Re: [Qemu-devel] [PATCH for-2.2] pci: Don't pass negative errno to 'error_set_errno()'

2014-11-12 Thread Max Reitz

On 2014-11-12 at 08:05, SeokYeon Hwang wrote:

Signed-off-by: SeokYeon Hwang 
---
  hw/pci/pcie.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 58455bd..2902f7d 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -229,7 +229,7 @@ static void pcie_cap_slot_hotplug_common(PCIDevice 
*hotplug_dev,
  /* the slot is electromechanically locked.
   * This error is propagated up to qdev and then to HMP/QMP.
   */
-error_setg_errno(errp, -EBUSY, "slot is electromechanically locked");
+error_setg_errno(errp, EBUSY, "slot is electromechanically locked");
  }
  }


Reviewed-by: Max Reitz 



Re: [Qemu-devel] [PATCH v4] error: passing a negative value to an os_errno is wrong

2014-11-12 Thread Max Reitz

On 2014-11-12 at 08:08, SeokYeon Hwang wrote:

Added 'assert(os_errno >= 0)' in 'error_set_errno()'.

Signed-off-by: SeokYeon Hwang 
---
  util/error.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/util/error.c b/util/error.c
index 2ace0d8..6c9d995 100644
--- a/util/error.c
+++ b/util/error.c
@@ -62,6 +62,7 @@ void error_set_errno(Error **errp, int os_errno, ErrorClass 
err_class,
  return;
  }
  assert(*errp == NULL);
+assert(os_errno >= 0);
  
  err = g_malloc0(sizeof(*err));
  


Reviewed-by: Max Reitz 



Re: [Qemu-devel] [PATCH 1/3] chardev: Add -qmp-pretty

2014-11-12 Thread Max Reitz

On 2014-11-12 at 05:41, Eric Blake wrote:

On 11/11/2014 06:34 AM, Max Reitz wrote:

Add a command line option for adding a QMP monitor using pretty JSON
formatting.

Signed-off-by: Max Reitz 
---
  qemu-options.hx |  8 
  vl.c| 15 ++-
  2 files changed, 18 insertions(+), 5 deletions(-)


Minor grammar suggestions:


diff --git a/qemu-options.hx b/qemu-options.hx
index da9851d..bc7b4c2 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2788,6 +2788,14 @@ STEXI
  @findex -qmp
  Like -monitor but opens in 'control' mode.
  ETEXI
+DEF("qmp-pretty", HAS_ARG, QEMU_OPTION_qmp_pretty, \
+"-qmp-pretty dev like -qmp but uses pretty JSON formatting\n",

maybe s/dev like -qmp/dev: like -qmp,/


+QEMU_ARCH_ALL)
+STEXI
+@item -qmp-pretty @var{dev}
+@findex -qmp-pretty
+Like -qmp but uses pretty JSON formatting.

maybe s/-qmp/-qmp,/



That's what I wrote at first, but then I saw the entry above: "Like 
-monitor but opens in 'control' mode" and decided to stay consistent.



Either way,
Reviewed-by: Eric Blake 


Thanks!

Max



Re: [Qemu-devel] [PATCH 2/3] iotests: _filter_qmp for pretty JSON output

2014-11-12 Thread Max Reitz

On 2014-11-12 at 05:47, Eric Blake wrote:

On 11/11/2014 06:34 AM, Max Reitz wrote:

_filter_qmp should be able to correctly filter out the QMP version
object for pretty JSON output.

Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/common.filter | 16 +++-
  1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/common.filter b/tests/qemu-iotests/common.filter
index 3acdb30..e24dab4 100644
--- a/tests/qemu-iotests/common.filter
+++ b/tests/qemu-iotests/common.filter
@@ -165,9 +165,23 @@ _filter_qemu()
  # replace problematic QMP output like timestamps
  _filter_qmp()
  {
+discard=0
+
  _filter_win32 | \
  sed -e 's#\("\(micro\)\?seconds": \)[0-9]\+#\1 TIMESTAMP#g' \
--e 's#^{"QMP":.*}$#QMP_VERSION#'
+-e 's#^{"QMP":.*}$#QMP_VERSION#' \
+-e 's#\\##g' | \
+while IFS='' read line; do
+if [[ $line == '"QMP": {' ]]; then

Good that this is a /bin/bash script and not /bin/sh :)


Ah, right, yes, I just copied the code I had written for filtering out 
the image-specific information from the qemu-img info output.



But - is it really worth doing this in shell?  Why not just do it in sed?


Because I don't know sed well enough. ;-)


sed -e ... \
 -e 's#\\##g' \
 -e '/"QMP": {/,/}/ c\' \
 -e 'QMP_VERSION'


Will do, thanks.

Max



Re: [Qemu-devel] [RFC PATCH] virtio-mmio: support for multiple irqs

2014-11-12 Thread Shannon Zhao
On 2014/11/11 23:11, Pawel Moll wrote:
> On Tue, 2014-11-04 at 09:35 +, Shannon Zhao wrote:
>> As the current virtio-mmio only support single irq,
>> so some advanced features such as vhost-net with irqfd
>> are not supported. And the net performance is not
>> the best without vhost-net and irqfd supporting.
> 
> Could you, please, help understanding me where does the main issue is?
> Is it about:
> 
> 1. The fact that the existing implementation blindly kicks all queues,
> instead only of the updated ones?
> 
> or:
> 
> 2. Literally having a dedicated interrupt line (remember, we're talking
> "real" interrupts here, not message signalled ones) per queue, so they
> can be handled by different processors at the same time?
> 

The main issue is that current virtio-mmio only support one interrupt which is 
shared by
config and queues. Therefore the virtio-mmio driver should read the
"VIRTIO_MMIO_INTERRUPT_STATUS" to get the interrupt reason and check whom this 
interrupt is to.

If we use vhost-net which uses irqfd to inject interrupt, the vhost-net doesn't 
update
"VIRTIO_MMIO_INTERRUPT_STATUS", then the guest driver can't read the interrupt 
reason and
doesn't call a handler to process.

So we can assign a dedicated interrupt line per queue for virtio-mmio and it 
can work with
irqfd.

> Now, if it's only about 1, the simplest solution would be to extend the
> VIRTIO_MMIO_INTERRUPT_STATUS register to signal up to 30 queues
> "readiness" in bits 2-31, still keeping bit 0 as a "combined"
> VIRTIO_MMIO_INT_VRING. In case when VIRTIO_MMIO_INT_VRING is set and
> none of the "individual" bits is (a device which doesn't support this
> feature or one that has more than 30 queues and of of those is ready) we
> would fall back to the original "kick all queues" approach. This could
> be a useful (and pretty simple) extension. In the worst case scenario it
> could be a post-1.0 standard addition, as it would provide backward
> compatibility.
> 
> However, if it's about 2, we're talking larger changes here. From the
> device perspective, we can define it as having per-queue (plus one for
> config) interrupt output *and* a "combined" output, being simple logical
> "or" of all the others. Then, the Device Tree bindings would be used to
> express the implementation choices (I'd keep the kernel parameter
> approach supporting the single interrupt case only). This is a very
> popular and well understood approach for memory mapped peripherals (for
> example, see the . It allows the system integrator to make a decision
> when it's coming to latency vs number interrupt lines trade off. The
> main issue is that we can't really impose a limit on a number of queues,
> therefore on a number of interrupts. This would require adding a new
> "interrupt acknowledge" register, which would take a number of the queue
> (or a symbolic value for the config one) instead of a bit mask. And I

Yes, maybe should add a new "interrupt acknowledge" register for backend and 
frontend to
consult the number of queues.

> must say that I'm not enjoying the idea of such substantial change to
> the specification that late in the process... (in other words: you'll
> have to put extra effort into convincing me :-)
> 
>> This patch support virtio-mmio to request multiple
>> irqs like virtio-pci. With this patch and qemu assigning
>> multiple irqs for virtio-mmio device, it's ok to use
>> vhost-net with irqfd on arm/arm64.
> 
> Could you please tell me how many queues (interrupts) are we talking
> about in this case? 5? A dozen? Hundreds?
> 

Theoretically the number of interrupts has no limit, but as the limit of ARM 
interrupt line,
the number should  be less than ARM interrupt lines. In the real situation, I 
think, the number
is generally less than 17 (8 pairs of vring interrupts and one config 
interrupt).

> Disclaimer: I have no personal experience with virtio and network (due
> to the fact how our Fast Models are implemented, I mostly us block
> devices and 9p protocol over virtio and I get enough performance from
> them :-).
> 
>> As arm doesn't support msi-x now, 
> 
> To be precise: "ARM" does "support" MSI-X :-) (google for GICv2m)

Sorry, I mean ARM with GICv2.
> 
> The correct statement would be: "normal memory mapped devices have no
> interface for message signalled interrupts (like MSI-X)"
> 
Yes, that's right.

>> we use GSI for multiple irq. 
> 
> I'm not sure what GSI stands for, but looking at the code I assume it's
> just a "normal" peripheral interrupt.
> 
>> In this patch we use "vm_try_to_find_vqs"
>> to check whether multiple irqs are supported like
>> virtio-pci.
> 
> Yeah, I can see that you have followed virtio-pci quite literally. I'm
> particularly not convinced to the one interrupt for config, one for all
> queues option. Doesn't make any sense to me here.
> 
About one interrupt for all queues, it's not a typical case. But just offer
one more choice for users. Users should configure the number of interrupts
according to th

Re: [Qemu-devel] [PATCH 3/3] iotests: Use -qmp-pretty in 067

2014-11-12 Thread Max Reitz

On 2014-11-12 at 05:51, Eric Blake wrote:

On 11/11/2014 06:34 AM, Max Reitz wrote:

067 invokes query-block, resulting in a reference output with really
long lines (which may pose a problem in email patches and always poses a
problem when the output changes, because it is hard to see what has
actually changed). Use -qmp-pretty to mitigate this issue.

Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/067 |   2 +-
  tests/qemu-iotests/067.out | 779 +
  2 files changed, 723 insertions(+), 58 deletions(-)

Large, but mechanical.  I'm trusting this one is correct (if the
testsuite still passes, you did it right) rather than actually reading it :)


Yes, for me "Passes before and the output afterwards still makes sense 
when skipping through" was enough.


Max


Reviewed-by: Eric Blake 


diff --git a/tests/qemu-iotests/067 b/tests/qemu-iotests/067
index d025192..29cd6b5 100755
--- a/tests/qemu-iotests/067
+++ b/tests/qemu-iotests/067
@@ -39,7 +39,7 @@ _supported_os Linux
  function do_run_qemu()
  {
  echo Testing: "$@"
-$QEMU -nographic -qmp stdio -serial none "$@"
+$QEMU -nographic -qmp-pretty stdio -serial none "$@"

Such a trivial change for inducing over 700 lines of change in the
expected output :)




Re: [Qemu-devel] [PATCH 11/21] iotests: Prepare for refcount_width option

2014-11-12 Thread Max Reitz

On 2014-11-11 at 18:57, Eric Blake wrote:

On 11/10/2014 06:45 AM, Max Reitz wrote:

Some tests do not work well with certain refcount widths (i.e. you
cannot create internal snapshots with refcount_width=1), so make those
widths unsupported.

Furthermore, add another filter to _filter_img_create in common.filter
which filters out the refcount_width value.

Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/007   |  4 
  tests/qemu-iotests/015   |  1 +
  tests/qemu-iotests/026   | 11 +++
  tests/qemu-iotests/029   |  1 +
  tests/qemu-iotests/051   |  1 +
  tests/qemu-iotests/058   |  1 +
  tests/qemu-iotests/067   |  7 +++
  tests/qemu-iotests/079   |  1 +
  tests/qemu-iotests/080   |  1 +
  tests/qemu-iotests/089   |  7 +++
  tests/qemu-iotests/090   |  1 +
  tests/qemu-iotests/108   |  6 ++
  tests/qemu-iotests/common.filter |  3 ++-
  13 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/tests/qemu-iotests/007 b/tests/qemu-iotests/007
index fe1a743..de39d1b 100755
--- a/tests/qemu-iotests/007
+++ b/tests/qemu-iotests/007
@@ -43,6 +43,10 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
  _supported_fmt qcow2
  _supported_proto generic
  _supported_os Linux
+# refcount_width must be at least 4 bits so we can create ten internal 
snapshots
+# (1 bit supports none, 2 bits support three, 4 bits support 15)

Feels like an off-by-one comment.  A width of 1 bit support a max
refcount of 1 (therefore no snapshots), a width of 2 bits supports a max
refcount of 3 (therefore 2 snapshots in addition to the original), a
width of 4 bits supports a max refcount of 15 (therefore only 14 snapshots).


Telling you how to correctly get to the right maximum refcount and then 
getting it wrong myself is a bit embarrassing...



+++ b/tests/qemu-iotests/067
@@ -35,6 +35,13 @@ status=1 # failure is the default!
  _supported_fmt qcow2
  _supported_proto file
  _supported_os Linux
+# Because this would change the output of query-block
+_unsupported_imgopts 'refcount_width=1[^0-9]' \
+ 'refcount_width=2[^0-9]' \
+ 'refcount_width=4[^0-9]' \
+ 'refcount_width=8[^0-9]' \
+ 'refcount_width=32[^0-9]' \
+ 'refcount_width=64[^0-9]'

It might be more compact to exploit globbing and just say:

_unsupported_imgopts 'refcount_width=?[^6]'

which leaves refcount_width=16 as the only pattern that doesn't match
the glob.  But that feels more fragile, so I can live with your longer list.


Well, maybe using ?[^6] is even better because this isn't about ruling 
out the options 1, 2, 4, 8, 32 and 64, but rather only allowing 16. 
Thus, using ?[^6] seems more explicit.



+++ b/tests/qemu-iotests/089
@@ -41,6 +41,13 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
  _supported_fmt qcow2
  _supported_proto file
  _supported_os Linux
+# Because this would change the output of qemu_io -c info
+_unsupported_imgopts 'refcount_width=1[^0-9]' \

I like how you give reasons for some tests...


+++ b/tests/qemu-iotests/090
@@ -41,6 +41,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
  _supported_fmt qcow2
  _supported_proto file nfs
  _supported_os Linux
+_unsupported_imgopts 'refcount_width=1[^0-9]'

...so why not do it for all tests?


Because the ones I didn't give reasons for are the ones I spotted first, 
so they seemed obvious to me. ;-) (I wondered the same thing when 
looking through the patches before submitting them, but decided to just 
leave it at that)


I'll add a comment for every test.


At any rate, the patch makes sense, so whether or not you tweak comments,

Reviewed-by: Eric Blake 

I'm assuming that later in the series you add a test that explicitly
covers the error message given when a refcount_order=0 (width=1) image
is attempted to be used with snapshots, since that will fail (internal
snapshots are simply not possible without a refcount that can't exceed 1).


Well, as you yourself explained, they are indeed possible if done right 
(immediate COW). But that'll go into another series.


Max



Re: [Qemu-devel] [PATCH 12/21] qcow2: Allow creation with refcount order != 4

2014-11-12 Thread Max Reitz

On 2014-11-11 at 19:05, Eric Blake wrote:

On 11/10/2014 06:45 AM, Max Reitz wrote:

Add a creation option to qcow2 for setting the refcount order of images
to be created, and respect that option's value.

This breaks some test outputs, fix them.

Signed-off-by: Max Reitz 
---
  block/qcow2.c  |  20 
  include/block/block_int.h  |   1 +
  tests/qemu-iotests/049.out | 112 ++---
  tests/qemu-iotests/079.out |  18 
  tests/qemu-iotests/082.out |  41 ++---
  tests/qemu-iotests/085.out |  38 +++
  6 files changed, 139 insertions(+), 91 deletions(-)

Is there any .json file that needs to be modified to allow this option
to be set via QMP?  I guess that can be a followup.


Good point, but I can't find any JSON object which contains a 
"lazy-refcounts" (or "lazy_refcounts") field and which seems to be 
related to image creation (if "lazy-refcounts" is somewhere not relating 
to options for opening an existing image, "refcount-width" should be 
there, too). I guess there probably is no way of directly creating an 
image over QMP yet at all.



  qemu-img create -f qcow2 TEST_DIR/t.qcow2 -- 1kilobyte
-qemu-img: Invalid image size specified! You may use k, M, G, T, P or E 
suffixes for
+qemu-img: Invalid image size specified! You may use k, M, G, T, P or E 
suffixes for
  qemu-img: kilobytes, megabytes, gigabytes, terabytes, petabytes and exabytes.

Nice that you are also getting rid of trailing whitespace.


That was actually unintentional. You seem fine with it, though, so I'll 
remove all trailing whitespace in all the test outputs I'm touching in v2.


Max


Reviewed-by: Eric Blake 





Re: [Qemu-devel] [PATCH 14/21] qcow2: Use error_report() in qcow2_amend_options()

2014-11-12 Thread Max Reitz

On 2014-11-11 at 19:11, Eric Blake wrote:

On 11/10/2014 06:45 AM, Max Reitz wrote:

Signed-off-by: Max Reitz 
---
  block/qcow2.c  | 14 ++
  tests/qemu-iotests/061.out | 14 +++---
  2 files changed, 13 insertions(+), 15 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 21a1883..beb7187 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2686,11 +2686,11 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
  } else if (!strcmp(compat, "1.1")) {
  new_version = 3;
  } else {
-fprintf(stderr, "Unknown compatibility level %s.\n", compat);
+error_report("Unknown compatibility level %s.", compat);

Not many error_report() locations include a trailing '.'


+++ b/tests/qemu-iotests/061.out
@@ -281,19 +281,19 @@ No errors were found on the image.
  === Testing invalid configurations ===
  
  Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864

-Lazy refcounts only supported with compatibility level 1.1 and above (use 
compat=1.1 or greater)
+qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above 
(use compat=1.1 or greater)
  qemu-img: Error while amending options: Invalid argument
-Lazy refcounts only supported with compatibility level 1.1 and above (use 
compat=1.1 or greater)
+qemu-img: Lazy refcounts only supported with compatibility level 1.1 and above 
(use compat=1.1 or greater)
  qemu-img: Error while amending options: Invalid argument
-Unknown compatibility level 0.42.
+qemu-img: Unknown compatibility level 0.42.
  qemu-img: Error while amending options: Invalid argument
  qemu-img: Invalid parameter 'foo'
  qemu-img: Invalid options for file format 'qcow2'
-Changing the cluster size is not supported.
+qemu-img: Changing the cluster size is not supported.
  qemu-img: Error while amending options: Operation not supported
-Changing the encryption flag is not supported.
+qemu-img: Changing the encryption flag is not supported.
  qemu-img: Error while amending options: Operation not supported
-Cannot change preallocation mode.
+qemu-img: Cannot change preallocation mode.
  qemu-img: Error while amending options: Operation not supported

See - most of the messages do not end in '.'.  Probably worth cleaning
up if you respin.  But it's not a show-stopper if you leave it, so:


I'll clean it up.

Max


Reviewed-by: Eric Blake 





Re: [Qemu-devel] [PATCH 15/21] qcow2: Use abort() instead of assert(false)

2014-11-12 Thread Max Reitz

On 2014-11-11 at 19:12, Eric Blake wrote:

On 11/10/2014 06:45 AM, Max Reitz wrote:

Signed-off-by: Max Reitz 
---
  block/qcow2.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

Reviewed-by: Eric Blake 

Is it worth hoisting this one into 2.2 via the -trivial tree?


No, as explained this point can only be reached if there is some 
creation option for qcow2 images which is not handled by any of the 
branches in this function. Since there is no such thing currently in 
master and there most certainly won't be in 2.2 (thanks to hard freeze), 
it's fine to keep it out of 2.2.


Max


diff --git a/block/qcow2.c b/block/qcow2.c
index beb7187..ebf843f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2718,9 +2718,9 @@ static int qcow2_amend_options(BlockDriverState *bs, 
QemuOpts *opts,
  error_report("Cannot change refcount entry width");
  return -ENOTSUP;
  } else {
-/* if this assertion fails, this probably means a new option was
+/* if this point is reached, this probably means a new option was
   * added without having it covered here */
-assert(false);
+abort();
  }
  
  desc++;







Re: [Qemu-devel] [PATCH 05/21] qcow2: Refcount overflow and qcow2_alloc_bytes()

2014-11-12 Thread Max Reitz

On 2014-11-11 at 20:49, Eric Blake wrote:

On 11/11/2014 09:18 AM, Max Reitz wrote:


No, I was envisioning that we have a brand new image with one cluster
allocated (cluster 1 has refcount 1), then 5 times in a row we do
'savevm' to take an internal snapshot.  If I understand your code
correctly, the first two snapshots increase the refcount, so cluster 1
has a refcount of 3. Then the next snapshot can't increase the refcount,
so it instead copies the contents to cluster 2.

No, it just errors out.

qcow2_alloc_bytes() is only used for allocating space for a compressed
cluster. When taking a snapshot, update_refcount() will be called to
increase the clusters' refcounts, and that function will simply throw an
error.

That's okay for now (always better for an initial feature to be
conservative, then expand it later if there is demand).  But I wonder if
we could be made smarter in the future and auto-COW any cluster that
would otherwise exceed max refcount.  Thus, for a refcount_order=0
(width=1) image, a snapshot now doubles the size of the image (as every
single cluster would COW into a new cluster) rather than erroring out.
Food for thought; maybe worth injecting comments into this series
(whether in code or in commit messages, as appropriate) pointing out
that we thought about the future possibility even though we chose not to
allow it for now.


Ah, right, thank you. Yes, that sounds like a good idea, I'll see to it 
at some later point in time.


I think adding comments will be hard because the snapshot functions 
aren't really modified. They just try to increase the refcount and that 
may now fail earlier than it did for a refcount width of 16 bits, so 
there's no real change in behavior there, it's just that it's now 
reasonably possible to hit that case. I will add appropriate comments to 
the test case (which tests this snapshotting issue), though.


Max



Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions

2014-11-12 Thread Frank Blaschka
On Tue, Nov 11, 2014 at 04:24:24PM +0100, Alexander Graf wrote:
> 
> 
> On 11.11.14 15:08, Frank Blaschka wrote:
> > On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote:
> >>
> >>
> >>
> >>> Am 11.11.2014 um 13:39 schrieb Frank Blaschka 
> >>> :
> >>>
>  On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:
> 
> 
> > On 11.11.14 13:10, Frank Blaschka wrote:
> >> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
> >>
> >>
> >>> On 10.11.14 15:20, Frank Blaschka wrote:
> >>> From: Frank Blaschka 
> >>>
> >>> This patch implements the s390 pci instructions in qemu. It allows
> >>> to access and drive pci devices attached to the s390 pci bus.
> >>> Because of platform constrains devices using IO BARs are not
> >>> supported. Also a device has to support MSI/MSI-X to run on s390.
> >>>
> >>> Signed-off-by: Frank Blaschka 
> >>> ---
> >>> target-s390x/Makefile.objs |   2 +-
> >>> target-s390x/kvm.c |  52 
> >>> target-s390x/pci_ic.c  | 753 
> >>> +
> >>> target-s390x/pci_ic.h  | 335 
> >>> 4 files changed, 1141 insertions(+), 1 deletion(-)
> >>> create mode 100644 target-s390x/pci_ic.c
> >>> create mode 100644 target-s390x/pci_ic.h
> >>>
> 
>  [...]
> 
> >>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
> >>> +{
> >>> +CPUS390XState *env = &cpu->env;
> >>> +S390PCIBusDevice *pbdev;
> >>> +uint8_t r1 = (run->s390_sieic.ipb & 0x00f0) >> 20;
> >>> +uint8_t r2 = (run->s390_sieic.ipb & 0x000f) >> 16;
> >>> +PciLgStg *rp;
> >>> +uint64_t offset;
> >>> +uint64_t data;
> >>> +uint8_t len;
> >>> +
> >>> +cpu_synchronize_state(CPU(cpu));
> >>> +
> >>> +if (env->psw.mask & PSW_MASK_PSTATE) {
> >>> +program_interrupt(env, PGM_PRIVILEGED, 4);
> >>> +return 0;
> >>> +}
> >>> +
> >>> +if (r2 & 0x1) {
> >>> +program_interrupt(env, PGM_SPECIFICATION, 4);
> >>> +return 0;
> >>> +}
> >>> +
> >>> +rp = (PciLgStg *)&env->regs[r2];
> >>> +offset = env->regs[r2 + 1];
> >>> +
> >>> +pbdev = s390_pci_find_dev_by_fh(rp->fh);
> >>> +if (!pbdev) {
> >>> +DPRINTF("pcilg no pci dev\n");
> >>> +setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> >>> +return 0;
> >>> +}
> >>> +
> >>> +len = rp->len & 0xF;
> >>> +if (rp->pcias < 6) {
> >>> +if ((8 - (offset & 0x7)) < len) {
> >>> +program_interrupt(env, PGM_OPERAND, 4);
> >>> +return 0;
> >>> +}
> >>> +MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
> >>> +io_mem_read(mr, offset, &data, len);
> >>> +} else if (rp->pcias == 15) {
> >>> +if ((4 - (offset & 0x3)) < len) {
> >>> +program_interrupt(env, PGM_OPERAND, 4);
> >>> +return 0;
> >>> +}
> >>> +data =  pci_host_config_read_common(
> >>> +   pbdev->pdev, offset, 
> >>> pci_config_size(pbdev->pdev), len);
> >>> +
> >>> +switch (len) {
> >>> +case 1:
> >>> +break;
> >>> +case 2:
> >>> +data = cpu_to_le16(data);
> >>> +break;
> >>> +case 4:
> >>> +data = cpu_to_le32(data);
> >>> +break;
> >>> +case 8:
> >>> +data = cpu_to_le64(data);
> >>> +break;
> >>
> >> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
> >> and LE. So if you're running this on an LE host, you won't swap the
> >> value and get a broken result.
> >>
> >> If you know that the value is always swapped, use bswapxx().
> >>
> >
> > Actually the code is right and required for a big endian host :-)
> > pcilg/pcistg provide access to the PCI config space which is defined
> > as PCI byte order (little endian). Since pci_host_config_read_common 
> > does
> > already a le to cpu conversion we have to convert back to PCI byte 
> > order.
> > Doing an unconditional swap would be a bug on a little endian host.
> 
>  Why would it be a bug? The value you end up writing is contents of a
>  register and thus doesn't have endianness. So if QEMU was an LE process,
> >>>
> >>> No, the s390 guest executing pcilg instruction expects to receive config 
> >>> space data
> >>> in PCI byte order.
> >>>
>  the value of data would be identical as on a BE QEMU before your swab.
>  After the swab, it would be bswap'ed on BE, but not LE. So LE hosts 
>  break.
> 
> >>>
> >>> Again on BE endian host we do the swap because of 

Re: [Qemu-devel] [PATCH 21/21] iotests: Add test for different refcount widths

2014-11-12 Thread Max Reitz

On 2014-11-11 at 20:53, Eric Blake wrote:

On 11/10/2014 06:45 AM, Max Reitz wrote:

Add a test for conversion between different refcount widths and errors
specific to certain widths (i.e. snapshots with refcount_width=1).

Signed-off-by: Max Reitz 
---
  tests/qemu-iotests/112 | 225 +
  tests/qemu-iotests/112.out | 123 +
  tests/qemu-iotests/group   |   1 +
  3 files changed, 349 insertions(+)
  create mode 100755 tests/qemu-iotests/112
  create mode 100644 tests/qemu-iotests/112.out

+
+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux

Might work on more than just Linux, but then again, it's probably worth
scrubbing the whole testsuite for situations like that, so don't worry
about it here.


+# This test will set refcount_width on its own which would conflict with the
+# manual setting; compat will be overridden as well
+_unsupported_imgopts refcount_width 'compat=0.10'
+
+function print_refcount_width()
+{
+$QEMU_IMG info "$TEST_IMG" | grep 'refcount width:' | sed -e 's/^ *//'

grep|sed is almost always a waste.  This is equivalent:

$QEMU_IMG info "$TEST_IMG" | sed -n '/refcount width:/ s/^ *//p'


As said before, I just don't know sed well enough. My knowledge of sed 
is "you can use it for regex replacement with -e", and that's about it. 
Oh, and "you can do it in-place with -i" and "someone wrote Sokoban in 
sed". And maybe even "sed better fits the term Standard EDitor than ed 
does".


But thanks a lot for telling me, I'm just always afraid to learn sed 
because it seems even more unreadable than perl to me...



+echo
+echo '=== Snapshot limit on refcount_width=1 ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
+print_refcount_width
+
+$QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
+
+# Should fail
+$QEMU_IMG snapshot -c foo "$TEST_IMG"
+
+# The new L1 table could/shoud be leaked

s/shoud/should/


Right.


+_check_test_img
+
+echo
+echo '=== Snapshot limit on refcount_width=2 ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=2" _make_test_img 64M
+print_refcount_width
+
+$QEMU_IO -c 'write 0 512' "$TEST_IMG" | _filter_qemu_io
+
+# Should succeed
+$QEMU_IMG snapshot -c foo "$TEST_IMG"
+$QEMU_IMG snapshot -c bar "$TEST_IMG"
+# Should fail (4th reference)
+$QEMU_IMG snapshot -c baz "$TEST_IMG"
+
+# The new L1 table could/shoud be leaked

again


yyp is dangerous.


+echo
+echo '=== Amend with snapshot ==='
+echo
+
+$QEMU_IMG snapshot -c foo "$TEST_IMG"
+# Just to have different refcounts across the image
+$QEMU_IO -c 'write 0 16M' "$TEST_IMG" | _filter_qemu_io
+
+# Should not work
+$QEMU_IMG amend -o refcount_width=1 "$TEST_IMG"
+_check_test_img
+print_refcount_width

This matches your initial implementation. Someday, though, we may decide
to auto-COW any overflowed cluster, and thus allow the conversion to
succeed.  Worth a comment?


Yes, will do.


+echo '=== Testing too many references for check ==='
+echo
+
+IMGOPTS="$IMGOPTS,refcount_width=1" _make_test_img 64M
+print_refcount_width
+
+# This cluster should be created at 0x5
+$QEMU_IO -c 'write 0 64k' "$TEST_IMG" | _filter_qemu_io
+# Now make the second L2 entriy (the L2 table should be at 0x4) point to

s/entriy/entry/


I think this happened because at one point in time it said something 
about "L2 entries".



+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0

Overall a nice set of tests!



+=== Snapshot limit on refcount_width=1 ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=67108864
+refcount width: 1
+wrote 512/512 bytes at offset 0
+512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+qemu-img: Could not create snapshot 'foo': -22 (Invalid argument)
+Leaked cluster 6 refcount=1 reference=0

Bummer that the error message did not state WHY (because a cluster would
overflow refcounts), but I'm not sure how hard it would be to make that
better, and at least we correctly errored out.


Yes, I know, it's a really bad error. The problem is that no Error 
object is used in that path at all so it will be rather cumbersome, but 
I'll look into it one more time.


Max



Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions

2014-11-12 Thread Alexander Graf


On 12.11.14 09:49, Frank Blaschka wrote:
> On Tue, Nov 11, 2014 at 04:24:24PM +0100, Alexander Graf wrote:
>>
>>
>> On 11.11.14 15:08, Frank Blaschka wrote:
>>> On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote:



> Am 11.11.2014 um 13:39 schrieb Frank Blaschka 
> :
>
>> On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:
>>
>>
>>> On 11.11.14 13:10, Frank Blaschka wrote:
 On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:


> On 10.11.14 15:20, Frank Blaschka wrote:
> From: Frank Blaschka 
>
> This patch implements the s390 pci instructions in qemu. It allows
> to access and drive pci devices attached to the s390 pci bus.
> Because of platform constrains devices using IO BARs are not
> supported. Also a device has to support MSI/MSI-X to run on s390.
>
> Signed-off-by: Frank Blaschka 
> ---
> target-s390x/Makefile.objs |   2 +-
> target-s390x/kvm.c |  52 
> target-s390x/pci_ic.c  | 753 
> +
> target-s390x/pci_ic.h  | 335 
> 4 files changed, 1141 insertions(+), 1 deletion(-)
> create mode 100644 target-s390x/pci_ic.c
> create mode 100644 target-s390x/pci_ic.h
>
>>
>> [...]
>>
> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
> +{
> +CPUS390XState *env = &cpu->env;
> +S390PCIBusDevice *pbdev;
> +uint8_t r1 = (run->s390_sieic.ipb & 0x00f0) >> 20;
> +uint8_t r2 = (run->s390_sieic.ipb & 0x000f) >> 16;
> +PciLgStg *rp;
> +uint64_t offset;
> +uint64_t data;
> +uint8_t len;
> +
> +cpu_synchronize_state(CPU(cpu));
> +
> +if (env->psw.mask & PSW_MASK_PSTATE) {
> +program_interrupt(env, PGM_PRIVILEGED, 4);
> +return 0;
> +}
> +
> +if (r2 & 0x1) {
> +program_interrupt(env, PGM_SPECIFICATION, 4);
> +return 0;
> +}
> +
> +rp = (PciLgStg *)&env->regs[r2];
> +offset = env->regs[r2 + 1];
> +
> +pbdev = s390_pci_find_dev_by_fh(rp->fh);
> +if (!pbdev) {
> +DPRINTF("pcilg no pci dev\n");
> +setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> +return 0;
> +}
> +
> +len = rp->len & 0xF;
> +if (rp->pcias < 6) {
> +if ((8 - (offset & 0x7)) < len) {
> +program_interrupt(env, PGM_OPERAND, 4);
> +return 0;
> +}
> +MemoryRegion *mr = pbdev->pdev->io_regions[rp->pcias].memory;
> +io_mem_read(mr, offset, &data, len);
> +} else if (rp->pcias == 15) {
> +if ((4 - (offset & 0x3)) < len) {
> +program_interrupt(env, PGM_OPERAND, 4);
> +return 0;
> +}
> +data =  pci_host_config_read_common(
> +   pbdev->pdev, offset, 
> pci_config_size(pbdev->pdev), len);
> +
> +switch (len) {
> +case 1:
> +break;
> +case 2:
> +data = cpu_to_le16(data);
> +break;
> +case 4:
> +data = cpu_to_le32(data);
> +break;
> +case 8:
> +data = cpu_to_le64(data);
> +break;

 Why? Also, this is wrong. cpu_to_le64 convert between host endianness
 and LE. So if you're running this on an LE host, you won't swap the
 value and get a broken result.

 If you know that the value is always swapped, use bswapxx().

>>>
>>> Actually the code is right and required for a big endian host :-)
>>> pcilg/pcistg provide access to the PCI config space which is defined
>>> as PCI byte order (little endian). Since pci_host_config_read_common 
>>> does
>>> already a le to cpu conversion we have to convert back to PCI byte 
>>> order.
>>> Doing an unconditional swap would be a bug on a little endian host.
>>
>> Why would it be a bug? The value you end up writing is contents of a
>> register and thus doesn't have endianness. So if QEMU was an LE process,
>
> No, the s390 guest executing pcilg instruction expects to receive config 
> space data
> in PCI byte order.
>
>> the value of data would be identical as on a BE QEMU before your swab.
>> After the swab, it would be bswap'ed on BE, but not LE. So LE hosts 
>> break.
>>
>
> Again

Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Claudio Fontana
On 11.11.2014 16:29, Mark Rutland wrote:
> Hi,
> 
> On Thu, Nov 06, 2014 at 01:33:20PM +, Alexander Spyridakis wrote:
>> On 6 November 2014 14:44, Peter Maydell  wrote:
>>>
>>>
 We need ACPI guest support in QEMU for AArch64 over here, with all features
 (including the ability to run ACPI code and add specific tables), for
 ACPI-based guests.
>>>
>>> The plan for providing ACPI to guests is that we run a UEFI BIOS
>>> blob which is what is responsible for providing ACPI and UEFI
>>> runtime services to guests which need them. (The UEFI blob finds
>>> out about its hardware by looking at a device tree that QEMU
>>> passes it, but that's a detail between QEMU and its bios blob).
>>> This pretty much looks like what x86 QEMU used to do with ACPI
>>> for a very long time, so we know it's a feasible approach.
>>
>> Hi Peter,
>>
>> The rational in the proposed approach is meant for cases where the
>> user does not want to rely on external firmware layers. While UEFI
>> could do what you are describing, the point is to avoid this not so
>> trivial overhead in the booting process. Especially in the case of
>> thin guests, where another software dependency is undesired.
> 
> I'm not sure how you plan to use ACPI without UEFI, as there are several
> pieces of information which ACPI misses, such as the memory map, which
> must be discovered from UEFI. How do you intend to discover the memory
> map without UEFI?
> 
> Additionally, with Linux and other generic OSs, the expectation is that
> the ACPI tables are discovered via the UEFI system table. How do you
> intend to discover the ACPI tables? Or other system information?
> 
> From experience with Linux, querying this information from UEFI is a
> trivial overhead, though a UEFI implementation might take a while to
> boot to the point where that is possible. It would be more generally
> helpful to have an optimized virtualised UEFI for this case (or perhaps
> just a UEFI frontend that presents the same interface to EFI
> applications but doesn't have to do any heavy lifting at boot).
> 
> So far the general trend with AArch64 at the system level is to use
> generic interfaces as far as possible. The generic interface for
> discovering ACPI tables is to boot as an EFI application and then to
> query the tables from UEFI. That is the interface others are likely to
> follow, and ACPI without UEFI is unlikely to be of much use to anyone
> else.
> 
> Why is it worth expending the effort on the boot protocol you suggest
> (which so far is not well defined) when there is already a portable,
> well-defined standard that others are already following?
> 
> Thanks,
> Mark.
> 

Hi,

I tend to mostly agree with this, we might look for alternative solutions for
speeding up guest startup in the future but in general if getting ACPI in the
guest for ARM64 requires also getting UEFI, then I can personally live with 
that,
especially if we strive to have the kind of optimized virtualized UEFI you 
mention.

We can in the meantime use these patches to test the "fast path" to the guest
by just hardcoding or passing on the command line what is needed to be able to 
test
information reading from ACPI from the guest side.

I cc: my colleagues Jani and Paul which have more the use case in mind and
might correct me there.

For x86 though what is the state of UEFI in QEMU? Is it relying on the OVMF
project to provide the firmware images if I understand correctly..
How is it working out in practice?
Should the same kind of approach be taken for ARM64?

As mentioned by others, I'd rather see an implementation of ACPI in QEMU which
learns from the experience of X86 (and possibly shares some code if possible),
rather than going in a different direction by creating device trees first,
and then converting them to ACPI tables somewhere in the firmware, just because
device trees are "already there", for the reasons which have already been
mentioned before by Igor and others.

I wouldn't want for ACPI to be "sort of" supported in QEMU, but with a limited
functionality which makes it not fully useful in practice. I'd rather see it
as a first class citizen instead, including the ability to run AML code.

Thanks,

Claudio



Re: [Qemu-devel] [PATCH 17/21] qcow2: Use intermediate helper CB for amend

2014-11-12 Thread Max Reitz

On 2014-11-11 at 22:05, Eric Blake wrote:

On 11/10/2014 06:45 AM, Max Reitz wrote:

If there is more than one time-consuming operation to be performed for
qcow2_amend_options(), we need an intermediate CB which coordinates the
progress of the individual operations and passes the result to the
original status callback.

Signed-off-by: Max Reitz 
---
  block/qcow2.c | 76 ++-
  1 file changed, 75 insertions(+), 1 deletion(-)

Getting trickier to review.


Yes, and 18 is the worst.


diff --git a/block/qcow2.c b/block/qcow2.c
index eaef251..e6b93d1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2655,6 +2655,71 @@ static int qcow2_downgrade(BlockDriverState *bs, int 
target_version,
  return 0;
  }
  
+typedef enum Qcow2AmendOperation {

+/* This is the value Qcow2AmendHelperCBInfo::last_operation will be
+ * statically initialized to so that the helper CB can discern the first
+ * invocation from an operation change */
+QCOW2_NO_OPERATION = 0,
+
+QCOW2_DOWNGRADING,
+} Qcow2AmendOperation;

So for this patch, you still have just one operation, but later in the
series, you add a second (and the goal of THIS patch is that it will
work even if there are 3 or more operations, even though this series
doesn't add that many).


Right.


+static void qcow2_amend_helper_cb(BlockDriverState *bs, int64_t offset,
+ int64_t total_work_size, void *opaque)

indentation looks off


Right, will fix.


+{
+Qcow2AmendHelperCBInfo *info = opaque;
+int64_t current_work_size;
+int64_t projected_work_size;

Worth asserting that info->total_operations is non-zero?  Or is there
ever a valid case for calling the callback even when there are no
sub-operations, and therefore we are automatically complete (offset ==
total_work_size)?


No, the driver does not have to call the status CB; qemu-img amend will 
mind that case by first printing 0 %, then invoking the amend operation, 
and then printing 100 %. Will add an assertion.



+
+if (info->current_operation != info->last_operation) {
+if (info->last_operation != QCOW2_NO_OPERATION) {
+info->offset_completed += info->last_work_size;
+info->operations_completed++;
+}

Would it be any easier to guarantee that we come to 100% completion by
requiring the coordinator to pass a final completion callback? [1]
  info->current_operation = QCOW2_NO_OPERATION;
  cb(bs, 0, 0, info)


No, because the amend CB does not have to be called on completion; 
img_amend() in qemu-img.c takes care of that case.



+
+info->last_operation = info->current_operation;
+}
+
+info->last_work_size = total_work_size;

Took me a while to realize that total_work_size is the incoming
(estimated) total size for the current sub-operation, and not the total
over the combination of all sub-operations...


Ah, right, I'll change the variable name, maybe to operation_work_size 
or something like that.



+
+current_work_size = info->offset_completed + total_work_size;
+
+/* current_work_size is the total work size for (operations_completed + 1)

but this comment helped.


+ * operations (which includes this one), so multiply it by the number of
+ * operations not covered and divide it by the number of operations
+ * covered to get a projection for the operations not covered */
+projected_work_size = current_work_size * (info->total_operations -
+   info->operations_completed - 1)
+/ (info->operations_completed + 1);

So, when there is just one sub-operation (which is the case until later
patches add a second), this results in the following calculation for ALL
calls during the intermediate steps of the sub-operation:

projected_work_size = total_work_size * (1 - 0 - 1) / (0 + 1)

that is, we are projecting 0 additional work because we have zero
additional stages to complete.  Am I correct that we will never enter
the callback in a state where
info->operations_completed==info->total_operations?


Yes, we won't.


(because if we do,
you'd have a computation of final_size * (1 - 1 - 1) / (1 + 1) which
looks weird).  Worth an assert()?


assert()s are always worth it.


Then again, my proposal above [1] to
guarantee a 100% completion by use of a final cleanup callback would
indeed reach the point where operations_completed==total_operations.


+
+info->original_status_cb(bs, info->offset_completed + offset,
+ current_work_size + projected_work_size,
+ info->original_cb_opaque);

So, as long as we don't add a second phase, this is strictly equivalent
to calling the original callback with the original offset (since
info->offset_completed remains 0) and original work size (since
projected_work_size remains 0).  That part works fine.

Let's see what happens if we had three phases.

Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 10:08, Alexander Graf wrote:
> 
> 
> On 12.11.14 09:49, Frank Blaschka wrote:
>> On Tue, Nov 11, 2014 at 04:24:24PM +0100, Alexander Graf wrote:
>>>
>>>
>>> On 11.11.14 15:08, Frank Blaschka wrote:
 On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote:
>
>
>
>> Am 11.11.2014 um 13:39 schrieb Frank Blaschka 
>> :
>>
>>> On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:
>>>
>>>
 On 11.11.14 13:10, Frank Blaschka wrote:
> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
>
>
>> On 10.11.14 15:20, Frank Blaschka wrote:
>> From: Frank Blaschka 
>>
>> This patch implements the s390 pci instructions in qemu. It allows
>> to access and drive pci devices attached to the s390 pci bus.
>> Because of platform constrains devices using IO BARs are not
>> supported. Also a device has to support MSI/MSI-X to run on s390.
>>
>> Signed-off-by: Frank Blaschka 
>> ---
>> target-s390x/Makefile.objs |   2 +-
>> target-s390x/kvm.c |  52 
>> target-s390x/pci_ic.c  | 753 
>> +
>> target-s390x/pci_ic.h  | 335 
>> 4 files changed, 1141 insertions(+), 1 deletion(-)
>> create mode 100644 target-s390x/pci_ic.c
>> create mode 100644 target-s390x/pci_ic.h
>>
>>>
>>> [...]
>>>
>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
>> +{
>> +CPUS390XState *env = &cpu->env;
>> +S390PCIBusDevice *pbdev;
>> +uint8_t r1 = (run->s390_sieic.ipb & 0x00f0) >> 20;
>> +uint8_t r2 = (run->s390_sieic.ipb & 0x000f) >> 16;
>> +PciLgStg *rp;
>> +uint64_t offset;
>> +uint64_t data;
>> +uint8_t len;
>> +
>> +cpu_synchronize_state(CPU(cpu));
>> +
>> +if (env->psw.mask & PSW_MASK_PSTATE) {
>> +program_interrupt(env, PGM_PRIVILEGED, 4);
>> +return 0;
>> +}
>> +
>> +if (r2 & 0x1) {
>> +program_interrupt(env, PGM_SPECIFICATION, 4);
>> +return 0;
>> +}
>> +
>> +rp = (PciLgStg *)&env->regs[r2];
>> +offset = env->regs[r2 + 1];
>> +
>> +pbdev = s390_pci_find_dev_by_fh(rp->fh);
>> +if (!pbdev) {
>> +DPRINTF("pcilg no pci dev\n");
>> +setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
>> +return 0;
>> +}
>> +
>> +len = rp->len & 0xF;
>> +if (rp->pcias < 6) {
>> +if ((8 - (offset & 0x7)) < len) {
>> +program_interrupt(env, PGM_OPERAND, 4);
>> +return 0;
>> +}
>> +MemoryRegion *mr = 
>> pbdev->pdev->io_regions[rp->pcias].memory;
>> +io_mem_read(mr, offset, &data, len);
>> +} else if (rp->pcias == 15) {
>> +if ((4 - (offset & 0x3)) < len) {
>> +program_interrupt(env, PGM_OPERAND, 4);
>> +return 0;
>> +}
>> +data =  pci_host_config_read_common(
>> +   pbdev->pdev, offset, 
>> pci_config_size(pbdev->pdev), len);
>> +
>> +switch (len) {
>> +case 1:
>> +break;
>> +case 2:
>> +data = cpu_to_le16(data);
>> +break;
>> +case 4:
>> +data = cpu_to_le32(data);
>> +break;
>> +case 8:
>> +data = cpu_to_le64(data);
>> +break;
>
> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
> and LE. So if you're running this on an LE host, you won't swap the
> value and get a broken result.
>
> If you know that the value is always swapped, use bswapxx().
>

 Actually the code is right and required for a big endian host :-)
 pcilg/pcistg provide access to the PCI config space which is defined
 as PCI byte order (little endian). Since pci_host_config_read_common 
 does
 already a le to cpu conversion we have to convert back to PCI byte 
 order.
 Doing an unconditional swap would be a bug on a little endian host.
>>>
>>> Why would it be a bug? The value you end up writing is contents of a
>>> register and thus doesn't have endianness. So if QEMU was an LE process,
>>
>> No, the s390 guest executing pcilg instruction expects to receive config 
>> space data
>> in PCI byte order.
>>

Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions

2014-11-12 Thread Alexander Graf


On 12.11.14 10:11, Paolo Bonzini wrote:
> 
> 
> On 12/11/2014 10:08, Alexander Graf wrote:
>>
>>
>> On 12.11.14 09:49, Frank Blaschka wrote:
>>> On Tue, Nov 11, 2014 at 04:24:24PM +0100, Alexander Graf wrote:


 On 11.11.14 15:08, Frank Blaschka wrote:
> On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote:
>>
>>
>>
>>> Am 11.11.2014 um 13:39 schrieb Frank Blaschka 
>>> :
>>>
 On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:


> On 11.11.14 13:10, Frank Blaschka wrote:
>> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
>>
>>
>>> On 10.11.14 15:20, Frank Blaschka wrote:
>>> From: Frank Blaschka 
>>>
>>> This patch implements the s390 pci instructions in qemu. It allows
>>> to access and drive pci devices attached to the s390 pci bus.
>>> Because of platform constrains devices using IO BARs are not
>>> supported. Also a device has to support MSI/MSI-X to run on s390.
>>>
>>> Signed-off-by: Frank Blaschka 
>>> ---
>>> target-s390x/Makefile.objs |   2 +-
>>> target-s390x/kvm.c |  52 
>>> target-s390x/pci_ic.c  | 753 
>>> +
>>> target-s390x/pci_ic.h  | 335 
>>> 4 files changed, 1141 insertions(+), 1 deletion(-)
>>> create mode 100644 target-s390x/pci_ic.c
>>> create mode 100644 target-s390x/pci_ic.h
>>>

 [...]

>>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
>>> +{
>>> +CPUS390XState *env = &cpu->env;
>>> +S390PCIBusDevice *pbdev;
>>> +uint8_t r1 = (run->s390_sieic.ipb & 0x00f0) >> 20;
>>> +uint8_t r2 = (run->s390_sieic.ipb & 0x000f) >> 16;
>>> +PciLgStg *rp;
>>> +uint64_t offset;
>>> +uint64_t data;
>>> +uint8_t len;
>>> +
>>> +cpu_synchronize_state(CPU(cpu));
>>> +
>>> +if (env->psw.mask & PSW_MASK_PSTATE) {
>>> +program_interrupt(env, PGM_PRIVILEGED, 4);
>>> +return 0;
>>> +}
>>> +
>>> +if (r2 & 0x1) {
>>> +program_interrupt(env, PGM_SPECIFICATION, 4);
>>> +return 0;
>>> +}
>>> +
>>> +rp = (PciLgStg *)&env->regs[r2];
>>> +offset = env->regs[r2 + 1];
>>> +
>>> +pbdev = s390_pci_find_dev_by_fh(rp->fh);
>>> +if (!pbdev) {
>>> +DPRINTF("pcilg no pci dev\n");
>>> +setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
>>> +return 0;
>>> +}
>>> +
>>> +len = rp->len & 0xF;
>>> +if (rp->pcias < 6) {
>>> +if ((8 - (offset & 0x7)) < len) {
>>> +program_interrupt(env, PGM_OPERAND, 4);
>>> +return 0;
>>> +}
>>> +MemoryRegion *mr = 
>>> pbdev->pdev->io_regions[rp->pcias].memory;
>>> +io_mem_read(mr, offset, &data, len);
>>> +} else if (rp->pcias == 15) {
>>> +if ((4 - (offset & 0x3)) < len) {
>>> +program_interrupt(env, PGM_OPERAND, 4);
>>> +return 0;
>>> +}
>>> +data =  pci_host_config_read_common(
>>> +   pbdev->pdev, offset, 
>>> pci_config_size(pbdev->pdev), len);
>>> +
>>> +switch (len) {
>>> +case 1:
>>> +break;
>>> +case 2:
>>> +data = cpu_to_le16(data);
>>> +break;
>>> +case 4:
>>> +data = cpu_to_le32(data);
>>> +break;
>>> +case 8:
>>> +data = cpu_to_le64(data);
>>> +break;
>>
>> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
>> and LE. So if you're running this on an LE host, you won't swap the
>> value and get a broken result.
>>
>> If you know that the value is always swapped, use bswapxx().
>>
>
> Actually the code is right and required for a big endian host :-)
> pcilg/pcistg provide access to the PCI config space which is defined
> as PCI byte order (little endian). Since pci_host_config_read_common 
> does
> already a le to cpu conversion we have to convert back to PCI byte 
> order.
> Doing an unconditional swap would be a bug on a little endian host.

 Why would it be a bug? The value you end up writing is contents of a
 register and thus doesn't have endianness. So 

Re: [Qemu-devel] [PATCH 2/3] vl: sanity check cpu topology

2014-11-12 Thread Andrew Jones
On Tue, Nov 11, 2014 at 04:31:24PM -0200, Eduardo Habkost wrote:
> On Tue, Nov 11, 2014 at 03:37:11PM +0100, Andrew Jones wrote:
> [...]
> > Below is a v2 I can post if it looks good to you.
> > 
> > From: Andrew Jones 
> > Date: Fri, 7 Nov 2014 15:45:07 +0100
> > Subject: [PATCH v2] vl: sanity check cpu topology
> > 
> > smp_parse allows partial or complete cpu topology to be given.
> > In either case there may be inconsistencies in the input which
> > are currently not sounding any alarms. In some cases the input
> > is even being silently corrected. Stop silently adjusting input
> > and abort when the complete cpu topology has been input, but
> > isn't correct.
> > 
> > Signed-off-by: Andrew Jones 
> 
> After applying this patch:
> 
>   $ ./install/bin/qemu-system-x86_64 -smp 12
>   cpu topology: error: sockets (1) * cores (1) * threads (1) < smp_cpus (12)
> 
> That is why I wanted to address the most obvious (and less risky) issues first
> (aborting only if all options were explicitly set), and touch automatic
> calculation later.
>

Oh right. I fixed that once, but then lost the change when trying to
produce these half fixes.

drew



Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions

2014-11-12 Thread Frank Blaschka
On Wed, Nov 12, 2014 at 10:08:19AM +0100, Alexander Graf wrote:
> 
> 
> On 12.11.14 09:49, Frank Blaschka wrote:
> > On Tue, Nov 11, 2014 at 04:24:24PM +0100, Alexander Graf wrote:
> >>
> >>
> >> On 11.11.14 15:08, Frank Blaschka wrote:
> >>> On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote:
> 
> 
> 
> > Am 11.11.2014 um 13:39 schrieb Frank Blaschka 
> > :
> >
> >> On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:
> >>
> >>
> >>> On 11.11.14 13:10, Frank Blaschka wrote:
>  On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
> 
> 
> > On 10.11.14 15:20, Frank Blaschka wrote:
> > From: Frank Blaschka 
> >
> > This patch implements the s390 pci instructions in qemu. It allows
> > to access and drive pci devices attached to the s390 pci bus.
> > Because of platform constrains devices using IO BARs are not
> > supported. Also a device has to support MSI/MSI-X to run on s390.
> >
> > Signed-off-by: Frank Blaschka 
> > ---
> > target-s390x/Makefile.objs |   2 +-
> > target-s390x/kvm.c |  52 
> > target-s390x/pci_ic.c  | 753 
> > +
> > target-s390x/pci_ic.h  | 335 
> > 4 files changed, 1141 insertions(+), 1 deletion(-)
> > create mode 100644 target-s390x/pci_ic.c
> > create mode 100644 target-s390x/pci_ic.h
> >
> >>
> >> [...]
> >>
> > +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
> > +{
> > +CPUS390XState *env = &cpu->env;
> > +S390PCIBusDevice *pbdev;
> > +uint8_t r1 = (run->s390_sieic.ipb & 0x00f0) >> 20;
> > +uint8_t r2 = (run->s390_sieic.ipb & 0x000f) >> 16;
> > +PciLgStg *rp;
> > +uint64_t offset;
> > +uint64_t data;
> > +uint8_t len;
> > +
> > +cpu_synchronize_state(CPU(cpu));
> > +
> > +if (env->psw.mask & PSW_MASK_PSTATE) {
> > +program_interrupt(env, PGM_PRIVILEGED, 4);
> > +return 0;
> > +}
> > +
> > +if (r2 & 0x1) {
> > +program_interrupt(env, PGM_SPECIFICATION, 4);
> > +return 0;
> > +}
> > +
> > +rp = (PciLgStg *)&env->regs[r2];
> > +offset = env->regs[r2 + 1];
> > +
> > +pbdev = s390_pci_find_dev_by_fh(rp->fh);
> > +if (!pbdev) {
> > +DPRINTF("pcilg no pci dev\n");
> > +setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
> > +return 0;
> > +}
> > +
> > +len = rp->len & 0xF;
> > +if (rp->pcias < 6) {
> > +if ((8 - (offset & 0x7)) < len) {
> > +program_interrupt(env, PGM_OPERAND, 4);
> > +return 0;
> > +}
> > +MemoryRegion *mr = 
> > pbdev->pdev->io_regions[rp->pcias].memory;
> > +io_mem_read(mr, offset, &data, len);
> > +} else if (rp->pcias == 15) {
> > +if ((4 - (offset & 0x3)) < len) {
> > +program_interrupt(env, PGM_OPERAND, 4);
> > +return 0;
> > +}
> > +data =  pci_host_config_read_common(
> > +   pbdev->pdev, offset, 
> > pci_config_size(pbdev->pdev), len);
> > +
> > +switch (len) {
> > +case 1:
> > +break;
> > +case 2:
> > +data = cpu_to_le16(data);
> > +break;
> > +case 4:
> > +data = cpu_to_le32(data);
> > +break;
> > +case 8:
> > +data = cpu_to_le64(data);
> > +break;
> 
>  Why? Also, this is wrong. cpu_to_le64 convert between host endianness
>  and LE. So if you're running this on an LE host, you won't swap the
>  value and get a broken result.
> 
>  If you know that the value is always swapped, use bswapxx().
> 
> >>>
> >>> Actually the code is right and required for a big endian host :-)
> >>> pcilg/pcistg provide access to the PCI config space which is defined
> >>> as PCI byte order (little endian). Since pci_host_config_read_common 
> >>> does
> >>> already a le to cpu conversion we have to convert back to PCI byte 
> >>> order.
> >>> Doing an unconditional swap would be a bug on a little endian host.
> >>
> >> Why would it be a bug? The value you end up writing is contents of a
> >> register and thus doesn't have endianness. So if QEMU was an LE 
> >> pr

Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions

2014-11-12 Thread Alexander Graf


On 12.11.14 10:19, Frank Blaschka wrote:
> On Wed, Nov 12, 2014 at 10:08:19AM +0100, Alexander Graf wrote:
>>
>>
>> On 12.11.14 09:49, Frank Blaschka wrote:
>>> On Tue, Nov 11, 2014 at 04:24:24PM +0100, Alexander Graf wrote:


 On 11.11.14 15:08, Frank Blaschka wrote:
> On Tue, Nov 11, 2014 at 01:51:25PM +0100, Alexander Graf wrote:
>>
>>
>>
>>> Am 11.11.2014 um 13:39 schrieb Frank Blaschka 
>>> :
>>>
 On Tue, Nov 11, 2014 at 01:16:04PM +0100, Alexander Graf wrote:


> On 11.11.14 13:10, Frank Blaschka wrote:
>> On Mon, Nov 10, 2014 at 04:56:21PM +0100, Alexander Graf wrote:
>>
>>
>>> On 10.11.14 15:20, Frank Blaschka wrote:
>>> From: Frank Blaschka 
>>>
>>> This patch implements the s390 pci instructions in qemu. It allows
>>> to access and drive pci devices attached to the s390 pci bus.
>>> Because of platform constrains devices using IO BARs are not
>>> supported. Also a device has to support MSI/MSI-X to run on s390.
>>>
>>> Signed-off-by: Frank Blaschka 
>>> ---
>>> target-s390x/Makefile.objs |   2 +-
>>> target-s390x/kvm.c |  52 
>>> target-s390x/pci_ic.c  | 753 
>>> +
>>> target-s390x/pci_ic.h  | 335 
>>> 4 files changed, 1141 insertions(+), 1 deletion(-)
>>> create mode 100644 target-s390x/pci_ic.c
>>> create mode 100644 target-s390x/pci_ic.h
>>>

 [...]

>>> +int kvm_pcilg_service_call(S390CPU *cpu, struct kvm_run *run)
>>> +{
>>> +CPUS390XState *env = &cpu->env;
>>> +S390PCIBusDevice *pbdev;
>>> +uint8_t r1 = (run->s390_sieic.ipb & 0x00f0) >> 20;
>>> +uint8_t r2 = (run->s390_sieic.ipb & 0x000f) >> 16;
>>> +PciLgStg *rp;
>>> +uint64_t offset;
>>> +uint64_t data;
>>> +uint8_t len;
>>> +
>>> +cpu_synchronize_state(CPU(cpu));
>>> +
>>> +if (env->psw.mask & PSW_MASK_PSTATE) {
>>> +program_interrupt(env, PGM_PRIVILEGED, 4);
>>> +return 0;
>>> +}
>>> +
>>> +if (r2 & 0x1) {
>>> +program_interrupt(env, PGM_SPECIFICATION, 4);
>>> +return 0;
>>> +}
>>> +
>>> +rp = (PciLgStg *)&env->regs[r2];
>>> +offset = env->regs[r2 + 1];
>>> +
>>> +pbdev = s390_pci_find_dev_by_fh(rp->fh);
>>> +if (!pbdev) {
>>> +DPRINTF("pcilg no pci dev\n");
>>> +setcc(cpu, ZPCI_PCI_LS_INVAL_HANDLE);
>>> +return 0;
>>> +}
>>> +
>>> +len = rp->len & 0xF;
>>> +if (rp->pcias < 6) {
>>> +if ((8 - (offset & 0x7)) < len) {
>>> +program_interrupt(env, PGM_OPERAND, 4);
>>> +return 0;
>>> +}
>>> +MemoryRegion *mr = 
>>> pbdev->pdev->io_regions[rp->pcias].memory;
>>> +io_mem_read(mr, offset, &data, len);
>>> +} else if (rp->pcias == 15) {
>>> +if ((4 - (offset & 0x3)) < len) {
>>> +program_interrupt(env, PGM_OPERAND, 4);
>>> +return 0;
>>> +}
>>> +data =  pci_host_config_read_common(
>>> +   pbdev->pdev, offset, 
>>> pci_config_size(pbdev->pdev), len);
>>> +
>>> +switch (len) {
>>> +case 1:
>>> +break;
>>> +case 2:
>>> +data = cpu_to_le16(data);
>>> +break;
>>> +case 4:
>>> +data = cpu_to_le32(data);
>>> +break;
>>> +case 8:
>>> +data = cpu_to_le64(data);
>>> +break;
>>
>> Why? Also, this is wrong. cpu_to_le64 convert between host endianness
>> and LE. So if you're running this on an LE host, you won't swap the
>> value and get a broken result.
>>
>> If you know that the value is always swapped, use bswapxx().
>>
>
> Actually the code is right and required for a big endian host :-)
> pcilg/pcistg provide access to the PCI config space which is defined
> as PCI byte order (little endian). Since pci_host_config_read_common 
> does
> already a le to cpu conversion we have to convert back to PCI byte 
> order.
> Doing an unconditional swap would be a bug on a little endian host.

 Why would it be a bug? The value you end up writing is contents of a
 register and thus doesn't have

Re: [Qemu-devel] [PATCH v2 1/3] vl: Avoid unnecessary 'if' nesting

2014-11-12 Thread Andrew Jones
On Tue, Nov 11, 2014 at 04:50:54PM -0200, Eduardo Habkost wrote:
> Just a coding style change, to make other changes easier to review.
> 
> Signed-off-by: Eduardo Habkost 
> ---
>  vl.c | 10 --
>  1 file changed, 4 insertions(+), 6 deletions(-)
> 
> diff --git a/vl.c b/vl.c
> index f4a6e5e..c1bfc9b 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1284,13 +1284,11 @@ static void smp_parse(QemuOpts *opts)
>  if (cpus == 0) {
>  cpus = cores * threads * sockets;
>  }
> +} else if (cores == 0) {
> +threads = threads > 0 ? threads : 1;
> +cores = cpus / (sockets * threads);
>  } else {
> -if (cores == 0) {
> -threads = threads > 0 ? threads : 1;
> -cores = cpus / (sockets * threads);
> -} else {
> -threads = cpus / (cores * sockets);
> -}
> +threads = cpus / (cores * sockets);
>  }
>  
>  max_cpus = qemu_opt_get_number(opts, "maxcpus", 0);
> -- 
> 1.9.3
>

Reviewed-by: Andrew Jones  



[Qemu-devel] [PATCH v3] esp: Do not overwrite ESP_TCHI after reset

2014-11-12 Thread Paolo Bonzini
From: Hannes Reinecke 

After a reset ESP_TCHI should contain the unique ID
of the chip. This value will be overwritten with the
current tranfer count if the transfer count has
previously been set.
So we should always return the chip id if ESP_TCHI
has never been written to.

Signed-off-by: Hannes Reinecke 
Signed-off-by: Paolo Bonzini 
---
 hw/scsi/esp.c | 11 +--
 include/hw/scsi/esp.h |  1 +
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/esp.c b/hw/scsi/esp.c
index 5ab44d8..272d13d 100644
--- a/hw/scsi/esp.c
+++ b/hw/scsi/esp.c
@@ -364,7 +364,7 @@ void esp_hard_reset(ESPState *s)
 {
 memset(s->rregs, 0, ESP_REGS);
 memset(s->wregs, 0, ESP_REGS);
-s->rregs[ESP_TCHI] = s->chip_id;
+s->tchi_written = 0;
 s->ti_size = 0;
 s->ti_rptr = 0;
 s->ti_wptr = 0;
@@ -422,6 +422,11 @@ uint64_t esp_reg_read(ESPState *s, uint32_t saddr)
 esp_lower_irq(s);
 
 return old_val;
+case ESP_TCHI:
+/* Return the unique id if the value has never been written */
+if (!s->tchi_written) {
+return s->chip_id;
+}
 default:
 break;
 }
@@ -432,9 +437,11 @@ void esp_reg_write(ESPState *s, uint32_t saddr, uint64_t 
val)
 {
 trace_esp_mem_writeb(saddr, s->wregs[saddr], val);
 switch (saddr) {
+case ESP_TCHI:
+s->tchi_written = true;
+/* fall through */
 case ESP_TCLO:
 case ESP_TCMID:
-case ESP_TCHI:
 s->rregs[ESP_RSTAT] &= ~STAT_TC;
 break;
 case ESP_FIFO:
diff --git a/include/hw/scsi/esp.h b/include/hw/scsi/esp.h
index e079fb8..6c79527 100644
--- a/include/hw/scsi/esp.h
+++ b/include/hw/scsi/esp.h
@@ -22,6 +22,7 @@ struct ESPState {
 uint8_t wregs[ESP_REGS];
 qemu_irq irq;
 uint8_t chip_id;
+bool tchi_written;
 int32_t ti_size;
 uint32_t ti_rptr, ti_wptr;
 uint32_t status;
-- 
1.8.3.1




Re: [Qemu-devel] [PATCH v2 3/3] vl: Don't silently change topology when all -smp options were set

2014-11-12 Thread Andrew Jones
On Tue, Nov 11, 2014 at 04:50:56PM -0200, Eduardo Habkost wrote:
> QEMU tries to change the "threads" option even if it was explicitly set
> in the command-line, and it shouldn't do that.
> 
> The right thing to do when all options (cpus, sockets, cores, threds)
> are explicitly set is to sanity check them and abort in case they don't
> make sense (i.e. when sockets*cores*threads < cpus).
> 
> Signed-off-by: Eduardo Habkost 
> ---
>  vl.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/vl.c b/vl.c
> index 2ed8b07..8880a4e 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -1287,8 +1287,14 @@ static void smp_parse(QemuOpts *opts)
>  } else if (cores == 0) {
>  threads = threads > 0 ? threads : 1;
>  cores = cpus / (sockets * threads);
> -} else {
> +} else if (threads == 0) {
>  threads = cpus / (cores * sockets);
> +} else if (sockets * cores * threads < cpus) {
> +fprintf(stderr, "cpu topology: error: "
> +"sockets (%u) * cores (%u) * threads (%u) < "
> +"smp_cpus (%u)\n",
> +sockets, cores, threads, cpus);
> +exit(1);
>  }
>  
>  max_cpus = qemu_opt_get_number(opts, "maxcpus", 0);
> -- 
> 1.9.3
>

Reviewed-by: Andrew Jones 



Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 10:22, Alexander Graf wrote:
 Absolutely lets make an example for qemu running on BE and LE

 byte orderconfig space backing   pci_default_read_config   pcilg (with 
 cpu_to_le)
 BE0x78563412 0x123456780x78563412
 LE0x78563412 0x785634120x78563412
>>>
>>> No, pci_default_read_config() always returns 0x12345678 because it
>>> returns a register, not memory.
>>>
>>
>> You mean implementation of pci_default_read_config is broken?
>> If it should return a register it should not do "return le32_to_cpu(val);"
> 
> It has to, to convert from memory (after memcpy) to an actual register
> value. Look at the value list in Paolo's email - I really have no idea
> how to explain it any better.

pci_default_read_config is reading from a *device* register, and has
absolutely zero knowledge of the host CPU endianness.

Another way to explain that the result of pci_default_read_config is
independent of the host endianness, is that the function is basically
doing this:

switch (len) {
case 1: return d->config[address];
case 2: return ldw_le_p(&d->config[address)]);
case 4: return ldl_le_p(&d->config[address)]);
default: abort();
}

So if you want to make the outcome big endian, you have to swap
unconditionally.

Paolo



[Qemu-devel] [PATCH 2/4] exec: add wrapper for host pointer access

2014-11-12 Thread Michael S. Tsirkin
host pointer accesses force pointer math, let's
add a wrapper to make them safer.

Signed-off-by: Michael S. Tsirkin 
---
 include/exec/cpu-all.h |  5 +
 exec.c | 10 +-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index c085804..9d8d408 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -313,6 +313,11 @@ typedef struct RAMBlock {
 int fd;
 } RAMBlock;
 
+static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset)
+{
+return (char *)block->host + offset;
+}
+
 typedef struct RAMList {
 QemuMutex mutex;
 /* Protected by the iothread lock.  */
diff --git a/exec.c b/exec.c
index ad5cf12..9648669 100644
--- a/exec.c
+++ b/exec.c
@@ -840,7 +840,7 @@ static void tlb_reset_dirty_range_all(ram_addr_t start, 
ram_addr_t length)
 
 block = qemu_get_ram_block(start);
 assert(block == qemu_get_ram_block(end - 1));
-start1 = (uintptr_t)block->host + (start - block->offset);
+start1 = (uintptr_t)ramblock_ptr(block, start - block->offset);
 cpu_tlb_reset_dirty_all(start1, length);
 }
 
@@ -1500,7 +1500,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
 QTAILQ_FOREACH(block, &ram_list.blocks, next) {
 offset = addr - block->offset;
 if (offset < block->length) {
-vaddr = block->host + offset;
+vaddr = ramblock_ptr(block, offset);
 if (block->flags & RAM_PREALLOC) {
 ;
 } else if (xen_enabled()) {
@@ -1551,7 +1551,7 @@ void *qemu_get_ram_block_host_ptr(ram_addr_t addr)
 {
 RAMBlock *block = qemu_get_ram_block(addr);
 
-return block->host;
+return ramblock_ptr(block, 0);
 }
 
 /* Return a host pointer to ram allocated with qemu_ram_alloc.
@@ -1578,7 +1578,7 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 xen_map_cache(block->offset, block->length, 1);
 }
 }
-return block->host + (addr - block->offset);
+return ramblock_ptr(block, addr - block->offset);
 }
 
 /* Return a host pointer to guest's ram. Similar to qemu_get_ram_ptr
@@ -1597,7 +1597,7 @@ static void *qemu_ram_ptr_length(ram_addr_t addr, hwaddr 
*size)
 if (addr - block->offset < block->length) {
 if (addr - block->offset + *size > block->length)
 *size = block->length - addr + block->offset;
-return block->host + (addr - block->offset);
+return ramblock_ptr(block, addr - block->offset);
 }
 }
 
-- 
MST




[Qemu-devel] [PATCH 0/4] migration: fix CVE-2014-7840

2014-11-12 Thread Michael S. Tsirkin
This patchset fixes CVE-2014-7840: invalid
migration stream can cause arbitrary qemu memory
overwrite.
First patch includes the minimal fix for the issue.
Follow-up patches on top add extra checking to reduce the
chance this kind of bug recurs.

Note: these are already (tentatively-pending review)
queued in my tree, so only review/ack
is necessary.

Michael S. Tsirkin (4):
  migration: fix parameter validation on ram load
  exec: add wrapper for host pointer access
  cpu: assert host pointer offset within block
  cpu: verify that block->host is set

 include/exec/cpu-all.h |  7 +++
 arch_init.c|  5 +++--
 exec.c | 10 +-
 3 files changed, 15 insertions(+), 7 deletions(-)

-- 
MST




[Qemu-devel] [PATCH 4/4] cpu: verify that block->host is set

2014-11-12 Thread Michael S. Tsirkin
If it isn't, access at an offset will cause memory corruption.

Signed-off-by: Michael S. Tsirkin 
---
 include/exec/cpu-all.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 7c3a5e7..62f5581 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -316,6 +316,7 @@ typedef struct RAMBlock {
 static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset)
 {
 assert(offset < block->length);
+assert(block->host);
 return (char *)block->host + offset;
 }
 
-- 
MST




[Qemu-devel] [PATCH 1/4] migration: fix parameter validation on ram load

2014-11-12 Thread Michael S. Tsirkin
During migration, the values read from migration stream during ram load
are not validated. Especially offset in host_from_stream_offset() and
also the length of the writes in the callers of said function.

To fix this, we need to make sure that the [offset, offset + length]
range fits into one of the allocated memory regions.

Validating addr < len should be sufficient since data seems to always be
managed in TARGET_PAGE_SIZE chunks.

Fixes: CVE-2014-7840

Note: follow-up patches add extra checks on each block->host access.

Signed-off-by: Michael S. Tsirkin 
---
 arch_init.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 88a5ba0..593a990 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -1006,7 +1006,7 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 uint8_t len;
 
 if (flags & RAM_SAVE_FLAG_CONTINUE) {
-if (!block) {
+if (!block || block->length <= offset) {
 error_report("Ack, bad migration stream!");
 return NULL;
 }
@@ -1019,8 +1019,9 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 id[len] = 0;
 
 QTAILQ_FOREACH(block, &ram_list.blocks, next) {
-if (!strncmp(id, block->idstr, sizeof(id)))
+if (!strncmp(id, block->idstr, sizeof(id)) && block->length > offset) {
 return memory_region_get_ram_ptr(block->mr) + offset;
+}
 }
 
 error_report("Can't find block %s!", id);
-- 
MST




[Qemu-devel] [PATCH 3/4] cpu: assert host pointer offset within block

2014-11-12 Thread Michael S. Tsirkin
Make accesses safer in case we missed some
check somewhere.

Signed-off-by: Michael S. Tsirkin 
---
 include/exec/cpu-all.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h
index 9d8d408..7c3a5e7 100644
--- a/include/exec/cpu-all.h
+++ b/include/exec/cpu-all.h
@@ -315,6 +315,7 @@ typedef struct RAMBlock {
 
 static inline void *ramblock_ptr(RAMBlock *block, ram_addr_t offset)
 {
+assert(offset < block->length);
 return (char *)block->host + offset;
 }
 
-- 
MST




Re: [Qemu-devel] [PATCH 1/4] migration: fix parameter validation on ram load

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 10:44, Michael S. Tsirkin wrote:
> During migration, the values read from migration stream during ram load
> are not validated. Especially offset in host_from_stream_offset() and
> also the length of the writes in the callers of said function.
> 
> To fix this, we need to make sure that the [offset, offset + length]
> range fits into one of the allocated memory regions.
> 
> Validating addr < len should be sufficient since data seems to always be
> managed in TARGET_PAGE_SIZE chunks.
> 
> Fixes: CVE-2014-7840
> 
> Note: follow-up patches add extra checks on each block->host access.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
>  arch_init.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/arch_init.c b/arch_init.c
> index 88a5ba0..593a990 100644
> --- a/arch_init.c
> +++ b/arch_init.c
> @@ -1006,7 +1006,7 @@ static inline void *host_from_stream_offset(QEMUFile *f,
>  uint8_t len;
>  
>  if (flags & RAM_SAVE_FLAG_CONTINUE) {
> -if (!block) {
> +if (!block || block->length <= offset) {
>  error_report("Ack, bad migration stream!");
>  return NULL;
>  }
> @@ -1019,8 +1019,9 @@ static inline void *host_from_stream_offset(QEMUFile *f,
>  id[len] = 0;
>  
>  QTAILQ_FOREACH(block, &ram_list.blocks, next) {
> -if (!strncmp(id, block->idstr, sizeof(id)))
> +if (!strncmp(id, block->idstr, sizeof(id)) && block->length > 
> offset) {
>  return memory_region_get_ram_ptr(block->mr) + offset;
> +}
>  }
>  
>  error_report("Can't find block %s!", id);
> 

Sort-of Yoda conditionals, i.e. I think "offset >= block->length" and
"offset < block->length" would have been more common and indeed that's
what you use in patch 3.  That's the only comment I have.

Series

Reviewed-by: Paolo Bonzini 




Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment

2014-11-12 Thread Max Reitz

On 2014-11-12 at 05:15, Eric Blake wrote:

On 11/10/2014 06:45 AM, Max Reitz wrote:

Add a function qcow2_change_refcount_order() which allows changing the
refcount order of a qcow2 image.

Signed-off-by: Max Reitz 
---
  block/qcow2-refcount.c | 424 +
  block/qcow2.h  |   4 +
  2 files changed, 428 insertions(+)

This is fairly big; you may want to get a second review from a
maintainer rather than blindly trusting me.


I didn't really see the point in splitting it up. Introducing the static 
helper functions first would not be very helpful either, so I thought 
what I'd do as a reviewer, and that was "apply the patch and just read 
through all the code". Splitting it into multiple patches would not have 
helped there (and I don't see how I could split this patch into logical 
changes, where at the start we have some inefficient but simple 
implementation and it gets better over time).


I'll need a review from a maintainer anyway, but I won't get one without 
being able to show another review first...



My review was not linear, but I left the email in linear order.  Feel
free to ask for clarification if my presentation is too hard to follow.


+
+/**
+ * This "operation" for walk_over_reftable() allocates the refblock on disk (if
+ * it is not empty) and inserts its offset into the new reftable. The size of
+ * this new reftable is increased as required.
+ */
+static int alloc_refblock(BlockDriverState *bs, uint64_t **reftable,
+  uint64_t reftable_index, uint64_t *reftable_size,
+  void *refblock, bool refblock_empty, Error **errp)
+{
+BDRVQcowState *s = bs->opaque;
+int64_t offset;
+
+if (!refblock_empty && reftable_index >= *reftable_size) {
+uint64_t *new_reftable;
+uint64_t new_reftable_size;
+
+new_reftable_size = ROUND_UP(reftable_index + 1,
+ s->cluster_size / sizeof(uint64_t));
+if (new_reftable_size > QCOW_MAX_REFTABLE_SIZE / sizeof(uint64_t)) {
+error_setg(errp,
+   "This operation would make the refcount table grow "
+   "beyond the maximum size supported by QEMU, aborting");
+return -ENOTSUP;
+}
+
+new_reftable = g_try_realloc(*reftable, new_reftable_size *
+sizeof(uint64_t));

Safe from overflow based on checks a few lines earlier.  Good.


+if (!new_reftable) {
+error_setg(errp, "Failed to increase reftable buffer size");
+return -ENOMEM;
+}
+
+memset(new_reftable + *reftable_size, 0,
+   (new_reftable_size - *reftable_size) * sizeof(uint64_t));
+
+*reftable  = new_reftable;
+*reftable_size = new_reftable_size;

Just to check my math here:

Suppose we have an image with 512-byte clusters, and are changing from
16-bit refcount (order 4) to 64-bit refcount.  Also, suppose the
existing image has exactly filled a one-cluster refcount table (that is,
there are 64 refcount blocks, each describing at a refblock with all 256
refcount entries full, for a total image size of exactly 8M).  The
original image occupies a header (1 cluster), L1 and L2 tables, and
data; but 65 of the 16k clusters tied up in the image are dedicated to
the refcount structures.

Meanwhile, the new refcount table will have to point to 256 refcount
blocks, each holding only 64 entries, which in turn implies that the
refcount table now has to be at least 4 clusters long.  But as this
requires at least 260 clusters to represent, then even if we were able
to reuse the 65 clusters of the original table, we'd still be allocating
at least 195 clusters; in reality, your code doesn't free any old
clusters until after allocating the new, because it is easier to keep
the old table live until the new table is populated.  The process of
allocating the new clusters means we actually end up with a new refcount
table of 5 clusters long, where not all 320 refblocks will be populated.
  But as long as we are keeping the old table up-to-date for the refblock
allocations, it ALSO means that we caused a rollover of the old table
from 1 cluster into 2, which itself consumes several clusters (the
larger table must be contiguous, and we must also set up a refblock to
describe the larger table, so we've added at least three clusters
associated to the original table during the course of preparing the new
table).

Hmm - that means I found a bug in your implementation.  See [3] below.


+}
+
+if (refblock_empty) {
+if (reftable_index < *reftable_size) {
+(*reftable)[reftable_index] = 0;

Necessary since you used g_try_realloc which leaves the new reftable
uninitialized.  Reasonable (rather than a memset) since the caller will
be visiting every single refblock in the table anyways.


+}
+} else {
+offset = qcow2_alloc_

[Qemu-devel] [Bug 1363641] Re: Build of v2.1.0 fails on armv7l due to undeclared __NR_select

2014-11-12 Thread Eduardo Otubo
This commit temporarily fixes this problem.

http://git.qemu.org/?p=qemu.git;a=commit;h=4cc47f8b3cc4f32586ba2f7fce1dc267da774a69

As soon as libseccomp makes a new release I'll update the dependency and
hopefully it will be fixed with proper library support.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1363641

Title:
  Build of v2.1.0 fails on armv7l due to undeclared __NR_select

Status in QEMU:
  New

Bug description:
  After `make clean` and `git clean -x -f -d` `git checkout v2.1.0 &&
  configure --prefix=/home/user/prefix-qemu-2.1.0 && make` fails due to
  missing declarations

  CCqemu-seccomp.o
  qemu-seccomp.c:28:1: error: '__NR_select' undeclared here (not in a 
function)
  qemu-seccomp.c:36:1: error: '__NR_mmap' undeclared here (not in a 
function)
  qemu-seccomp.c:57:1: error: '__NR_getrlimit' undeclared here (not in a 
function)
  qemu-seccomp.c:96:1: error: '__NR_time' undeclared here (not in a 
function)
    GEN   qmp-marshal.c
  qemu-seccomp.c:186:1: error: '__NR_alarm' undeclared here (not in a 
function)
  make: *** [qemu-seccomp.o] Error 1

  Same errors for master 8b3030114a449e66c68450acaac4b66f26d91416.
  `configure`should not succeed for a failing build if the error occurs
  due to missing dependencies, if it's a bug it needs to be fixed.
  `config.log` for v2.1.0 and 8b303011... attached. The content is
  mostly compiler output which I think is unusual for `config.log`, but
  see for yourself.

  I'm building on a debian 7.6 chroot on Synology DSM 5.0. `uname -a`
  says `Linux diskstatation 3.2.40 #4493 SMP Thu Aug 21 21:43:02 CST
  2014 armv7l GNU/Linux`.

  After installing some of the missing dependencies, i.e. `apt-get
  install liblzo2-dev libbsd-dev syslinux-common libhwloc-dev librdmacm-
  dev libsnappy-dev libibverbs-dev valgrind linux-
  headers-3.2.0-4-common` I'm getting

   CCmigration-rdma.o
  migration-rdma.c: In function 'ram_chunk_start':
  migration-rdma.c:523:12: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
  migration-rdma.c: In function '__qemu_rdma_add_block':
  migration-rdma.c:556:49: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
  migration-rdma.c:557:49: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
  migration-rdma.c: In function '__qemu_rdma_delete_block':
  migration-rdma.c:664:45: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
  migration-rdma.c:699:49: error: cast to pointer from integer of different 
size [-Werror=int-to-pointer-cast]
  migration-rdma.c: In function 'qemu_rdma_search_ram_block':
  migration-rdma.c:1113:49: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
  migration-rdma.c: In function 'qemu_rdma_register_and_get_keys':
  migration-rdma.c:1176:50: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
  migration-rdma.c:1177:29: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
  migration-rdma.c:1177:51: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
  migration-rdma.c:1178:29: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
  migration-rdma.c: In function 'qemu_rdma_post_send_control':
  migration-rdma.c:1562:36: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
  migration-rdma.c: In function 'qemu_rdma_post_recv_control':
  migration-rdma.c:1616:37: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
  migration-rdma.c: In function 'qemu_rdma_write_one':
  migration-rdma.c:1864:16: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
  migration-rdma.c:1868:53: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
  migration-rdma.c:1922:52: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
  migration-rdma.c:1923:50: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
  migration-rdma.c:1977:49: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
  migration-rdma.c:1998:49: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
  migration-rdma.c:2010:58: error: cast to pointer from integer of 
different size [-Werror=int-to-pointer-cast]
  migration-rdma.c: In function 'qemu_rdma_registration_handle':
  migration-rdma.c:3027:21: error: cast from pointer to integer of 
different size [-Werror=pointer-to-int-cast]
  migration-rdma.c:3092:41: error: cast from pointer to integer of 
different

Re: [Qemu-devel] [PATCH] virtio-scsi: dataplane: suppress guest notification

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 04:24, Ming Lei wrote:
> This patch uses vring_should_notify() to suppress
> guest notification, and looks notification frequency
> can be decreased from ~33K/sec to ~2K/sec in my test
> environment.
> 
> Suggested-by: Stefan Hajnoczi 
> Signed-off-by: Ming Lei 
> ---
>  hw/scsi/virtio-scsi-dataplane.c |7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c
> index df17229..be3b042 100644
> --- a/hw/scsi/virtio-scsi-dataplane.c
> +++ b/hw/scsi/virtio-scsi-dataplane.c
> @@ -93,9 +93,14 @@ VirtIOSCSIReq *virtio_scsi_pop_req_vring(VirtIOSCSI *s,
>  
>  void virtio_scsi_vring_push_notify(VirtIOSCSIReq *req)
>  {
> +VirtIODevice *vdev = VIRTIO_DEVICE(req->vring->parent);
> +
>  vring_push(&req->vring->vring, &req->elem,
> req->qsgl.size + req->resp_iov.size);
> -event_notifier_set(&req->vring->guest_notifier);
> +
> +if (vring_should_notify(vdev, &req->vring->vring)) {
> +event_notifier_set(&req->vring->guest_notifier);
> +}
>  }
>  
>  static void virtio_scsi_iothread_handle_ctrl(EventNotifier *notifier)
> 

Thanks, applied.

Paolo



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Mark Rutland
On Tue, Nov 11, 2014 at 09:33:12PM +, Christoffer Dall wrote:
> On Tue, Nov 11, 2014 at 04:48:07PM +, Mark Rutland wrote:
> > Hi Christoffer,
> > 
> > On Tue, Nov 11, 2014 at 04:31:01PM +, Christoffer Dall wrote:
> > > On Tue, Nov 11, 2014 at 03:29:33PM +, Mark Rutland wrote:
> > > > Hi,
> > > > 
> > > > On Thu, Nov 06, 2014 at 01:33:20PM +, Alexander Spyridakis wrote:
> > > > > On 6 November 2014 14:44, Peter Maydell  
> > > > > wrote:
> > > > > >
> > > > > >
> > > > > > > We need ACPI guest support in QEMU for AArch64 over here, with 
> > > > > > > all features
> > > > > > > (including the ability to run ACPI code and add specific tables), 
> > > > > > > for
> > > > > > > ACPI-based guests.
> > > > > >
> > > > > > The plan for providing ACPI to guests is that we run a UEFI BIOS
> > > > > > blob which is what is responsible for providing ACPI and UEFI
> > > > > > runtime services to guests which need them. (The UEFI blob finds
> > > > > > out about its hardware by looking at a device tree that QEMU
> > > > > > passes it, but that's a detail between QEMU and its bios blob).
> > > > > > This pretty much looks like what x86 QEMU used to do with ACPI
> > > > > > for a very long time, so we know it's a feasible approach.
> > > > > 
> > > > > Hi Peter,
> > > > > 
> > > > > The rational in the proposed approach is meant for cases where the
> > > > > user does not want to rely on external firmware layers. While UEFI
> > > > > could do what you are describing, the point is to avoid this not so
> > > > > trivial overhead in the booting process. Especially in the case of
> > > > > thin guests, where another software dependency is undesired.
> > > > 
> > > > I'm not sure how you plan to use ACPI without UEFI, as there are several
> > > > pieces of information which ACPI misses, such as the memory map, which
> > > > must be discovered from UEFI. How do you intend to discover the memory
> > > > map without UEFI?
> > > > 
> > > > Additionally, with Linux and other generic OSs, the expectation is that
> > > > the ACPI tables are discovered via the UEFI system table. How do you
> > > > intend to discover the ACPI tables? Or other system information?
> > > 
> > > FWIW, Xen needs to pass the RDSP pointer along with a tiny DT containing
> > > the command line and memory information to Dom0 as well.
> > 
> > When you say "memory information", is that pointers to a UEFI memory
> > map, or memory nodes? The former should work for ACPI, but I don't think
> > the latter will. I think there's a need for some discussion regarding
> > the Dom0 boot flow for ACPI. Is there any tree I can take a peek at?
> 
> Plain memory nodes.  There is no UEFI instance for Dom0.  AFAIU x86 does
> something similar (although with some custom PV thing instead of DT),
> and when Dom0 needs UEFI runtime services, this is done through specific
> hypercalls.
> 
> The Xen code is incomplete for this work, but can be followed here:
> https://git.linaro.org/people/parth.dixit/acpi-rsdp/xen.git/shortlog/refs/heads/explore-rsdp

Thanks.

> The Linux side is stuff based on the LEG kernel I think, not sure if
> it's pushed anywhere yet.
> 
> I'm cc'ing Parth and Julien here, but I agree that having a discussion
> on this could probably be good.

Sounds good to me. That might be worth running as a separate thread so
as not to confuse matters.

Perhaps just using memory nodes is fine, but so far all of the
discussions I've been in (on mailing lists and in person) regarding ACPI
have had the fundamental assumption that ACPI would require UEFI, and
the UEFI memory map is in use. Given that assumption seems to be broken
for this case, we need to revisit those discussions.

There's also a problem in that this opens the possibility of non-Xen
!UEFI + ACPI configurations, which I don't think is something we want to
encourage. Xen is somewhat a special case because of the symbiotic
relationship with Dom0.

> > Passing just the RDSP will mean that Dom0 won't get SMBIOS tables and
> > other potentially useful things, in addition to simply being yet another
> > potential boot configuration. I'm a little concerned about that.
> > 
> 
> I share your concern, but running another UEFI instance for Dom0 doesn't
> seem like a viable alternative either.  Why is this a problem on ARM and
> not on x86 though?

I believe that on x86 the fallback for !UEFI would be the e820 memory
map, which provides info regarding the type of the memory mapping, as
opposed to just the base + size. That said, I'm not that familiar with
e820, and from a quick look the provided information doesn't seem to be
that detailed.

> > > We are currently suggesting adding an RDSP property to the chosen node
> > > in the tiny DT, but a command-line arguement like kexec proposed could
> > > be another option I guess, albeit not a very pretty one.
> > 
> > I'm not sure what an RDSP command line property would have to do with
> > kexec. I'll assume I've misunderstood something.
> > 
> 
> I thou

Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Christoffer Dall
On Wed, Nov 12, 2014 at 10:38:54AM +, Mark Rutland wrote:
> On Tue, Nov 11, 2014 at 09:33:12PM +, Christoffer Dall wrote:
> > On Tue, Nov 11, 2014 at 04:48:07PM +, Mark Rutland wrote:
> > > Hi Christoffer,
> > > 
> > > On Tue, Nov 11, 2014 at 04:31:01PM +, Christoffer Dall wrote:
> > > > On Tue, Nov 11, 2014 at 03:29:33PM +, Mark Rutland wrote:
> > > > > Hi,
> > > > > 
> > > > > On Thu, Nov 06, 2014 at 01:33:20PM +, Alexander Spyridakis wrote:
> > > > > > On 6 November 2014 14:44, Peter Maydell  
> > > > > > wrote:
> > > > > > >
> > > > > > >
> > > > > > > > We need ACPI guest support in QEMU for AArch64 over here, with 
> > > > > > > > all features
> > > > > > > > (including the ability to run ACPI code and add specific 
> > > > > > > > tables), for
> > > > > > > > ACPI-based guests.
> > > > > > >
> > > > > > > The plan for providing ACPI to guests is that we run a UEFI BIOS
> > > > > > > blob which is what is responsible for providing ACPI and UEFI
> > > > > > > runtime services to guests which need them. (The UEFI blob finds
> > > > > > > out about its hardware by looking at a device tree that QEMU
> > > > > > > passes it, but that's a detail between QEMU and its bios blob).
> > > > > > > This pretty much looks like what x86 QEMU used to do with ACPI
> > > > > > > for a very long time, so we know it's a feasible approach.
> > > > > > 
> > > > > > Hi Peter,
> > > > > > 
> > > > > > The rational in the proposed approach is meant for cases where the
> > > > > > user does not want to rely on external firmware layers. While UEFI
> > > > > > could do what you are describing, the point is to avoid this not so
> > > > > > trivial overhead in the booting process. Especially in the case of
> > > > > > thin guests, where another software dependency is undesired.
> > > > > 
> > > > > I'm not sure how you plan to use ACPI without UEFI, as there are 
> > > > > several
> > > > > pieces of information which ACPI misses, such as the memory map, which
> > > > > must be discovered from UEFI. How do you intend to discover the memory
> > > > > map without UEFI?
> > > > > 
> > > > > Additionally, with Linux and other generic OSs, the expectation is 
> > > > > that
> > > > > the ACPI tables are discovered via the UEFI system table. How do you
> > > > > intend to discover the ACPI tables? Or other system information?
> > > > 
> > > > FWIW, Xen needs to pass the RDSP pointer along with a tiny DT containing
> > > > the command line and memory information to Dom0 as well.
> > > 
> > > When you say "memory information", is that pointers to a UEFI memory
> > > map, or memory nodes? The former should work for ACPI, but I don't think
> > > the latter will. I think there's a need for some discussion regarding
> > > the Dom0 boot flow for ACPI. Is there any tree I can take a peek at?
> > 
> > Plain memory nodes.  There is no UEFI instance for Dom0.  AFAIU x86 does
> > something similar (although with some custom PV thing instead of DT),
> > and when Dom0 needs UEFI runtime services, this is done through specific
> > hypercalls.
> > 
> > The Xen code is incomplete for this work, but can be followed here:
> > https://git.linaro.org/people/parth.dixit/acpi-rsdp/xen.git/shortlog/refs/heads/explore-rsdp
> 
> Thanks.
> 
> > The Linux side is stuff based on the LEG kernel I think, not sure if
> > it's pushed anywhere yet.
> > 
> > I'm cc'ing Parth and Julien here, but I agree that having a discussion
> > on this could probably be good.
> 
> Sounds good to me. That might be worth running as a separate thread so
> as not to confuse matters.

Agreed.

> 
> Perhaps just using memory nodes is fine, but so far all of the
> discussions I've been in (on mailing lists and in person) regarding ACPI
> have had the fundamental assumption that ACPI would require UEFI, and
> the UEFI memory map is in use. Given that assumption seems to be broken
> for this case, we need to revisit those discussions.
> 
> There's also a problem in that this opens the possibility of non-Xen
> !UEFI + ACPI configurations, which I don't think is something we want to
> encourage. Xen is somewhat a special case because of the symbiotic
> relationship with Dom0.
> 
> > > Passing just the RDSP will mean that Dom0 won't get SMBIOS tables and
> > > other potentially useful things, in addition to simply being yet another
> > > potential boot configuration. I'm a little concerned about that.
> > > 
> > 
> > I share your concern, but running another UEFI instance for Dom0 doesn't
> > seem like a viable alternative either.  Why is this a problem on ARM and
> > not on x86 though?
> 
> I believe that on x86 the fallback for !UEFI would be the e820 memory
> map, which provides info regarding the type of the memory mapping, as
> opposed to just the base + size. That said, I'm not that familiar with
> e820, and from a quick look the provided information doesn't seem to be
> that detailed.
> 

right, the good old PC.

> > > > We are currently suggesting adding an RDSP 

Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Mark Rutland
On Wed, Nov 12, 2014 at 09:08:55AM +, Claudio Fontana wrote:
> On 11.11.2014 16:29, Mark Rutland wrote:
> > Hi,
> > 
> > On Thu, Nov 06, 2014 at 01:33:20PM +, Alexander Spyridakis wrote:
> >> On 6 November 2014 14:44, Peter Maydell  wrote:
> >>>
> >>>
>  We need ACPI guest support in QEMU for AArch64 over here, with all 
>  features
>  (including the ability to run ACPI code and add specific tables), for
>  ACPI-based guests.
> >>>
> >>> The plan for providing ACPI to guests is that we run a UEFI BIOS
> >>> blob which is what is responsible for providing ACPI and UEFI
> >>> runtime services to guests which need them. (The UEFI blob finds
> >>> out about its hardware by looking at a device tree that QEMU
> >>> passes it, but that's a detail between QEMU and its bios blob).
> >>> This pretty much looks like what x86 QEMU used to do with ACPI
> >>> for a very long time, so we know it's a feasible approach.
> >>
> >> Hi Peter,
> >>
> >> The rational in the proposed approach is meant for cases where the
> >> user does not want to rely on external firmware layers. While UEFI
> >> could do what you are describing, the point is to avoid this not so
> >> trivial overhead in the booting process. Especially in the case of
> >> thin guests, where another software dependency is undesired.
> > 
> > I'm not sure how you plan to use ACPI without UEFI, as there are several
> > pieces of information which ACPI misses, such as the memory map, which
> > must be discovered from UEFI. How do you intend to discover the memory
> > map without UEFI?
> > 
> > Additionally, with Linux and other generic OSs, the expectation is that
> > the ACPI tables are discovered via the UEFI system table. How do you
> > intend to discover the ACPI tables? Or other system information?
> > 
> > From experience with Linux, querying this information from UEFI is a
> > trivial overhead, though a UEFI implementation might take a while to
> > boot to the point where that is possible. It would be more generally
> > helpful to have an optimized virtualised UEFI for this case (or perhaps
> > just a UEFI frontend that presents the same interface to EFI
> > applications but doesn't have to do any heavy lifting at boot).
> > 
> > So far the general trend with AArch64 at the system level is to use
> > generic interfaces as far as possible. The generic interface for
> > discovering ACPI tables is to boot as an EFI application and then to
> > query the tables from UEFI. That is the interface others are likely to
> > follow, and ACPI without UEFI is unlikely to be of much use to anyone
> > else.
> > 
> > Why is it worth expending the effort on the boot protocol you suggest
> > (which so far is not well defined) when there is already a portable,
> > well-defined standard that others are already following?
> > 
> > Thanks,
> > Mark.
> > 
> 
> Hi,
> 
> I tend to mostly agree with this, we might look for alternative
> solutions for speeding up guest startup in the future but in general
> if getting ACPI in the guest for ARM64 requires also getting UEFI,
> then I can personally live with that, especially if we strive to have
> the kind of optimized virtualized UEFI you mention.

Given that UEFI will be required for other guests (e.g. if you want to
boot a distribution's ISO image), I hope that virtualised UEFI will see
some optimisation work.

> We can in the meantime use these patches to test the "fast path" to
> the guest by just hardcoding or passing on the command line what is
> needed to be able to test information reading from ACPI from the guest
> side.
> 
> I cc: my colleagues Jani and Paul which have more the use case in mind
> and
> might correct me there.
> 
> For x86 though what is the state of UEFI in QEMU? Is it relying on the
> OVMF project to provide the firmware images if I understand
> correctly..
> How is it working out in practice?
> Should the same kind of approach be taken for ARM64?

I'm unfortunately not familiar enough with OVMF to answer any of these
questions.

> As mentioned by others, I'd rather see an implementation of ACPI in
> QEMU which learns from the experience of X86 (and possibly shares some
> code if possible), rather than going in a different direction by
> creating device trees first, and then converting them to ACPI tables
> somewhere in the firmware, just because device trees are "already
> there", for the reasons which have already been mentioned before by
> Igor and others.

For the features which ACPI provides which device trees do not (e.g. the
dynamic addition and removal of memory and CPUs), there will need to be
some sort of interface between QEMU and the ACPI implementation. That's
already outside of the realm of DT, so as previously mentioned a simple
conversion doesn't cover the general case.

For most static things, device tree should be suitable as-is (i.e.
without translation to ACPI) for all currently supported guests.

I think any ACPI implemenation for a hypervisor should provide a
demon

[Qemu-devel] [PATCH] vhdx: fix g_try_malloc0 thinko

2014-11-12 Thread Paolo Bonzini
Spotted while looking at Coverity reports (though not found by
Coverity itself).

Signed-off-by: Paolo Bonzini 
---
 block/qcow2-snapshot.c | 2 +-
 block/vhdx.c   | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
index f52d7fd..27d3bc1 100644
--- a/block/qcow2-snapshot.c
+++ b/block/qcow2-snapshot.c
@@ -465,7 +465,7 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
*snapshot_id)
 BDRVQcowState *s = bs->opaque;
 QCowSnapshot *sn;
 int i, snapshot_index;
-int cur_l1_bytes, sn_l1_bytes;
+size_t cur_l1_bytes, sn_l1_bytes;
 int ret;
 uint64_t *sn_l1_table = NULL;
 
diff --git a/block/vhdx.c b/block/vhdx.c
index 87c99fc..bd1e403 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1593,7 +1593,7 @@ static int vhdx_create_bat(BlockDriverState *bs, 
BDRVVHDXState *s,
 bdrv_has_zero_init(bs) == 0) {
 /* for a fixed file, the default BAT entry is not zero */
 s->bat = g_try_malloc0(length);
-if (length && s->bat != NULL) {
+if (length && s->bat == NULL) {
 ret = -ENOMEM;
 goto exit;
 }
-- 
2.1.0




[Qemu-devel] [PATCH] target-i386: eliminate dead code and hoist common code out of "if"

2014-11-12 Thread Paolo Bonzini
ist != 0 is checked in the first "if", so it cannot be true in
the "else if" part.  While at it, simplify the code and move
the ESP alignment out of the conditionals.

Reported by Coverity.

Signed-off-by: Paolo Bonzini 
---
 target-i386/seg_helper.c | 17 -
 1 file changed, 4 insertions(+), 13 deletions(-)

diff --git a/target-i386/seg_helper.c b/target-i386/seg_helper.c
index 13eefba..a59ad70 100644
--- a/target-i386/seg_helper.c
+++ b/target-i386/seg_helper.c
@@ -883,32 +883,23 @@ static void do_interrupt64(CPUX86State *env, int intno, 
int is_int,
 }
 if ((!(e2 & DESC_C_MASK) && dpl < cpl) || ist != 0) {
 /* to inner privilege */
-if (ist != 0) {
-esp = get_rsp_from_tss(env, ist + 3);
-} else {
-esp = get_rsp_from_tss(env, dpl);
-}
-esp &= ~0xfLL; /* align stack */
-ss = 0;
 new_stack = 1;
+esp = get_rsp_from_tss(env, ist != 0 ? ist + 3 : dpl);
+ss = 0;
 } else if ((e2 & DESC_C_MASK) || dpl == cpl) {
 /* to same privilege */
 if (env->eflags & VM_MASK) {
 raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
 }
 new_stack = 0;
-if (ist != 0) {
-esp = get_rsp_from_tss(env, ist + 3);
-} else {
-esp = env->regs[R_ESP];
-}
-esp &= ~0xfLL; /* align stack */
+esp = env->regs[R_ESP];
 dpl = cpl;
 } else {
 raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
 new_stack = 0; /* avoid warning */
 esp = 0; /* avoid warning */
 }
+esp &= ~0xfLL; /* align stack */
 
 PUSHQ(esp, env->segs[R_SS].selector);
 PUSHQ(esp, env->regs[R_ESP]);
-- 
2.1.0




Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Mark Rutland
[...]

> > > > > We are currently suggesting adding an RDSP property to the chosen node
> > > > > in the tiny DT, but a command-line arguement like kexec proposed could
> > > > > be another option I guess, albeit not a very pretty one.
> > > > 
> > > > I'm not sure what an RDSP command line property would have to do with
> > > > kexec. I'll assume I've misunderstood something.
> > > > 
> > > 
> > > I thought the kexec patches proposed passing the RDSP on the
> > > command-line to boot the secondary kernel, so if that ended up being
> > > supported by the kernel for kexec, maybe that could be leveraged by
> > > Xen's boot protocol.  It was an idea someone brought to me, just thought
> > > I'd mention it.
> > 
> > Ah, that's not something I'd heard of.
> 
> Maybe there was a misunderstanding then, I thought you were cc'ed on
> those discussions.

I may just have lost them in my inbox. I'm on a few too many mailing
lists these days. Sorry about that.

> > I'm not a fan of placing fundamentally required system description on
> > the command line. It's fine for explicit overrides but I don't think it
> > should be the default mechanism as that causes its own set of problems
> > (who wants to fight with their hypervisor to pass a command line to a
> > guest kernel?).
> > 
> 
> Agreed completely, but I've been lacking strong technical arguments
> against passing this stuff on the cmdline.  My personal preferred
> approach for the Xen Dom0 case is to add a property to the DT.

Something under /chosen, or a firmware node would sound preferable to
me. For UEFI we pass the system table address as
/chosen/linux,uefi-system-table = <... ...>, and I think the RDSP could
be handled similarly if necessary. The user can than override that via
the command line if desired.

Ideally, the user shouldn't have to place anything on the command line
to get a usable system. Obviously some things will be necesarry (where
is my rootfs?), but that should be the user's configuration of the
system rather than fundamental properties of said system.

The big issue I'm aware of at the moment that forces people to provide a
command line (on non-virtualised systems at least) is the default
console and the rate thereof, but that's being looked into currently.

Thanks,
Mark.



Re: [Qemu-devel] [PATCH] vhdx: fix g_try_malloc0 thinko

2014-11-12 Thread Fam Zheng
On Wed, 11/12 11:59, Paolo Bonzini wrote:
> Spotted while looking at Coverity reports (though not found by
> Coverity itself).
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/qcow2-snapshot.c | 2 +-
>  block/vhdx.c   | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/block/qcow2-snapshot.c b/block/qcow2-snapshot.c
> index f52d7fd..27d3bc1 100644
> --- a/block/qcow2-snapshot.c
> +++ b/block/qcow2-snapshot.c
> @@ -465,7 +465,7 @@ int qcow2_snapshot_goto(BlockDriverState *bs, const char 
> *snapshot_id)
>  BDRVQcowState *s = bs->opaque;
>  QCowSnapshot *sn;
>  int i, snapshot_index;
> -int cur_l1_bytes, sn_l1_bytes;
> +size_t cur_l1_bytes, sn_l1_bytes;
>  int ret;
>  uint64_t *sn_l1_table = NULL;
>  
> diff --git a/block/vhdx.c b/block/vhdx.c
> index 87c99fc..bd1e403 100644
> --- a/block/vhdx.c
> +++ b/block/vhdx.c
> @@ -1593,7 +1593,7 @@ static int vhdx_create_bat(BlockDriverState *bs, 
> BDRVVHDXState *s,
>  bdrv_has_zero_init(bs) == 0) {
>  /* for a fixed file, the default BAT entry is not zero */
>  s->bat = g_try_malloc0(length);
> -if (length && s->bat != NULL) {
> +if (length && s->bat == NULL) {
>  ret = -ENOMEM;
>  goto exit;
>  }
> -- 
> 2.1.0
> 
> 

There is already:

commit a011898d25b8a26a311d56dfe37e8d3a4374ec65
Author: Adelina Tuvenie 
Date:   Thu Sep 18 18:17:44 2014 +0300

block: allow creation of fixed vhdx images

Fam



Re: [Qemu-devel] [PATCH] vhdx: fix g_try_malloc0 thinko

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 12:09, Fam Zheng wrote:
> There is already:
> 
> commit a011898d25b8a26a311d56dfe37e8d3a4374ec65
> Author: Adelina Tuvenie 
> Date:   Thu Sep 18 18:17:44 2014 +0300
> 
> block: allow creation of fixed vhdx images

Oops, old checkout.  Thanks,

Paolo



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Arnd Bergmann
On Wednesday 12 November 2014 10:56:40 Mark Rutland wrote:
> On Wed, Nov 12, 2014 at 09:08:55AM +, Claudio Fontana wrote:
> > On 11.11.2014 16:29, Mark Rutland wrote:
> > 
> > I tend to mostly agree with this, we might look for alternative
> > solutions for speeding up guest startup in the future but in general
> > if getting ACPI in the guest for ARM64 requires also getting UEFI,
> > then I can personally live with that, especially if we strive to have
> > the kind of optimized virtualized UEFI you mention.
> 
> Given that UEFI will be required for other guests (e.g. if you want to
> boot a distribution's ISO image), I hope that virtualised UEFI will see
> some optimisation work.

I think the requirement it just for KVM to provide something that
behaves exactly like UEFI, it doesn't have to be the full Tianocore
implementation if it's easier to reimplement the boot interface.

> > As mentioned by others, I'd rather see an implementation of ACPI in
> > QEMU which learns from the experience of X86 (and possibly shares some
> > code if possible), rather than going in a different direction by
> > creating device trees first, and then converting them to ACPI tables
> > somewhere in the firmware, just because device trees are "already
> > there", for the reasons which have already been mentioned before by
> > Igor and others.
> 
> For the features which ACPI provides which device trees do not (e.g. the
> dynamic addition and removal of memory and CPUs), there will need to be
> some sort of interface between QEMU and the ACPI implementation. That's
> already outside of the realm of DT, so as previously mentioned a simple
> conversion doesn't cover the general case.

I think we need to support the low-level interfaces in the kernel for
this anyway, we should not have to use ACPI just to do memory and CPU
hotplugging in KVM, that would be silly. If ACPI is present, it can
provide a wrapper for the same interface, but KVM should not need to
be aware of the fact that ACPI is used in the guest, after it has
passed the initial ACPI blob to the kernel.

> I think any ACPI implemenation for a hypervisor should provide a
> demonstrable useful feature (e.g. hot-add of CPUs) before merging so we
> know the infrastructure is suitable.

> > I wouldn't want for ACPI to be "sort of" supported in QEMU, but with a
> > limited functionality which makes it not fully useful in practice. I'd
> > rather see it as a first class citizen instead, including the ability
> > to run AML code.
> 
> I agree that there's no point in having ACPI in a guest unless it
> provides something which dt does not. I don't know how it should be
> structured to provide those useful features.

I see it the opposite way: we shouldn't have to use ACPI just to make
use of some feature in Linux, the only reason why you'd want ACPI support
in KVM is to be able to run Windows. It makes sense for the ACPI
implementation to be compatible with the Linux ACPI code as well so
we can test it better.

Arnd



Re: [Qemu-devel] [PATCH v6 07/10] qmp: Add support of "dirty-bitmap" sync mode for drive-backup

2014-11-12 Thread Vladimir Sementsov-Ogievskiy

@@ -317,7 +321,21 @@ static void coroutine_fn backup_run(void *opaque)
  if (alloced == 0) {
  continue;
  }
+} else if (job->sync_mode == MIRROR_SYNC_MODE_DIRTY_BITMAP) {
+int i, dirty = 0;
+for (i = 0; i < BACKUP_SECTORS_PER_CLUSTER;
+ i += job->sync_bitmap_gran) {
+if (bdrv_get_dirty(bs, job->sync_bitmap,
+start * BACKUP_SECTORS_PER_CLUSTER + i)) {
+dirty = 1;
+break;
+}
+}
+if (!dirty) {
+continue;
+}
  }
+
With such solution we don't use the power of HBitmap, which provides 
"hbitmap_iter_skip_words", with O(log(n)) complexity. Here we look 
through the whole bitmap (O(n)), instead of jumping to the next dirty bit.


--
С уважением,
Владимир



[Qemu-devel] [PATCH] target-i386: fix Coverity complaints about overflows

2014-11-12 Thread Paolo Bonzini
sipi_vector is an int; it is shifted by 12 and passed as a 64-bit value,
which makes Coverity think that we wanted (uint64_t)sipi_vector << 12.

But actually it must be between 0 and 255.  Make this explicit.

Signed-off-by: Paolo Bonzini 
---
 target-i386/cpu.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 1b2c12a..015f5b5 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -1104,7 +1104,7 @@ static inline void cpu_x86_load_seg_cache(CPUX86State 
*env,
 }
 
 static inline void cpu_x86_load_seg_cache_sipi(X86CPU *cpu,
-   int sipi_vector)
+   uint8_t sipi_vector)
 {
 CPUState *cs = CPU(cpu);
 CPUX86State *env = &cpu->env;
-- 
2.1.0




[Qemu-devel] [PATCH] apic: fix incorrect handling of ExtINT interrupts wrt processor priority

2014-11-12 Thread Paolo Bonzini
This fixes another failure with ExtINT, demonstrated by QNX.  The failure
mode is as follows:
- IPI sent to cpu 0 (bit set in APIC irr)
- IPI accepted by cpu 0 (bit cleared in irr, set in isr)
- IPI sent to cpu 0 (bit set in both irr and isr)
- PIC interrupt sent to cpu 0

The PIC interrupt causes CPU_INTERRUPT_HARD to be set, but
apic_irq_pending observes that the highest pending APIC interrupt priority
(the IPI) is the same as the processor priority (since the IPI is still
being handled), so apic_get_interrupt returns a spurious interrupt rather
than the pending PIC interrupt. The result is an endless sequence of
spurious interrupts, since nothing will clear CPU_INTERRUPT_HARD.

Instead, ExtINT interrupts should have ignored the processor priority.
Calling apic_check_pic early in apic_get_interrupt ensures that
apic_deliver_pic_intr is called instead of delivering the spurious
interrupt.  apic_deliver_pic_intr then clears CPU_INTERRUPT_HARD if needed.

Reported-by: Richard Bilson 
Signed-off-by: Paolo Bonzini 
---
 hw/intc/apic.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/intc/apic.c b/hw/intc/apic.c
index 6ec5861..0f97b47 100644
--- a/hw/intc/apic.c
+++ b/hw/intc/apic.c
@@ -571,7 +571,10 @@ int apic_get_interrupt(DeviceState *dev)
 apic_sync_vapic(s, SYNC_FROM_VAPIC);
 intno = apic_irq_pending(s);
 
-if (intno == 0) {
+/* if there is an interrupt from the 8259, let the caller handle
+ * that first since ExtINT interrupts ignore the priority.
+ */
+if (intno == 0 || apic_check_pic(s)) {
 apic_sync_vapic(s, SYNC_TO_VAPIC);
 return -1;
 } else if (intno < 0) {
@@ -582,9 +585,6 @@ int apic_get_interrupt(DeviceState *dev)
 apic_set_bit(s->isr, intno);
 apic_sync_vapic(s, SYNC_TO_VAPIC);
 
-/* re-inject if there is still a pending PIC interrupt */
-apic_check_pic(s);
-
 apic_update_irq(s);
 
 return intno;
-- 
1.8.3.1




[Qemu-devel] [PATCH] virtio-scsi: Fix comment for VirtIOSCSIReq

2014-11-12 Thread Fam Zheng
The cdb is not zeroed by virtio_scsi_init_req, so fix the misleading
comment.

Suggested-by: Markus Armbruster 
Signed-off-by: Fam Zheng 
---
 include/hw/virtio/virtio-scsi.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index 9e1a49c..bf17cc9 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -209,7 +209,8 @@ typedef struct VirtIOSCSIReq {
 /* Note:
  * - fields before elem are initialized by virtio_scsi_init_req;
  * - elem is uninitialized at the time of allocation.
- * - fields after elem are zeroed by virtio_scsi_init_req.
+ * - fields after elem (except the ending cdb[]) are zeroed by
+ *   virtio_scsi_init_req.
  * */
 
 VirtQueueElement elem;
-- 
1.9.3




Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Christoffer Dall
On Wed, Nov 12, 2014 at 12:15:08PM +0100, Arnd Bergmann wrote:
> On Wednesday 12 November 2014 10:56:40 Mark Rutland wrote:
> > On Wed, Nov 12, 2014 at 09:08:55AM +, Claudio Fontana wrote:
> > > On 11.11.2014 16:29, Mark Rutland wrote:
> > > 
> > > I tend to mostly agree with this, we might look for alternative
> > > solutions for speeding up guest startup in the future but in general
> > > if getting ACPI in the guest for ARM64 requires also getting UEFI,
> > > then I can personally live with that, especially if we strive to have
> > > the kind of optimized virtualized UEFI you mention.
> > 
> > Given that UEFI will be required for other guests (e.g. if you want to
> > boot a distribution's ISO image), I hope that virtualised UEFI will see
> > some optimisation work.
> 
> I think the requirement it just for KVM to provide something that
> behaves exactly like UEFI, it doesn't have to be the full Tianocore
> implementation if it's easier to reimplement the boot interface.

We already have a port of Tinaocode that works for virt, but yes,
implementing something ligther is certainly possible.

> 
> > > As mentioned by others, I'd rather see an implementation of ACPI in
> > > QEMU which learns from the experience of X86 (and possibly shares some
> > > code if possible), rather than going in a different direction by
> > > creating device trees first, and then converting them to ACPI tables
> > > somewhere in the firmware, just because device trees are "already
> > > there", for the reasons which have already been mentioned before by
> > > Igor and others.
> > 
> > For the features which ACPI provides which device trees do not (e.g. the
> > dynamic addition and removal of memory and CPUs), there will need to be
> > some sort of interface between QEMU and the ACPI implementation. That's
> > already outside of the realm of DT, so as previously mentioned a simple
> > conversion doesn't cover the general case.
> 
> I think we need to support the low-level interfaces in the kernel for
> this anyway, we should not have to use ACPI just to do memory and CPU
> hotplugging in KVM, that would be silly.

I had that same intuitive feeling, but lacked good tecnical arguments
for it.  Care to elaborate on that?

> If ACPI is present, it can
> provide a wrapper for the same interface, but KVM should not need to
> be aware of the fact that ACPI is used in the guest, after it has
> passed the initial ACPI blob to the kernel.
> 

That's where things begin to be a bit foggy for me.  AFAIU ACPI already
has a method for doing this and I speculate that there is some IRQ
assigned to an ACPI event that causes some AML code to be interpreted by
your OS.  Wouldn't it be a matter of QEMU putting the right AML table
fragments in place to wire this up then?

-Christoffer



Re: [Qemu-devel] [PATCH] virtio-scsi: Fix comment for VirtIOSCSIReq

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 12:29, Fam Zheng wrote:
> The cdb is not zeroed by virtio_scsi_init_req, so fix the misleading
> comment.
> 
> Suggested-by: Markus Armbruster 
> Signed-off-by: Fam Zheng 
> ---
>  include/hw/virtio/virtio-scsi.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
> index 9e1a49c..bf17cc9 100644
> --- a/include/hw/virtio/virtio-scsi.h
> +++ b/include/hw/virtio/virtio-scsi.h
> @@ -209,7 +209,8 @@ typedef struct VirtIOSCSIReq {
>  /* Note:
>   * - fields before elem are initialized by virtio_scsi_init_req;
>   * - elem is uninitialized at the time of allocation.
> - * - fields after elem are zeroed by virtio_scsi_init_req.
> + * - fields after elem (except the ending cdb[]) are zeroed by
> + *   virtio_scsi_init_req.
>   * */
>  
>  VirtQueueElement elem;
> 

Applied, thanks.

Paolo



[Qemu-devel] [PATCH] apic_common: migrate missing fields

2014-11-12 Thread Paolo Bonzini
From: Pavel Dovgalyuk 

This patch adds missed sipi_vector and wait_for_sipi fields to a new
subsection of the vmstate of the apic_common module. Saving and loading
of these fields makes migration of the apic state deterministic.

Signed-off-by: Pavel Dovgalyuk 
Signed-off-by: Paolo Bonzini 
---
Found while cleaning up an old branch.

 hw/intc/apic_common.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index ce3d903..7cf795a 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -324,6 +324,18 @@ static void apic_common_realize(DeviceState *dev, Error 
**errp)
 
 }
 
+static void apic_pre_load(void *opaque)
+{
+APICCommonState *s = APIC_COMMON(opaque);
+
+/* The default is !cpu_is_bsp(s->cpu), but the common value is 0
+ * so that's what apic_common_sipi_needed checks for.  Reset to
+ * the value that is assumed when the apic_sipi subsection is
+ * absent.
+ */
+s->wait_for_sipi = 0;
+}
+
 static void apic_dispatch_pre_save(void *opaque)
 {
 APICCommonState *s = APIC_COMMON(opaque);
@@ -345,12 +357,30 @@ static int apic_dispatch_post_load(void *opaque, int 
version_id)
 return 0;
 }
 
+static bool apic_common_sipi_needed(void *opaque)
+{
+APICCommonState *s = APIC_COMMON(opaque);
+return s->wait_for_sipi != 0;
+}
+
+static const VMStateDescription vmstate_apic_common_sipi = {
+.name = "apic_sipi",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields = (VMStateField[]) {
+VMSTATE_INT32(sipi_vector, APICCommonState),
+VMSTATE_INT32(wait_for_sipi, APICCommonState),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_apic_common = {
 .name = "apic",
 .version_id = 3,
 .minimum_version_id = 3,
 .minimum_version_id_old = 1,
 .load_state_old = apic_load_old,
+.pre_load = apic_pre_load,
 .pre_save = apic_dispatch_pre_save,
 .post_load = apic_dispatch_post_load,
 .fields = (VMStateField[]) {
@@ -375,6 +405,13 @@ static const VMStateDescription vmstate_apic_common = {
 VMSTATE_INT64(timer_expiry,
   APICCommonState), /* open-coded timer state */
 VMSTATE_END_OF_LIST()
+},
+.subsections = (VMStateSubsection[]) {
+{
+.vmsd = &vmstate_apic_common_sipi,
+.needed = apic_common_sipi_needed,
+},
+VMSTATE_END_OF_LIST()
 }
 };
 
-- 
1.8.3.1




Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 12:34, Christoffer Dall wrote:
> AFAIU ACPI already has a method for doing this

It's not defined in the spec.  QEMU defines a bunch of registers to do
that, and provides AML that works with those registers.

While these registers can be separated from the ACPI code in QEMU...

> and I speculate that there is some IRQ
> assigned to an ACPI event that causes some AML code to be interpreted by
> your OS.

... QEMU does exactly this, it uses the "general purpose event" (GPE)
mechanism to trigger the parsing of the AML.  When you hot-plug/unplug a
CPU or memory, an SCI (system control interrupt - the ACPI IRQ) is
triggered in the guest and that's not entirely disconnected from ACPI.

Perhaps you could treat it as a shared level-triggered interrupt in DT?
 I don't know.

> Wouldn't it be a matter of QEMU putting the right AML table
> fragments in place to wire this up then?

Yes.

Paolo



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 11:38, Mark Rutland wrote:
> > I share your concern, but running another UEFI instance for Dom0 doesn't
> > seem like a viable alternative either.  Why is this a problem on ARM and
> > not on x86 though?
> 
> I believe that on x86 the fallback for !UEFI would be the e820 memory
> map, which provides info regarding the type of the memory mapping, as
> opposed to just the base + size. That said, I'm not that familiar with
> e820, and from a quick look the provided information doesn't seem to be
> that detailed.

The e820 memory map is only part of it.  On x86 !UEFI you are supposed
to scan low memory for magic signatures that provides pointers to the
SMBIOS and ACPI tables.

As Christoffer said, "the good old PC". :)

SeaBIOS fishes out information from fw_cfg, and puts it in low memory.
On ARM you could use DT binary blobs instead of fw_cfg, as proposed
already (I don't remember if it was in this thread or IRC).  Then if you
want to go !UEFI you can extract the tables from those binary blobs.

Paolo



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Mark Rutland
On Wed, Nov 12, 2014 at 11:15:08AM +, Arnd Bergmann wrote:
> On Wednesday 12 November 2014 10:56:40 Mark Rutland wrote:
> > On Wed, Nov 12, 2014 at 09:08:55AM +, Claudio Fontana wrote:
> > > On 11.11.2014 16:29, Mark Rutland wrote:
> > > 
> > > I tend to mostly agree with this, we might look for alternative
> > > solutions for speeding up guest startup in the future but in general
> > > if getting ACPI in the guest for ARM64 requires also getting UEFI,
> > > then I can personally live with that, especially if we strive to have
> > > the kind of optimized virtualized UEFI you mention.
> > 
> > Given that UEFI will be required for other guests (e.g. if you want to
> > boot a distribution's ISO image), I hope that virtualised UEFI will see
> > some optimisation work.
> 
> I think the requirement it just for KVM to provide something that
> behaves exactly like UEFI, it doesn't have to be the full Tianocore
> implementation if it's easier to reimplement the boot interface.

I agree that we don't need a full Tianocore, but whatever we have must
implement the minimal interface UEFI requires (which is more than just
the boot interface).

For a "boot this EFI application" workflow you can skip most of the BDS
stuff, but you still need boot services and runtime services provided to
the application.

> > > As mentioned by others, I'd rather see an implementation of ACPI in
> > > QEMU which learns from the experience of X86 (and possibly shares some
> > > code if possible), rather than going in a different direction by
> > > creating device trees first, and then converting them to ACPI tables
> > > somewhere in the firmware, just because device trees are "already
> > > there", for the reasons which have already been mentioned before by
> > > Igor and others.
> > 
> > For the features which ACPI provides which device trees do not (e.g. the
> > dynamic addition and removal of memory and CPUs), there will need to be
> > some sort of interface between QEMU and the ACPI implementation. That's
> > already outside of the realm of DT, so as previously mentioned a simple
> > conversion doesn't cover the general case.
> 
> I think we need to support the low-level interfaces in the kernel for
> this anyway, we should not have to use ACPI just to do memory and CPU
> hotplugging in KVM, that would be silly. If ACPI is present, it can
> provide a wrapper for the same interface, but KVM should not need to
> be aware of the fact that ACPI is used in the guest, after it has
> passed the initial ACPI blob to the kernel.

The difficulty here is that there is currently no common low-level
interface between the guest and any specific hypervisor for these
facilities. This would be up to kvm tool or QEMU, not KVM itself. So
we'd have to spec one for !ACPI.

With ACPI, the interface should be common (following the ACPI spec), but
we don't have a common interface underlying that.

I am not averse to having mechanisms for !ACPI, but we'd need to spec
something.

> > I think any ACPI implemenation for a hypervisor should provide a
> > demonstrable useful feature (e.g. hot-add of CPUs) before merging so we
> > know the infrastructure is suitable.
> 
> > > I wouldn't want for ACPI to be "sort of" supported in QEMU, but with a
> > > limited functionality which makes it not fully useful in practice. I'd
> > > rather see it as a first class citizen instead, including the ability
> > > to run AML code.
> > 
> > I agree that there's no point in having ACPI in a guest unless it
> > provides something which dt does not. I don't know how it should be
> > structured to provide those useful features.
> 
> I see it the opposite way: we shouldn't have to use ACPI just to make
> use of some feature in Linux, the only reason why you'd want ACPI support
> in KVM is to be able to run Windows. It makes sense for the ACPI
> implementation to be compatible with the Linux ACPI code as well so
> we can test it better.

At the moment, ACPI specifies how these features should work. So without
inventing a whole new interface, ACPI is the way of providing those
features.

Thanks,
Mark.



Re: [Qemu-devel] [RFC PATCH v4 06/25] cpu-exec: reset exception_index correctly

2014-11-12 Thread Paolo Bonzini


On 07/11/2014 11:32, Pavel Dovgalyuk wrote:
> Exception index is reset at every entry at every entry into cpu_exec()
> function. This may cause missing the exceptions while replaying them.
> This patch moves exception_index reset to the locations where they are
> processed.
> 
> Signed-off-by: Pavel Dovgalyuk 
> ---
>  cpu-exec.c |2 +-
>  cpus.c |1 +
>  2 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 8830255..011f51f 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -358,7 +358,6 @@ int cpu_exec(CPUArchState *env)
>  }
>  
>  cc->cpu_exec_enter(cpu);
> -cpu->exception_index = -1;
>  
>  /* Calculate difference between guest clock and host clock.
>   * This delay includes the delay of the last cycle, so
> @@ -378,6 +377,7 @@ int cpu_exec(CPUArchState *env)
>  if (ret == EXCP_DEBUG) {
>  cpu_handle_debug_exception(env);
>  }
> +cpu->exception_index = -1;
>  break;
>  } else {
>  #if defined(CONFIG_USER_ONLY)
> diff --git a/cpus.c b/cpus.c
> index e53d605..7e8c507 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -1016,6 +1016,7 @@ static void *qemu_tcg_cpu_thread_fn(void *arg)
>  CPU_FOREACH(cpu) {
>  cpu->thread_id = qemu_get_thread_id();
>  cpu->created = true;
> +cpu->exception_index = -1;
>  }
>  qemu_cond_signal(&qemu_cpu_cond);
>  
> 
> 
> 

What about user-mode emulation?  Do you need to reset
cpu->exception_index for it too?

Paolo



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Mark Rutland
On Wed, Nov 12, 2014 at 11:52:22AM +, Paolo Bonzini wrote:
> 
> On 12/11/2014 11:38, Mark Rutland wrote:
> > > I share your concern, but running another UEFI instance for Dom0 doesn't
> > > seem like a viable alternative either.  Why is this a problem on ARM and
> > > not on x86 though?
> > 
> > I believe that on x86 the fallback for !UEFI would be the e820 memory
> > map, which provides info regarding the type of the memory mapping, as
> > opposed to just the base + size. That said, I'm not that familiar with
> > e820, and from a quick look the provided information doesn't seem to be
> > that detailed.
> 
> The e820 memory map is only part of it.  On x86 !UEFI you are supposed
> to scan low memory for magic signatures that provides pointers to the
> SMBIOS and ACPI tables.

Fun...

> As Christoffer said, "the good old PC". :)
> 
> SeaBIOS fishes out information from fw_cfg, and puts it in low memory.
> On ARM you could use DT binary blobs instead of fw_cfg, as proposed
> already (I don't remember if it was in this thread or IRC).  Then if you
> want to go !UEFI you can extract the tables from those binary blobs.

This sounds broken. I am very much not a fan of shoving binary blobs
into DT to workaround a shoddy boot interface.

Mark.



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 13:04, Mark Rutland wrote:
>> > SeaBIOS fishes out information from fw_cfg, and puts it in low memory.
>> > On ARM you could use DT binary blobs instead of fw_cfg, as proposed
>> > already (I don't remember if it was in this thread or IRC).  Then if you
>> > want to go !UEFI you can extract the tables from those binary blobs.
> This sounds broken. I am very much not a fan of shoving binary blobs
> into DT to workaround a shoddy boot interface.

We tried spec-ing everything and building the tables in the firmware on
x86.  You really do not want to do that; it's painful to have to update
firmware in lockstep with QEMU, and when you add the next feature it
always seems like you got the bindings wrong.

And we only had one client (SeaBIOS) while you will have at least two
(TianoCore and Linux, presumably? or is Huawei targeting OSv only?).

What we do now is we have two blobs, one with the ACPI tables and one
that tells the firmware how to relocate pointers from one table to
another.  It's been working very well for both SeaBIOS and OVMF.

Paolo



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Mark Rutland
On Wed, Nov 12, 2014 at 11:48:27AM +, Paolo Bonzini wrote:
> 
> On 12/11/2014 12:34, Christoffer Dall wrote:
> > AFAIU ACPI already has a method for doing this
> 
> It's not defined in the spec.  QEMU defines a bunch of registers to do
> that, and provides AML that works with those registers.

Huh? SCI + AML is the method, and that's defined by the spec.

The registers the AML pokes are an implementation detail abstracted by
the AML, and as such are irrelevant to the spec.

> While these registers can be separated from the ACPI code in QEMU...
> 
> > and I speculate that there is some IRQ
> > assigned to an ACPI event that causes some AML code to be interpreted by
> > your OS.
> 
> ... QEMU does exactly this, it uses the "general purpose event" (GPE)
> mechanism to trigger the parsing of the AML.  When you hot-plug/unplug a
> CPU or memory, an SCI (system control interrupt - the ACPI IRQ) is
> triggered in the guest and that's not entirely disconnected from ACPI.
> 
> Perhaps you could treat it as a shared level-triggered interrupt in DT?
>  I don't know.

Putting an interrupt in DT is trivial. The hard part is the rest of the
interface, which so far there is no specification for.

Thanks,
Mark.



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 13:18, Mark Rutland wrote:
> On Wed, Nov 12, 2014 at 11:48:27AM +, Paolo Bonzini wrote:
>>
>> On 12/11/2014 12:34, Christoffer Dall wrote:
>>> AFAIU ACPI already has a method for doing this
>>
>> It's not defined in the spec.  QEMU defines a bunch of registers to do
>> that, and provides AML that works with those registers.
> 
> Huh? SCI + AML is the method, and that's defined by the spec.

I thought Christoffer meant a method to do the actual hotplug, not just
to signal events.  If you want to "support the low-level interfaces in
the kernel for this anyway", you certainly need to know the details
underneath the AML.

>> Perhaps you could treat it as a shared level-triggered interrupt in DT?
>>  I don't know.
> 
> Putting an interrupt in DT is trivial. The hard part is the rest of the
> interface, which so far there is no specification for.

Have you looked at docs/specs/acpi_{cpu,mem}_hotplug.txt?  Writing a DT
binding for it is trivial too.  Or are we talking about two different
things?

Paolo



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Christoffer Dall
On Wed, Nov 12, 2014 at 1:27 PM, Paolo Bonzini  wrote:
>
>
> On 12/11/2014 13:18, Mark Rutland wrote:
>> On Wed, Nov 12, 2014 at 11:48:27AM +, Paolo Bonzini wrote:
>>>
>>> On 12/11/2014 12:34, Christoffer Dall wrote:
 AFAIU ACPI already has a method for doing this
>>>
>>> It's not defined in the spec.  QEMU defines a bunch of registers to do
>>> that, and provides AML that works with those registers.
>>
>> Huh? SCI + AML is the method, and that's defined by the spec.
>
> I thought Christoffer meant a method to do the actual hotplug, not just
> to signal events.  If you want to "support the low-level interfaces in
> the kernel for this anyway", you certainly need to know the details
> underneath the AML.
>
I was being deliberately vague, because I don't know how this works (I
did use the word speculate, didn't I?).

But yes, one of the reasons why I would argue that we're not in a
hurry to accept these patches is that we don't know how the actual
underlying mechanisms to do the things we want ACPI to facilitate
(e.g. cpu and memory hotplug) really work, and thus we cannot verify
the implementation or benefit directly from it today.

-Christoffer



Re: [Qemu-devel] Better Cortex-M support?

2014-11-12 Thread Liviu Ionescu

On 12 Nov 2014, at 01:08, Peter Maydell  wrote:

> Cortex-M4 support would definitely be interesting.

:-)

any hints on what is missing for Cortex-M4 support? are all thumb instructions 
supported? the hard FP? 

> If you have some fixes you think should be in mainline
> the first step is to send patches to us.

sure, I'd be happy to contribute all my changes back.

do you really need the git format-patch files sent via email? I thought that 
using a remote to my git://git.code.sf.net/p/gnuarmeclipse/qemu is easier.

I tried to make my changes not affect the current arm-softmmu configuration, 
and for this I added a new target gnuarmeclipse-softmmu.

I also added 3 new configuration variables (branding_message="", 
semihosting_native="no", verbose="no") to control the features I added, and 
enable them only on the gnuarmeclipse target. these variables are passed to 
config-host.h and used to include my changes.

hopefully these will not damage anything in the current distribution (but need 
to be checked).


some of my intermediate commits were changed in subsequent commits, so only the 
final result is relevant.


regards,

Liviu




Re: [Qemu-devel] Better Cortex-M support?

2014-11-12 Thread Peter Maydell
On 12 November 2014 12:50, Liviu Ionescu  wrote:
>
> On 12 Nov 2014, at 01:08, Peter Maydell  wrote:
>
>> Cortex-M4 support would definitely be interesting.
>
> :-)
>
> any hints on what is missing for Cortex-M4 support? are all thumb
> instructions supported? the hard FP?

The instructions themselves are generally supported for
the A profile cores, so getting that part right would mostly
involve enabling those feature bits for the new M4 CPU class.
The difficult bits are going to involve:
 * correct trapping when fp is disabled
 * getting the exception handling right, including lazy
   exception frame population
 * bits where M profile FP differs from A profile (eg fp
   status and ID registers being memory mapped)

> sure, I'd be happy to contribute all my changes back.
>
> do you really need the git format-patch files sent via
> email? I thought that using a remote to my
> git://git.code.sf.net/p/gnuarmeclipse/qemu is easier.

It's easier for us on the receiving end to have patches sent
as email, because it means we can conveniently review them
in public on the mailing list. I'm afraid a lot of our
conventions for accepting patches are based around making
life easier for the maintainers doing the review, because
we're generally pretty heavily loaded with work. There's
a summary at http://wiki.qemu.org/Contribute/SubmitAPatch
of how it works.

> I tried to make my changes not affect the current arm-softmmu
> configuration, and for this I added a new target gnuarmeclipse-softmmu.

For upstream we would just want to fix arm-softmmu, I think.

> I also added 3 new configuration variables (branding_message="",
> semihosting_native="no", verbose="no") to control the features
> I added, and enable them only on the gnuarmeclipse target.
> these variables are passed to config-host.h and used to include my changes.

It sounds like you should separate out the things which are
bug fixes to upstream from the parts which are integration
with Eclipse. The former are going to be easier to get upstream
than the latter, so you can start with the easier parts first.

> some of my intermediate commits were changed in subsequent commits,
> so only the final result is relevant.

For submitting changes upstream you'll need to clean up
this set of commits (rebase, rearrange, split or merge commits,
etc so you have one commit (patch) per thing you're fixing and
it's a clean change that stands on its own.

thanks
-- PMM



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Arnd Bergmann
On Wednesday 12 November 2014 12:34:01 Christoffer Dall wrote:
> On Wed, Nov 12, 2014 at 12:15:08PM +0100, Arnd Bergmann wrote:
> > On Wednesday 12 November 2014 10:56:40 Mark Rutland wrote:
> > > 
> > > For the features which ACPI provides which device trees do not (e.g. the
> > > dynamic addition and removal of memory and CPUs), there will need to be
> > > some sort of interface between QEMU and the ACPI implementation. That's
> > > already outside of the realm of DT, so as previously mentioned a simple
> > > conversion doesn't cover the general case.
> > 
> > I think we need to support the low-level interfaces in the kernel for
> > this anyway, we should not have to use ACPI just to do memory and CPU
> > hotplugging in KVM, that would be silly.
> 
> I had that same intuitive feeling, but lacked good tecnical arguments
> for it.  Care to elaborate on that?

ACPI always has to interface back to the hypervisor to do anything that
changes the hardware configuration, so it essentially has to perform
a hypercall or touch some virtual register.

If we need to architect that interface in qemu anyway, we should make
it sane enough for the kernel to use directly, without having to go
through ACPI, as not everyone will want to run ACPI.
 
> > If ACPI is present, it can
> > provide a wrapper for the same interface, but KVM should not need to
> > be aware of the fact that ACPI is used in the guest, after it has
> > passed the initial ACPI blob to the kernel.
> 
> That's where things begin to be a bit foggy for me.  AFAIU ACPI already
> has a method for doing this and I speculate that there is some IRQ
> assigned to an ACPI event that causes some AML code to be interpreted by
> your OS.  Wouldn't it be a matter of QEMU putting the right AML table
> fragments in place to wire this up then?

Yes, that is what I meant with a wrapper. The two choices are:

1. have an interrupt and a hypercall or mmio interface. When the
   interrupt gets triggered, we ask the interface what happened and
   do something on the same interface depending on the state of the
   system.

2. have an interrupt that causes AML code to be run, that code will
   use the hypercall or mmio interface to find out what happened and
   create an ACPI event. This event is enqueued to the generic ACPI
   hotplug handler, which depending on the state of the system decides
   to do something by calling into AML code again, which will trigger
   the underlying interface.

>From qemu's point of view, the two are doing exactly the same thing,
except that the MMIO access can be hidden in AML so the OS doesn't have
to know the interface. Note that in case of Xen, the use of hypercalls
means that the OS has to know the interface after all, so the second
half of the process is handled by drivers/xen/xen-acpi-*hotplug.c.

Note how the implementation that uses the ACPI wrapper is much
more complex than the native one that does the same thing:

-rw-r--r-- 1 arnd arnd  2119 Nov 10 16:43 drivers/xen/cpu_hotplug.c
-rw-r--r-- 1 arnd arnd 10987 Nov 10 16:43 drivers/xen/xen-acpi-cpuhotplug.c

-rw-r--r-- 1 arnd arnd  6894 Nov 10 16:43 drivers/xen/xen-balloon.c
-rw-r--r-- 1 arnd arnd 12085 Nov 10 16:43 drivers/xen/xen-acpi-memhotplug.c

Arnd



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Arnd Bergmann
On Wednesday 12 November 2014 13:27:14 Paolo Bonzini wrote:
> On 12/11/2014 13:18, Mark Rutland wrote:
> > On Wed, Nov 12, 2014 at 11:48:27AM +, Paolo Bonzini wrote:
> >> Perhaps you could treat it as a shared level-triggered interrupt in DT?
> >>  I don't know.
> > 
> > Putting an interrupt in DT is trivial. The hard part is the rest of the
> > interface, which so far there is no specification for.
> 
> Have you looked at docs/specs/acpi_{cpu,mem}_hotplug.txt?  Writing a DT
> binding for it is trivial too.  Or are we talking about two different
> things?

Interesting. I agree that doing a DT binding for these will be trivial,
and the implementation in Linux should also be straightforward, thanks
for pointing these out.

However, it seems that the implementation that qemu uses is incompatible
with ARM64, since GPE is not part of the ACPI hardware reduced mode.

Arnd



Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU

2014-11-12 Thread Christopher Covington
On 11/12/2014 03:05 AM, Petr Tesarik wrote:
> On Tue, 11 Nov 2014 12:27:44 -0500
> Christopher Covington  wrote:
> 
>> On 11/11/2014 06:22 AM, Laszlo Ersek wrote:
>>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please
>>> keep me CC'd.)
>>>
>>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib,
>>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command.
>>>
>>> The resultant vmcore is usually analyzed with the "crash" utility.
>>>
>>> The original tool producing such files is kdump. Unlike the procedure
>>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd
>>> kdump kernel), and has more information about the original guest kernel
>>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state
>>> is opaque.
>>>
>>> For this reason, the kdump preparation logic in QEMU hardcodes a number
>>> of fields in the kdump header. The direct issue is the "phys_base"
>>> field. Refer to dump.c, functions create_header32(), create_header64(),
>>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text
>>> "0").
>>>
>>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD
>>>
>>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD
>>>
>>> This works in most cases, because the guest Linux kernel indeed tends to
>>> be loaded at guest-phys address 0. However, when the guest Linux kernel
>>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map),
>>> then the guest Linux kernel is loaded at 16MB, thereby getting out of
>>> sync with the phys_base=0 setting visible in the KDUMP header.
>>>
>>> This trips up the "crash" utility.
>>>
>>> Dave worked around the issue in "crash" for ELF format dumps -- "crash"
>>> can identify QEMU as the originator of the vmcore by finding the QEMU
>>> notes in the ELF vmcore. If those are present, then "crash" employs a
>>> heuristic, probing for a phys_base up to 32MB, in 1MB steps.
>>
>> What advantages does KDUMP have over ELF?
> 
> It's smaller (data is compressed), and it contains a header with some
> useful information (e.g. the crashed kernel's version and release).

What if the ELF dumper used SHF_COMPRESSED or could dump an ELF.xz?

How does QEMU figure out the kernel version information?

Thanks,
Chris

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [Qemu-devel] [PATCH 2/3] iotests: _filter_qmp for pretty JSON output

2014-11-12 Thread Eric Blake
On 11/12/2014 01:33 AM, Max Reitz wrote:

>>> +-e 's#^{"QMP":.*}$#QMP_VERSION#' \
>>> +-e 's#\\##g' | \
>>> +while IFS='' read line; do
>>> +if [[ $line == '"QMP": {' ]]; then
>> Good that this is a /bin/bash script and not /bin/sh :)
> 
> Ah, right, yes, I just copied the code I had written for filtering out
> the image-specific information from the qemu-img info output.
> 
>> But - is it really worth doing this in shell?  Why not just do it in sed?
> 
> Because I don't know sed well enough. ;-)
> 
>> sed -e ... \
>>  -e 's#\\##g' \
>>  -e '/"QMP": {/,/}/ c\' \

Better make this line anchored in its searching:

-e '/^"QMP": {$/,/^}$/ c\' \

>>  -e 'QMP_VERSION'
> 
> Will do, thanks.
> 
> Max
> 
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Peter Maydell
On 12 November 2014 12:04, Mark Rutland  wrote:
> On Wed, Nov 12, 2014 at 11:52:22AM +, Paolo Bonzini wrote:
>> SeaBIOS fishes out information from fw_cfg, and puts it in low memory.
>> On ARM you could use DT binary blobs instead of fw_cfg, as proposed
>> already (I don't remember if it was in this thread or IRC).  Then if you
>> want to go !UEFI you can extract the tables from those binary blobs.
>
> This sounds broken. I am very much not a fan of shoving binary blobs
> into DT to workaround a shoddy boot interface.

My understanding from an IRC conversation yesterday was that at
least some of these ACPI blobs contain data which has to be constructed
at the point it is requested (ie is not fixed at the point when
QEMU starts up), because OVMF will do:
 * startup
 * prod some parts of the hardware to configure it
 * request ACPI tables via fw_cfg
and the ACPI tables have to reflect the statu of the hardware
after OVMF's poking, not before.

It wasn't entirely clear to me whether this applies equally
to the ARM UEFI setup as to x86 + OVMF, but it does suggest that
it would be better to define a memory-mapped variant of fw_cfg
rather than stuffing the blobs into the device tree. (I didn't
much like throwing the blobs in the dtb myself either.)

If somebody with more x86/ACPI knowledge could clarify what the
dynamically-constructed parts of the tables are and whether
they're likely to apply to use that would be good. I think they
involved PCI in some way, but I don't have access to my irc logs
right now to check. (I could imagine that ARM UEFI might not need
to prod and configure a PCI bus and devices in the way that an
x86 BIOS expects it has to, but that's speculation on my part,
and I dunno that I'd care to bake that assumption into the design
anyway.)

-- PMM



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 14:08, Arnd Bergmann wrote:
>>> Putting an interrupt in DT is trivial. The hard part is the rest of the
>>> interface, which so far there is no specification for.
>> 
>> Have you looked at docs/specs/acpi_{cpu,mem}_hotplug.txt?  Writing a DT
>> binding for it is trivial too.  Or are we talking about two different
>> things?
> 
> Interesting. I agree that doing a DT binding for these will be trivial,
> and the implementation in Linux should also be straightforward, thanks
> for pointing these out.
> 
> However, it seems that the implementation that qemu uses is incompatible
> with ARM64, since GPE is not part of the ACPI hardware reduced mode.

That makes it even simpler because you do not have to use SCI.

-M virt currently doesn't have a GPIO controller, but it's easy to add a
PL061 and specify a random GPIO pin in the DT bindings.  Then you can
use GPIO-signaled ACPI events in the AML, just move _Lxx methods move
from \_GPE to \_SB.GPIO under the GPIO controller.

Paolo



Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU

2014-11-12 Thread Christopher Covington
On 11/12/2014 08:26 AM, Petr Tesarik wrote:
> On Wed, 12 Nov 2014 08:18:04 -0500
> Christopher Covington  wrote:
> 
>> On 11/12/2014 03:05 AM, Petr Tesarik wrote:
>>> On Tue, 11 Nov 2014 12:27:44 -0500
>>> Christopher Covington  wrote:
>>>
 On 11/11/2014 06:22 AM, Laszlo Ersek wrote:
> (Note: I'm not subscribed to either qemu-devel or the kexec list; please
> keep me CC'd.)
>
> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib,
> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command.
>
> The resultant vmcore is usually analyzed with the "crash" utility.
>
> The original tool producing such files is kdump. Unlike the procedure
> performed by QEMU, kdump runs from *within* the guest (under a kexec'd
> kdump kernel), and has more information about the original guest kernel
> state (which is being dumped) than QEMU. To QEMU, the guest kernel state
> is opaque.
>
> For this reason, the kdump preparation logic in QEMU hardcodes a number
> of fields in the kdump header. The direct issue is the "phys_base"
> field. Refer to dump.c, functions create_header32(), create_header64(),
> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text
> "0").
>
> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD
>
> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD
>
> This works in most cases, because the guest Linux kernel indeed tends to
> be loaded at guest-phys address 0. However, when the guest Linux kernel
> is booted on top of OVMF (which has a somewhat unusual UEFI memory map),
> then the guest Linux kernel is loaded at 16MB, thereby getting out of
> sync with the phys_base=0 setting visible in the KDUMP header.
>
> This trips up the "crash" utility.
>
> Dave worked around the issue in "crash" for ELF format dumps -- "crash"
> can identify QEMU as the originator of the vmcore by finding the QEMU
> notes in the ELF vmcore. If those are present, then "crash" employs a
> heuristic, probing for a phys_base up to 32MB, in 1MB steps.

 What advantages does KDUMP have over ELF?
>>>
>>> It's smaller (data is compressed), and it contains a header with some
>>> useful information (e.g. the crashed kernel's version and release).
>>
>> What if the ELF dumper used SHF_COMPRESSED or could dump an ELF.xz?
> 
> Not the same thing. With KDUMP, each page is compressed separately, so
> if a utility like crash needs a page from the middle, it can find it
> and unpack it immediately. If we had an ELF.xz, then the whole file
> must be unpacked before it can be used. And unpacking a few terabytes
> takes ... a while. ;-)

Understood on the ELF.xz approach, but why couldn't each page (or maybe a
configurable size) be a SHF_COMPRESSED section?

Thanks,
Chris

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 14:27, Peter Maydell wrote:
> If somebody with more x86/ACPI knowledge could clarify what the
> dynamically-constructed parts of the tables are and whether
> they're likely to apply to use that would be good. I think they
> involved PCI in some way, but I don't have access to my irc logs
> right now to check. (I could imagine that ARM UEFI might not need
> to prod and configure a PCI bus and devices in the way that an
> x86 BIOS expects it has to, but that's speculation on my part,
> and I dunno that I'd care to bake that assumption into the design
> anyway.)

Yes, IIUC it's just for ACPI hotplug on devices behind a PCI-to-PCI
bridge.  On Linux/x86 we can use the PCI standard hotplug controller
(SHPC), but not on Windows.  So if it's just PCI, I tend to agree with
you.  ARM should really just use PCIe and SHPC hotplug and call it a day.

To me that actually goes in favor of putting blobs in the DT.  :)

That said, if you really really want to write kernel drivers for stuff
that x86 puts in the AML, there's little point in building the ACPI
tables in QEMU.  You already have a DT interface to talk to non-ACPI
devices, Tiano Core can use the same bits to build ACPI tables.

Paolo



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Christoffer Dall
On Wed, Nov 12, 2014 at 02:03:01PM +0100, Arnd Bergmann wrote:
> On Wednesday 12 November 2014 12:34:01 Christoffer Dall wrote:
> > On Wed, Nov 12, 2014 at 12:15:08PM +0100, Arnd Bergmann wrote:
> > > On Wednesday 12 November 2014 10:56:40 Mark Rutland wrote:
> > > > 
> > > > For the features which ACPI provides which device trees do not (e.g. the
> > > > dynamic addition and removal of memory and CPUs), there will need to be
> > > > some sort of interface between QEMU and the ACPI implementation. That's
> > > > already outside of the realm of DT, so as previously mentioned a simple
> > > > conversion doesn't cover the general case.
> > > 
> > > I think we need to support the low-level interfaces in the kernel for
> > > this anyway, we should not have to use ACPI just to do memory and CPU
> > > hotplugging in KVM, that would be silly.
> > 
> > I had that same intuitive feeling, but lacked good tecnical arguments
> > for it.  Care to elaborate on that?
> 
> ACPI always has to interface back to the hypervisor to do anything that
> changes the hardware configuration, so it essentially has to perform
> a hypercall or touch some virtual register.
> 
> If we need to architect that interface in qemu anyway, we should make
> it sane enough for the kernel to use directly, without having to go
> through ACPI, as not everyone will want to run ACPI.
>  

With the usual benefits of doing it in ACPI will not require updates on
both sides if we need to fix something, but also the usual downside of
having something obscure and pseduo-hidden, which may be broken
underneath, I suppose.

> > > If ACPI is present, it can
> > > provide a wrapper for the same interface, but KVM should not need to
> > > be aware of the fact that ACPI is used in the guest, after it has
> > > passed the initial ACPI blob to the kernel.
> > 
> > That's where things begin to be a bit foggy for me.  AFAIU ACPI already
> > has a method for doing this and I speculate that there is some IRQ
> > assigned to an ACPI event that causes some AML code to be interpreted by
> > your OS.  Wouldn't it be a matter of QEMU putting the right AML table
> > fragments in place to wire this up then?
> 
> Yes, that is what I meant with a wrapper. The two choices are:
> 
> 1. have an interrupt and a hypercall or mmio interface. When the
>interrupt gets triggered, we ask the interface what happened and
>do something on the same interface depending on the state of the
>system.
> 
> 2. have an interrupt that causes AML code to be run, that code will
>use the hypercall or mmio interface to find out what happened and
>create an ACPI event. This event is enqueued to the generic ACPI
>hotplug handler, which depending on the state of the system decides
>to do something by calling into AML code again, which will trigger
>the underlying interface.
> 
> From qemu's point of view, the two are doing exactly the same thing,
> except that the MMIO access can be hidden in AML so the OS doesn't have
> to know the interface.

right, thanks for the explanation.

> Note that in case of Xen, the use of hypercalls
> means that the OS has to know the interface after all, so the second
> half of the process is handled by drivers/xen/xen-acpi-*hotplug.c.
> 
> Note how the implementation that uses the ACPI wrapper is much
> more complex than the native one that does the same thing:
> 
> -rw-r--r-- 1 arnd arnd  2119 Nov 10 16:43 drivers/xen/cpu_hotplug.c
> -rw-r--r-- 1 arnd arnd 10987 Nov 10 16:43 drivers/xen/xen-acpi-cpuhotplug.c
> 
> -rw-r--r-- 1 arnd arnd  6894 Nov 10 16:43 drivers/xen/xen-balloon.c
> -rw-r--r-- 1 arnd arnd 12085 Nov 10 16:43 drivers/xen/xen-acpi-memhotplug.c
> 
Interesting.

-Christoffer



Re: [Qemu-devel] [PATCH] spice: do not require TCP ports

2014-11-12 Thread Gerd Hoffmann
On Di, 2014-11-11 at 13:39 +0100, Marc-André Lureau wrote:
> It is possible to use Spice server without TCP port.  On local VM,
> qemu (and libvirt) can add new clients thanks to QMP add_client command.

added to spice patch queue.

thanks,
  Gerd





Re: [Qemu-devel] [PATCH v1 00/15] android-console: Add console power command

2014-11-12 Thread Greg Bellows
Please disregard this patch review as it went to a broader audience than
intended.

If there is interest in this work then please let me know and I will be
happy to point you to a Linaro git repository that can be monitored for
future updates.

Regards,

Greg

On 11 November 2014 18:25, Greg Bellows  wrote:

> This patchset includes 3 changes:
> - Fixes to the existing Android emulator console.
> - Restructure the emulator console redir help output
> - Add support for the emulator console power command
>
> Greg Bellows (15):
>   android-console: Fix goldfish audio misnaming
>   android-console: Unify available commands output
>   android-console: Remove extra redir help message
>   android-console: Consolidate redir help text
>   android-console: Add console base power command
>   android-console: Add missing hw_has_battery prop
>   android-console: Init the battery ID state field
>   android-console: Add header for battery externs
>   android-console: Add GF battery prop print func
>   android-console: Add GF battery property getter
>   android-console: Add power ac command
>   android-console: Add power status command
>   android-console: Add power present command
>   android-console: Add power health command
>   android-console: Add power capacity command
>
>  android-commands.h |  53 
>  android-console.c  | 254
> -
>  android-console.h  |   8 ++
>  hw/misc/goldfish_battery.c | 122 --
>  include/hw/misc/goldfish_battery.h |  74 +++
>  monitor.c  |  16 ++-
>  6 files changed, 484 insertions(+), 43 deletions(-)
>  create mode 100644 include/hw/misc/goldfish_battery.h
>
> --
> 1.8.3.2
>
>


Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Mark Rutland
On Wed, Nov 12, 2014 at 12:27:14PM +, Paolo Bonzini wrote:
> 
> 
> On 12/11/2014 13:18, Mark Rutland wrote:
> > On Wed, Nov 12, 2014 at 11:48:27AM +, Paolo Bonzini wrote:
> >>
> >> On 12/11/2014 12:34, Christoffer Dall wrote:
> >>> AFAIU ACPI already has a method for doing this
> >>
> >> It's not defined in the spec.  QEMU defines a bunch of registers to do
> >> that, and provides AML that works with those registers.
> > 
> > Huh? SCI + AML is the method, and that's defined by the spec.
> 
> I thought Christoffer meant a method to do the actual hotplug, not just
> to signal events.  If you want to "support the low-level interfaces in
> the kernel for this anyway", you certainly need to know the details
> underneath the AML.

I think when Christoffer said that ACPI already had a mechanism he was
referring to the SCI + AML, as opposed to the underlying implementation.

> >> Perhaps you could treat it as a shared level-triggered interrupt in DT?
> >>  I don't know.
> > 
> > Putting an interrupt in DT is trivial. The hard part is the rest of the
> > interface, which so far there is no specification for.
> 
> Have you looked at docs/specs/acpi_{cpu,mem}_hotplug.txt?  Writing a DT
> binding for it is trivial too.  Or are we talking about two different
> things?

Writing a binding for the partiuclar device might be trivial. How this
would fit with the DT model is more complicated, and needs to be
specified. As far as I can see, those documents are quite strongly tied
to x86 ACPI (they talk in terms of OSPM, OST, GPE, APIC IDs).

I agree that we could do CPU and memory hotplug without ACPI, but we
need to specify the full firmware interface for doing so, and how this
interacts with the initial DTB if using DT. We can't just pick-and-mix
portions of ACPI and state that it's specified and standard.

Thanks,
Mark.



Re: [Qemu-devel] Better Cortex-M support?

2014-11-12 Thread Liviu Ionescu

On 12 Nov 2014, at 15:02, Peter Maydell  wrote:

> The instructions themselves are generally supported for
> the A profile cores, so getting that part right would mostly
> involve enabling those feature bits for the new M4 CPU class.

ok

> The difficult bits are going to involve:
> * correct trapping when fp is disabled
> * getting the exception handling right, including lazy
>   exception frame population
> * bits where M profile FP differs from A profile (eg fp
>   status and ID registers being memory mapped)

all seem related to FP, which, for my priority list, comes second, after 
specific vendor system registers for the supported processors.

my first priority is to make the emulator run the images generated by the 
project templates *without* any changes.

this requires the presence of the registers used by the CMSIS SystemInit() 
call, which generally are vendor specific, related to clock settings (pll & co).

> For upstream we would just want to fix arm-softmmu, I think.

for me, the plain arm-softmmu is not usable, since it forwards the semihosting 
calls via gdb instead of using native implementation, so I had to add a 
configuration variable to change this behaviour.

> For submitting changes upstream you'll need to clean up
> this set of commits (rebase, rearrange, split or merge commits,
> etc so you have one commit (patch) per thing you're fixing and
> it's a clean change that stands on its own.

I understand. I'll consider this, but we'll probably postpone until significant 
improvements are there and someone else needs them.


regards,

Liviu




Re: [Qemu-devel] Better Cortex-M support?

2014-11-12 Thread Peter Maydell
On 12 November 2014 13:43, Liviu Ionescu  wrote:
> all seem related to FP, which, for my priority list, comes second,
> after specific vendor system registers for the supported processors.
>
> my first priority is to make the emulator run the images generated
> by the project templates *without* any changes.
>
> this requires the presence of the registers used by the CMSIS
> SystemInit() call, which generally are vendor specific, related
> to clock settings (pll & co).

This sounds to me like it's part of the SoC and/or board rather
than the CPU itself. You'll need to model the SoC and board,
which can be a fair bit of work. (Our existing M3 code isn't
a great example, either. I'd suggest looking at Alistair's patches
on the list for supporting the netduino2, if you haven't already.)

>> For upstream we would just want to fix arm-softmmu, I think.
>
> for me, the plain arm-softmmu is not usable, since it forwards the
> semihosting calls via gdb instead of using native implementation,
> so I had to add a configuration variable to change this behaviour.

That sounds like the kind of thing that other people might also
want to be able to do, so it would be better to make arm-softmmu
be runtime configurable on this, I think.

thanks
-- PMM



Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment

2014-11-12 Thread Eric Blake
On 11/12/2014 02:55 AM, Max Reitz wrote:
> On 2014-11-12 at 05:15, Eric Blake wrote:
>> On 11/10/2014 06:45 AM, Max Reitz wrote:
>>> Add a function qcow2_change_refcount_order() which allows changing the
>>> refcount order of a qcow2 image.
>>>
>>> Signed-off-by: Max Reitz 
>>> ---
>>>   block/qcow2-refcount.c | 424
>>> +
>>>   block/qcow2.h  |   4 +
>>>   2 files changed, 428 insertions(+)
>> This is fairly big; you may want to get a second review from a
>> maintainer rather than blindly trusting me.
> 
> I didn't really see the point in splitting it up. Introducing the static
> helper functions first would not be very helpful either, so I thought
> what I'd do as a reviewer, and that was "apply the patch and just read
> through all the code". Splitting it into multiple patches would not have
> helped there (and I don't see how I could split this patch into logical
> changes, where at the start we have some inefficient but simple
> implementation and it gets better over time).

Yes, I agree that your patch is not divisible.  Big doesn't always mean
bad :)


>>> +refcount = s->get_refcount(refblock, refblock_index);
>>> +if (new_refcount_bits < 64 && refcount >>
>>> new_refcount_bits) {
>> Technically, this get_refcount() call is dead code on the second walk,
>> since the first walk already validated things, so you could push all of
>> this code...
>>
>>> +uint64_t offset;
>>> +
>>> +qcow2_cache_put(bs, s->refcount_block_cache,
>>> &refblock);
>>> +
>>> +offset = ((reftable_index <<
>>> s->refcount_block_bits)
>>> +  + refblock_index) << s->cluster_bits;
>>> +
>>> +error_setg(errp, "Cannot decrease refcount entry
>>> width to "
>>> +   "%i bits: Cluster at offset %#"
>>> PRIx64 " has a "
>>> +   "refcount of %" PRIu64,
>>> new_refcount_bits,
>>> +   offset, refcount);
>>> +return -EINVAL;
>>> +}
>>> +
>>> +if (new_set_refcount) {
>>> +new_set_refcount(new_refblock,
>>> new_refblock_index++, refcount);
>>> +} else {
>> ...here, in the branch only run on the first walk.
> 
> Well, yes, but I wanted to keep this function as agnostic to what the
> caller wants to do with it as possible. I'd rather decide depending on
> whether index == 0 because that's a better way of discerning the first
> walk.

I was thinking a bit more about how to avoid the allocation corner case
bug, and wonder if three walks instead of 2 is the right solution, in
which case the first two walks are both allocations, and index==0 is no
longer a reliable witness of whether these checks are needed.  More on
that below.

> 
>>> +new_refblock_index++;
>>> +}
>>> +new_refblock_empty = new_refblock_empty && refcount
>>> == 0;
>> Worth condensing to 'new_refblock_empty &= !refcount'?  Maybe not.
> 
> I personally would find that harder to read.

No need to change it then.


>>> +if (new_set_refcount) {
>>> +new_set_refcount(new_refblock,
>>> new_refblock_index++, 0);
>> Would it be worth guaranteeing that every new refblock is 0-initialized
>> when allocated, so that you can skip setting a refcount to 0?  This
>> question depends on the answer about block allocation asked at [4] above.
> 
> This function sets a value in the buffer new_refblock, not in the
> cluster on disk. Therefore, in order to be able to omit this call, we'd
> have to call a memset() with 0 on new_refblock after each call to
> operation(). I don't think it's worth it. This is more explicit and
> won't cost much performance.

Yeah, that's about the same conclusion I came to after finishing the
whole review, although I didn't state it very well (this was one of my
earlier comments in my non-linear review, that I didn't touch up later).


>>> +/* The new_reftable_size is now valid and will not be changed
>>> anymore,
>>> + * so we can now allocate the reftable */
>>> +new_reftable_offset = qcow2_alloc_clusters(bs, new_reftable_size *
>>> +   sizeof(uint64_t));
>> And here is your bug, that I hinted at with the mention of [3] above.
>> This allocation can potentially cause an overflow of the existing
>> reftable to occupy one more cluster.
> 
> An additional bug is that the new reftable may be referenced by an
> existing refblock which was completely empty, though (or at least
> referenced by part of an existing refblock which was to be turned into a
> new refblock which was then completely empty, and thus omitted from
> allocation).
> 
>> Remember my thought experiment
>> above, how an 8 megabyte image rolls from 1 to 2 clusters during the
>> course of allocating refblocks

Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU

2014-11-12 Thread Petr Tesarik
On Wed, 12 Nov 2014 08:18:04 -0500
Christopher Covington  wrote:

> On 11/12/2014 03:05 AM, Petr Tesarik wrote:
> > On Tue, 11 Nov 2014 12:27:44 -0500
> > Christopher Covington  wrote:
> > 
> >> On 11/11/2014 06:22 AM, Laszlo Ersek wrote:
> >>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please
> >>> keep me CC'd.)
> >>>
> >>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib,
> >>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command.
> >>>
> >>> The resultant vmcore is usually analyzed with the "crash" utility.
> >>>
> >>> The original tool producing such files is kdump. Unlike the procedure
> >>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd
> >>> kdump kernel), and has more information about the original guest kernel
> >>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state
> >>> is opaque.
> >>>
> >>> For this reason, the kdump preparation logic in QEMU hardcodes a number
> >>> of fields in the kdump header. The direct issue is the "phys_base"
> >>> field. Refer to dump.c, functions create_header32(), create_header64(),
> >>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text
> >>> "0").
> >>>
> >>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD
> >>>
> >>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD
> >>>
> >>> This works in most cases, because the guest Linux kernel indeed tends to
> >>> be loaded at guest-phys address 0. However, when the guest Linux kernel
> >>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map),
> >>> then the guest Linux kernel is loaded at 16MB, thereby getting out of
> >>> sync with the phys_base=0 setting visible in the KDUMP header.
> >>>
> >>> This trips up the "crash" utility.
> >>>
> >>> Dave worked around the issue in "crash" for ELF format dumps -- "crash"
> >>> can identify QEMU as the originator of the vmcore by finding the QEMU
> >>> notes in the ELF vmcore. If those are present, then "crash" employs a
> >>> heuristic, probing for a phys_base up to 32MB, in 1MB steps.
> >>
> >> What advantages does KDUMP have over ELF?
> > 
> > It's smaller (data is compressed), and it contains a header with some
> > useful information (e.g. the crashed kernel's version and release).
> 
> What if the ELF dumper used SHF_COMPRESSED or could dump an ELF.xz?

Not the same thing. With KDUMP, each page is compressed separately, so
if a utility like crash needs a page from the middle, it can find it
and unpack it immediately. If we had an ELF.xz, then the whole file
must be unpacked before it can be used. And unpacking a few terabytes
takes ... a while. ;-)

> How does QEMU figure out the kernel version information?

Good question. Who can answer this part?

Petr T



Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU

2014-11-12 Thread Petr Tesarik
On Tue, 11 Nov 2014 12:27:44 -0500
Christopher Covington  wrote:

> On 11/11/2014 06:22 AM, Laszlo Ersek wrote:
> > (Note: I'm not subscribed to either qemu-devel or the kexec list; please
> > keep me CC'd.)
> > 
> > QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib,
> > kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command.
> > 
> > The resultant vmcore is usually analyzed with the "crash" utility.
> > 
> > The original tool producing such files is kdump. Unlike the procedure
> > performed by QEMU, kdump runs from *within* the guest (under a kexec'd
> > kdump kernel), and has more information about the original guest kernel
> > state (which is being dumped) than QEMU. To QEMU, the guest kernel state
> > is opaque.
> > 
> > For this reason, the kdump preparation logic in QEMU hardcodes a number
> > of fields in the kdump header. The direct issue is the "phys_base"
> > field. Refer to dump.c, functions create_header32(), create_header64(),
> > and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text
> > "0").
> > 
> > http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD
> > 
> > http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD
> > 
> > This works in most cases, because the guest Linux kernel indeed tends to
> > be loaded at guest-phys address 0. However, when the guest Linux kernel
> > is booted on top of OVMF (which has a somewhat unusual UEFI memory map),
> > then the guest Linux kernel is loaded at 16MB, thereby getting out of
> > sync with the phys_base=0 setting visible in the KDUMP header.
> > 
> > This trips up the "crash" utility.
> > 
> > Dave worked around the issue in "crash" for ELF format dumps -- "crash"
> > can identify QEMU as the originator of the vmcore by finding the QEMU
> > notes in the ELF vmcore. If those are present, then "crash" employs a
> > heuristic, probing for a phys_base up to 32MB, in 1MB steps.
> 
> What advantages does KDUMP have over ELF?

It's smaller (data is compressed), and it contains a header with some
useful information (e.g. the crashed kernel's version and release).

HTH,
Petr T



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Julien Grall

Hi,

On 12/11/2014 10:44, Christoffer Dall wrote:


I'm not a fan of placing fundamentally required system description on
the command line. It's fine for explicit overrides but I don't think it
should be the default mechanism as that causes its own set of problems
(who wants to fight with their hypervisor to pass a command line to a
guest kernel?).



Agreed completely, but I've been lacking strong technical arguments
against passing this stuff on the cmdline.  My personal preferred
approach for the Xen Dom0 case is to add a property to the DT.


IHMO, the cmdline is OS specific, therefore this solution wouldn't fit 
to support ACPI on other OS (i.e *BSD) without requiring specific 
implementation/parsing of the command line.


The DT solution would be cleaner and the bindings are already 
standardize (and starting to be used on other OS such as FreeBSD).


Regards,

--
Julien Grall



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Graeme Gregory
On Wed, Nov 12, 2014 at 11:07:22AM +, Mark Rutland wrote:
> [...]
> 
> > > > > > We are currently suggesting adding an RDSP property to the chosen 
> > > > > > node
> > > > > > in the tiny DT, but a command-line arguement like kexec proposed 
> > > > > > could
> > > > > > be another option I guess, albeit not a very pretty one.
> > > > > 
> > > > > I'm not sure what an RDSP command line property would have to do with
> > > > > kexec. I'll assume I've misunderstood something.
> > > > > 
> > > > 
> > > > I thought the kexec patches proposed passing the RDSP on the
> > > > command-line to boot the secondary kernel, so if that ended up being
> > > > supported by the kernel for kexec, maybe that could be leveraged by
> > > > Xen's boot protocol.  It was an idea someone brought to me, just thought
> > > > I'd mention it.
> > > 
> > > Ah, that's not something I'd heard of.
> > 
> > Maybe there was a misunderstanding then, I thought you were cc'ed on
> > those discussions.
> 
> I may just have lost them in my inbox. I'm on a few too many mailing
> lists these days. Sorry about that.
> 
> > > I'm not a fan of placing fundamentally required system description on
> > > the command line. It's fine for explicit overrides but I don't think it
> > > should be the default mechanism as that causes its own set of problems
> > > (who wants to fight with their hypervisor to pass a command line to a
> > > guest kernel?).
> > > 
> > 
> > Agreed completely, but I've been lacking strong technical arguments
> > against passing this stuff on the cmdline.  My personal preferred
> > approach for the Xen Dom0 case is to add a property to the DT.
> 
> Something under /chosen, or a firmware node would sound preferable to
> me. For UEFI we pass the system table address as
> /chosen/linux,uefi-system-table = <... ...>, and I think the RDSP could
> be handled similarly if necessary. The user can than override that via
> the command line if desired.
> 
We have actually done that in the past when we had to support u-boot and
bootwrapper as a bootloader. It works fine in the kernel and its a
minimal patch to support.

Graeme




Re: [Qemu-devel] State of ARM FIQ in Qemu

2014-11-12 Thread Tim Sander
Hi Greg

> > Bad mode in data abort handler detected
> > Internal error: Oops - bad mode: 0 [#1] PREEMPT SMP ARM
> > Modules linked in: firq(O) ipv6
> > CPU: 0 PID: 103 Comm: systemd-udevd Tainted: G   O 3.14.0 #1
> > task: bf2b9300 ti: bf362000 task.ti: bf362000
> > PC is at 0x1240
> > LR is at handle_fasteoi_irq+0x9c/0x13c
> > pc : []lr : [<8005cda0>]psr: 600f01d1
> > sp : bf363e70  ip : 07a7e79d  fp : 
> > r10: 76f92008  r9 : 80590080  r8 : 76e8e4d0
> > r7 : f8200100  r6 : bf363fb0  r5 : bf008414  r4 : bf0083c0
> > r3 : 80230d04  r2 : 002f  r1 :   r0 : bf0083c0
> > Flags: nZCv  IRQs off  FIQs off  Mode FIQ_32  ISA ARM  Segment user
> 
> It looks like we are in FIQ mode and interrupts have been masked.
Indeed.
 
> > Control: 10c53c7d  Table: 60004059  DAC: 0015
> > Process systemd-udevd (pid: 103, stack limit = 0xbf362240)
> > Stack: (0xbf363e70 to 0xbf364000)
> > 3e60: bf0083c0  002f
> > 80230d04
> > 3e80: bf0083c0 bf008414 bf363fb0 f8200100 76e8e4d0 80590080 76f92008
> > 
> > 3ea0: 07a7e79d bf363e70 8005cda0 1240 600f01d1  8005cd04
> > 002f
> > 3ec0: 002f 800598bc 8058cc70 8000ed00 f820010c 8059684c bf363ef8
> > 80008528
> > 3ee0: 80023730 80023744 200f0113  bf363f2c 80012180 
> > 805baa00
> > 3f00:  0100 0002 0022  bf362000 76e8e4d0
> > 80590080
> > 3f20: 76f92008  000a bf363f40 80023730 80023744 200f0113
> > 
> > 3f40: bf007a14 8059ac00  000a 8dd7 00400140 bf0079c0
> > 8058cc70
> > 3f60: 0022  f8200100 76e8e4d0 76f9201c 76f92008 
> > 80023af0
> > 3f80: 8058cc70 8000ed04 f820010c 8059684c bf363fb0 80008528 
> > 76dd3b44
> > 3fa0: 600f0010  000c 8001233c   76f93428
> > 76f93428
> > 3fc0: 76f93438  76f93448 000c 76e8e4d0 76f9201c 76f92008
> > 
> > 3fe0:  7ec115c0 76f60914 76dd3b44 600f0010  9fffd821
> > 9fffdc21
> > [<8005cda0>] (handle_fasteoi_irq) from [<80230d04>] (gic_eoi_irq+0x0/0x4c)
> 
> It certainly looks like we are going down the standard IRQ patch as you
> suggested.  I'm not a Linux driver guy, but do you see any kind of activity
> (break points, printfs, ...) through your FIQ handler?
I am reaching 0x1224 which i believe is the fiq vector address on the 
vexpress?

> > [<80230d04>] (gic_eoi_irq) from [] (0xf8200100)
> > Code: ee02af10 f57ff06f e59d8000 e59d9004 (e599b00c)
> > ---[ end trace 3dc3571209a017e1 ]---
> > Kernel panic - not syncing: Fatal exception in interrupt
> 
> It is hard to determine entirely what is happening here based on this
> info.  I do have code of my own that routes KGDB interrupts as FIQs and
> with the workaround I see the FIQs handled as expected.  Some things we can
> try to get more info in hopes of pinpointing where to look:
> 
>1. At the top of hw/intc/arm_gic.c there is the following commented out
>line:
>//#define DEBUG_GIC
>Uncomment the line, rebuild and rerun.  This will give us some trace on
>what is going through the GIC code.
I have commented out some debug lines but i see:
Breakpoint 1, gic_update_with_grouping (s=0x564dba80) at 
hw/intc/arm_gic.c:120
120 DPRINTF("Raised pending FIQ %d (cpu %d)\n", 
best_irq, cpu);

With the expected irq nr. 49 (32+17).

>2. Run qemu with the "-d int" option which will print a message on each
>interrupt.  We should see an FIQ at some point if they are occurring. The
> only issue is that there will be numerous IRQs, so you'll have to parse
> through them to find an "exception 6 [FIQ].
Here is the relevant output when the FIQ hits:
Taking exception 2 [SVC]
Taking exception 2 [SVC]
pml: pml_timer_tick: raise_irq
arm_gic: Raised pending FIQ 49 (cpu 0)
Taking exception 6 [FIQ]
pml: pml_write: update control flags: 1
pml: pml_update: start timer
pml: pml_update: lower irq
pml: pml_read: read magic
pml: pml_write: update control flags: 3
pml: pml_update: start timer
Taking exception 3 [Prefetch Abort]
...with IFSR 0x5 IFAR 0x80221d70
Taking exception 4 [Data Abort]
...with DFSR 0x805 DFAR 0x805c604c
Taking exception 4 [Data Abort]
...with DFSR 0x805 DFAR 0x805c604c
Taking exception 4 [Data Abort]

So the fiq is hitting but unfortunatly i have no idea where the data aborts are 
coming from.
I have shifted all other Irqs besides 49 to group 1 so that only irq 49 is a 
FIQ. 
Might it be that i am seeing some secure violations...
The address of the IFAR __idr_pre_get which lives in the linux kernel in 
lib/idr.c seems to
be implementing ann integer ID management. 

>3. If you set a breakpoint in your driver, is it possible to see that
>FIQs are on from the kernel debugger.  Clearly you have to try this from
> a path where interrupts are masked.  I see the following on my system
> mentioned above:
>...
>Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment

Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 14:41, Mark Rutland wrote:
> Writing a binding for the partiuclar device might be trivial. How this
> would fit with the DT model is more complicated, and needs to be
> specified. As far as I can see, those documents are quite strongly tied
> to x86 ACPI (they talk in terms of OSPM, OST, GPE, APIC IDs).

OSPM -> replace with kernel driver
OST -> used to generate events, doesn't need to be implemented in the
   kernel driver.  Or just define the meaning based on the ACPI
   _OST spec.
GPE -> replace with GPIO
APIC ID -> replace with whatever id ARM CPUs have

> I agree that we could do CPU and memory hotplug without ACPI, but we
> need to specify the full firmware interface for doing so, and how this
> interacts with the initial DTB if using DT.

The initial DTB has to expose the IDs for hotpluggable CPUs and the
range for hotpluggable memory.  In the ACPI case this goes in the SRAT.
 But none of this is insurmountable.

> We can't just pick-and-mix
> portions of ACPI and state that it's specified and standard.

But that's what you already do if you want to build ACPI tables from DT.
 You are already picking-and-mixing the variable portions of the ACPI
tables and make a DT bindings for them.

All that's left is to de-x86ify the existing spec (which is really
easy), and to figure out a DT binding for it.  It's really not unlike
any other MMIO device.

Paolo



Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU

2014-11-12 Thread Laszlo Ersek
On 11/12/14 14:26, Petr Tesarik wrote:
> On Wed, 12 Nov 2014 08:18:04 -0500
> Christopher Covington  wrote:
> 
>> On 11/12/2014 03:05 AM, Petr Tesarik wrote:
>>> On Tue, 11 Nov 2014 12:27:44 -0500
>>> Christopher Covington  wrote:
>>>
 On 11/11/2014 06:22 AM, Laszlo Ersek wrote:
> (Note: I'm not subscribed to either qemu-devel or the kexec list; please
> keep me CC'd.)
>
> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib,
> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command.
>
> The resultant vmcore is usually analyzed with the "crash" utility.
>
> The original tool producing such files is kdump. Unlike the procedure
> performed by QEMU, kdump runs from *within* the guest (under a kexec'd
> kdump kernel), and has more information about the original guest kernel
> state (which is being dumped) than QEMU. To QEMU, the guest kernel state
> is opaque.
>
> For this reason, the kdump preparation logic in QEMU hardcodes a number
> of fields in the kdump header. The direct issue is the "phys_base"
> field. Refer to dump.c, functions create_header32(), create_header64(),
> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text
> "0").
>
> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD
>
> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD
>
> This works in most cases, because the guest Linux kernel indeed tends to
> be loaded at guest-phys address 0. However, when the guest Linux kernel
> is booted on top of OVMF (which has a somewhat unusual UEFI memory map),
> then the guest Linux kernel is loaded at 16MB, thereby getting out of
> sync with the phys_base=0 setting visible in the KDUMP header.
>
> This trips up the "crash" utility.
>
> Dave worked around the issue in "crash" for ELF format dumps -- "crash"
> can identify QEMU as the originator of the vmcore by finding the QEMU
> notes in the ELF vmcore. If those are present, then "crash" employs a
> heuristic, probing for a phys_base up to 32MB, in 1MB steps.

 What advantages does KDUMP have over ELF?
>>>
>>> It's smaller (data is compressed), and it contains a header with some
>>> useful information (e.g. the crashed kernel's version and release).

Another advantage is that all zero-filled pages are represented in the
kdump file by one shared zero page.

The difference in speed of dumping is stunning.

>> What if the ELF dumper used SHF_COMPRESSED or could dump an ELF.xz?
> 
> Not the same thing. With KDUMP, each page is compressed separately, so
> if a utility like crash needs a page from the middle, it can find it
> and unpack it immediately. If we had an ELF.xz, then the whole file
> must be unpacked before it can be used. And unpacking a few terabytes
> takes ... a while. ;-)
> 
>> How does QEMU figure out the kernel version information?
> 
> Good question. Who can answer this part?

I can.

(Apologies for being a bit non-responsive, I'm swamped. I figured I'd
let the discussion unfold a bit between the kdump experts.)

So, QEMU doesn't figure out the kernel version information. It just
dumps the guest-physical frames, and that's it.

I linked the code before that populates the kdump header. The links and
function names are still visible above.

Thanks
Laszlo




Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Mark Rutland
On Wed, Nov 12, 2014 at 01:59:30PM +, Paolo Bonzini wrote:
> 
> 
> On 12/11/2014 14:41, Mark Rutland wrote:
> > Writing a binding for the partiuclar device might be trivial. How this
> > would fit with the DT model is more complicated, and needs to be
> > specified. As far as I can see, those documents are quite strongly tied
> > to x86 ACPI (they talk in terms of OSPM, OST, GPE, APIC IDs).
> 
> OSPM -> replace with kernel driver
> OST -> used to generate events, doesn't need to be implemented in the
>kernel driver.  Or just define the meaning based on the ACPI
>_OST spec.
> GPE -> replace with GPIO
> APIC ID -> replace with whatever id ARM CPUs have
> 
> > I agree that we could do CPU and memory hotplug without ACPI, but we
> > need to specify the full firmware interface for doing so, and how this
> > interacts with the initial DTB if using DT.
> 
> The initial DTB has to expose the IDs for hotpluggable CPUs and the
> range for hotpluggable memory.  In the ACPI case this goes in the SRAT.
>  But none of this is insurmountable.

My only point is that it needs to be defined, not that this definition
is impossible. That does trickle into a few places -- we currently have
no way of defining a CPU in DT which is possible but not present, for
example (the status property historically means something different per
ePAPR).

> > We can't just pick-and-mix
> > portions of ACPI and state that it's specified and standard.
> 
> But that's what you already do if you want to build ACPI tables from DT.
>  You are already picking-and-mixing the variable portions of the ACPI
> tables and make a DT bindings for them.

I don't follow. I argued _against_ trying to build ACPI tables from DT
because the two don't quite match up anyway. I argued _against_ trying
to convert ACPI tables to DT in prior discussions for similar reasons.

> All that's left is to de-x86ify the existing spec (which is really
> easy), and to figure out a DT binding for it.  It's really not unlike
> any other MMIO device.

In addition to fixing up the other specs which are affected by this
(e.g. how we describe those additional CPUs). There's also some
de-ACPIing to be done in addition to de-x86ification, and we need to be
careful to ensure we have access to all the information we require in
the absence of ACPI, and that we have a well defined behaviour on both
sides of the interface for what would previously have been implicit in
ACPI.

I'm not saying that this is impossible. It's just a greater body of work
than modifying one spec.

Thanks,
Mark.



Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment

2014-11-12 Thread Eric Blake
On 11/10/2014 06:45 AM, Max Reitz wrote:
> Add a function qcow2_change_refcount_order() which allows changing the
> refcount order of a qcow2 image.

A thought: didn't you just submit a patch that marked the image as
dirty, nuked the on-disk refcount, then rebuilt one using the in-memory
refcounts?  Would reusing THAT code be any better than writing this patch?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH 18/21] qcow2: Add function for refcount order amendment

2014-11-12 Thread Max Reitz

On 2014-11-12 at 15:19, Eric Blake wrote:

On 11/10/2014 06:45 AM, Max Reitz wrote:

Add a function qcow2_change_refcount_order() which allows changing the
refcount order of a qcow2 image.

A thought: didn't you just submit a patch that marked the image as
dirty, nuked the on-disk refcount, then rebuilt one using the in-memory
refcounts?  Would reusing THAT code be any better than writing this patch?


Yes, I thought about that, too... The problem is that that patch 
requires all refcount blocks to fit in memory at the same time (or 
generally, the qcow2 check function requires that, for now). I'd really 
like to avoid that, if possible, but maybe it isn't possible after all.


But if you say it like that ("nuke"), I guess I'll give it a try. Maybe 
it looks funny enough.


Max



Re: [Qemu-devel] Better Cortex-M support?

2014-11-12 Thread Liviu Ionescu

On 12 Nov 2014, at 15:51, Peter Maydell  wrote:

> ... I'd suggest looking at Alistair's patches
> on the list for supporting the netduino2,

will certainly do.

and I also plan to review the patches of Andre Bechus, available from 
https://github.com/beckus/qemu_stm32.

>> for me, the plain arm-softmmu is not usable, since it forwards the
>> semihosting calls via gdb instead of using native implementation,
>> so I had to add a configuration variable to change this behaviour.
> 
> That sounds like the kind of thing that other people might also
> want to be able to do, so it would be better to make arm-softmmu
> be runtime configurable on this, I think.

for the gnuarmeclipse-qemu target, since I never need to forward semihosting to 
gdb, the solution was to add a configuration variable and statically set it at 
build time to always use native calls.

--- gdbstub.c ---
/* If gdb is connected when the first semihosting syscall occurs then use
   remote gdb syscalls.  Otherwise use native file IO.  */
int use_gdb_syscalls(void)
{
#if !defined(CONFIG_SEMIHOSTING_NATIVE)
if (gdb_syscall_mode == GDB_SYS_UNKNOWN) {
gdb_syscall_mode = (gdbserver_state ? GDB_SYS_ENABLED
: GDB_SYS_DISABLED);
}
return gdb_syscall_mode == GDB_SYS_ENABLED;
#else
// Make semihosting always use native file IO.
if (gdb_syscall_mode == GDB_SYS_UNKNOWN) {
gdb_syscall_mode = GDB_SYS_DISABLED;
}
return FALSE;
#endif
}

for arm-softmmu you probably need a way to configure this at run-time, which 
would involve a new monitor command, that can be issued from the GDB client via 
'mon '.

I guess you are not happy with a new gnuarmeclipse-qemu target, but since my 
experience with arm-softmmu is almost null, I want to avoid breaking anything 
there.


regards,

Liviu





Re: [Qemu-devel] Starting X remotely

2014-11-12 Thread Nigel Horne

On 11/10/14 3:03 PM, Brian Jackson wrote:

On Monday, November 10, 2014 09:42:45 AM Nigel Horne wrote:

When I start any emulator remotely (i.e. I've logged into the host
machine using 'ssh -Y'), I get this error and the guest console is
unusable.  All works well when I am logged in directly to the host
machine's console.  I'm following the instructions in the message and
e-mailing you!

I guess the message could be more specific about the kinds of info that are
actually helpful to solve problems.

The full Qemu command line used

I get it with any qemu guest that starts in an X window.  For example:
qemu-system-x86_64 -drive 
file=kfreebsd-amd64,index=0,media=disk,cache=writeback,aio=native -drive 
file=/dev/sr0,index=1,media=cdrom -boot c -redir tcp:2232::22 -m 1024 
-machine accel=kvm,kernel_irqchip=on -cpu host -net 
user,hostname=qemu.bandsman.co.uk -net nic,model=e1000 -k en-us

In this case it appears keycode related, so keyboard settings of host, guest,
remote system you are connecting from, etc.

 So to be clear there are 3 machines involved here:

1) the guest
2) the host
3) the machine I've ssh'd in to the host from

I presume it's the keyboard of machine 3 you're talking about.  I've had 
that whatever it is - PC running Linux and MACOS/X with an Apple 
keyboard all exhibit it.

Machine 2 is a PC running Debian Linux.


The key you pressed (don't say any key, tell us an exact key that reproduces
this)

Any key.

Any other pertinent information

This is with bleading edge.





** (qemu-system-x86_64:6389): WARNING **: Couldn't connect to
accessibility bus: Failed to connect to socket /tmp/dbus-PE9NEDvtUV:
Connection refused
unknown keycodes `empty_aliases(qwerty)', please report to
qemu-devel@nongnu.org

-Nigel




smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Qemu-devel] [PATCH 2/3] s390: implement pci instructions

2014-11-12 Thread Frank Blaschka
On Wed, Nov 12, 2014 at 10:36:03AM +0100, Paolo Bonzini wrote:
> 
> 
> On 12/11/2014 10:22, Alexander Graf wrote:
>  Absolutely lets make an example for qemu running on BE and LE
> 
>  byte orderconfig space backing   pci_default_read_config   pcilg 
>  (with cpu_to_le)
>  BE0x78563412 0x123456780x78563412
>  LE0x78563412 0x785634120x78563412
> >>>
> >>> No, pci_default_read_config() always returns 0x12345678 because it
> >>> returns a register, not memory.
> >>>
> >>
> >> You mean implementation of pci_default_read_config is broken?
> >> If it should return a register it should not do "return le32_to_cpu(val);"
> > 
> > It has to, to convert from memory (after memcpy) to an actual register
> > value. Look at the value list in Paolo's email - I really have no idea
> > how to explain it any better.
> 
> pci_default_read_config is reading from a *device* register, and has
> absolutely zero knowledge of the host CPU endianness.
> 
> Another way to explain that the result of pci_default_read_config is
> independent of the host endianness, is that the function is basically
> doing this:
> 
> switch (len) {
> case 1: return d->config[address];
> case 2: return ldw_le_p(&d->config[address)]);
> case 4: return ldl_le_p(&d->config[address)]);
> default: abort();
> }
> 
> So if you want to make the outcome big endian, you have to swap
> unconditionally.
> 
> Paolo

Hi Paolo, Alex,

thx a lot for all the explanation and patience.
I think I have understand your point now. I will change the code to 
unconditional swap. I feel I had a knowledge gap regarding running guest and
host which different byte orders. Hope this gap is filled now ;)

Frank

> 




Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU

2014-11-12 Thread Petr Tesarik
V Wed, 12 Nov 2014 08:28:54 -0500
Christopher Covington  napsáno:

> On 11/12/2014 08:26 AM, Petr Tesarik wrote:
> > On Wed, 12 Nov 2014 08:18:04 -0500
> > Christopher Covington  wrote:
> > 
> >> On 11/12/2014 03:05 AM, Petr Tesarik wrote:
> >>> On Tue, 11 Nov 2014 12:27:44 -0500
> >>> Christopher Covington  wrote:
> >>>
>  On 11/11/2014 06:22 AM, Laszlo Ersek wrote:
> > (Note: I'm not subscribed to either qemu-devel or the kexec list; please
> > keep me CC'd.)
> >
> > QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib,
> > kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command.
> >
> > The resultant vmcore is usually analyzed with the "crash" utility.
> >
> > The original tool producing such files is kdump. Unlike the procedure
> > performed by QEMU, kdump runs from *within* the guest (under a kexec'd
> > kdump kernel), and has more information about the original guest kernel
> > state (which is being dumped) than QEMU. To QEMU, the guest kernel state
> > is opaque.
> >
> > For this reason, the kdump preparation logic in QEMU hardcodes a number
> > of fields in the kdump header. The direct issue is the "phys_base"
> > field. Refer to dump.c, functions create_header32(), create_header64(),
> > and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text
> > "0").
> >
> > http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD
> >
> > http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD
> >
> > This works in most cases, because the guest Linux kernel indeed tends to
> > be loaded at guest-phys address 0. However, when the guest Linux kernel
> > is booted on top of OVMF (which has a somewhat unusual UEFI memory map),
> > then the guest Linux kernel is loaded at 16MB, thereby getting out of
> > sync with the phys_base=0 setting visible in the KDUMP header.
> >
> > This trips up the "crash" utility.
> >
> > Dave worked around the issue in "crash" for ELF format dumps -- "crash"
> > can identify QEMU as the originator of the vmcore by finding the QEMU
> > notes in the ELF vmcore. If those are present, then "crash" employs a
> > heuristic, probing for a phys_base up to 32MB, in 1MB steps.
> 
>  What advantages does KDUMP have over ELF?
> >>>
> >>> It's smaller (data is compressed), and it contains a header with some
> >>> useful information (e.g. the crashed kernel's version and release).
> >>
> >> What if the ELF dumper used SHF_COMPRESSED or could dump an ELF.xz?
> > 
> > Not the same thing. With KDUMP, each page is compressed separately, so
> > if a utility like crash needs a page from the middle, it can find it
> > and unpack it immediately. If we had an ELF.xz, then the whole file
> > must be unpacked before it can be used. And unpacking a few terabytes
> > takes ... a while. ;-)
> 
> Understood on the ELF.xz approach, but why couldn't each page (or maybe a
> configurable size) be a SHF_COMPRESSED section?

A machine with 64TB of RAM (already manufactured by SGI) has
17,179,869,184 pages. When KDUMP (or, actually diskdump) format was
invented, ELF files could have at most 2^16 = 65,536 program headers.
Since then, ELF specification has been extended (PN_XNUM), so the
number of sections can be stored in the sh_info field of the first ELF
section, but that only increases the number of possible sections to
2^32 = 4,294,967,296.

Yes, we could divide memory into larger chunks than pages, but:

  1. you're probably the first one to have the idea, and
  2. this is easy if you save the complete RAM content, but not quite
 that easy if some pages should be filtered out (makedumpfile).

There are a few other (minor) points, e.g.:

  * Each program header consumes 56 bytes in ELF64, while a single
bit is sufficient in KDUMP compressed files to tell if the
corresponding page is stored or not.

  * SHF_COMPRESSED currently supports only zlib compression, which is
rather slow. KDUMP supports zlib, lzo and snappy.

  * Support for KDUMP files is already present in the crash utility,
while I don't think there is any support for SHF_COMPRESSED
segments.

In short, SHF_COMPRESSED looks like a viable alternative, but right now
KDUMP is the better choice in terms of features and interoperability.

Just my two cents,
Petr T



Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU

2014-11-12 Thread Laszlo Ersek
On 11/11/14 18:27, Christopher Covington wrote:
> On 11/11/2014 06:22 AM, Laszlo Ersek wrote:
>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please
>> keep me CC'd.)
>>
>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib,
>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command.
>>
>> The resultant vmcore is usually analyzed with the "crash" utility.
>>
>> The original tool producing such files is kdump. Unlike the procedure
>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd
>> kdump kernel), and has more information about the original guest kernel
>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state
>> is opaque.
>>
>> For this reason, the kdump preparation logic in QEMU hardcodes a number
>> of fields in the kdump header. The direct issue is the "phys_base"
>> field. Refer to dump.c, functions create_header32(), create_header64(),
>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text
>> "0").
>>
>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD
>>
>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD
>>
>> This works in most cases, because the guest Linux kernel indeed tends to
>> be loaded at guest-phys address 0. However, when the guest Linux kernel
>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map),
>> then the guest Linux kernel is loaded at 16MB, thereby getting out of
>> sync with the phys_base=0 setting visible in the KDUMP header.
>>
>> This trips up the "crash" utility.
>>
>> Dave worked around the issue in "crash" for ELF format dumps -- "crash"
>> can identify QEMU as the originator of the vmcore by finding the QEMU
>> notes in the ELF vmcore. If those are present, then "crash" employs a
>> heuristic, probing for a phys_base up to 32MB, in 1MB steps.
> 
> What advantages does KDUMP have over ELF?

This has been discussed, but I'd like to give a short perspective from
personal experience.

The more obvious advantage is the smaller size, due to (a) per-page
compression (which preserves random-access for "crash"), and (b) zero
page sharing. A smaller dump file is easier to store, and easier to
upload if you're requesting assitance with debugging.

The perhaps less obvious advantage is the speed at which qemu writes the
dump. We're talking orders of magnitude, especially on rotational media.
This is because lzo and snappy are *incredibly* fast (put differently:
they incur very little CPU penalty for the same guest RAM size). The CPU
penalty is actually so small that in almost all cases the dumping
procedure stays IO-bound (in my experience: even on an SSD!). Now
combine that with a potential reduction of 4GB -> 256MB in size: that's
a sixteen-fold speedup.

(I'm allowed to praise this qemu feature, I didn't write it. :))

Thanks
Laszlo



Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU

2014-11-12 Thread Laszlo Ersek
On 11/12/14 14:28, Christopher Covington wrote:
> On 11/12/2014 08:26 AM, Petr Tesarik wrote:
>> On Wed, 12 Nov 2014 08:18:04 -0500
>> Christopher Covington  wrote:
>>
>>> On 11/12/2014 03:05 AM, Petr Tesarik wrote:
 On Tue, 11 Nov 2014 12:27:44 -0500
 Christopher Covington  wrote:

> On 11/11/2014 06:22 AM, Laszlo Ersek wrote:
>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please
>> keep me CC'd.)
>>
>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib,
>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command.
>>
>> The resultant vmcore is usually analyzed with the "crash" utility.
>>
>> The original tool producing such files is kdump. Unlike the procedure
>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd
>> kdump kernel), and has more information about the original guest kernel
>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state
>> is opaque.
>>
>> For this reason, the kdump preparation logic in QEMU hardcodes a number
>> of fields in the kdump header. The direct issue is the "phys_base"
>> field. Refer to dump.c, functions create_header32(), create_header64(),
>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text
>> "0").
>>
>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD
>>
>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD
>>
>> This works in most cases, because the guest Linux kernel indeed tends to
>> be loaded at guest-phys address 0. However, when the guest Linux kernel
>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map),
>> then the guest Linux kernel is loaded at 16MB, thereby getting out of
>> sync with the phys_base=0 setting visible in the KDUMP header.
>>
>> This trips up the "crash" utility.
>>
>> Dave worked around the issue in "crash" for ELF format dumps -- "crash"
>> can identify QEMU as the originator of the vmcore by finding the QEMU
>> notes in the ELF vmcore. If those are present, then "crash" employs a
>> heuristic, probing for a phys_base up to 32MB, in 1MB steps.
>
> What advantages does KDUMP have over ELF?

 It's smaller (data is compressed), and it contains a header with some
 useful information (e.g. the crashed kernel's version and release).
>>>
>>> What if the ELF dumper used SHF_COMPRESSED or could dump an ELF.xz?
>>
>> Not the same thing. With KDUMP, each page is compressed separately, so
>> if a utility like crash needs a page from the middle, it can find it
>> and unpack it immediately. If we had an ELF.xz, then the whole file
>> must be unpacked before it can be used. And unpacking a few terabytes
>> takes ... a while. ;-)
> 
> Understood on the ELF.xz approach, but why couldn't each page (or maybe a
> configurable size) be a SHF_COMPRESSED section?

Perhaps it could, technically -- it's just not how Qiao Nuohan
implemented the feature. I didn't research the background for this.

Thanks
Laszlo



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Paolo Bonzini


On 12/11/2014 15:10, Mark Rutland wrote:
>>> We can't just pick-and-mix
>>> portions of ACPI and state that it's specified and standard.
>>
>> But that's what you already do if you want to build ACPI tables from DT.
>>  You are already picking-and-mixing the variable portions of the ACPI
>> tables and make a DT bindings for them.
> 
> I don't follow. I argued _against_ trying to build ACPI tables from DT
> because the two don't quite match up anyway. I argued _against_ trying
> to convert ACPI tables to DT in prior discussions for similar reasons.

Sorry, that was not you-Mark, but more you-ARM.

And in fact I'd tend to agree with you, but if there are people that
want not to use ACPI or UEFI or both, I think it's better if the UEFI
firmware swallows the same pill and builds ACPI from DT.

> In addition to fixing up the other specs which are affected by this
> (e.g. how we describe those additional CPUs). There's also some
> de-ACPIing to be done in addition to de-x86ification, and we need to be
> careful to ensure we have access to all the information we require in
> the absence of ACPI, and that we have a well defined behaviour on both
> sides of the interface for what would previously have been implicit in
> ACPI.

Yes, I agree.  On the QEMU side the de-ACPIfication would have to be
done anyway (no GPE because of the reduced hardware), but you need extra
de-ACPIfication for stuff like the SRAT.

> I'm not saying that this is impossible. It's just a greater body of work
> than modifying one spec.

No doubt about that.

Paolo



Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU

2014-11-12 Thread Christopher Covington
Thanks Petr and Laszlo for entertaining my questions. I've got one last one if
you have the time.

On 11/12/2014 09:10 AM, Laszlo Ersek wrote:
> On 11/12/14 14:26, Petr Tesarik wrote:
>> On Wed, 12 Nov 2014 08:18:04 -0500
>> Christopher Covington  wrote:
>>
>>> On 11/12/2014 03:05 AM, Petr Tesarik wrote:
 On Tue, 11 Nov 2014 12:27:44 -0500
 Christopher Covington  wrote:

> On 11/11/2014 06:22 AM, Laszlo Ersek wrote:
>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please
>> keep me CC'd.)
>>
>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib,
>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command.
>>
>> The resultant vmcore is usually analyzed with the "crash" utility.
>>
>> The original tool producing such files is kdump. Unlike the procedure
>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd
>> kdump kernel), and has more information about the original guest kernel
>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state
>> is opaque.
>>
>> For this reason, the kdump preparation logic in QEMU hardcodes a number
>> of fields in the kdump header. The direct issue is the "phys_base"
>> field. Refer to dump.c, functions create_header32(), create_header64(),
>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text
>> "0").
>>
>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD
>>
>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD
>>
>> This works in most cases, because the guest Linux kernel indeed tends to
>> be loaded at guest-phys address 0. However, when the guest Linux kernel
>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map),
>> then the guest Linux kernel is loaded at 16MB, thereby getting out of
>> sync with the phys_base=0 setting visible in the KDUMP header.
>>
>> This trips up the "crash" utility.
>>
>> Dave worked around the issue in "crash" for ELF format dumps -- "crash"
>> can identify QEMU as the originator of the vmcore by finding the QEMU
>> notes in the ELF vmcore. If those are present, then "crash" employs a
>> heuristic, probing for a phys_base up to 32MB, in 1MB steps.
>
> What advantages does KDUMP have over ELF?

 It's smaller (data is compressed), and it contains a header with some
 useful information (e.g. the crashed kernel's version and release).
> 
> Another advantage is that all zero-filled pages are represented in the
> kdump file by one shared zero page.
> 
> The difference in speed of dumping is stunning.

Would you expect using SHT_NOBITS to give a similar speedup to the ELF dumper?

Thanks,
Chris

-- 
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project



Re: [Qemu-devel] [Linaro-acpi] [RFC PATCH 0/7] hw/arm/virt: Dynamic ACPI v5.1 table generation

2014-11-12 Thread Claudio Fontana
On 12.11.2014 14:27, Peter Maydell wrote:
> On 12 November 2014 12:04, Mark Rutland  wrote:
>> On Wed, Nov 12, 2014 at 11:52:22AM +, Paolo Bonzini wrote:
>>> SeaBIOS fishes out information from fw_cfg, and puts it in low memory.
>>> On ARM you could use DT binary blobs instead of fw_cfg, as proposed
>>> already (I don't remember if it was in this thread or IRC).  Then if you
>>> want to go !UEFI you can extract the tables from those binary blobs.
>>
>> This sounds broken. I am very much not a fan of shoving binary blobs
>> into DT to workaround a shoddy boot interface.
> 
> My understanding from an IRC conversation yesterday was that at
> least some of these ACPI blobs contain data which has to be constructed
> at the point it is requested (ie is not fixed at the point when
> QEMU starts up), because OVMF will do:
>  * startup
>  * prod some parts of the hardware to configure it
>  * request ACPI tables via fw_cfg
> and the ACPI tables have to reflect the statu of the hardware
> after OVMF's poking, not before.
> 
> It wasn't entirely clear to me whether this applies equally
> to the ARM UEFI setup as to x86 + OVMF, but it does suggest that
> it would be better to define a memory-mapped variant of fw_cfg
> rather than stuffing the blobs into the device tree. (I didn't
> much like throwing the blobs in the dtb myself either.)
> 
> If somebody with more x86/ACPI knowledge could clarify what the
> dynamically-constructed parts of the tables are and whether
> they're likely to apply to use that would be good. I think they
> involved PCI in some way, but I don't have access to my irc logs
> right now to check. (I could imagine that ARM UEFI might not need
> to prod and configure a PCI bus and devices in the way that an
> x86 BIOS expects it has to, but that's speculation on my part,
> and I dunno that I'd care to bake that assumption into the design
> anyway.)

Would the last step you mention allow for guests to start with an already 
existing PCI interrupt map
and the BAR registers preprogrammed to point to somewhere sane?

I ask because on OSv at the moment, the situation is that for x86 we don't need 
to reprogram anything on PCI,
as everything is already nicely set up by the time the guest starts, and thus 
the BAR addresses can be read directly.
On ARM we have to reprogram the BARs manually for all devices.

Couldn't we give an easier time to each OS guest by having everything nicely 
set up on AArch64 as well?

Claudio

 











Re: [Qemu-devel] uniquely identifying KDUMP files that originate from QEMU

2014-11-12 Thread Laszlo Ersek
On 11/12/14 15:48, Christopher Covington wrote:
> Thanks Petr and Laszlo for entertaining my questions. I've got one last one if
> you have the time.
> 
> On 11/12/2014 09:10 AM, Laszlo Ersek wrote:
>> On 11/12/14 14:26, Petr Tesarik wrote:
>>> On Wed, 12 Nov 2014 08:18:04 -0500
>>> Christopher Covington  wrote:
>>>
 On 11/12/2014 03:05 AM, Petr Tesarik wrote:
> On Tue, 11 Nov 2014 12:27:44 -0500
> Christopher Covington  wrote:
>
>> On 11/11/2014 06:22 AM, Laszlo Ersek wrote:
>>> (Note: I'm not subscribed to either qemu-devel or the kexec list; please
>>> keep me CC'd.)
>>>
>>> QEMU is able to dump the guest's memory in KDUMP format (kdump-zlib,
>>> kdump-lzo, kdump-snappy) with the "dump-guest-memory" QMP command.
>>>
>>> The resultant vmcore is usually analyzed with the "crash" utility.
>>>
>>> The original tool producing such files is kdump. Unlike the procedure
>>> performed by QEMU, kdump runs from *within* the guest (under a kexec'd
>>> kdump kernel), and has more information about the original guest kernel
>>> state (which is being dumped) than QEMU. To QEMU, the guest kernel state
>>> is opaque.
>>>
>>> For this reason, the kdump preparation logic in QEMU hardcodes a number
>>> of fields in the kdump header. The direct issue is the "phys_base"
>>> field. Refer to dump.c, functions create_header32(), create_header64(),
>>> and "include/sysemu/dump.h", macro PHYS_BASE (with the replacement text
>>> "0").
>>>
>>> http://git.qemu.org/?p=qemu.git;a=blob;f=dump.c;h=9c7dad8f865af3b778589dd0847e450ba9a75b9d;hb=HEAD
>>>
>>> http://git.qemu.org/?p=qemu.git;a=blob;f=include/sysemu/dump.h;h=7e4ec5c7d96fb39c943d970d1683aa2dc171c933;hb=HEAD
>>>
>>> This works in most cases, because the guest Linux kernel indeed tends to
>>> be loaded at guest-phys address 0. However, when the guest Linux kernel
>>> is booted on top of OVMF (which has a somewhat unusual UEFI memory map),
>>> then the guest Linux kernel is loaded at 16MB, thereby getting out of
>>> sync with the phys_base=0 setting visible in the KDUMP header.
>>>
>>> This trips up the "crash" utility.
>>>
>>> Dave worked around the issue in "crash" for ELF format dumps -- "crash"
>>> can identify QEMU as the originator of the vmcore by finding the QEMU
>>> notes in the ELF vmcore. If those are present, then "crash" employs a
>>> heuristic, probing for a phys_base up to 32MB, in 1MB steps.
>>
>> What advantages does KDUMP have over ELF?
>
> It's smaller (data is compressed), and it contains a header with some
> useful information (e.g. the crashed kernel's version and release).
>>
>> Another advantage is that all zero-filled pages are represented in the
>> kdump file by one shared zero page.
>>
>> The difference in speed of dumping is stunning.
> 
> Would you expect using SHT_NOBITS to give a similar speedup to the ELF dumper?

Sorry, I don't know what SHT_NOBITS is.

Laszlo




[Qemu-devel] [PULL for-2.2 0/3] usb bugfixes for 2.2

2014-11-12 Thread Gerd Hoffmann
  Hi,

usb patch queue with three little fixes.

please pull,
  Gerd

The following changes since commit 558c2c8ddfb165a36eb95dc93125c04829d68aa7:

  Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging 
(2014-11-10 16:28:51 +)

are available in the git repository at:


  git://git.kraxel.org/qemu tags/pull-usb-20141112-1

for you to fetch changes up to 79ae25af1569a50a0ec799901a1bb280c088f121:

  usb-host: fix usb_host_speed_compat tyops (2014-11-12 15:27:23 +0100)


usb bugfixes for 2.2


Chris Johns (1):
  Provide the missing LIBUSB_LOG_LEVEL_* for older libusb or FreeBSD. 
Providing just the needed value as a defined.

Gerd Hoffmann (2):
  xhci: add sanity checks to xhci_lookup_uport
  usb-host: fix usb_host_speed_compat tyops

 hw/usb/hcd-xhci.c|  9 +
 hw/usb/host-libusb.c | 12 +---
 2 files changed, 18 insertions(+), 3 deletions(-)



[Qemu-devel] [PULL for-2.2 1/3] Provide the missing LIBUSB_LOG_LEVEL_* for older libusb or FreeBSD. Providing just the needed value as a defined.

2014-11-12 Thread Gerd Hoffmann
From: Chris Johns 

Signed-off-by: Chris Johns 
Signed-off-by: Gerd Hoffmann 
---
 hw/usb/host-libusb.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/usb/host-libusb.c b/hw/usb/host-libusb.c
index d2d161b..032a0e4 100644
--- a/hw/usb/host-libusb.c
+++ b/hw/usb/host-libusb.c
@@ -143,6 +143,12 @@ static void usb_host_attach_kernel(USBHostDevice *s);
 
 /*  */
 
+#ifndef LIBUSB_LOG_LEVEL_WARNING /* older libusb didn't define these */
+#define LIBUSB_LOG_LEVEL_WARNING 2
+#endif
+
+/*  */
+
 #define CONTROL_TIMEOUT  1/* 10 sec*/
 #define BULK_TIMEOUT 0/* unlimited */
 #define INTR_TIMEOUT 0/* unlimited */
-- 
1.8.3.1




[Qemu-devel] [PULL for-2.2 3/3] usb-host: fix usb_host_speed_compat tyops

2014-11-12 Thread Gerd Hoffmann
Signed-off-by: Gerd Hoffmann 
Reviewed-by: Gonglei 
---
 hw/usb/host-libusb.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/usb/host-libusb.c b/hw/usb/host-libusb.c
index 032a0e4..a5f9dab 100644
--- a/hw/usb/host-libusb.c
+++ b/hw/usb/host-libusb.c
@@ -749,13 +749,13 @@ static void usb_host_speed_compat(USBHostDevice *s)
 
 udev->speedmask = (1 << udev->speed);
 if (udev->speed == USB_SPEED_SUPER && compat_high) {
-udev->speedmask |= USB_SPEED_HIGH;
+udev->speedmask |= USB_SPEED_MASK_HIGH;
 }
 if (udev->speed == USB_SPEED_SUPER && compat_full) {
-udev->speedmask |= USB_SPEED_FULL;
+udev->speedmask |= USB_SPEED_MASK_FULL;
 }
 if (udev->speed == USB_SPEED_HIGH && compat_full) {
-udev->speedmask |= USB_SPEED_FULL;
+udev->speedmask |= USB_SPEED_MASK_FULL;
 }
 }
 
-- 
1.8.3.1




[Qemu-devel] [PULL for-2.2 2/3] xhci: add sanity checks to xhci_lookup_uport

2014-11-12 Thread Gerd Hoffmann
Also catch xhci_lookup_uport failures in post_load.

https://bugzilla.redhat.com/show_bug.cgi?id=1074219

Signed-off-by: Gerd Hoffmann 
---
 hw/usb/hcd-xhci.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/hw/usb/hcd-xhci.c b/hw/usb/hcd-xhci.c
index 2930b72..9a942cf 100644
--- a/hw/usb/hcd-xhci.c
+++ b/hw/usb/hcd-xhci.c
@@ -2262,6 +2262,9 @@ static USBPort *xhci_lookup_uport(XHCIState *xhci, 
uint32_t *slot_ctx)
 int i, pos, port;
 
 port = (slot_ctx[1]>>16) & 0xFF;
+if (port < 1 || port > xhci->numports) {
+return NULL;
+}
 port = xhci->ports[port-1].uport->index+1;
 pos = snprintf(path, sizeof(path), "%d", port);
 for (i = 0; i < 5; i++) {
@@ -3706,6 +3709,12 @@ static int usb_xhci_post_load(void *opaque, int 
version_id)
 xhci_mask64(ldq_le_pci_dma(pci_dev, dcbaap + 8 * slotid));
 xhci_dma_read_u32s(xhci, slot->ctx, slot_ctx, sizeof(slot_ctx));
 slot->uport = xhci_lookup_uport(xhci, slot_ctx);
+if (!slot->uport) {
+/* should not happen, but may trigger on guest bugs */
+slot->enabled = 0;
+slot->addressed = 0;
+continue;
+}
 assert(slot->uport && slot->uport->dev);
 
 for (epid = 1; epid <= 31; epid++) {
-- 
1.8.3.1




  1   2   3   >