date:20181213

Re: [Qemu-devel] [PATCH] usb-mtp: Limit filename to object information size

2018-12-13 Thread Gerd Hoffmann

On Thu, Dec 13, 2018 at 10:37:06PM +, Michael Hanselmann wrote:
> The filename length in MTP metadata is specified by the guest. By
> trusting it directly it'd theoretically be possible to get the host to
> write memory parts outside the filename buffer into a filename. In
> practice though there are usually NUL bytes stopping the string
> operations.
> 
> Also use the opportunity to not assign the filename member twice.

Added to usb patch queue.

thanks,
  Gerd

Re: [Qemu-devel] [PATCH v3 00/16] Virtio devices split from virtio-pci

2018-12-13 Thread Gonglei (Arei)



> -Original Message-
> From: Juan Quintela [mailto:quint...@redhat.com]
> Sent: Friday, December 14, 2018 5:01 AM
> To: qemu-devel@nongnu.org
> Cc: Michael S. Tsirkin ; Thomas Huth ;
> Gerd Hoffmann ; Gonglei (Arei)
> ; Juan Quintela 
> Subject: [PATCH v3 00/16] Virtio devices split from virtio-pci
> 
> Hi
> 
> v3:
> - rebase to master
> - only compile them if CONFIG_PCI is set (thomas)
> 
> Please review.
> 
> Later, Juan.
> 
> V2:
> 
> - Rebase on top of master
> 
> Please review.
> 
> Later, Juan.
> 
> [v1]
> From previous verision (in the middle of make check tests):
> - split also the bits of virtio-pci.h (mst suggestion)
> - add gpu, crypt and gpg bits
> - more cleanups
> - fix all the copyrights (the ones not changed have been there
>   foverever)
> - be consistent with naming, vhost-* or virtio-*
> 
> Please review, Juan.
> 
> Juan Quintela (16):
>   virtio: split vhost vsock bits from virtio-pci
>   virtio: split virtio input host bits from virtio-pci
>   virtio: split virtio input bits from virtio-pci
>   virtio: split virtio rng bits from virtio-pci
>   virtio: split virtio balloon bits from virtio-pci
>   virtio: split virtio 9p bits from virtio-pci
>   virtio: split vhost user blk bits from virtio-pci
>   virtio: split vhost user scsi bits from virtio-pci
>   virtio: split vhost scsi bits from virtio-pci
>   virtio: split virtio scsi bits from virtio-pci
>   virtio: split virtio blk bits rom virtio-pci
>   virtio: split virtio net bits rom virtio-pci
>   virtio: split virtio serial bits rom virtio-pci
>   virtio: split virtio gpu bits rom virtio-pci.h
>   virtio: split virtio crypto bits rom virtio-pci.h
>   virtio: virtio 9p really requires CONFIG_VIRTFS to work
> 
>  default-configs/virtio.mak|   3 +-
>  hw/display/virtio-gpu-pci.c   |  14 +
>  hw/display/virtio-vga.c   |   1 +
>  hw/virtio/Makefile.objs   |  15 +
>  hw/virtio/vhost-scsi-pci.c|  95 
>  hw/virtio/vhost-user-blk-pci.c| 101 
>  hw/virtio/vhost-user-scsi-pci.c   | 101 
>  hw/virtio/vhost-vsock-pci.c   |  82 
>  hw/virtio/virtio-9p-pci.c |  86 
>  hw/virtio/virtio-balloon-pci.c|  94 
>  hw/virtio/virtio-blk-pci.c|  97 
>  hw/virtio/virtio-crypto-pci.c |  14 +
>  hw/virtio/virtio-input-host-pci.c |  45 ++
>  hw/virtio/virtio-input-pci.c  | 154 ++
>  hw/virtio/virtio-net-pci.c|  96 
>  hw/virtio/virtio-pci.c| 783 --
>  hw/virtio/virtio-pci.h| 234 -
>  hw/virtio/virtio-rng-pci.c|  86 
>  hw/virtio/virtio-scsi-pci.c   | 106 
>  hw/virtio/virtio-serial-pci.c | 112 +
>  tests/Makefile.include|  20 +-
>  21 files changed, 1311 insertions(+), 1028 deletions(-)
>  create mode 100644 hw/virtio/vhost-scsi-pci.c
>  create mode 100644 hw/virtio/vhost-user-blk-pci.c
>  create mode 100644 hw/virtio/vhost-user-scsi-pci.c
>  create mode 100644 hw/virtio/vhost-vsock-pci.c
>  create mode 100644 hw/virtio/virtio-9p-pci.c
>  create mode 100644 hw/virtio/virtio-balloon-pci.c
>  create mode 100644 hw/virtio/virtio-blk-pci.c
>  create mode 100644 hw/virtio/virtio-input-host-pci.c
>  create mode 100644 hw/virtio/virtio-input-pci.c
>  create mode 100644 hw/virtio/virtio-net-pci.c
>  create mode 100644 hw/virtio/virtio-rng-pci.c
>  create mode 100644 hw/virtio/virtio-scsi-pci.c
>  create mode 100644 hw/virtio/virtio-serial-pci.c
> 
> --
> 2.19.2

For series:
Reviewed-by: Gonglei 

 
Thanks,
-Gonglei

Re: [Qemu-devel] [PATCH v11 7/7] virtio-balloon: VIRTIO_BALLOON_F_FREE_PAGE_HINT

2018-12-13 Thread Wei Wang


On 12/13/2018 11:45 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

The new feature enables the virtio-balloon device to receive hints of
guest free pages from the free page vq.

A notifier is registered to the migration precopy notifier chain. The
notifier calls free_page_start after the migration thread syncs the dirty
bitmap, so that the free page optimization starts to clear bits of free
pages from the bitmap. It calls the free_page_stop before the migration
thread syncs the bitmap, which is the end of the current round of ram
save. The free_page_stop is also called to stop the optimization in the
case when there is an error occurred in the process of ram saving.

Note: balloon will report pages which were free at the time of this call.
As the reporting happens asynchronously, dirty bit logging must be
enabled before this free_page_start call is made. Guest reporting must be
disabled before the migration dirty bitmap is synchronized.

Signed-off-by: Wei Wang 
CC: Michael S. Tsirkin 
CC: Dr. David Alan Gilbert 
CC: Juan Quintela 
CC: Peter Xu 

I think I'm OK for this from the migration side, I'd appreciate
someone checking the virtio and aio bits.

I'm not too sure how it gets switched on and off - i.e. if we get a nice
new qemu on a new kernel, what happens when I try and migrate to the
same qemu on an older kernel without these hints?



This feature doesn't rely on the host kernel. Those hints are reported 
from the guest kernel.

So migration across different hosts wouldn't affect the use of this feature.
Please correct me if I didn't get your point.

Best,
Wei

Re: [Qemu-devel] [PATCH v11 1/7] bitmap: fix bitmap_count_one

2018-12-13 Thread Wei Wang


On 12/13/2018 10:28 PM, Dr. David Alan Gilbert wrote:

* Wei Wang (wei.w.w...@intel.com) wrote:

BITMAP_LAST_WORD_MASK(nbits) returns 0x when "nbits=0", which
makes bitmap_count_one fail to handle the "nbits=0" case. It appears to be
preferred to remain BITMAP_LAST_WORD_MASK identical to the kernel
implementation that it is ported from.

So this patch fixes bitmap_count_one to handle the nbits=0 case.

OK; it's a little odd that it's only bitmap_count_one that's being fixed
for this case; but OK.


Reviewed-by: Dr. David Alan Gilbert 

Thanks.

We could also help fix other callers outside this series.
(this one is put here as it helps this optimization feature avoid that 
issue).


Best,
Wei

Re: [Qemu-devel] [PATCH 0/1] checkpatch: checker for comment block

2018-12-13 Thread Markus Armbruster

Paolo Bonzini  writes:

> On 13/12/18 19:21, Peter Maydell wrote:
>> On Thu, 13 Dec 2018 at 18:07, Paolo Bonzini  wrote:
>>> On 13/12/18 19:01, Peter Maydell wrote:
 I sent a patch to do this a little while back:
  https://patchwork.kernel.org/patch/10561557/

 It didn't get applied because Paolo disagreed with having
 our tools enforcing what our style guide says.
>>>
>>> I didn't disagree with that---I disagreed with having a single style in
>>> the style guide, because unlike most other blatant violations of the
>>> coding style (eg. braces), this one is pervasive in maintained code and
>>> I don't want code that I maintain to mix two comment styles.
>>>
>>> So I proposed two alternatives:
>>>
>>> - someone fixes all the comment blocks which are "starred" but don't
>>> have a lone "/*" at the beginning, and then we can commit that patch;
>>>
>>> - we allow "/* foo" on the first line, except for doc comments and for
>>> the first line of the file (author/license block), and fix the style
>>> guide accordingly.
>> 
>> We came to a consensus on the comment style when we discussed
>> the patch which updated CODING_STYLE. I'm not personally
>> a fan of the result (I used to use "/* foo"), but what we have in the
>> doc is what we achieved consensus for. I'm not going to reopen
>> the "what should block comments look like" style debate.
>
> Sure, I don't want to do that either.  I accept the result of the
> discussion; I don't accept introducing a new warning that will cause
> over 700 files to become inconsistent sooner or later.

By design, checkpatch.pl only checks *patches*.  Existing code doesn't
trigger warnings until it gets touched.  And then it should arguably be
made to conform to CODING_STYLE.  So, what's the problem again?  :)

> Therefore, the
> only way to enforce the result of the discussion is to change the
> existing comments,

I support cleaning up comment style wholesale[*].

>for example by having a script that maintainers can
> use to change the existing comments in their files.  Having each of us
> come up with their own script or doing it by hand is probably not a good
> use of everyone's time.

Sharing tools is good. 

> Alternatively, fixing the style guide can also mean "explain why /* foo
> is allowed by checkpatch even though it does not match the coding
> style", without rehashing the discussion.
>
> (BTW it may actually be a good idea to fix _some_ instances of bad
> coding style, in particular the space-tab sequences and the files where
> there are maybe 2 or 3 tabs that ended up there by mistake.  That's a
> different topic).

You've since posted patches for that.  Thanks.

 Personally I think we should just commit my patch, and then
 we can stop having people manually pointing out where
 submitters' patches don't match CODING_STYLE.

Concur.  It has my R-by, modulo a commit message tweak.


[*] Same for other style violations.  Yes, it's churn, and yes, it'll
mess up git-blame some, but I'm convinced the presence of numerous bad
examples costs us more.  CODING_STYLE was committed almost a decade ago.
If we had cleaned up back then, the churn and the blame would be long
forgotten, and we would've spared ourselves plenty of review cycles and
quite a few style discussions.  It's late, but never too late.

Re: [Qemu-devel] [PATCH 1/2] remove space-tab sequences

2018-12-13 Thread Markus Armbruster

Paolo Bonzini  writes:

> There are not many, and they are all simple mistakes that ended up
> being committed.  Remove them.
>
> Signed-off-by: Paolo Bonzini 

The cover letter states "I am not touching space-tab in the middle of
the line, many of which are in #define lines."  I think the actual
commit should, too.

Re: [Qemu-devel] [PULL 27/32] qapi: add #if conditions to generated code members

2018-12-13 Thread Markus Armbruster

Eric Blake  writes:

> On 12/13/18 12:43 PM, Markus Armbruster wrote:
>> From: Marc-André Lureau 
>>
>> Wrap generated enum and struct members and their supporting code with
>>
>
> Git ate the line because it started with #.  Not sure if you can sneak
> in a v2 pull request that puts something sane here...

v2 sent, and my .gitconfig now has

[commit]
cleanup = scissors

Let's see how I fare with that.

Thanks!

>
>> We do enum and struct in a single patch because union tag enum and the
>> associated variants tie them together, and dealing with that to split
>> the patch doesn't seem worthwhile.
>>

[Qemu-devel] [PULL v2 27/32] qapi: Add #if conditions to generated code members

2018-12-13 Thread Markus Armbruster

From: Marc-André Lureau 

Wrap generated enum and struct members and their supporting code with
#if/#endif, using the .ifcond members added in the previous patches.

We do enum and struct in a single patch because union tag enum and the
associated variants tie them together, and dealing with that to split
the patch doesn't seem worthwhile.

Signed-off-by: Marc-André Lureau 
Reviewed-by: Markus Armbruster 
Message-Id: <20181213123724.4866-18-marcandre.lur...@redhat.com>
Signed-off-by: Markus Armbruster 
---
 scripts/qapi/common.py |  4 
 scripts/qapi/introspect.py | 14 ++
 scripts/qapi/types.py  |  4 
 scripts/qapi/visit.py  |  6 ++
 4 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/scripts/qapi/common.py b/scripts/qapi/common.py
index 17707b6de7..8c2d97369e 100644
--- a/scripts/qapi/common.py
+++ b/scripts/qapi/common.py
@@ -2078,11 +2078,13 @@ const QEnumLookup %(c_name)s_lookup = {
 ''',
 c_name=c_name(name))
 for m in members:
+ret += gen_if(m.ifcond)
 index = c_enum_const(name, m.name, prefix)
 ret += mcgen('''
 [%(index)s] = "%(name)s",
 ''',
  index=index, name=m.name)
+ret += gen_endif(m.ifcond)
 
 ret += mcgen('''
 },
@@ -2104,10 +2106,12 @@ typedef enum %(c_name)s {
 c_name=c_name(name))
 
 for m in enum_members:
+ret += gen_if(m.ifcond)
 ret += mcgen('''
 %(c_enum)s,
 ''',
  c_enum=c_enum_const(name, m.name, prefix))
+ret += gen_endif(m.ifcond)
 
 ret += mcgen('''
 } %(c_name)s;
diff --git a/scripts/qapi/introspect.py b/scripts/qapi/introspect.py
index 417625d54b..f7f2ca07e4 100644
--- a/scripts/qapi/introspect.py
+++ b/scripts/qapi/introspect.py
@@ -162,6 +162,8 @@ const QLitObject %(c_name)s = %(c_string)s;
 ret = {'name': member.name, 'type': self._use_type(member.type)}
 if member.optional:
 ret['default'] = None
+if member.ifcond:
+ret = (ret, {'if': member.ifcond})
 return ret
 
 def _gen_variants(self, tag_name, variants):
@@ -169,14 +171,17 @@ const QLitObject %(c_name)s = %(c_string)s;
 'variants': [self._gen_variant(v) for v in variants]}
 
 def _gen_variant(self, variant):
-return {'case': variant.name, 'type': self._use_type(variant.type)}
+return ({'case': variant.name, 'type': self._use_type(variant.type)},
+{'if': variant.ifcond})
 
 def visit_builtin_type(self, name, info, json_type):
 self._gen_qlit(name, 'builtin', {'json-type': json_type}, [])
 
 def visit_enum_type(self, name, info, ifcond, members, prefix):
 self._gen_qlit(name, 'enum',
-   {'values': [m.name for m in members]}, ifcond)
+   {'values':
+[(m.name, {'if': m.ifcond}) for m in members]},
+   ifcond)
 
 def visit_array_type(self, name, info, ifcond, element_type):
 element = self._use_type(element_type)
@@ -192,8 +197,9 @@ const QLitObject %(c_name)s = %(c_string)s;
 
 def visit_alternate_type(self, name, info, ifcond, variants):
 self._gen_qlit(name, 'alternate',
-   {'members': [{'type': self._use_type(m.type)}
-for m in variants.variants]}, ifcond)
+   {'members': [
+   ({'type': self._use_type(m.type)}, {'if': m.ifcond})
+   for m in variants.variants]}, ifcond)
 
 def visit_command(self, name, info, ifcond, arg_type, ret_type, gen,
   success_response, boxed, allow_oob, allow_preconfig):
diff --git a/scripts/qapi/types.py b/scripts/qapi/types.py
index e8d22c5081..62d4cf9f95 100644
--- a/scripts/qapi/types.py
+++ b/scripts/qapi/types.py
@@ -43,6 +43,7 @@ struct %(c_name)s {
 def gen_struct_members(members):
 ret = ''
 for memb in members:
+ret += gen_if(memb.ifcond)
 if memb.optional:
 ret += mcgen('''
 bool has_%(c_name)s;
@@ -52,6 +53,7 @@ def gen_struct_members(members):
 %(c_type)s %(c_name)s;
 ''',
  c_type=memb.type.c_type(), c_name=c_name(memb.name))
+ret += gen_endif(memb.ifcond)
 return ret
 
 
@@ -131,11 +133,13 @@ def gen_variants(variants):
 for var in variants.variants:
 if var.type.name == 'q_empty':
 continue
+ret += gen_if(var.ifcond)
 ret += mcgen('''
 %(c_type)s %(c_name)s;
 ''',
  c_type=var.type.c_unboxed_type(),
  c_name=c_name(var.name))
+ret += gen_endif(var.ifcond)
 
 ret += mcgen('''
 } u;
diff --git a/scripts/qapi/visit.py b/scripts/qapi/visit.py
index 24f85a2e85..82eab72b21 100644
--- a/scripts/qapi/visit.py
+++ b/scripts/qapi/visit.py
@@ -54,6 +54,7 @@ void visit_type_%(c_name)s_members(Visitor *v, %(c_name)s 
*obj,

[Qemu-devel] [PULL v2 00/32] QAPI patches for 2018-12-13

2018-12-13 Thread Markus Armbruster

git-request-pull master public pull-qapi-2018-12-13-v2
The following changes since commit c3ec0fa1a8e815ecfec9eabb9c20ee206c313e07:

  Merge remote-tracking branch 'remotes/armbru/tags/pull-monitor-2018-12-12' 
into staging (2018-12-13 13:41:44 +)

are available in the Git repository at:

  git://repo.or.cz/qemu/armbru.git tags/pull-qapi-2018-12-13-v2

for you to fetch changes up to 335d10cd8e2c3bb6067804b095aaf6371fc1983e:

  qapi: add conditions to REPLICATION type/commands on the schema (2018-12-14 
06:52:48 +0100)


QAPI patches for 2018-12-13

* Rewrite the ugly parts of string-input-visitor
* Support conditional QAPI enum, struct, union and alternate members


David Hildenbrand (9):
  cutils: Add qemu_strtod() and qemu_strtod_finite()
  cutils: Fix qemu_strtosz() & friends to reject non-finite sizes
  qapi: Fix string-input-visitor to reject NaN and infinities
  qapi: Use qemu_strtod_finite() in qobject-input-visitor
  test-string-input-visitor: Add more tests
  qapi: Rewrite string-input-visitor's integer and list parsing
  test-string-input-visitor: Use virtual walk
  test-string-input-visitor: Split off uint64 list tests
  test-string-input-visitor: Add range overflow tests

Eric Blake (1):
  docs: Update references to JSON RFC

Marc-André Lureau (21):
  tests/qapi: Cover commands with 'if' and union / alternate 'data'
  qapi: rename QAPISchemaEnumType.values to .members
  qapi: break long lines at 'data' member
  qapi: Do not define enumeration value explicitly
  qapi: change enum visitor and gen_enum* to take QAPISchemaMember
  tests: print enum type members more like object type members
  qapi: factor out checking for keys
  qapi: improve reporting of unknown or missing keys
  qapi: add a dictionary form with 'name' key for enum members
  qapi: add 'if' to enum members
  qapi-events: add 'if' condition to implicit event enum
  qapi: add a dictionary form for TYPE
  qapi: Add 'if' to implicit struct members
  qapi: add 'if' to union members
  qapi: add 'if' to alternate members
  qapi: Add #if conditions to generated code members
  qapi: add 'If:' condition to enum values documentation
  qapi: add 'If:' condition to struct members documentation
  qapi: add condition to variants documentation
  qapi: add more conditions to SPICE
  qapi: add conditions to REPLICATION type/commands on the schema

Markus Armbruster (1):
  json: Fix to reject duplicate object member names

 docs/devel/qapi-code-gen.txt   |  21 +-
 docs/interop/qmp-spec.txt  |   2 +-
 include/qapi/string-input-visitor.h|   4 +-
 include/qemu/cutils.h  |   8 +-
 migration/colo.c   |  16 +-
 monitor.c  |   7 +-
 qapi/block-core.json   |  26 +-
 qapi/char.json | 150 
 qapi/migration.json|  15 +-
 qapi/misc.json |   7 +-
 qapi/net.json  |   3 +-
 qapi/qobject-input-visitor.c   |   9 +-
 qapi/string-input-visitor.c| 407 -
 qapi/tpm.json  |   5 +-
 qapi/ui.json   |   3 +-
 qobject/json-parser.c  |   5 +
 scripts/qapi/common.py | 207 +++
 scripts/qapi/doc.py|  31 +-
 scripts/qapi/events.py |  13 +-
 scripts/qapi/introspect.py |  17 +-
 scripts/qapi/types.py  |  10 +-
 scripts/qapi/visit.py  |   8 +-
 tests/Makefile.include |  11 +-
 tests/qapi-schema/alternate-base.err   |   1 +
 tests/qapi-schema/alternate-invalid-dict.err   |   1 +
 ...ict-member.exit => alternate-invalid-dict.exit} |   0
 tests/qapi-schema/alternate-invalid-dict.json  |   4 +
 ...-dict-member.out => alternate-invalid-dict.out} |   0
 tests/qapi-schema/comments.out |  14 +-
 tests/qapi-schema/doc-bad-section.out  |  13 +-
 tests/qapi-schema/doc-good.json|  11 +-
 tests/qapi-schema/doc-good.out |  22 +-
 tests/qapi-schema/doc-good.texi|   7 +-
 tests/qapi-schema/double-type.err  |   1 +
 tests/qapi-schema/empty.out|   9 +-
 tests/qapi-schema/enum-bad-member.err  |   1 +
 tests/qapi-schema/enum-bad-member.exit |   1 +

[Qemu-devel] [PATCH v2 27/27] target/arm: Tidy TBI handling in gen_a64_set_pc

2018-12-13 Thread Richard Henderson

We can perform this with fewer operations.

Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 65 ++
 1 file changed, 23 insertions(+), 42 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index c57c89d98a..5c06e429d4 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -261,7 +261,7 @@ void gen_a64_set_pc_im(uint64_t val)
 /* Load the PC from a generic TCG variable.
  *
  * If address tagging is enabled via the TCR TBI bits, then loading
- * an address into the PC will clear out any tag in the it:
+ * an address into the PC will clear out any tag in it:
  *  + for EL2 and EL3 there is only one TBI bit, and if it is set
  *then the address is zero-extended, clearing bits [63:56]
  *  + for EL0 and EL1, TBI0 controls addresses with bit 55 == 0
@@ -276,56 +276,37 @@ void gen_a64_set_pc_im(uint64_t val)
  */
 static void gen_a64_set_pc(DisasContext *s, TCGv_i64 src)
 {
+bool tbi0 = s->tbi0, tbi1 = s->tbi1;
 
 if (s->current_el <= 1) {
-/* Test if NEITHER or BOTH TBI values are set.  If so, no need to
- * examine bit 55 of address, can just generate code.
- * If mixed, then test via generated code
- */
-if (s->tbi0 && s->tbi1) {
-TCGv_i64 tmp_reg = tcg_temp_new_i64();
-/* Both bits set, sign extension from bit 55 into [63:56] will
- * cover both cases
- */
-tcg_gen_shli_i64(tmp_reg, src, 8);
-tcg_gen_sari_i64(cpu_pc, tmp_reg, 8);
-tcg_temp_free_i64(tmp_reg);
-} else if (!s->tbi0 && !s->tbi1) {
-/* Neither bit set, just load it as-is */
-tcg_gen_mov_i64(cpu_pc, src);
-} else {
-TCGv_i64 tcg_tmpval = tcg_temp_new_i64();
-TCGv_i64 tcg_bit55  = tcg_temp_new_i64();
-TCGv_i64 tcg_zero   = tcg_const_i64(0);
+if (tbi0 || tbi1) {
+/* Sign-extend from bit 55.  */
+tcg_gen_sextract_i64(cpu_pc, src, 0, 56);
 
-tcg_gen_andi_i64(tcg_bit55, src, (1ull << 55));
+if (tbi0 != tbi1) {
+TCGv_i64 tcg_zero = tcg_const_i64(0);
 
-if (s->tbi0) {
-/* tbi0==1, tbi1==0, so 0-fill upper byte if bit 55 = 0 */
-tcg_gen_andi_i64(tcg_tmpval, src,
- 0x00FFull);
-tcg_gen_movcond_i64(TCG_COND_EQ, cpu_pc, tcg_bit55, tcg_zero,
-tcg_tmpval, src);
-} else {
-/* tbi0==0, tbi1==1, so 1-fill upper byte if bit 55 = 1 */
-tcg_gen_ori_i64(tcg_tmpval, src,
-0xFF00ull);
-tcg_gen_movcond_i64(TCG_COND_NE, cpu_pc, tcg_bit55, tcg_zero,
-tcg_tmpval, src);
+/*
+ * The two TBI bits differ.
+ * If tbi0, then !tbi1: only use the extension if positive.
+ * if !tbi0, then tbi1: only use the extension if negative.
+ */
+tcg_gen_movcond_i64(tbi0 ? TCG_COND_GE : TCG_COND_LT,
+cpu_pc, cpu_pc, tcg_zero, cpu_pc, src);
+tcg_temp_free_i64(tcg_zero);
 }
-tcg_temp_free_i64(tcg_zero);
-tcg_temp_free_i64(tcg_bit55);
-tcg_temp_free_i64(tcg_tmpval);
+return;
 }
-} else {  /* EL > 1 */
-if (s->tbi0) {
+} else {
+if (tbi0) {
 /* Force tag byte to all zero */
-tcg_gen_andi_i64(cpu_pc, src, 0x00FFull);
-} else {
-/* Load unmodified address */
-tcg_gen_mov_i64(cpu_pc, src);
+tcg_gen_extract_i64(cpu_pc, src, 0, 56);
+return;
 }
 }
+
+/* Load unmodified address */
+tcg_gen_mov_i64(cpu_pc, src);
 }
 
 typedef struct DisasCompare64 {
-- 
2.17.2

[Qemu-devel] [PATCH v2 19/27] target/arm: Export aa64_va_parameters to internals.h

2018-12-13 Thread Richard Henderson

We need to reuse this from helper-a64.c.  Provide a stub
definition for CONFIG_USER_ONLY.  This matches the stub
definitions that we removed for arm_regime_tbi{0,1} before.

Signed-off-by: Richard Henderson 
---
 target/arm/internals.h | 17 +
 target/arm/helper.c|  4 ++--
 2 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index 9ef9d01ee2..4fbef58126 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -951,4 +951,21 @@ typedef struct ARMVAParameters {
 bool using64k   : 1;
 } ARMVAParameters;
 
+#ifdef CONFIG_USER_ONLY
+static inline ARMVAParameters aa64_va_parameters(CPUARMState *env,
+ uint64_t va,
+ ARMMMUIdx mmu_idx, bool data)
+{
+return (ARMVAParameters) {
+/* 48-bit address space */
+.tsz = 16,
+/* We can't handle tagged addresses properly in user-only mode */
+.tbi = false,
+};
+}
+#else
+ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
+   ARMMMUIdx mmu_idx, bool data);
+#endif
+
 #endif
diff --git a/target/arm/helper.c b/target/arm/helper.c
index bd1b683766..b9ffc07fbc 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -9702,8 +9702,8 @@ static uint8_t convert_stage2_attrs(CPUARMState *env, 
uint8_t s2attrs)
 return (hiattr << 6) | (hihint << 4) | (loattr << 2) | lohint;
 }
 
-static ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
-  ARMMMUIdx mmu_idx, bool data)
+ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
+   ARMMMUIdx mmu_idx, bool data)
 {
 uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
 uint32_t el = regime_el(env, mmu_idx);
-- 
2.17.2

[Qemu-devel] [PATCH v2 20/27] target/arm: Implement pauth_strip

2018-12-13 Thread Richard Henderson

Stripping out the authentication data does not require any crypto,
it merely requires the virtual address parameters.

Signed-off-by: Richard Henderson 
---
 target/arm/helper-a64.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 79cc9cf47b..329af51232 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -1069,6 +1069,15 @@ static uint64_t pauth_addpac(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
 g_assert_not_reached(); /* FIXME */
 }
 
+static uint64_t pauth_original_ptr(uint64_t ptr, ARMVAParameters param)
+{
+uint64_t extfield = -param.select;
+int bot_pac_bit = 64 - param.tsz;
+int top_pac_bit = 64 - 8 * param.tbi;
+
+return deposit64(ptr, bot_pac_bit, top_pac_bit - bot_pac_bit, extfield);
+}
+
 static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
ARMPACKey *key, bool data, int keynumber)
 {
@@ -1077,7 +1086,10 @@ static uint64_t pauth_auth(CPUARMState *env, uint64_t 
ptr, uint64_t modifier,
 
 static uint64_t pauth_strip(CPUARMState *env, uint64_t ptr, bool data)
 {
-g_assert_not_reached(); /* FIXME */
+ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
+ARMVAParameters param = aa64_va_parameters(env, ptr, mmu_idx, data);
+
+return pauth_original_ptr(ptr, param);
 }
 
 static void QEMU_NORETURN pauth_trap(CPUARMState *env, int target_el,
-- 
2.17.2

[Qemu-devel] [PATCH v2 18/27] target/arm: Reuse aa64_va_parameters for setting tbflags

2018-12-13 Thread Richard Henderson

The arm_regime_tbi{0,1} functions are replacable with the new function
by giving the lowest and highest address.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h| 35 --
 target/arm/helper.c | 61 -
 2 files changed, 16 insertions(+), 80 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 3cc7a069ce..7c7dbc216c 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3079,41 +3079,6 @@ static inline bool arm_cpu_bswap_data(CPUARMState *env)
 }
 #endif
 
-#ifndef CONFIG_USER_ONLY
-/**
- * arm_regime_tbi0:
- * @env: CPUARMState
- * @mmu_idx: MMU index indicating required translation regime
- *
- * Extracts the TBI0 value from the appropriate TCR for the current EL
- *
- * Returns: the TBI0 value.
- */
-uint32_t arm_regime_tbi0(CPUARMState *env, ARMMMUIdx mmu_idx);
-
-/**
- * arm_regime_tbi1:
- * @env: CPUARMState
- * @mmu_idx: MMU index indicating required translation regime
- *
- * Extracts the TBI1 value from the appropriate TCR for the current EL
- *
- * Returns: the TBI1 value.
- */
-uint32_t arm_regime_tbi1(CPUARMState *env, ARMMMUIdx mmu_idx);
-#else
-/* We can't handle tagged addresses properly in user-only mode */
-static inline uint32_t arm_regime_tbi0(CPUARMState *env, ARMMMUIdx mmu_idx)
-{
-return 0;
-}
-
-static inline uint32_t arm_regime_tbi1(CPUARMState *env, ARMMMUIdx mmu_idx)
-{
-return 0;
-}
-#endif
-
 void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
   target_ulong *cs_base, uint32_t *flags);
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 3422fa5943..bd1b683766 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -8957,48 +8957,6 @@ static inline ARMMMUIdx stage_1_mmu_idx(ARMMMUIdx 
mmu_idx)
 return mmu_idx;
 }
 
-/* Returns TBI0 value for current regime el */
-uint32_t arm_regime_tbi0(CPUARMState *env, ARMMMUIdx mmu_idx)
-{
-TCR *tcr;
-uint32_t el;
-
-/* For EL0 and EL1, TBI is controlled by stage 1's TCR, so convert
- * a stage 1+2 mmu index into the appropriate stage 1 mmu index.
- */
-mmu_idx = stage_1_mmu_idx(mmu_idx);
-
-tcr = regime_tcr(env, mmu_idx);
-el = regime_el(env, mmu_idx);
-
-if (el > 1) {
-return extract64(tcr->raw_tcr, 20, 1);
-} else {
-return extract64(tcr->raw_tcr, 37, 1);
-}
-}
-
-/* Returns TBI1 value for current regime el */
-uint32_t arm_regime_tbi1(CPUARMState *env, ARMMMUIdx mmu_idx)
-{
-TCR *tcr;
-uint32_t el;
-
-/* For EL0 and EL1, TBI is controlled by stage 1's TCR, so convert
- * a stage 1+2 mmu index into the appropriate stage 1 mmu index.
- */
-mmu_idx = stage_1_mmu_idx(mmu_idx);
-
-tcr = regime_tcr(env, mmu_idx);
-el = regime_el(env, mmu_idx);
-
-if (el > 1) {
-return 0;
-} else {
-return extract64(tcr->raw_tcr, 38, 1);
-}
-}
-
 /* Return the TTBR associated with this translation regime */
 static inline uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx mmu_idx,
int ttbrn)
@@ -13048,9 +13006,22 @@ void cpu_get_tb_cpu_state(CPUARMState *env, 
target_ulong *pc,
 
 *pc = env->pc;
 flags = ARM_TBFLAG_AARCH64_STATE_MASK;
-/* Get control bits for tagged addresses */
-flags |= (arm_regime_tbi0(env, mmu_idx) << ARM_TBFLAG_TBI0_SHIFT);
-flags |= (arm_regime_tbi1(env, mmu_idx) << ARM_TBFLAG_TBI1_SHIFT);
+
+#ifndef CONFIG_USER_ONLY
+/* Get control bits for tagged addresses.  Note that the
+ * translator only uses this for instruction addresses.
+ */
+{
+ARMMMUIdx stage1 = stage_1_mmu_idx(mmu_idx);
+ARMVAParameters p0, p1;
+
+p0 = aa64_va_parameters(env, 0, stage1, false);
+p1 = aa64_va_parameters(env, -1, stage1, false);
+
+flags |= p0.tbi << ARM_TBFLAG_TBI0_SHIFT;
+flags |= p1.tbi << ARM_TBFLAG_TBI1_SHIFT;
+}
+#endif
 
 if (cpu_isar_feature(aa64_sve, cpu)) {
 int sve_el = sve_exception_el(env, current_el);
-- 
2.17.2

[Qemu-devel] [PATCH v2 22/27] target/arm: Implement pauth_addpac

2018-12-13 Thread Richard Henderson

This is not really functional yet, because the crypto is not yet
implemented.  This, however follows the AddPAC pseudo function.

Signed-off-by: Richard Henderson 
---
 target/arm/helper-a64.c | 40 +++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 87cff7d96a..19486b9677 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -1066,7 +1066,45 @@ static uint64_t pauth_computepac(uint64_t data, uint64_t 
modifier,
 static uint64_t pauth_addpac(CPUARMState *env, uint64_t ptr, uint64_t modifier,
  ARMPACKey *key, bool data)
 {
-g_assert_not_reached(); /* FIXME */
+ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
+ARMVAParameters param = aa64_va_parameters(env, ptr, mmu_idx, data);
+uint64_t pac, ext_ptr, ext, test;
+int bot_bit, top_bit;
+
+/* If tagged pointers are in use, use ptr<55>, otherwise ptr<63>.  */
+if (param.tbi) {
+ext = sextract64(ptr, 55, 1);
+} else {
+ext = sextract64(ptr, 63, 1);
+}
+
+/* Build a pointer with known good extension bits.  */
+top_bit = 64 - 8 * param.tbi;
+bot_bit = 64 - param.tsz;
+ext_ptr = deposit64(ptr, bot_bit, top_bit - bot_bit, ext);
+
+pac = pauth_computepac(ext_ptr, modifier, *key);
+
+/* Check if the ptr has good extension bits and corrupt the
+ * pointer authentication code if not.
+ */
+test = sextract64(ptr, bot_bit, top_bit - bot_bit);
+if (test != 0 && test != -1) {
+pac ^= 1ull << (top_bit - 1);
+}
+
+/* Preserve the determination between upper and lower at bit 55,
+ * and insert pointer authentication code.
+ */
+if (param.tbi) {
+ptr &= ~MAKE_64BIT_MASK(bot_bit, 55 - bot_bit + 1);
+pac &= MAKE_64BIT_MASK(bot_bit, 54 - bot_bit + 1);
+} else {
+ptr &= MAKE_64BIT_MASK(0, bot_bit);
+pac &= ~(MAKE_64BIT_MASK(55, 1) | MAKE_64BIT_MASK(0, bot_bit));
+}
+ext &= MAKE_64BIT_MASK(55, 1);
+return pac | ext | ptr;
 }
 
 static uint64_t pauth_original_ptr(uint64_t ptr, ARMVAParameters param)
-- 
2.17.2

[Qemu-devel] [PATCH v2 17/27] target/arm: Create ARMVAParameters and helpers

2018-12-13 Thread Richard Henderson

Split out functions to extract the virtual address parameters.
Let the functions choose T0 or T1 address space half, if present.
Extract (most of) the control bits that vary between EL or Tx.

Signed-off-by: Richard Henderson 

v2: Incorporate feedback wrt VTCR, HTCR, and more.
---
 target/arm/internals.h |  16 +++
 target/arm/helper.c| 286 +++--
 2 files changed, 174 insertions(+), 128 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index 1d0d0392c9..9ef9d01ee2 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -935,4 +935,20 @@ static inline ARMMMUIdx arm_stage1_mmu_idx(CPUARMState 
*env)
 ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env);
 #endif
 
+/*
+ * Parameters of a given virtual address, as extracted from a the
+ * translation control register (TCR) for a given regime.
+ */
+typedef struct ARMVAParameters {
+unsigned tsz: 8;
+unsigned select : 1;
+bool tbi: 1;
+bool epd: 1;
+bool hpd: 1;
+bool ha : 1;
+bool hd : 1;
+bool using16k   : 1;
+bool using64k   : 1;
+} ARMVAParameters;
+
 #endif
diff --git a/target/arm/helper.c b/target/arm/helper.c
index b1c0ff923f..3422fa5943 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -9744,6 +9744,133 @@ static uint8_t convert_stage2_attrs(CPUARMState *env, 
uint8_t s2attrs)
 return (hiattr << 6) | (hihint << 4) | (loattr << 2) | lohint;
 }
 
+static ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
+  ARMMMUIdx mmu_idx, bool data)
+{
+uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
+uint32_t el = regime_el(env, mmu_idx);
+bool tbi, tbid, epd, hpd, ha, hd, using16k, using64k;
+int select, tsz;
+
+/* Bit 55 is always between the two regions, and is canonical for
+ * determining if address tagging is enabled.
+ */
+select = extract64(va, 55, 1);
+
+if (el > 1) {
+tsz = extract32(tcr, 0, 6);
+using64k = extract32(tcr, 14, 1);
+using16k = extract32(tcr, 15, 1);
+if (mmu_idx == ARMMMUIdx_S2NS) {
+/* VTCR_EL2 */
+tbi = tbid = hpd = false;
+} else {
+tbi = extract32(tcr, 20, 1);
+hpd = extract32(tcr, 24, 1);
+tbid = extract32(tcr, 29, 1);
+}
+ha = extract32(tcr, 21, 1);
+hd = extract32(tcr, 22, 1);
+epd = false;
+} else if (!select) {
+tsz = extract32(tcr, 0, 6);
+epd = extract32(tcr, 7, 1);
+using64k = extract32(tcr, 14, 1);
+using16k = extract32(tcr, 15, 1);
+tbi = extract64(tcr, 37, 1);
+ha = extract64(tcr, 39, 1);
+hd = extract64(tcr, 40, 1);
+hpd = extract64(tcr, 41, 1);
+tbid = extract64(tcr, 51, 1);
+} else {
+int tg = extract32(tcr, 30, 2);
+using16k = tg == 1;
+using64k = tg == 3;
+tsz = extract32(tcr, 16, 6);
+epd = extract32(tcr, 23, 1);
+tbi = extract64(tcr, 38, 1);
+ha = extract64(tcr, 39, 1);
+hd = extract64(tcr, 40, 1);
+hpd = extract64(tcr, 42, 1);
+tbid = extract64(tcr, 52, 1);
+}
+tsz = MIN(tsz, 39);  /* TODO: ARMv8.4-TTST */
+tsz = MAX(tsz, 16);  /* TODO: ARMv8.2-LVA  */
+
+return (ARMVAParameters) {
+.tsz = tsz,
+.select = select,
+.tbi = tbi & (data | !tbid),
+.epd = epd,
+.hpd = hpd,
+.ha = ha,
+.hd = hd,
+.using16k = using16k,
+.using64k = using64k,
+};
+}
+
+static ARMVAParameters aa32_va_parameters(CPUARMState *env, uint32_t va,
+  ARMMMUIdx mmu_idx)
+{
+uint64_t tcr = regime_tcr(env, mmu_idx)->raw_tcr;
+uint32_t el = regime_el(env, mmu_idx);
+int select, tsz;
+bool epd, hpd;
+
+if (mmu_idx == ARMMMUIdx_S2NS) {
+/* VTCR */
+bool sext = extract32(tcr, 4, 1);
+bool sign = extract32(tcr, 3, 1);
+
+/* If the sign-extend bit is not the same as t0sz[3], the result
+ * is unpredictable. Flag this as a guest error.
+ */
+if (sign != sext) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "AArch32: VTCR.S / VTCR.T0SZ[3] mismatch\n");
+}
+tsz = sextract32(tcr, 0, 4) + 8;
+select = 0;
+hpd = false;
+epd = false;
+} else if (el == 2) {
+/* HTCR */
+tsz = extract32(tcr, 0, 3);
+select = 0;
+hpd = extract64(tcr, 24, 1);
+epd = false;
+} else {
+int t0sz = extract32(tcr, 0, 3);
+int t1sz = extract32(tcr, 16, 3);
+
+if (t1sz == 0) {
+select = va > (0xu >> t0sz);
+} else {
+/* Note that we will detect errors later.  */
+select = va >= ~(0xu >> t1sz);
+}
+if (!select) {
+tsz =

[Qemu-devel] [PATCH v2 11/27] target/arm: Rearrange decode in disas_uncond_b_reg

2018-12-13 Thread Richard Henderson

This will enable PAuth decode in a subsequent patch.

Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 47 +-
 1 file changed, 36 insertions(+), 11 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index c84c2dbb66..30086a5d7f 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1989,32 +1989,54 @@ static void disas_uncond_b_reg(DisasContext *s, 
uint32_t insn)
 rn = extract32(insn, 5, 5);
 op4 = extract32(insn, 0, 5);
 
-if (op4 != 0x0 || op3 != 0x0 || op2 != 0x1f) {
-unallocated_encoding(s);
-return;
+if (op2 != 0x1f) {
+goto do_unallocated;
 }
 
 switch (opc) {
 case 0: /* BR */
 case 1: /* BLR */
 case 2: /* RET */
-gen_a64_set_pc(s, cpu_reg(s, rn));
+switch (op3) {
+case 0:
+if (op4 != 0) {
+goto do_unallocated;
+}
+dst = cpu_reg(s, rn);
+break;
+
+default:
+goto do_unallocated;
+}
+
+gen_a64_set_pc(s, dst);
 /* BLR also needs to load return address */
 if (opc == 1) {
 tcg_gen_movi_i64(cpu_reg(s, 30), s->pc);
 }
 break;
+
 case 4: /* ERET */
 if (s->current_el == 0) {
-unallocated_encoding(s);
-return;
+goto do_unallocated;
+}
+switch (op3) {
+case 0:
+if (op4 != 0) {
+goto do_unallocated;
+}
+dst = tcg_temp_new_i64();
+tcg_gen_ld_i64(dst, cpu_env,
+   offsetof(CPUARMState, elr_el[s->current_el]));
+break;
+
+default:
+goto do_unallocated;
 }
 if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 gen_io_start();
 }
-dst = tcg_temp_new_i64();
-tcg_gen_ld_i64(dst, cpu_env,
-   offsetof(CPUARMState, elr_el[s->current_el]));
+
 gen_helper_exception_return(cpu_env, dst);
 tcg_temp_free_i64(dst);
 if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
@@ -2023,14 +2045,17 @@ static void disas_uncond_b_reg(DisasContext *s, 
uint32_t insn)
 /* Must exit loop to check un-masked IRQs */
 s->base.is_jmp = DISAS_EXIT;
 return;
+
 case 5: /* DRPS */
-if (rn != 0x1f) {
-unallocated_encoding(s);
+if (op3 != 0 || op4 != 0 || rn != 0x1f) {
+goto do_unallocated;
 } else {
 unsupported_encoding(s, insn);
 }
 return;
+
 default:
+do_unallocated:
 unallocated_encoding(s);
 return;
 }
-- 
2.17.2

[Qemu-devel] [PATCH v2 23/27] target/arm: Implement pauth_computepac

2018-12-13 Thread Richard Henderson

This is the main crypto routine, an implementation of QARMA.
This matches, as much as possible, ARM pseudocode.

Signed-off-by: Richard Henderson 
---
 target/arm/helper-a64.c | 241 +++-
 1 file changed, 240 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 19486b9677..1da7867a42 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -1057,10 +1057,249 @@ uint32_t HELPER(sqrt_f16)(uint32_t a, void *fpstp)
  * Helpers for ARMv8.3-PAuth.
  */
 
+static uint64_t pac_cell_shuffle(uint64_t i)
+{
+uint64_t o = 0;
+
+o |= extract64(i, 52, 4);
+o |= extract64(i, 24, 4) << 4;
+o |= extract64(i, 44, 4) << 8;
+o |= extract64(i,  0, 4) << 12;
+
+o |= extract64(i, 28, 4) << 16;
+o |= extract64(i, 48, 4) << 20;
+o |= extract64(i,  4, 4) << 24;
+o |= extract64(i, 40, 4) << 28;
+
+o |= i & MAKE_64BIT_MASK(32, 4);
+o |= extract64(i, 12, 4) << 36;
+o |= extract64(i, 56, 4) << 40;
+o |= extract64(i,  8, 4) << 44;
+
+o |= extract64(i, 36, 4) << 48;
+o |= extract64(i, 16, 4) << 52;
+o |= extract64(i, 40, 4) << 56;
+o |= i & MAKE_64BIT_MASK(60, 4);
+
+return o;
+}
+
+static uint64_t pac_cell_inv_shuffle(uint64_t i)
+{
+uint64_t o = 0;
+
+o |= extract64(i, 12, 4);
+o |= extract64(i, 24, 4) << 4;
+o |= extract64(i, 48, 4) << 8;
+o |= extract64(i, 36, 4) << 12;
+
+o |= extract64(i, 56, 4) << 16;
+o |= extract64(i, 44, 4) << 20;
+o |= extract64(i,  4, 4) << 24;
+o |= extract64(i, 16, 4) << 28;
+
+o |= i & MAKE_64BIT_MASK(32, 4);
+o |= extract64(i, 52, 4) << 36;
+o |= extract64(i, 28, 4) << 40;
+o |= extract64(i,  8, 4) << 44;
+
+o |= extract64(i, 20, 4) << 48;
+o |= extract64(i,  0, 4) << 52;
+o |= extract64(i, 40, 4) << 56;
+o |= i & MAKE_64BIT_MASK(60, 4);
+
+return o;
+}
+
+static uint64_t pac_sub(uint64_t i)
+{
+static const uint8_t sub[16] = {
+0xb, 0x6, 0x8, 0xf, 0xc, 0x0, 0x9, 0xe,
+0x3, 0x7, 0x4, 0x5, 0xd, 0x2, 0x1, 0xa,
+};
+uint64_t o = 0;
+int b;
+
+for (b = 0; b < 64; b += 16) {
+o |= (uint64_t)sub[(i >> b) & 0xf] << b;
+}
+return o;
+}
+
+static uint64_t pac_inv_sub(uint64_t i)
+{
+static const uint8_t inv_sub[16] = {
+0x5, 0xe, 0xd, 0x8, 0xa, 0xb, 0x1, 0x9,
+0x2, 0x6, 0xf, 0x0, 0x4, 0xc, 0x7, 0x3,
+};
+uint64_t o = 0;
+int b;
+
+for (b = 0; b < 64; b += 16) {
+o |= (uint64_t)inv_sub[(i >> b) & 0xf] << b;
+}
+return o;
+}
+
+static int rot_cell(int cell, int n)
+{
+cell |= cell << 4;
+cell >>= n;
+return cell & 0xf;
+}
+
+static uint64_t pac_mult(uint64_t i)
+{
+uint64_t o = 0;
+int b;
+
+for (b = 0; b < 4 * 4; b += 4) {
+int i0, i4, i8, ic, t0, t1, t2, t3;
+
+i0 = extract64(i, b, 4);
+i4 = extract64(i, b + 4 * 4, 4);
+i8 = extract64(i, b + 8 * 4, 4);
+ic = extract64(i, b + 12 * 4, 4);
+
+t0 = rot_cell(i8, 1) ^ rot_cell(i4, 2) ^ rot_cell(i0, 1);
+t1 = rot_cell(ic, 1) ^ rot_cell(i4, 1) ^ rot_cell(i0, 2);
+t2 = rot_cell(ic, 2) ^ rot_cell(i8, 1) ^ rot_cell(i0, 1);
+t3 = rot_cell(ic, 2) ^ rot_cell(i8, 2) ^ rot_cell(i4, 1);
+
+o |= (uint64_t)t3 << b;
+o |= (uint64_t)t2 << (b + 4 * 4);
+o |= (uint64_t)t1 << (b + 8 * 4);
+o |= (uint64_t)t0 << (b + 12 * 4);
+}
+return o;
+}
+
+static uint64_t tweak_cell_rot(uint64_t cell)
+{
+return (cell >> 1) | (((cell ^ (cell >> 1)) & 1) << 3);
+}
+
+static uint64_t tweak_shuffle(uint64_t i)
+{
+uint64_t o = 0;
+
+o |= extract64(i, 16, 4) << 0;
+o |= extract64(i, 20, 4) << 4;
+o |= tweak_cell_rot(extract64(i, 24, 4)) << 8;
+o |= extract64(i, 28, 4) << 12;
+
+o |= tweak_cell_rot(extract64(i, 44, 4)) << 16;
+o |= extract64(i,  8, 4) << 20;
+o |= extract64(i, 12, 4) << 24;
+o |= tweak_cell_rot(extract64(i, 32, 4)) << 28;
+
+o |= extract64(i, 48, 4) << 32;
+o |= extract64(i, 52, 4) << 36;
+o |= extract64(i, 56, 4) << 40;
+o |= tweak_cell_rot(extract64(i, 60, 4)) << 44;
+
+o |= tweak_cell_rot(extract64(i,  0, 4)) << 48;
+o |= extract64(i,  4, 4) << 52;
+o |= tweak_cell_rot(extract64(i, 40, 4)) << 56;
+o |= tweak_cell_rot(extract64(i, 36, 4)) << 60;
+
+return o;
+}
+
+static uint64_t tweak_cell_inv_rot(uint64_t cell)
+{
+return ((cell << 1) & 0xf) | ((cell & 1) ^ (cell >> 3));
+}
+
+static uint64_t tweak_inv_shuffle(uint64_t i)
+{
+uint64_t o = 0;
+
+o |= tweak_cell_inv_rot(extract64(i, 48, 4));
+o |= extract64(i, 52, 4) << 4;
+o |= extract64(i, 20, 4) << 8;
+o |= extract64(i, 24, 4) << 12;
+
+o |= extract64(i,  0, 4) << 16;
+o |= extract64(i,  4, 4) << 20;
+o |= tweak_cell_inv_rot(extract64(i,  8, 4)) << 24;
+o |= extract64(i, 12, 4) << 28;
+
+o |= tweak_cell_inv_rot(extract64(i, 28, 4)) << 32;
+o

[Qemu-devel] [PATCH v2 21/27] target/arm: Implement pauth_auth

2018-12-13 Thread Richard Henderson

This is not really functional yet, because the crypto is not yet
implemented.  This, however follows the Auth pseudo function.

Signed-off-by: Richard Henderson 
---
 target/arm/helper-a64.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 329af51232..87cff7d96a 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -1081,7 +1081,26 @@ static uint64_t pauth_original_ptr(uint64_t ptr, 
ARMVAParameters param)
 static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
ARMPACKey *key, bool data, int keynumber)
 {
-g_assert_not_reached(); /* FIXME */
+ARMMMUIdx mmu_idx = arm_stage1_mmu_idx(env);
+ARMVAParameters param = aa64_va_parameters(env, ptr, mmu_idx, data);
+int bot_bit, top_bit;
+uint64_t pac, orig_ptr, test;
+
+orig_ptr = pauth_original_ptr(ptr, param);
+pac = pauth_computepac(orig_ptr, modifier, *key);
+bot_bit = 64 - param.tsz;
+top_bit = 64 - 8 * param.tbi;
+
+test = (pac ^ ptr) & ~MAKE_64BIT_MASK(55, 1);
+if (unlikely(extract64(test, bot_bit, top_bit - bot_bit))) {
+int error_code = (keynumber << 1) | (keynumber ^ 1);
+if (param.tbi) {
+return deposit64(ptr, 53, 2, error_code);
+} else {
+return deposit64(ptr, 61, 2, error_code);
+}
+}
+return orig_ptr;
 }
 
 static uint64_t pauth_strip(CPUARMState *env, uint64_t ptr, bool data)
-- 
2.17.2

[Qemu-devel] [PATCH v2 09/27] target/arm: Move helper_exception_return to helper-a64.c

2018-12-13 Thread Richard Henderson

This function is only used by AArch64.  Code movement only.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper-a64.h |   2 +
 target/arm/helper.h |   1 -
 target/arm/helper-a64.c | 155 
 target/arm/op_helper.c  | 155 
 4 files changed, 157 insertions(+), 156 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index 28aa0af69d..55299896c4 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -86,6 +86,8 @@ DEF_HELPER_2(advsimd_f16tosinth, i32, f16, ptr)
 DEF_HELPER_2(advsimd_f16touinth, i32, f16, ptr)
 DEF_HELPER_2(sqrt_f16, f16, f16, ptr)
 
+DEF_HELPER_1(exception_return, void, env)
+
 DEF_HELPER_FLAGS_3(pacia, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacib, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacda, TCG_CALL_NO_WG, i64, env, i64, i64)
diff --git a/target/arm/helper.h b/target/arm/helper.h
index 8c9590091b..53a38188c6 100644
--- a/target/arm/helper.h
+++ b/target/arm/helper.h
@@ -79,7 +79,6 @@ DEF_HELPER_2(get_cp_reg64, i64, env, ptr)
 
 DEF_HELPER_3(msr_i_pstate, void, env, i32, i32)
 DEF_HELPER_1(clear_pstate_ss, void, env)
-DEF_HELPER_1(exception_return, void, env)
 
 DEF_HELPER_2(get_r13_banked, i32, env, i32)
 DEF_HELPER_3(set_r13_banked, void, env, i32, i32)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index bb64700e10..f70c8d9818 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -887,6 +887,161 @@ uint32_t HELPER(advsimd_f16touinth)(uint32_t a, void 
*fpstp)
 return float16_to_uint16(a, fpst);
 }
 
+static int el_from_spsr(uint32_t spsr)
+{
+/* Return the exception level that this SPSR is requesting a return to,
+ * or -1 if it is invalid (an illegal return)
+ */
+if (spsr & PSTATE_nRW) {
+switch (spsr & CPSR_M) {
+case ARM_CPU_MODE_USR:
+return 0;
+case ARM_CPU_MODE_HYP:
+return 2;
+case ARM_CPU_MODE_FIQ:
+case ARM_CPU_MODE_IRQ:
+case ARM_CPU_MODE_SVC:
+case ARM_CPU_MODE_ABT:
+case ARM_CPU_MODE_UND:
+case ARM_CPU_MODE_SYS:
+return 1;
+case ARM_CPU_MODE_MON:
+/* Returning to Mon from AArch64 is never possible,
+ * so this is an illegal return.
+ */
+default:
+return -1;
+}
+} else {
+if (extract32(spsr, 1, 1)) {
+/* Return with reserved M[1] bit set */
+return -1;
+}
+if (extract32(spsr, 0, 4) == 1) {
+/* return to EL0 with M[0] bit set */
+return -1;
+}
+return extract32(spsr, 2, 2);
+}
+}
+
+void HELPER(exception_return)(CPUARMState *env)
+{
+int cur_el = arm_current_el(env);
+unsigned int spsr_idx = aarch64_banked_spsr_index(cur_el);
+uint32_t spsr = env->banked_spsr[spsr_idx];
+int new_el;
+bool return_to_aa64 = (spsr & PSTATE_nRW) == 0;
+
+aarch64_save_sp(env, cur_el);
+
+arm_clear_exclusive(env);
+
+/* We must squash the PSTATE.SS bit to zero unless both of the
+ * following hold:
+ *  1. debug exceptions are currently disabled
+ *  2. singlestep will be active in the EL we return to
+ * We check 1 here and 2 after we've done the pstate/cpsr write() to
+ * transition to the EL we're going to.
+ */
+if (arm_generate_debug_exceptions(env)) {
+spsr &= ~PSTATE_SS;
+}
+
+new_el = el_from_spsr(spsr);
+if (new_el == -1) {
+goto illegal_return;
+}
+if (new_el > cur_el
+|| (new_el == 2 && !arm_feature(env, ARM_FEATURE_EL2))) {
+/* Disallow return to an EL which is unimplemented or higher
+ * than the current one.
+ */
+goto illegal_return;
+}
+
+if (new_el != 0 && arm_el_is_aa64(env, new_el) != return_to_aa64) {
+/* Return to an EL which is configured for a different register width 
*/
+goto illegal_return;
+}
+
+if (new_el == 2 && arm_is_secure_below_el3(env)) {
+/* Return to the non-existent secure-EL2 */
+goto illegal_return;
+}
+
+if (new_el == 1 && (arm_hcr_el2_eff(env) & HCR_TGE)) {
+goto illegal_return;
+}
+
+qemu_mutex_lock_iothread();
+arm_call_pre_el_change_hook(arm_env_get_cpu(env));
+qemu_mutex_unlock_iothread();
+
+if (!return_to_aa64) {
+env->aarch64 = 0;
+/* We do a raw CPSR write because aarch64_sync_64_to_32()
+ * will sort the register banks out for us, and we've already
+ * caught all the bad-mode cases in el_from_spsr().
+ */
+cpsr_write(env, spsr, ~0, CPSRWriteRaw);
+if (!arm_singlestep_active(env)) {
+env->uncached_cpsr &= ~PSTATE_SS;
+}
+aarch64_sync_64_to_32(env);
+
+if (spsr & CPSR_T) {
+env->regs[15] = env->elr_el[cur_el] & ~0x1;
+

[Qemu-devel] [PATCH v2 24/27] target/arm: Add PAuth system registers

2018-12-13 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/arm/helper.c | 70 +
 1 file changed, 70 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index b9ffc07fbc..f1e9254c9a 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5061,6 +5061,70 @@ static CPAccessResult access_lor_other(CPUARMState *env,
 return access_lor_ns(env);
 }
 
+#ifdef TARGET_AARCH64
+static CPAccessResult access_pauth(CPUARMState *env, const ARMCPRegInfo *ri,
+   bool isread)
+{
+int el = arm_current_el(env);
+
+if (el < 2 &&
+arm_feature(env, ARM_FEATURE_EL2) &&
+!(arm_hcr_el2_eff(env) & HCR_APK)) {
+return CP_ACCESS_TRAP_EL2;
+}
+if (el < 3 &&
+arm_feature(env, ARM_FEATURE_EL3) &&
+!(env->cp15.scr_el3 & SCR_APK)) {
+return CP_ACCESS_TRAP_EL3;
+}
+return CP_ACCESS_OK;
+}
+
+static const ARMCPRegInfo pauth_reginfo[] = {
+{ .name = "APDAKEYLOW_EL1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 2, .opc2 = 0,
+  .access = PL1_RW, .accessfn = access_pauth,
+  .fieldoffset = offsetof(CPUARMState, apda_key.lo) },
+{ .name = "APDAKEYHI_EL1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 2, .opc2 = 1,
+  .access = PL1_RW, .accessfn = access_pauth,
+  .fieldoffset = offsetof(CPUARMState, apda_key.hi) },
+{ .name = "APDBKEYLOW_EL1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 2, .opc2 = 2,
+  .access = PL1_RW, .accessfn = access_pauth,
+  .fieldoffset = offsetof(CPUARMState, apdb_key.lo) },
+{ .name = "APDBKEYHI_EL1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 2, .opc2 = 3,
+  .access = PL1_RW, .accessfn = access_pauth,
+  .fieldoffset = offsetof(CPUARMState, apdb_key.hi) },
+{ .name = "APGAKEYLOW_EL1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 3, .opc2 = 0,
+  .access = PL1_RW, .accessfn = access_pauth,
+  .fieldoffset = offsetof(CPUARMState, apia_key.lo) },
+{ .name = "APGAKEYHI_EL1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 3, .opc2 = 1,
+  .access = PL1_RW, .accessfn = access_pauth,
+  .fieldoffset = offsetof(CPUARMState, apia_key.hi) },
+{ .name = "APIAKEYLOW_EL1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 1, .opc2 = 0,
+  .access = PL1_RW, .accessfn = access_pauth,
+  .fieldoffset = offsetof(CPUARMState, apia_key.lo) },
+{ .name = "APIAKEYHI_EL1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 1, .opc2 = 1,
+  .access = PL1_RW, .accessfn = access_pauth,
+  .fieldoffset = offsetof(CPUARMState, apia_key.hi) },
+{ .name = "APIBKEYLOW_EL1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 1, .opc2 = 2,
+  .access = PL1_RW, .accessfn = access_pauth,
+  .fieldoffset = offsetof(CPUARMState, apib_key.lo) },
+{ .name = "APIBKEYHI_EL1", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 0, .crn = 2, .crm = 1, .opc2 = 3,
+  .access = PL1_RW, .accessfn = access_pauth,
+  .fieldoffset = offsetof(CPUARMState, apib_key.hi) },
+REGINFO_SENTINEL
+};
+#endif
+
 void register_cp_regs_for_features(ARMCPU *cpu)
 {
 /* Register all the coprocessor registers based on feature bits */
@@ -5845,6 +5909,12 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 define_one_arm_cp_reg(cpu, _el3_reginfo);
 }
 }
+
+#ifdef TARGET_AARCH64
+if (cpu_isar_feature(aa64_pauth, cpu)) {
+define_arm_cp_regs(cpu, pauth_reginfo);
+}
+#endif
 }
 
 void arm_cpu_register_gdb_regs_for_features(ARMCPU *cpu)
-- 
2.17.2

[Qemu-devel] [PATCH v2 07/27] target/arm: Decode PAuth within disas_data_proc_1src

2018-12-13 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 146 +
 1 file changed, 146 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index c5ec430b42..7ba4c996cf 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4564,6 +4564,7 @@ static void handle_rev16(DisasContext *s, unsigned int sf,
 static void disas_data_proc_1src(DisasContext *s, uint32_t insn)
 {
 unsigned int sf, opcode, opcode2, rn, rd;
+TCGv_i64 tcg_rd;
 
 if (extract32(insn, 29, 1)) {
 unallocated_encoding(s);
@@ -4602,7 +4603,152 @@ static void disas_data_proc_1src(DisasContext *s, 
uint32_t insn)
 case MAP(1, 0x00, 0x05):
 handle_cls(s, sf, rn, rd);
 break;
+case MAP(1, 0x01, 0x00): /* PACIA */
+if (s->pauth_active) {
+tcg_rd = cpu_reg(s, rd);
+gen_helper_pacia(tcg_rd, cpu_env, tcg_rd, cpu_reg_sp(s, rn));
+} else if (!dc_isar_feature(aa64_pauth, s)) {
+goto do_unallocated;
+}
+break;
+case MAP(1, 0x01, 0x01): /* PACIB */
+if (s->pauth_active) {
+tcg_rd = cpu_reg(s, rd);
+gen_helper_pacib(tcg_rd, cpu_env, tcg_rd, cpu_reg_sp(s, rn));
+} else if (!dc_isar_feature(aa64_pauth, s)) {
+goto do_unallocated;
+}
+break;
+case MAP(1, 0x01, 0x02): /* PACDA */
+if (s->pauth_active) {
+tcg_rd = cpu_reg(s, rd);
+gen_helper_pacda(tcg_rd, cpu_env, tcg_rd, cpu_reg_sp(s, rn));
+} else if (!dc_isar_feature(aa64_pauth, s)) {
+goto do_unallocated;
+}
+break;
+case MAP(1, 0x01, 0x03): /* PACDB */
+if (s->pauth_active) {
+tcg_rd = cpu_reg(s, rd);
+gen_helper_pacdb(tcg_rd, cpu_env, tcg_rd, cpu_reg_sp(s, rn));
+} else if (!dc_isar_feature(aa64_pauth, s)) {
+goto do_unallocated;
+}
+break;
+case MAP(1, 0x01, 0x04): /* AUTIA */
+if (s->pauth_active) {
+tcg_rd = cpu_reg(s, rd);
+gen_helper_autia(tcg_rd, cpu_env, tcg_rd, cpu_reg_sp(s, rn));
+} else if (!dc_isar_feature(aa64_pauth, s)) {
+goto do_unallocated;
+}
+break;
+case MAP(1, 0x01, 0x05): /* AUTIB */
+if (s->pauth_active) {
+tcg_rd = cpu_reg(s, rd);
+gen_helper_autib(tcg_rd, cpu_env, tcg_rd, cpu_reg_sp(s, rn));
+} else if (!dc_isar_feature(aa64_pauth, s)) {
+goto do_unallocated;
+}
+break;
+case MAP(1, 0x01, 0x06): /* AUTDA */
+if (s->pauth_active) {
+tcg_rd = cpu_reg(s, rd);
+gen_helper_autda(tcg_rd, cpu_env, tcg_rd, cpu_reg_sp(s, rn));
+} else if (!dc_isar_feature(aa64_pauth, s)) {
+goto do_unallocated;
+}
+break;
+case MAP(1, 0x01, 0x07): /* AUTDB */
+if (s->pauth_active) {
+tcg_rd = cpu_reg(s, rd);
+gen_helper_autdb(tcg_rd, cpu_env, tcg_rd, cpu_reg_sp(s, rn));
+} else if (!dc_isar_feature(aa64_pauth, s)) {
+goto do_unallocated;
+}
+break;
+case MAP(1, 0x01, 0x08): /* PACIZA */
+if (!dc_isar_feature(aa64_pauth, s) || rn != 31) {
+goto do_unallocated;
+} else if (s->pauth_active) {
+tcg_rd = cpu_reg(s, rd);
+gen_helper_pacia(tcg_rd, cpu_env, tcg_rd, new_tmp_a64_zero(s));
+}
+break;
+case MAP(1, 0x01, 0x09): /* PACIZB */
+if (!dc_isar_feature(aa64_pauth, s) || rn != 31) {
+goto do_unallocated;
+} else if (s->pauth_active) {
+tcg_rd = cpu_reg(s, rd);
+gen_helper_pacib(tcg_rd, cpu_env, tcg_rd, new_tmp_a64_zero(s));
+}
+break;
+case MAP(1, 0x01, 0x0a): /* PACDZA */
+if (!dc_isar_feature(aa64_pauth, s) || rn != 31) {
+goto do_unallocated;
+} else if (s->pauth_active) {
+tcg_rd = cpu_reg(s, rd);
+gen_helper_pacda(tcg_rd, cpu_env, tcg_rd, new_tmp_a64_zero(s));
+}
+break;
+case MAP(1, 0x01, 0x0b): /* PACDZB */
+if (!dc_isar_feature(aa64_pauth, s) || rn != 31) {
+goto do_unallocated;
+} else if (s->pauth_active) {
+tcg_rd = cpu_reg(s, rd);
+gen_helper_pacdb(tcg_rd, cpu_env, tcg_rd, new_tmp_a64_zero(s));
+}
+break;
+case MAP(1, 0x01, 0x0c): /* AUTIZA */
+if (!dc_isar_feature(aa64_pauth, s) || rn != 31) {
+goto do_unallocated;
+} else if (s->pauth_active) {
+tcg_rd = cpu_reg(s, rd);
+gen_helper_autia(tcg_rd, cpu_env, tcg_rd, new_tmp_a64_zero(s));
+}
+break;
+case MAP(1, 0x01, 0x0d): /* AUTIZB */
+if (!dc_isar_feature(aa64_pauth, s) || rn != 31) {
+goto do_unallocated;
+} else if (s->pauth_active) {
+tcg_rd =

[Qemu-devel] [PATCH v2 15/27] target/arm: Introduce arm_mmu_idx

2018-12-13 Thread Richard Henderson

The pattern

  ARMMMUIdx mmu_idx = core_to_arm_mmu_idx(env, cpu_mmu_index(env, false));

is computing the full ARMMMUIdx, stripping off the ARM bits,
and then putting them back.

Avoid the extra two steps with the appropriate helper function.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 

v2: Move arm_mmu_idx declaration to internals.h.
---
 target/arm/cpu.h   |  9 -
 target/arm/internals.h |  8 
 target/arm/helper.c| 27 ---
 3 files changed, 32 insertions(+), 12 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 6435997111..3cc7a069ce 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2747,7 +2747,14 @@ ARMMMUIdx 
arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
 /* Return the MMU index for a v7M CPU in the specified security state */
 ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate);
 
-/* Determine the current mmu_idx to use for normal loads/stores */
+/**
+ * cpu_mmu_index:
+ * @env: The cpu environment
+ * @ifetch: True for code access, false for data access.
+ *
+ * Return the core mmu index for the current translation regime.
+ * This function is used by generic TCG code paths.
+ */
 int cpu_mmu_index(CPUARMState *env, bool ifetch);
 
 /* Indexes used when registering address spaces with cpu_address_space_init */
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 6bc0daf560..4a52fe58b6 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -912,4 +912,12 @@ void arm_cpu_update_virq(ARMCPU *cpu);
  */
 void arm_cpu_update_vfiq(ARMCPU *cpu);
 
+/**
+ * arm_mmu_idx:
+ * @env: The cpu environment
+ *
+ * Return the full ARMMMUIdx for the current translation regime.
+ */
+ARMMMUIdx arm_mmu_idx(CPUARMState *env);
+
 #endif
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 56960411e3..50c1db16dd 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7117,7 +7117,7 @@ static bool v7m_push_callee_stack(ARMCPU *cpu, uint32_t 
lr, bool dotailchain,
 limit = env->v7m.msplim[M_REG_S];
 }
 } else {
-mmu_idx = core_to_arm_mmu_idx(env, cpu_mmu_index(env, false));
+mmu_idx = arm_mmu_idx(env);
 frame_sp_p = >regs[13];
 limit = v7m_sp_limit(env);
 }
@@ -7298,7 +7298,7 @@ static bool v7m_push_stack(ARMCPU *cpu)
 CPUARMState *env = >env;
 uint32_t xpsr = xpsr_read(env);
 uint32_t frameptr = env->regs[13];
-ARMMMUIdx mmu_idx = core_to_arm_mmu_idx(env, cpu_mmu_index(env, false));
+ARMMMUIdx mmu_idx = arm_mmu_idx(env);
 
 /* Align stack pointer if the guest wants that */
 if ((frameptr & 4) &&
@@ -11073,7 +11073,7 @@ hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, 
vaddr addr,
 int prot;
 bool ret;
 ARMMMUFaultInfo fi = {};
-ARMMMUIdx mmu_idx = core_to_arm_mmu_idx(env, cpu_mmu_index(env, false));
+ARMMMUIdx mmu_idx = arm_mmu_idx(env);
 
 *attrs = (MemTxAttrs) {};
 
@@ -12977,26 +12977,31 @@ ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState 
*env, bool secstate)
 return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
 }
 
-int cpu_mmu_index(CPUARMState *env, bool ifetch)
+ARMMMUIdx arm_mmu_idx(CPUARMState *env)
 {
-int el = arm_current_el(env);
+int el;
 
 if (arm_feature(env, ARM_FEATURE_M)) {
-ARMMMUIdx mmu_idx = arm_v7m_mmu_idx_for_secstate(env, env->v7m.secure);
-
-return arm_to_core_mmu_idx(mmu_idx);
+return arm_v7m_mmu_idx_for_secstate(env, env->v7m.secure);
 }
 
+el = arm_current_el(env);
 if (el < 2 && arm_is_secure_below_el3(env)) {
-return arm_to_core_mmu_idx(ARMMMUIdx_S1SE0 + el);
+return ARMMMUIdx_S1SE0 + el;
+} else {
+return ARMMMUIdx_S12NSE0 + el;
 }
-return el;
+}
+
+int cpu_mmu_index(CPUARMState *env, bool ifetch)
+{
+return arm_to_core_mmu_idx(arm_mmu_idx(env));
 }
 
 void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
   target_ulong *cs_base, uint32_t *pflags)
 {
-ARMMMUIdx mmu_idx = core_to_arm_mmu_idx(env, cpu_mmu_index(env, false));
+ARMMMUIdx mmu_idx = arm_mmu_idx(env);
 int current_el = arm_current_el(env);
 int fp_el = fp_exception_el(env, current_el);
 uint32_t flags;
-- 
2.17.2

Re: [Qemu-devel] [PULL 00/32] QAPI patches for 2018-12-13

2018-12-13 Thread Markus Armbruster

NAK, expect v2 to correct the commit message accident pointed out by
Eric.

[Qemu-devel] [PATCH v2 26/27] target/arm: Enable PAuth for user-only, part 2

2018-12-13 Thread Richard Henderson

FIXME: We should have an attribute that controls the EL1 enable bits.
We may not always want to turn on pointer authentication with -cpu max.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 0b185f8d30..bc2c9eb551 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -162,6 +162,12 @@ static void arm_cpu_reset(CPUState *s)
 env->pstate = PSTATE_MODE_EL0t;
 /* Userspace expects access to DC ZVA, CTL_EL0 and the cache ops */
 env->cp15.sctlr_el[1] |= SCTLR_UCT | SCTLR_UCI | SCTLR_DZE;
+/* Enable all PAC keys. */
+env->cp15.sctlr_el[1] |= SCTLR_EnIA | SCTLR_EnIB;
+env->cp15.sctlr_el[1] |= SCTLR_EnDA | SCTLR_EnDB;
+/* Enable all PAC instructions */
+env->cp15.hcr_el2 |= HCR_API;
+env->cp15.scr_el3 |= SCR_API;
 /* and to the FP/Neon instructions */
 env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 20, 2, 3);
 /* and to the SVE instructions */
-- 
2.17.2

[Qemu-devel] [PATCH v2 16/27] target/arm: Introduce arm_stage1_mmu_idx

2018-12-13 Thread Richard Henderson

While we could expose stage_1_mmu_idx, the combination is
probably going to be more useful.

Signed-off-by: Richard Henderson 
---
 target/arm/internals.h | 15 +++
 target/arm/helper.c|  7 +++
 2 files changed, 22 insertions(+)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index 4a52fe58b6..1d0d0392c9 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -920,4 +920,19 @@ void arm_cpu_update_vfiq(ARMCPU *cpu);
  */
 ARMMMUIdx arm_mmu_idx(CPUARMState *env);
 
+/**
+ * arm_stage1_mmu_idx:
+ * @env: The cpu environment
+ *
+ * Return the ARMMMUIdx for the stage1 traversal for the current regime.
+ */
+#ifdef CONFIG_USER_ONLY
+static inline ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env)
+{
+return ARMMMUIdx_S1NSE0;
+}
+#else
+ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env);
+#endif
+
 #endif
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 50c1db16dd..b1c0ff923f 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -12998,6 +12998,13 @@ int cpu_mmu_index(CPUARMState *env, bool ifetch)
 return arm_to_core_mmu_idx(arm_mmu_idx(env));
 }
 
+#ifndef CONFIG_USER_ONLY
+ARMMMUIdx arm_stage1_mmu_idx(CPUARMState *env)
+{
+return stage_1_mmu_idx(arm_mmu_idx(env));
+}
+#endif
+
 void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
   target_ulong *cs_base, uint32_t *pflags)
 {
-- 
2.17.2

[Qemu-devel] [PATCH v2 08/27] target/arm: Decode PAuth within disas_data_proc_2src

2018-12-13 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 7ba4c996cf..d034a5edf3 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4884,6 +4884,13 @@ static void disas_data_proc_2src(DisasContext *s, 
uint32_t insn)
 case 11: /* RORV */
 handle_shift_reg(s, A64_SHIFT_TYPE_ROR, sf, rm, rn, rd);
 break;
+case 12: /* PACGA */
+if (sf == 0 || !dc_isar_feature(aa64_pauth, s)) {
+goto do_unallocated;
+}
+gen_helper_pacga(cpu_reg(s, rd), cpu_env,
+ cpu_reg(s, rn), cpu_reg_sp(s, rm));
+break;
 case 16:
 case 17:
 case 18:
@@ -4899,6 +4906,7 @@ static void disas_data_proc_2src(DisasContext *s, 
uint32_t insn)
 break;
 }
 default:
+do_unallocated:
 unallocated_encoding(s);
 break;
 }
-- 
2.17.2

[Qemu-devel] [PATCH v2 10/27] target/arm: Add new_pc argument to helper_exception_return

2018-12-13 Thread Richard Henderson

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper-a64.h|  2 +-
 target/arm/helper-a64.c| 10 +-
 target/arm/translate-a64.c |  7 ++-
 3 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index 55299896c4..aff8d6c9f3 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -86,7 +86,7 @@ DEF_HELPER_2(advsimd_f16tosinth, i32, f16, ptr)
 DEF_HELPER_2(advsimd_f16touinth, i32, f16, ptr)
 DEF_HELPER_2(sqrt_f16, f16, f16, ptr)
 
-DEF_HELPER_1(exception_return, void, env)
+DEF_HELPER_2(exception_return, void, env, i64)
 
 DEF_HELPER_FLAGS_3(pacia, TCG_CALL_NO_WG, i64, env, i64, i64)
 DEF_HELPER_FLAGS_3(pacib, TCG_CALL_NO_WG, i64, env, i64, i64)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index f70c8d9818..79cc9cf47b 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -925,7 +925,7 @@ static int el_from_spsr(uint32_t spsr)
 }
 }
 
-void HELPER(exception_return)(CPUARMState *env)
+void HELPER(exception_return)(CPUARMState *env, uint64_t new_pc)
 {
 int cur_el = arm_current_el(env);
 unsigned int spsr_idx = aarch64_banked_spsr_index(cur_el);
@@ -991,9 +991,9 @@ void HELPER(exception_return)(CPUARMState *env)
 aarch64_sync_64_to_32(env);
 
 if (spsr & CPSR_T) {
-env->regs[15] = env->elr_el[cur_el] & ~0x1;
+env->regs[15] = new_pc & ~0x1;
 } else {
-env->regs[15] = env->elr_el[cur_el] & ~0x3;
+env->regs[15] = new_pc & ~0x3;
 }
 qemu_log_mask(CPU_LOG_INT, "Exception return from AArch64 EL%d to "
   "AArch32 EL%d PC 0x%" PRIx32 "\n",
@@ -1005,7 +1005,7 @@ void HELPER(exception_return)(CPUARMState *env)
 env->pstate &= ~PSTATE_SS;
 }
 aarch64_restore_sp(env, new_el);
-env->pc = env->elr_el[cur_el];
+env->pc = new_pc;
 qemu_log_mask(CPU_LOG_INT, "Exception return from AArch64 EL%d to "
   "AArch64 EL%d PC 0x%" PRIx64 "\n",
   cur_el, new_el, env->pc);
@@ -1031,7 +1031,7 @@ illegal_return:
  * no change to exception level, execution state or stack pointer
  */
 env->pstate |= PSTATE_IL;
-env->pc = env->elr_el[cur_el];
+env->pc = new_pc;
 spsr &= PSTATE_NZCV | PSTATE_DAIF;
 spsr |= pstate_read(env) & ~(PSTATE_NZCV | PSTATE_DAIF);
 pstate_write(env, spsr);
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index d034a5edf3..c84c2dbb66 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1981,6 +1981,7 @@ static void disas_exc(DisasContext *s, uint32_t insn)
 static void disas_uncond_b_reg(DisasContext *s, uint32_t insn)
 {
 unsigned int opc, op2, op3, rn, op4;
+TCGv_i64 dst;
 
 opc = extract32(insn, 21, 4);
 op2 = extract32(insn, 16, 5);
@@ -2011,7 +2012,11 @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t 
insn)
 if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 gen_io_start();
 }
-gen_helper_exception_return(cpu_env);
+dst = tcg_temp_new_i64();
+tcg_gen_ld_i64(dst, cpu_env,
+   offsetof(CPUARMState, elr_el[s->current_el]));
+gen_helper_exception_return(cpu_env, dst);
+tcg_temp_free_i64(dst);
 if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 gen_io_end();
 }
-- 
2.17.2

[Qemu-devel] [PATCH v2 25/27] target/arm: Enable PAuth for -cpu max

2018-12-13 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/arm/cpu64.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 1d57be0c91..84f70b2a24 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -316,6 +316,10 @@ static void aarch64_max_initfn(Object *obj)
 
 t = cpu->isar.id_aa64isar1;
 t = FIELD_DP64(t, ID_AA64ISAR1, FCMA, 1);
+t = FIELD_DP64(t, ID_AA64ISAR1, APA, 1); /* PAuth, architected only */
+t = FIELD_DP64(t, ID_AA64ISAR1, API, 0);
+t = FIELD_DP64(t, ID_AA64ISAR1, GPA, 1);
+t = FIELD_DP64(t, ID_AA64ISAR1, GPI, 0);
 cpu->isar.id_aa64isar1 = t;
 
 t = cpu->isar.id_aa64pfr0;
-- 
2.17.2

[Qemu-devel] [PATCH v2 14/27] target/arm: Move cpu_mmu_index out of line

2018-12-13 Thread Richard Henderson

This function is, or will shortly become, too big to inline.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h| 48 +
 target/arm/helper.c | 44 +
 2 files changed, 49 insertions(+), 43 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 898243c93e..6435997111 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2739,54 +2739,16 @@ static inline int arm_mmu_idx_to_el(ARMMMUIdx mmu_idx)
 }
 
 /* Return the MMU index for a v7M CPU in the specified security and
- * privilege state
+ * privilege state.
  */
-static inline ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
-  bool secstate,
-  bool priv)
-{
-ARMMMUIdx mmu_idx = ARM_MMU_IDX_M;
-
-if (priv) {
-mmu_idx |= ARM_MMU_IDX_M_PRIV;
-}
-
-if (armv7m_nvic_neg_prio_requested(env->nvic, secstate)) {
-mmu_idx |= ARM_MMU_IDX_M_NEGPRI;
-}
-
-if (secstate) {
-mmu_idx |= ARM_MMU_IDX_M_S;
-}
-
-return mmu_idx;
-}
+ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
+bool secstate, bool priv);
 
 /* Return the MMU index for a v7M CPU in the specified security state */
-static inline ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env,
- bool secstate)
-{
-bool priv = arm_current_el(env) != 0;
-
-return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
-}
+ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate);
 
 /* Determine the current mmu_idx to use for normal loads/stores */
-static inline int cpu_mmu_index(CPUARMState *env, bool ifetch)
-{
-int el = arm_current_el(env);
-
-if (arm_feature(env, ARM_FEATURE_M)) {
-ARMMMUIdx mmu_idx = arm_v7m_mmu_idx_for_secstate(env, env->v7m.secure);
-
-return arm_to_core_mmu_idx(mmu_idx);
-}
-
-if (el < 2 && arm_is_secure_below_el3(env)) {
-return arm_to_core_mmu_idx(ARMMMUIdx_S1SE0 + el);
-}
-return el;
-}
+int cpu_mmu_index(CPUARMState *env, bool ifetch);
 
 /* Indexes used when registering address spaces with cpu_address_space_init */
 typedef enum ARMASIdx {
diff --git a/target/arm/helper.c b/target/arm/helper.c
index bd0cff5c27..56960411e3 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -12949,6 +12949,50 @@ int fp_exception_el(CPUARMState *env, int cur_el)
 return 0;
 }
 
+ARMMMUIdx arm_v7m_mmu_idx_for_secstate_and_priv(CPUARMState *env,
+bool secstate, bool priv)
+{
+ARMMMUIdx mmu_idx = ARM_MMU_IDX_M;
+
+if (priv) {
+mmu_idx |= ARM_MMU_IDX_M_PRIV;
+}
+
+if (armv7m_nvic_neg_prio_requested(env->nvic, secstate)) {
+mmu_idx |= ARM_MMU_IDX_M_NEGPRI;
+}
+
+if (secstate) {
+mmu_idx |= ARM_MMU_IDX_M_S;
+}
+
+return mmu_idx;
+}
+
+/* Return the MMU index for a v7M CPU in the specified security state */
+ARMMMUIdx arm_v7m_mmu_idx_for_secstate(CPUARMState *env, bool secstate)
+{
+bool priv = arm_current_el(env) != 0;
+
+return arm_v7m_mmu_idx_for_secstate_and_priv(env, secstate, priv);
+}
+
+int cpu_mmu_index(CPUARMState *env, bool ifetch)
+{
+int el = arm_current_el(env);
+
+if (arm_feature(env, ARM_FEATURE_M)) {
+ARMMMUIdx mmu_idx = arm_v7m_mmu_idx_for_secstate(env, env->v7m.secure);
+
+return arm_to_core_mmu_idx(mmu_idx);
+}
+
+if (el < 2 && arm_is_secure_below_el3(env)) {
+return arm_to_core_mmu_idx(ARMMMUIdx_S1SE0 + el);
+}
+return el;
+}
+
 void cpu_get_tb_cpu_state(CPUARMState *env, target_ulong *pc,
   target_ulong *cs_base, uint32_t *pflags)
 {
-- 
2.17.2

[Qemu-devel] [PATCH v2 03/27] target/arm: Add PAuth active bit to tbflags

2018-12-13 Thread Richard Henderson

There are 5 bits of state that could be added, but to save
space within tbflags, add only a single enable bit.
Helpers will determine the rest of the state at runtime.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 

v2: Fix whitespace, comment grammar.
---
 target/arm/cpu.h   |  4 
 target/arm/translate.h |  2 ++
 target/arm/helper.c| 19 +++
 target/arm/translate-a64.c |  1 +
 4 files changed, 26 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index cd2519d43e..898243c93e 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3032,6 +3032,8 @@ static inline bool arm_cpu_data_is_big_endian(CPUARMState 
*env)
 #define ARM_TBFLAG_SVEEXC_EL_MASK   (0x3 << ARM_TBFLAG_SVEEXC_EL_SHIFT)
 #define ARM_TBFLAG_ZCR_LEN_SHIFT4
 #define ARM_TBFLAG_ZCR_LEN_MASK (0xf << ARM_TBFLAG_ZCR_LEN_SHIFT)
+#define ARM_TBFLAG_PAUTH_ACTIVE_SHIFT  8
+#define ARM_TBFLAG_PAUTH_ACTIVE_MASK   (1ull << ARM_TBFLAG_PAUTH_ACTIVE_SHIFT)
 
 /* some convenience accessor macros */
 #define ARM_TBFLAG_AARCH64_STATE(F) \
@@ -3074,6 +3076,8 @@ static inline bool arm_cpu_data_is_big_endian(CPUARMState 
*env)
 (((F) & ARM_TBFLAG_SVEEXC_EL_MASK) >> ARM_TBFLAG_SVEEXC_EL_SHIFT)
 #define ARM_TBFLAG_ZCR_LEN(F) \
 (((F) & ARM_TBFLAG_ZCR_LEN_MASK) >> ARM_TBFLAG_ZCR_LEN_SHIFT)
+#define ARM_TBFLAG_PAUTH_ACTIVE(F) \
+(((F) & ARM_TBFLAG_PAUTH_ACTIVE_MASK) >> ARM_TBFLAG_PAUTH_ACTIVE_SHIFT)
 
 static inline bool bswap_code(bool sctlr_b)
 {
diff --git a/target/arm/translate.h b/target/arm/translate.h
index 1550aa8bc7..d8a8bb4e9c 100644
--- a/target/arm/translate.h
+++ b/target/arm/translate.h
@@ -68,6 +68,8 @@ typedef struct DisasContext {
 bool is_ldex;
 /* True if a single-step exception will be taken to the current EL */
 bool ss_same_el;
+/* True if v8.3-PAuth is active.  */
+bool pauth_active;
 /* Bottom two bits of XScale c15_cpar coprocessor access control reg */
 int c15_cpar;
 /* TCG op of the current insn_start.  */
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 644599b29d..bd0cff5c27 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -12981,6 +12981,25 @@ void cpu_get_tb_cpu_state(CPUARMState *env, 
target_ulong *pc,
 flags |= sve_el << ARM_TBFLAG_SVEEXC_EL_SHIFT;
 flags |= zcr_len << ARM_TBFLAG_ZCR_LEN_SHIFT;
 }
+
+if (cpu_isar_feature(aa64_pauth, cpu)) {
+/*
+ * In order to save space in flags, we record only whether
+ * pauth is "inactive", meaning all insns are implemented as
+ * a nop, or "active" when some action must be performed.
+ * The decision of which action to take is left to a helper.
+ */
+uint64_t sctlr;
+if (current_el == 0) {
+/* FIXME: ARMv8.1-VHE S2 translation regime.  */
+sctlr = env->cp15.sctlr_el[1];
+} else {
+sctlr = env->cp15.sctlr_el[current_el];
+}
+if (sctlr & (SCTLR_EnIA | SCTLR_EnIB | SCTLR_EnDA | SCTLR_EnDB)) {
+flags |= ARM_TBFLAG_PAUTH_ACTIVE_MASK;
+}
+}
 } else {
 *pc = env->regs[15];
 flags = (env->thumb << ARM_TBFLAG_THUMB_SHIFT)
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index e1da1e4d6f..7c1cc1ce8e 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -13407,6 +13407,7 @@ static void 
aarch64_tr_init_disas_context(DisasContextBase *dcbase,
 dc->fp_excp_el = ARM_TBFLAG_FPEXC_EL(dc->base.tb->flags);
 dc->sve_excp_el = ARM_TBFLAG_SVEEXC_EL(dc->base.tb->flags);
 dc->sve_len = (ARM_TBFLAG_ZCR_LEN(dc->base.tb->flags) + 1) * 16;
+dc->pauth_active = ARM_TBFLAG_PAUTH_ACTIVE(dc->base.tb->flags);
 dc->vec_len = 0;
 dc->vec_stride = 0;
 dc->cp_regs = arm_cpu->cp_regs;
-- 
2.17.2

[Qemu-devel] [PATCH v2 04/27] target/arm: Add PAuth helpers

2018-12-13 Thread Richard Henderson

The cryptographic internals are stubbed out for now,
but the enable and trap bits are checked.

Signed-off-by: Richard Henderson 

v2: Remove trap from xpac* helpers; these are now side-effect free.
Use struct ARMPACKey.
---
 target/arm/helper-a64.h |  12 +++
 target/arm/internals.h  |   6 ++
 target/arm/helper-a64.c | 166 
 3 files changed, 184 insertions(+)

diff --git a/target/arm/helper-a64.h b/target/arm/helper-a64.h
index 9d3a907049..28aa0af69d 100644
--- a/target/arm/helper-a64.h
+++ b/target/arm/helper-a64.h
@@ -85,3 +85,15 @@ DEF_HELPER_2(advsimd_rinth, f16, f16, ptr)
 DEF_HELPER_2(advsimd_f16tosinth, i32, f16, ptr)
 DEF_HELPER_2(advsimd_f16touinth, i32, f16, ptr)
 DEF_HELPER_2(sqrt_f16, f16, f16, ptr)
+
+DEF_HELPER_FLAGS_3(pacia, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(pacib, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(pacda, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(pacdb, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(pacga, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autia, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autib, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autda, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_3(autdb, TCG_CALL_NO_WG, i64, env, i64, i64)
+DEF_HELPER_FLAGS_2(xpaci, TCG_CALL_NO_RWG_SE, i64, env, i64)
+DEF_HELPER_FLAGS_2(xpacd, TCG_CALL_NO_RWG_SE, i64, env, i64)
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 78e026d6e9..6bc0daf560 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -259,6 +259,7 @@ enum arm_exception_class {
 EC_CP14DTTRAP = 0x06,
 EC_ADVSIMDFPACCESSTRAP= 0x07,
 EC_FPIDTRAP   = 0x08,
+EC_PACTRAP= 0x09,
 EC_CP14RRTTRAP= 0x0c,
 EC_ILLEGALSTATE   = 0x0e,
 EC_AA32_SVC   = 0x11,
@@ -426,6 +427,11 @@ static inline uint32_t syn_sve_access_trap(void)
 return EC_SVEACCESSTRAP << ARM_EL_EC_SHIFT;
 }
 
+static inline uint32_t syn_pactrap(void)
+{
+return EC_PACTRAP << ARM_EL_EC_SHIFT;
+}
+
 static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
 {
 return (EC_INSNABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
diff --git a/target/arm/helper-a64.c b/target/arm/helper-a64.c
index 61799d20e1..bb64700e10 100644
--- a/target/arm/helper-a64.c
+++ b/target/arm/helper-a64.c
@@ -898,4 +898,170 @@ uint32_t HELPER(sqrt_f16)(uint32_t a, void *fpstp)
 return float16_sqrt(a, s);
 }
 
+/*
+ * Helpers for ARMv8.3-PAuth.
+ */
 
+static uint64_t pauth_computepac(uint64_t data, uint64_t modifier,
+ ARMPACKey key)
+{
+g_assert_not_reached(); /* FIXME */
+}
+
+static uint64_t pauth_addpac(CPUARMState *env, uint64_t ptr, uint64_t modifier,
+ ARMPACKey *key, bool data)
+{
+g_assert_not_reached(); /* FIXME */
+}
+
+static uint64_t pauth_auth(CPUARMState *env, uint64_t ptr, uint64_t modifier,
+   ARMPACKey *key, bool data, int keynumber)
+{
+g_assert_not_reached(); /* FIXME */
+}
+
+static uint64_t pauth_strip(CPUARMState *env, uint64_t ptr, bool data)
+{
+g_assert_not_reached(); /* FIXME */
+}
+
+static void QEMU_NORETURN pauth_trap(CPUARMState *env, int target_el,
+ uintptr_t ra)
+{
+CPUState *cs = ENV_GET_CPU(env);
+
+cs->exception_index = EXCP_UDEF;
+env->exception.syndrome = syn_pactrap();
+env->exception.target_el = target_el;
+cpu_loop_exit_restore(cs, ra);
+}
+
+static void pauth_check_trap(CPUARMState *env, int el, uintptr_t ra)
+{
+if (el < 2 && arm_feature(env, ARM_FEATURE_EL2)) {
+uint64_t hcr = arm_hcr_el2_eff(env);
+bool trap = !(hcr & HCR_API);
+/* FIXME: ARMv8.1-VHE: trap only applies to EL1&0 regime.  */
+/* FIXME: ARMv8.3-NV: HCR_NV trap takes precedence for ERETA[AB].  */
+if (trap) {
+pauth_trap(env, 2, ra);
+}
+}
+if (el < 3 && arm_feature(env, ARM_FEATURE_EL3)) {
+if (!(env->cp15.scr_el3 & SCR_API)) {
+pauth_trap(env, 3, ra);
+}
+}
+}
+
+static bool pauth_key_enabled(CPUARMState *env, int el, uint32_t bit)
+{
+uint32_t sctlr;
+if (el == 0) {
+/* FIXME: ARMv8.1-VHE S2 translation regime.  */
+sctlr = env->cp15.sctlr_el[1];
+} else {
+sctlr = env->cp15.sctlr_el[el];
+}
+return (sctlr & bit) != 0;
+}
+
+uint64_t HELPER(pacia)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+int el = arm_current_el(env);
+if (!pauth_key_enabled(env, el, SCTLR_EnIA)) {
+return x;
+}
+pauth_check_trap(env, el, GETPC());
+return pauth_addpac(env, x, y, >apia_key, false);
+}
+
+uint64_t HELPER(pacib)(CPUARMState *env, uint64_t x, uint64_t y)
+{
+int el = arm_current_el(env);
+if (!pauth_key_enabled(env, el, SCTLR_EnIB)) {
+

[Qemu-devel] [PATCH v2 12/27] target/arm: Decode PAuth within disas_uncond_b_reg

2018-12-13 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 82 +-
 1 file changed, 81 insertions(+), 1 deletion(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 30086a5d7f..e62d248894 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1982,6 +1982,7 @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t 
insn)
 {
 unsigned int opc, op2, op3, rn, op4;
 TCGv_i64 dst;
+TCGv_i64 modifier;
 
 opc = extract32(insn, 21, 4);
 op2 = extract32(insn, 16, 5);
@@ -1999,12 +2000,44 @@ static void disas_uncond_b_reg(DisasContext *s, 
uint32_t insn)
 case 2: /* RET */
 switch (op3) {
 case 0:
+/* BR, BLR, RET */
 if (op4 != 0) {
 goto do_unallocated;
 }
 dst = cpu_reg(s, rn);
 break;
 
+case 2:
+case 3:
+if (!dc_isar_feature(aa64_pauth, s)) {
+goto do_unallocated;
+}
+if (opc == 2) {
+/* RETAA, RETAB */
+if (rn != 0x1f || op4 != 0x1f) {
+goto do_unallocated;
+}
+rn = 30;
+modifier = cpu_X[31];
+} else {
+/* BRAAZ, BRABZ, BLRAAZ, BLRABZ */
+if (op4 != 0x1f) {
+goto do_unallocated;
+}
+modifier = new_tmp_a64_zero(s);
+}
+if (s->pauth_active) {
+dst = new_tmp_a64(s);
+if (op3 == 2) {
+gen_helper_autia(dst, cpu_env, cpu_reg(s, rn), modifier);
+} else {
+gen_helper_autib(dst, cpu_env, cpu_reg(s, rn), modifier);
+}
+} else {
+dst = cpu_reg(s, rn);
+}
+break;
+
 default:
 goto do_unallocated;
 }
@@ -2016,12 +2049,38 @@ static void disas_uncond_b_reg(DisasContext *s, 
uint32_t insn)
 }
 break;
 
+case 8: /* BRAA */
+case 9: /* BLRAA */
+if (!dc_isar_feature(aa64_pauth, s)) {
+goto do_unallocated;
+}
+if (op3 != 2 || op3 != 3) {
+goto do_unallocated;
+}
+if (s->pauth_active) {
+dst = new_tmp_a64(s);
+modifier = cpu_reg_sp(s, op4);
+if (op3 == 2) {
+gen_helper_autia(dst, cpu_env, cpu_reg(s, rn), modifier);
+} else {
+gen_helper_autib(dst, cpu_env, cpu_reg(s, rn), modifier);
+}
+} else {
+dst = cpu_reg(s, rn);
+}
+gen_a64_set_pc(s, dst);
+/* BLRAA also needs to load return address */
+if (opc == 9) {
+tcg_gen_movi_i64(cpu_reg(s, 30), s->pc);
+}
+break;
+
 case 4: /* ERET */
 if (s->current_el == 0) {
 goto do_unallocated;
 }
 switch (op3) {
-case 0:
+case 0: /* ERET */
 if (op4 != 0) {
 goto do_unallocated;
 }
@@ -2030,6 +2089,27 @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t 
insn)
offsetof(CPUARMState, elr_el[s->current_el]));
 break;
 
+case 2: /* ERETAA */
+case 3: /* ERETAB */
+if (!dc_isar_feature(aa64_pauth, s)) {
+goto do_unallocated;
+}
+if (rn != 0x1f || op4 != 0x1f) {
+goto do_unallocated;
+}
+dst = tcg_temp_new_i64();
+tcg_gen_ld_i64(dst, cpu_env,
+   offsetof(CPUARMState, elr_el[s->current_el]));
+if (s->pauth_active) {
+modifier = cpu_X[31];
+if (op3 == 2) {
+gen_helper_autia(dst, cpu_env, dst, modifier);
+} else {
+gen_helper_autib(dst, cpu_env, dst, modifier);
+}
+}
+break;
+
 default:
 goto do_unallocated;
 }
-- 
2.17.2

[Qemu-devel] [PATCH v2 02/27] target/arm: Add SCTLR bits through ARMv8.5

2018-12-13 Thread Richard Henderson

Post v8.4 bits taken from SysReg_v85_xml-00bet8.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 

v2: Review fixups from Peter.
---
 target/arm/cpu.h | 45 +
 1 file changed, 33 insertions(+), 12 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 39d4afdfe6..cd2519d43e 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -989,12 +989,15 @@ void pmccntr_sync(CPUARMState *env);
 #define SCTLR_A   (1U << 1)
 #define SCTLR_C   (1U << 2)
 #define SCTLR_W   (1U << 3) /* up to v6; RAO in v7 */
-#define SCTLR_SA  (1U << 3)
+#define SCTLR_nTLSMD_32 (1U << 3) /* v8.2-LSMAOC, AArch32 only */
+#define SCTLR_SA  (1U << 3) /* AArch64 only */
 #define SCTLR_P   (1U << 4) /* up to v5; RAO in v6 and v7 */
+#define SCTLR_LSMAOE_32 (1U << 4) /* v8.2-LSMAOC, AArch32 only */
 #define SCTLR_SA0 (1U << 4) /* v8 onward, AArch64 only */
 #define SCTLR_D   (1U << 5) /* up to v5; RAO in v6 */
 #define SCTLR_CP15BEN (1U << 5) /* v7 onward */
 #define SCTLR_L   (1U << 6) /* up to v5; RAO in v6 and v7; RAZ in v8 */
+#define SCTLR_nAA (1U << 6) /* when v8.4-LSE is implemented */
 #define SCTLR_B   (1U << 7) /* up to v6; RAZ in v7 */
 #define SCTLR_ITD (1U << 7) /* v8 onward */
 #define SCTLR_S   (1U << 8) /* up to v6; RAZ in v7 */
@@ -1002,35 +1005,53 @@ void pmccntr_sync(CPUARMState *env);
 #define SCTLR_R   (1U << 9) /* up to v6; RAZ in v7 */
 #define SCTLR_UMA (1U << 9) /* v8 onward, AArch64 only */
 #define SCTLR_F   (1U << 10) /* up to v6 */
-#define SCTLR_SW  (1U << 10) /* v7 onward */
-#define SCTLR_Z   (1U << 11)
+#define SCTLR_SW  (1U << 10) /* v7, RES0 in v8 */
+#define SCTLR_Z   (1U << 11) /* in v7, RES1 in v8 */
+#define SCTLR_EOS (1U << 11) /* v8.5-ExS */
 #define SCTLR_I   (1U << 12)
-#define SCTLR_V   (1U << 13)
+#define SCTLR_V   (1U << 13) /* AArch32 only */
+#define SCTLR_EnDB(1U << 13) /* v8.3, AArch64 only */
 #define SCTLR_RR  (1U << 14) /* up to v7 */
 #define SCTLR_DZE (1U << 14) /* v8 onward, AArch64 only */
 #define SCTLR_L4  (1U << 15) /* up to v6; RAZ in v7 */
 #define SCTLR_UCT (1U << 15) /* v8 onward, AArch64 only */
 #define SCTLR_DT  (1U << 16) /* up to ??, RAO in v6 and v7 */
 #define SCTLR_nTWI(1U << 16) /* v8 onward */
-#define SCTLR_HA  (1U << 17)
+#define SCTLR_HA  (1U << 17) /* up to v7, RES0 in v8 */
 #define SCTLR_BR  (1U << 17) /* PMSA only */
 #define SCTLR_IT  (1U << 18) /* up to ??, RAO in v6 and v7 */
 #define SCTLR_nTWE(1U << 18) /* v8 onward */
 #define SCTLR_WXN (1U << 19)
 #define SCTLR_ST  (1U << 20) /* up to ??, RAZ in v6 */
-#define SCTLR_UWXN(1U << 20) /* v7 onward */
-#define SCTLR_FI  (1U << 21)
-#define SCTLR_U   (1U << 22)
+#define SCTLR_UWXN(1U << 20) /* v7 onward, AArch32 only */
+#define SCTLR_FI  (1U << 21) /* up to v7, v8 RES0 */
+#define SCTLR_IESB(1U << 21) /* v8.2-IESB, AArch64 only */
+#define SCTLR_U   (1U << 22) /* up to v6, RAO in v7 */
+#define SCTLR_EIS (1U << 22) /* v8.5-ExS */
 #define SCTLR_XP  (1U << 23) /* up to v6; v7 onward RAO */
+#define SCTLR_SPAN(1U << 23) /* v8.1-PAN */
 #define SCTLR_VE  (1U << 24) /* up to v7 */
 #define SCTLR_E0E (1U << 24) /* v8 onward, AArch64 only */
 #define SCTLR_EE  (1U << 25)
 #define SCTLR_L2  (1U << 26) /* up to v6, RAZ in v7 */
 #define SCTLR_UCI (1U << 26) /* v8 onward, AArch64 only */
-#define SCTLR_NMFI(1U << 27)
-#define SCTLR_TRE (1U << 28)
-#define SCTLR_AFE (1U << 29)
-#define SCTLR_TE  (1U << 30)
+#define SCTLR_NMFI(1U << 27) /* up to v7, RAZ in v7VE and v8 */
+#define SCTLR_EnDA(1U << 27) /* v8.3, AArch64 only */
+#define SCTLR_TRE (1U << 28) /* AArch32 only */
+#define SCTLR_nTLSMD_64 (1U << 28) /* v8.2-LSMAOC, AArch64 only */
+#define SCTLR_AFE (1U << 29) /* AArch32 only */
+#define SCTLR_LSMAOE_64 (1U << 29) /* v8.2-LSMAOC, AArch64 only */
+#define SCTLR_TE  (1U << 30) /* AArch32 only */
+#define SCTLR_EnIB(1U << 30) /* v8.3, AArch64 only */
+#define SCTLR_EnIA(1U << 31) /* v8.3, AArch64 only */
+#define SCTLR_BT0 (1ULL << 35) /* v8.5-BTI */
+#define SCTLR_BT1 (1ULL << 36) /* v8.5-BTI */
+#define SCTLR_ITFSB   (1ULL << 37) /* v8.5-MemTag */
+#define SCTLR_TCF0(3ULL << 38) /* v8.5-MemTag */
+#define SCTLR_TCF (3ULL << 40) /* v8.5-MemTag */
+#define SCTLR_ATA0(1ULL << 42) /* v8.5-MemTag */
+#define SCTLR_ATA (1ULL << 43) /* v8.5-MemTag */
+#define SCTLR_DSSBS   (1ULL << 44) /* v8.5 */
 
 #define CPTR_TCPAC(1U << 31)
 #define CPTR_TTA  (1U << 20)
-- 
2.17.2

[Qemu-devel] [PATCH v2 06/27] target/arm: Rearrange decode in disas_data_proc_1src

2018-12-13 Thread Richard Henderson

Now properly signals unallocated for REV64 with SF=0.
Allows for the opcode2 field to be decoded shortly.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 31 ++-
 1 file changed, 22 insertions(+), 9 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 0df344f9e8..c5ec430b42 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -4563,38 +4563,51 @@ static void handle_rev16(DisasContext *s, unsigned int 
sf,
  */
 static void disas_data_proc_1src(DisasContext *s, uint32_t insn)
 {
-unsigned int sf, opcode, rn, rd;
+unsigned int sf, opcode, opcode2, rn, rd;
 
-if (extract32(insn, 29, 1) || extract32(insn, 16, 5)) {
+if (extract32(insn, 29, 1)) {
 unallocated_encoding(s);
 return;
 }
 
 sf = extract32(insn, 31, 1);
 opcode = extract32(insn, 10, 6);
+opcode2 = extract32(insn, 16, 5);
 rn = extract32(insn, 5, 5);
 rd = extract32(insn, 0, 5);
 
-switch (opcode) {
-case 0: /* RBIT */
+#define MAP(SF, O2, O1) ((SF) | (O1 << 1) | (O2 << 7))
+
+switch (MAP(sf, opcode2, opcode)) {
+case MAP(0, 0x00, 0x00): /* RBIT */
+case MAP(1, 0x00, 0x00):
 handle_rbit(s, sf, rn, rd);
 break;
-case 1: /* REV16 */
+case MAP(0, 0x00, 0x01): /* REV16 */
+case MAP(1, 0x00, 0x01):
 handle_rev16(s, sf, rn, rd);
 break;
-case 2: /* REV32 */
+case MAP(0, 0x00, 0x02): /* REV/REV32 */
+case MAP(1, 0x00, 0x02):
 handle_rev32(s, sf, rn, rd);
 break;
-case 3: /* REV64 */
+case MAP(1, 0x00, 0x03): /* REV64 */
 handle_rev64(s, sf, rn, rd);
 break;
-case 4: /* CLZ */
+case MAP(0, 0x00, 0x04): /* CLZ */
+case MAP(1, 0x00, 0x04):
 handle_clz(s, sf, rn, rd);
 break;
-case 5: /* CLS */
+case MAP(0, 0x00, 0x05): /* CLS */
+case MAP(1, 0x00, 0x05):
 handle_cls(s, sf, rn, rd);
 break;
+default:
+unallocated_encoding(s);
+break;
 }
+
+#undef MAP
 }
 
 static void handle_div(DisasContext *s, bool is_signed, unsigned int sf,
-- 
2.17.2

[Qemu-devel] [PATCH v2 01/27] target/arm: Add state for the ARMv8.3-PAuth extension

2018-12-13 Thread Richard Henderson

Add storage space for the 5 encryption keys.

Signed-off-by: Richard Henderson 

v2: Remove pointless double migration.
Use a struct to make it clear which half is which.
---
 target/arm/cpu.h | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index c943f35dd9..39d4afdfe6 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -201,11 +201,16 @@ typedef struct ARMVectorReg {
 uint64_t d[2 * ARM_MAX_VQ] QEMU_ALIGNED(16);
 } ARMVectorReg;
 
-/* In AArch32 mode, predicate registers do not exist at all.  */
 #ifdef TARGET_AARCH64
+/* In AArch32 mode, predicate registers do not exist at all.  */
 typedef struct ARMPredicateReg {
 uint64_t p[2 * ARM_MAX_VQ / 8] QEMU_ALIGNED(16);
 } ARMPredicateReg;
+
+/* In AArch32 mode, PAC keys do not exist at all.  */
+typedef struct ARMPACKey {
+uint64_t lo, hi;
+} ARMPACKey;
 #endif
 
 
@@ -605,6 +610,14 @@ typedef struct CPUARMState {
 uint32_t cregs[16];
 } iwmmxt;
 
+#ifdef TARGET_AARCH64
+ARMPACKey apia_key;
+ARMPACKey apib_key;
+ARMPACKey apda_key;
+ARMPACKey apdb_key;
+ARMPACKey apga_key;
+#endif
+
 #if defined(CONFIG_USER_ONLY)
 /* For usermode syscall translation.  */
 int eabi;
@@ -3324,6 +3337,21 @@ static inline bool isar_feature_aa64_fcma(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64isar1, ID_AA64ISAR1, FCMA) != 0;
 }
 
+static inline bool isar_feature_aa64_pauth(const ARMISARegisters *id)
+{
+/*
+ * Note that while QEMU will only implement the architected algorithm
+ * QARMA, and thus APA+GPA, the host cpu for kvm may use implementation
+ * defined algorithms, and thus API+GPI, and this predicate controls
+ * migration of the 128-bit keys.
+ */
+return (id->id_aa64isar1 &
+(FIELD_DP64(0, ID_AA64ISAR1, APA, -1) |
+ FIELD_DP64(0, ID_AA64ISAR1, API, -1) |
+ FIELD_DP64(0, ID_AA64ISAR1, GPA, -1) |
+ FIELD_DP64(0, ID_AA64ISAR1, GPI, -1))) != 0;
+}
+
 static inline bool isar_feature_aa64_fp16(const ARMISARegisters *id)
 {
 /* We always set the AdvSIMD and FP fields identically wrt FP16.  */
-- 
2.17.2

[Qemu-devel] [PATCH v2 05/27] target/arm: Decode PAuth within system hint space

2018-12-13 Thread Richard Henderson

Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 93 +-
 1 file changed, 81 insertions(+), 12 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 7c1cc1ce8e..0df344f9e8 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1471,33 +1471,102 @@ static void handle_hint(DisasContext *s, uint32_t insn,
 }
 
 switch (selector) {
-case 0: /* NOP */
-return;
-case 3: /* WFI */
+case 000: /* NOP */
+break;
+case 003: /* WFI */
 s->base.is_jmp = DISAS_WFI;
-return;
+break;
+case 001: /* YIELD */
 /* When running in MTTCG we don't generate jumps to the yield and
  * WFE helpers as it won't affect the scheduling of other vCPUs.
  * If we wanted to more completely model WFE/SEV so we don't busy
  * spin unnecessarily we would need to do something more involved.
  */
-case 1: /* YIELD */
 if (!(tb_cflags(s->base.tb) & CF_PARALLEL)) {
 s->base.is_jmp = DISAS_YIELD;
 }
-return;
-case 2: /* WFE */
+break;
+case 002: /* WFE */
 if (!(tb_cflags(s->base.tb) & CF_PARALLEL)) {
 s->base.is_jmp = DISAS_WFE;
 }
-return;
-case 4: /* SEV */
-case 5: /* SEVL */
+break;
+case 004: /* SEV */
+case 005: /* SEVL */
 /* we treat all as NOP at least for now */
-return;
+break;
+case 007: /* XPACLRI */
+if (s->pauth_active) {
+gen_helper_xpaci(cpu_X[30], cpu_env, cpu_X[30]);
+}
+break;
+case 010: /* PACIA1716 */
+if (s->pauth_active) {
+gen_helper_pacia(cpu_X[17], cpu_env, cpu_X[17], cpu_X[16]);
+}
+break;
+case 012: /* PACIB1716 */
+if (s->pauth_active) {
+gen_helper_pacib(cpu_X[17], cpu_env, cpu_X[17], cpu_X[16]);
+}
+break;
+case 014: /* AUTIA1716 */
+if (s->pauth_active) {
+gen_helper_autia(cpu_X[17], cpu_env, cpu_X[17], cpu_X[16]);
+}
+break;
+case 016: /* AUTIB1716 */
+if (s->pauth_active) {
+gen_helper_autib(cpu_X[17], cpu_env, cpu_X[17], cpu_X[16]);
+}
+break;
+case 030: /* PACIAZ */
+if (s->pauth_active) {
+gen_helper_pacia(cpu_X[30], cpu_env, cpu_X[30],
+new_tmp_a64_zero(s));
+}
+break;
+case 031: /* PACIASP */
+if (s->pauth_active) {
+gen_helper_pacia(cpu_X[30], cpu_env, cpu_X[30], cpu_X[31]);
+}
+break;
+case 032: /* PACIBZ */
+if (s->pauth_active) {
+gen_helper_pacib(cpu_X[30], cpu_env, cpu_X[30],
+new_tmp_a64_zero(s));
+}
+break;
+case 033: /* PACIBSP */
+if (s->pauth_active) {
+gen_helper_pacib(cpu_X[30], cpu_env, cpu_X[30], cpu_X[31]);
+}
+break;
+case 034: /* AUTIAZ */
+if (s->pauth_active) {
+gen_helper_autia(cpu_X[30], cpu_env, cpu_X[30],
+  new_tmp_a64_zero(s));
+}
+break;
+case 035: /* AUTIASP */
+if (s->pauth_active) {
+gen_helper_autia(cpu_X[30], cpu_env, cpu_X[30], cpu_X[31]);
+}
+break;
+case 036: /* AUTIBZ */
+if (s->pauth_active) {
+gen_helper_autib(cpu_X[30], cpu_env, cpu_X[30],
+  new_tmp_a64_zero(s));
+}
+break;
+case 037: /* AUTIBSP */
+if (s->pauth_active) {
+gen_helper_autib(cpu_X[30], cpu_env, cpu_X[30], cpu_X[31]);
+}
+break;
 default:
 /* default specified as NOP equivalent */
-return;
+break;
 }
 }
 
-- 
2.17.2

[Qemu-devel] [PATCH v2 00/27] target/arm: Implement ARMv8.3-PAuth

2018-12-13 Thread Richard Henderson

Lots of little changes since v1, but many of which are noted
within each patch.  This version works in system mode, using

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=for-next/core


r~

Richard Henderson (27):
  target/arm: Add state for the ARMv8.3-PAuth extension
  target/arm: Add SCTLR bits through ARMv8.5
  target/arm: Add PAuth active bit to tbflags
  target/arm: Add PAuth helpers
  target/arm: Decode PAuth within system hint space
  target/arm: Rearrange decode in disas_data_proc_1src
  target/arm: Decode PAuth within disas_data_proc_1src
  target/arm: Decode PAuth within disas_data_proc_2src
  target/arm: Move helper_exception_return to helper-a64.c
  target/arm: Add new_pc argument to helper_exception_return
  target/arm: Rearrange decode in disas_uncond_b_reg
  target/arm: Decode PAuth within disas_uncond_b_reg
  target/arm: Decode Load/store register (pac)
  target/arm: Move cpu_mmu_index out of line
  target/arm: Introduce arm_mmu_idx
  target/arm: Introduce arm_stage1_mmu_idx
  target/arm: Create ARMVAParameters and helpers
  target/arm: Reuse aa64_va_parameters for setting tbflags
  target/arm: Export aa64_va_parameters to internals.h
  target/arm: Implement pauth_strip
  target/arm: Implement pauth_auth
  target/arm: Implement pauth_addpac
  target/arm: Implement pauth_computepac
  target/arm: Add PAuth system registers
  target/arm: Enable PAuth for -cpu max
  target/arm: Enable PAuth for user-only, part 2
  target/arm: Tidy TBI handling in gen_a64_set_pc

 target/arm/cpu.h   | 171 +-
 target/arm/helper-a64.h|  14 +
 target/arm/helper.h|   1 -
 target/arm/internals.h |  62 
 target/arm/translate.h |   2 +
 target/arm/cpu.c   |   6 +
 target/arm/cpu64.c |   4 +
 target/arm/helper-a64.c| 629 +
 target/arm/helper.c| 500 ++---
 target/arm/op_helper.c | 155 -
 target/arm/translate-a64.c | 534 ++-
 11 files changed, 1581 insertions(+), 497 deletions(-)

-- 
2.17.2

[Qemu-devel] [PATCH v2 13/27] target/arm: Decode Load/store register (pac)

2018-12-13 Thread Richard Henderson

Not that there are any stores involved, but why argue with ARM's
naming convention.

Signed-off-by: Richard Henderson 
---
 target/arm/translate-a64.c | 62 ++
 1 file changed, 62 insertions(+)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index e62d248894..c57c89d98a 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -3146,6 +3146,65 @@ static void disas_ldst_atomic(DisasContext *s, uint32_t 
insn,
s->be_data | size | MO_ALIGN);
 }
 
+/* PAC memory operations
+ *
+ *  31  30  27  262422  21   12  11  105 0
+ * +--+---+---+-+-+---++---+---++-+
+ * | size | 1 1 1 | V | 0 0 | M S | 1 |  imm9  | W | 1 | Rn |  Rt |
+ * +--+---+---+-+-++---+---++-+
+ *
+ * Rt: the result register
+ * Rn: base address or SP
+ * Rs: the source register for the operation
+ * V: vector flag (always 0 as of v8.3)
+ * M: clear for key DA, set for key DB
+ * W: pre-indexing flag
+ * S: sign for imm9.
+ */
+static void disas_ldst_pac(DisasContext *s, uint32_t insn,
+   int size, int rt, bool is_vector)
+{
+int rn = extract32(insn, 5, 5);
+bool is_wback = extract32(insn, 11, 1);
+bool use_key_a = !extract32(insn, 23, 1);
+int offset, memidx;
+TCGv_i64 tcg_addr, tcg_rt;
+
+if (size != 3 || is_vector || !dc_isar_feature(aa64_pauth, s)) {
+unallocated_encoding(s);
+return;
+}
+
+if (rn == 31) {
+gen_check_sp_alignment(s);
+}
+tcg_addr = read_cpu_reg_sp(s, rn, 1);
+
+if (s->pauth_active) {
+if (use_key_a) {
+gen_helper_autda(tcg_addr, cpu_env, tcg_addr, cpu_X[31]);
+} else {
+gen_helper_autdb(tcg_addr, cpu_env, tcg_addr, cpu_X[31]);
+}
+}
+
+/* Form the 10-bit signed, scaled offset.  */
+offset = (extract32(insn, 22, 1) << 9) | extract32(insn, 12, 9);
+offset = sextract32(offset << size, 10 + size, 0);
+tcg_gen_addi_i64(tcg_addr, tcg_addr, offset);
+
+tcg_rt = cpu_reg(s, rt);
+memidx = get_mem_index(s);
+do_gpr_ld_memidx(s, tcg_rt, tcg_addr, size,
+ /* is_signed */ false, /* extend */ false, memidx,
+ /* iss_valid */ true, /* iss_srt */ rt,
+ /* iss_sf */ true, /* iss_ar */ false);
+
+if (is_wback) {
+tcg_gen_mov_i64(cpu_reg_sp(s, rn), tcg_addr);
+}
+}
+
 /* Load/store register (all forms) */
 static void disas_ldst_reg(DisasContext *s, uint32_t insn)
 {
@@ -3171,6 +3230,9 @@ static void disas_ldst_reg(DisasContext *s, uint32_t insn)
 case 2:
 disas_ldst_reg_roffset(s, insn, opc, size, rt, is_vector);
 return;
+default:
+disas_ldst_pac(s, insn, size, rt, is_vector);
+return;
 }
 break;
 case 1:
-- 
2.17.2

Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting

2018-12-13 Thread Jason Wang




On 2018/12/13 下午10:56, Michael S. Tsirkin wrote:

On Thu, Dec 13, 2018 at 11:41:06AM +0800, Yongji Xie wrote:

On Thu, 13 Dec 2018 at 10:58, Jason Wang  wrote:


On 2018/12/12 下午5:18, Yongji Xie wrote:

Ok, then we can simply forbid increasing the avail_idx in this case?

Basically, it's a question of whether or not it's better to done it in
the level of virtio instead of vhost. I'm pretty sure if we expose
sufficient information, it could be done without touching vhost-user.
And we won't deal with e.g migration and other cases.


OK, I get your point. That's indeed an alternative way. But this feature seems
to be only useful to vhost-user backend.

I admit I could not think of a use case other than vhost-user.



I'm not sure whether it make sense to
touch virtio protocol for this feature.

Some possible advantages:

- Feature could be determined and noticed by user or management layer.

- There's no need to invent ring layout specific protocol to record in
flight descriptors. E.g if my understanding is correct, for this series
and for the example above, it still can not work for packed virtqueue
since descriptor id is not sufficient (descriptor could be overwritten
by used one). You probably need to have a (partial) copy of descriptor
ring for this.

- No need to deal with migration, all information was in guest memory.


Yes, we have those advantages. But seems like handle this in vhost-user
level could be easier to be maintained in production environment. We can
support old guest. And the bug fix will not depend on guest kernel updating.


Yes. But the my main concern is the layout specific data structure. If
it could be done through a generic structure (can it?), it would be
fine. Otherwise, I believe we don't want another negotiation about what
kind of layout that backend support for reconnect.


Yes, the current layout in shared memory didn't support packed virtqueue because
the information of one descriptor in descriptor ring will not be
available once device fetch it.

I also thought about a generic structure before. But I failed... So I
tried another way
to acheive that in this series. In QEMU side, we just provide a shared
memory to backend
and we didn't define anything for this memory. In backend side, they
should know how to
use those memory to record inflight I/O no matter what kind of
virtqueue they used.
Thus,  If we updates virtqueue for new virtio spec in the feature, we
don't need to touch
QEMU and guest. What do you think about it?

Thanks,
Yongji

I think that's a good direction to take, yes.
Backends need to be very careful about the layout,
with versioning etc.



I'm not sure this could be done 100% transparent to qemu. E.g you need 
to deal with reset I think and you need to carefully choose the size of 
the region. Which means you need negotiate the size, layout through 
backend. And need to deal with migration with them. This is another sin 
of splitting virtio dataplane from qemu anyway.



Thanks

[Qemu-devel] [PATCH qemu v2] spapr-iommu: Always advertise the maximum possible DMA window size

2018-12-13 Thread Alexey Kardashevskiy

When deciding about the huge DMA window, the typical Linux pseries guest
uses the maximum allowed RAM size as the upper limit. We did the same
on QEMU side to match that logic. Now we are going to support a GPU RAM
pass through which is not available at the guest boot time as it requires
the guest driver interaction. As the result, the guest requests a smaller
window than it should. Therefore the guest needs to be patched to
understand this new memory and so does QEMU.

Instead of reimplementing here whatever solution we choose for the guest,
this advertises the biggest possible window size limited by 32 bit
(as defined by LoPAPR). Since the window size has to be power-of-two
(the create rtas call receives a window shift, not a size),
this uses 0x8000. as the maximum number of TCEs possible (rather than
32bit maximum of 0x.).

This is safe as:
1. The guest visible emulated table is allocated in KVM (actual pages
are allocated in page fault handler) and QEMU (actual pages are allocated
when updated);
2. The hardware table (and corresponding userspace address table)
supports sparse allocation and also checks for locked_vm limit so
it is unable to cause the host any damage.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v2:
* replaced 0x with 0x8000 as a top limit
---
 hw/ppc/spapr_rtas_ddw.c | 19 +++
 1 file changed, 3 insertions(+), 16 deletions(-)

diff --git a/hw/ppc/spapr_rtas_ddw.c b/hw/ppc/spapr_rtas_ddw.c
index 329feb1..cb8a410 100644
--- a/hw/ppc/spapr_rtas_ddw.c
+++ b/hw/ppc/spapr_rtas_ddw.c
@@ -96,9 +96,8 @@ static void rtas_ibm_query_pe_dma_window(PowerPCCPU *cpu,
  uint32_t nret, target_ulong rets)
 {
 sPAPRPHBState *sphb;
-uint64_t buid, max_window_size;
+uint64_t buid;
 uint32_t avail, addr, pgmask = 0;
-MachineState *machine = MACHINE(spapr);
 
 if ((nargs != 3) || (nret != 5)) {
 goto param_error_exit;
@@ -114,27 +113,15 @@ static void rtas_ibm_query_pe_dma_window(PowerPCCPU *cpu,
 /* Translate page mask to LoPAPR format */
 pgmask = spapr_page_mask_to_query_mask(sphb->page_size_mask);
 
-/*
- * This is "Largest contiguous block of TCEs allocated specifically
- * for (that is, are reserved for) this PE".
- * Return the maximum number as maximum supported RAM size was in 4K pages.
- */
-if (machine->ram_size == machine->maxram_size) {
-max_window_size = machine->ram_size;
-} else {
-max_window_size = machine->device_memory->base +
-  memory_region_size(>device_memory->mr);
-}
-
 avail = SPAPR_PCI_DMA_MAX_WINDOWS - spapr_phb_get_active_win_num(sphb);
 
 rtas_st(rets, 0, RTAS_OUT_SUCCESS);
 rtas_st(rets, 1, avail);
-rtas_st(rets, 2, max_window_size >> SPAPR_TCE_PAGE_SHIFT);
+rtas_st(rets, 2, 0x8000); /* The largest window we can possibly have */
 rtas_st(rets, 3, pgmask);
 rtas_st(rets, 4, 0); /* DMA migration mask, not supported */
 
-trace_spapr_iommu_ddw_query(buid, addr, avail, max_window_size, pgmask);
+trace_spapr_iommu_ddw_query(buid, addr, avail, 0x8000, pgmask);
 return;
 
 param_error_exit:
-- 
2.17.1

Re: [Qemu-devel] [PATCH 1/3] memory_ldst: Add atomic ops for PTE updates

2018-12-13 Thread Benjamin Herrenschmidt

On Thu, 2018-12-13 at 21:01 -0600, Richard Henderson wrote:
> On 12/13/18 5:58 PM, Benjamin Herrenschmidt wrote:
> > +#ifdef CONFIG_ATOMIC64
> > +/* This is meant to be used for atomic PTE updates under MT-TCG */
> > +uint32_t glue(address_space_cmpxchgq_notdirty, SUFFIX)(ARG1_DECL,
> > +hwaddr addr, uint64_t old, uint64_t new, MemTxAttrs attrs, MemTxResult 
> > *result)
> > +{
> > +uint8_t *ptr;
> > +MemoryRegion *mr;
> > +hwaddr l = 8;
> > +hwaddr addr1;
> > +MemTxResult r;
> > +uint8_t dirty_log_mask;
> > +
> > +/* Must test result */
> > +assert(result);
> > +
> > +RCU_READ_LOCK();
> > +mr = TRANSLATE(addr, , , true, attrs);
> > +if (l < 8 || !memory_access_is_direct(mr, true)) {
> > +r = MEMTX_ERROR;
> > +} else {
> > +uint32_t orig = old;
> > +
> > +ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
> > +old = atomic_cmpxchg(ptr, orig, new);
> > +
> 
> I think you need atomic_cmpxchg__nocheck here.
> 
> Failure would be with a 32-bit host that supports ATOMIC64.
> E.g. i686.

I'm confused by this and the comments around the definition of
ATOMIC_REG_SIZE :)

So would we have CONFIG_ATOMIC64 in that case and if yes why if all the
atomic_* end up barfing ?

Or rather, why set CONFIG_ATOMIC64 if we ought not to use 64-bit
atomics ?

Also we should probably define ATOMIC_REG_SIZE to 8 for ppc64...

Cheers
Ben.
> 
> r~

Re: [Qemu-devel] [PATCH] fixup! target/arm: Move id_aa64mmfr* to ARMISARegisters

2018-12-13 Thread Richard Henderson

On 12/13/18 9:18 PM, Richard Henderson wrote:
> I didn't get this fix pushed back into the patch set that I actually
> sent last week.  The patch is in target-arm.next, and I'm sure you
> would have eventually seen the error in testing.
> 
> 
> r~
> ---
>  target/arm/kvm64.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Oops, didn't clean out the directory before generating the pull.
Obviously this isn't in the branch.


r~

Re: [Qemu-devel] [PATCH qemu RFC 3/7] pci: Move NVIDIA vendor id to the rest of ids

2018-12-13 Thread Alexey Kardashevskiy




On 21/11/2018 05:27, Alistair Francis wrote:
> On Tue, Nov 13, 2018 at 12:42 AM Alexey Kardashevskiy  wrote:
>>
>> sPAPR code will use it too so move it from VFIO to the common code.
>>
>> Signed-off-by: Alexey Kardashevskiy 
> 
> Reviewed-by: Alistair Francis 



Aand who is taking this? I am going to repost the patchset, posting
this one over and over again seems redundant. Thanks,



> 
> Alistair
> 
>> ---
>>  include/hw/pci/pci_ids.h | 2 ++
>>  hw/vfio/pci-quirks.c | 2 --
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/hw/pci/pci_ids.h b/include/hw/pci/pci_ids.h
>> index 63acc72..3ed7d10 100644
>> --- a/include/hw/pci/pci_ids.h
>> +++ b/include/hw/pci/pci_ids.h
>> @@ -271,4 +271,6 @@
>>
>>  #define PCI_VENDOR_ID_SYNOPSYS   0x16C3
>>
>> +#define PCI_VENDOR_ID_NVIDIA 0x10de
>> +
>>  #endif
>> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c
>> index eae31c7..40a1200 100644
>> --- a/hw/vfio/pci-quirks.c
>> +++ b/hw/vfio/pci-quirks.c
>> @@ -526,8 +526,6 @@ static void vfio_probe_ati_bar2_quirk(VFIOPCIDevice 
>> *vdev, int nr)
>>   * note it for future reference.
>>   */
>>
>> -#define PCI_VENDOR_ID_NVIDIA0x10de
>> -
>>  /*
>>   * Nvidia has several different methods to get to config space, the
>>   * nouveu project has several of these documented here:
>> --
>> 2.17.1
>>
>>

-- 
Alexey

[Qemu-devel] [PULL 27/32] tcg/mips: Improve the add2/sub2 command to use TCG_TARGET_REG_BITS

2018-12-13 Thread Richard Henderson

From: Alistair Francis 

Instead of hard coding 31 for the shift right use TCG_TARGET_REG_BITS - 1.

Signed-off-by: Alistair Francis 
Message-Id: 
<7dfbddf7014a595150aa79011ddb342c3cc17ec3.1544648105.git.alistair.fran...@wdc.com>
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.inc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index a06ff257fa..be0bc92e8e 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -792,7 +792,7 @@ static void tcg_out_addsub2(TCGContext *s, TCGReg rl, 
TCGReg rh, TCGReg al,
 tcg_out_opc_imm(s, OPC_ADDIU, rl, al, bl);
 tcg_out_opc_imm(s, OPC_SLTIU, TCG_TMP0, rl, bl);
 } else if (rl == al && rl == bl) {
-tcg_out_opc_sa(s, OPC_SRL, TCG_TMP0, al, 31);
+tcg_out_opc_sa(s, OPC_SRL, TCG_TMP0, al, TCG_TARGET_REG_BITS - 1);
 tcg_out_opc_reg(s, OPC_ADDU, rl, al, bl);
 } else {
 tcg_out_opc_reg(s, OPC_ADDU, rl, al, bl);
-- 
2.17.2

[Qemu-devel] [PULL 32/32] xxhash: match output against the original xxhash32

2018-12-13 Thread Richard Henderson

From: "Emilio G. Cota" 

Change the order in which we extract a/b and c/d to
match the output of the upstream xxhash32.

Tested with:
  https://github.com/cota/xxhash/tree/qemu

Reviewed-by: Alex Bennée 
Tested-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
Signed-off-by: Richard Henderson 
---
 include/qemu/xxhash.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/qemu/xxhash.h b/include/qemu/xxhash.h
index fe35dde328..076f1f6054 100644
--- a/include/qemu/xxhash.h
+++ b/include/qemu/xxhash.h
@@ -55,10 +55,10 @@ qemu_xxhash7(uint64_t ab, uint64_t cd, uint32_t e, uint32_t 
f, uint32_t g)
 uint32_t v2 = QEMU_XXHASH_SEED + PRIME32_2;
 uint32_t v3 = QEMU_XXHASH_SEED + 0;
 uint32_t v4 = QEMU_XXHASH_SEED - PRIME32_1;
-uint32_t a = ab >> 32;
-uint32_t b = ab;
-uint32_t c = cd >> 32;
-uint32_t d = cd;
+uint32_t a = ab;
+uint32_t b = ab >> 32;
+uint32_t c = cd;
+uint32_t d = cd >> 32;
 uint32_t h32;
 
 v1 += a * PRIME32_2;
-- 
2.17.2

[Qemu-devel] [PULL 25/32] tcg/optimize: Optimize bswap

2018-12-13 Thread Richard Henderson

Somehow we forgot these operations, once upon a time.
This will allow immediate stores to have their bswap
optimized away.

Signed-off-by: Richard Henderson 
---
 tcg/optimize.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 5dbe11c3c8..6b98ec13e6 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -353,6 +353,15 @@ static TCGArg do_constant_folding_2(TCGOpcode op, TCGArg 
x, TCGArg y)
 CASE_OP_32_64(ext16u):
 return (uint16_t)x;
 
+CASE_OP_32_64(bswap16):
+return bswap16(x);
+
+CASE_OP_32_64(bswap32):
+return bswap32(x);
+
+case INDEX_op_bswap64_i64:
+return bswap64(x);
+
 case INDEX_op_ext_i32_i64:
 case INDEX_op_ext32s_i64:
 return (int32_t)x;
@@ -1105,6 +1114,9 @@ void tcg_optimize(TCGContext *s)
 CASE_OP_32_64(ext16s):
 CASE_OP_32_64(ext16u):
 CASE_OP_32_64(ctpop):
+CASE_OP_32_64(bswap16):
+CASE_OP_32_64(bswap32):
+case INDEX_op_bswap64_i64:
 case INDEX_op_ext32s_i64:
 case INDEX_op_ext32u_i64:
 case INDEX_op_ext_i32_i64:
-- 
2.17.2

[Qemu-devel] [PULL 26/32] tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP

2018-12-13 Thread Richard Henderson

For now, defined universally as true, since we previously required
backends to implement swapped memory operations.  Future patches
may now remove that support where it is onerous.

Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.h |   1 +
 tcg/arm/tcg-target.h |   1 +
 tcg/i386/tcg-target.h|   2 +
 tcg/mips/tcg-target.h|   1 +
 tcg/ppc/tcg-target.h |   1 +
 tcg/s390/tcg-target.h|   1 +
 tcg/sparc/tcg-target.h   |   1 +
 tcg/tci/tcg-target.h |   2 +
 tcg/tcg-op.c | 118 ++-
 9 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/tcg/aarch64/tcg-target.h b/tcg/aarch64/tcg-target.h
index 9aea1d1771..f966a4fcb3 100644
--- a/tcg/aarch64/tcg-target.h
+++ b/tcg/aarch64/tcg-target.h
@@ -137,6 +137,7 @@ typedef enum {
 #define TCG_TARGET_HAS_mul_vec  1
 
 #define TCG_TARGET_DEFAULT_MO (0)
+#define TCG_TARGET_HAS_MEMORY_BSWAP 1
 
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
diff --git a/tcg/arm/tcg-target.h b/tcg/arm/tcg-target.h
index 94b3578c55..16172f73a3 100644
--- a/tcg/arm/tcg-target.h
+++ b/tcg/arm/tcg-target.h
@@ -131,6 +131,7 @@ enum {
 };
 
 #define TCG_TARGET_DEFAULT_MO (0)
+#define TCG_TARGET_HAS_MEMORY_BSWAP 1
 
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index c523d5f5e1..f378d29568 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -220,6 +220,8 @@ static inline void tb_target_set_jmp_target(uintptr_t 
tc_ptr,
 
 #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
 
+#define TCG_TARGET_HAS_MEMORY_BSWAP  1
+
 #ifdef CONFIG_SOFTMMU
 #define TCG_TARGET_NEED_LDST_LABELS
 #endif
diff --git a/tcg/mips/tcg-target.h b/tcg/mips/tcg-target.h
index a8222476f0..5cb8672470 100644
--- a/tcg/mips/tcg-target.h
+++ b/tcg/mips/tcg-target.h
@@ -203,6 +203,7 @@ extern bool use_mips32r2_instructions;
 #endif
 
 #define TCG_TARGET_DEFAULT_MO (0)
+#define TCG_TARGET_HAS_MEMORY_BSWAP 1
 
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
diff --git a/tcg/ppc/tcg-target.h b/tcg/ppc/tcg-target.h
index be52ad1d2e..52c1bb04b1 100644
--- a/tcg/ppc/tcg-target.h
+++ b/tcg/ppc/tcg-target.h
@@ -128,6 +128,7 @@ void flush_icache_range(uintptr_t start, uintptr_t stop);
 void tb_target_set_jmp_target(uintptr_t, uintptr_t, uintptr_t);
 
 #define TCG_TARGET_DEFAULT_MO (0)
+#define TCG_TARGET_HAS_MEMORY_BSWAP 1
 
 #ifdef CONFIG_SOFTMMU
 #define TCG_TARGET_NEED_LDST_LABELS
diff --git a/tcg/s390/tcg-target.h b/tcg/s390/tcg-target.h
index 6f2b06a7d1..853ed6e7aa 100644
--- a/tcg/s390/tcg-target.h
+++ b/tcg/s390/tcg-target.h
@@ -135,6 +135,7 @@ extern uint64_t s390_facilities;
 #define TCG_TARGET_CALL_STACK_OFFSET   160
 
 #define TCG_TARGET_EXTEND_ARGS 1
+#define TCG_TARGET_HAS_MEMORY_BSWAP   1
 
 #define TCG_TARGET_DEFAULT_MO (TCG_MO_ALL & ~TCG_MO_ST_LD)
 
diff --git a/tcg/sparc/tcg-target.h b/tcg/sparc/tcg-target.h
index d8339bf010..a0ed2a3342 100644
--- a/tcg/sparc/tcg-target.h
+++ b/tcg/sparc/tcg-target.h
@@ -164,6 +164,7 @@ extern bool use_vis3_instructions;
 #define TCG_AREG0 TCG_REG_I0
 
 #define TCG_TARGET_DEFAULT_MO (0)
+#define TCG_TARGET_HAS_MEMORY_BSWAP 1
 
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
diff --git a/tcg/tci/tcg-target.h b/tcg/tci/tcg-target.h
index 26140d78cb..086f34e69a 100644
--- a/tcg/tci/tcg-target.h
+++ b/tcg/tci/tcg-target.h
@@ -198,6 +198,8 @@ static inline void flush_icache_range(uintptr_t start, 
uintptr_t stop)
We prefer consistency across hosts on this.  */
 #define TCG_TARGET_DEFAULT_MO  (0)
 
+#define TCG_TARGET_HAS_MEMORY_BSWAP 1
+
 static inline void tb_target_set_jmp_target(uintptr_t tc_ptr,
 uintptr_t jmp_addr, uintptr_t addr)
 {
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 887b371a81..1ad095cc35 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -2694,25 +2694,78 @@ static void tcg_gen_req_mo(TCGBar type)
 
 void tcg_gen_qemu_ld_i32(TCGv_i32 val, TCGv addr, TCGArg idx, TCGMemOp memop)
 {
+TCGMemOp orig_memop;
+
 tcg_gen_req_mo(TCG_MO_LD_LD | TCG_MO_ST_LD);
 memop = tcg_canonicalize_memop(memop, 0, 0);
 trace_guest_mem_before_tcg(tcg_ctx->cpu, cpu_env,
addr, trace_mem_get_info(memop, 0));
+
+orig_memop = memop;
+if (!TCG_TARGET_HAS_MEMORY_BSWAP && (memop & MO_BSWAP)) {
+memop &= ~MO_BSWAP;
+/* The bswap primitive requires zero-extended input.  */
+if ((memop & MO_SSIZE) == MO_SW) {
+memop &= ~MO_SIGN;
+}
+}
+
 gen_ldst_i32(INDEX_op_qemu_ld_i32, val, addr, memop, idx);
+
+if ((orig_memop ^ memop) & MO_BSWAP) {
+switch (orig_memop & MO_SIZE) {
+case MO_16:
+tcg_gen_bswap16_i32(val, val);
+if (orig_memop & MO_SIGN) {
+tcg_gen_ext16s_i32(val, val);
+}
+

[Qemu-devel] [PULL 28/32] tcg: Drop nargs from tcg_op_insert_{before, after}

2018-12-13 Thread Richard Henderson

From: "Emilio G. Cota" 

It's unused since 75e8b9b7aa0b95a761b9add7e2f09248b101a392.

Signed-off-by: Emilio G. Cota 
Message-Id: <20181209193749.12277-9-c...@braap.org>
Reviewed-by: Richard Henderson 
Signed-off-by: Richard Henderson 
---
 tcg/tcg.h  |  4 ++--
 tcg/optimize.c |  4 ++--
 tcg/tcg.c  | 10 --
 3 files changed, 8 insertions(+), 10 deletions(-)

diff --git a/tcg/tcg.h b/tcg/tcg.h
index f4efbaa680..a745e926bb 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -1073,8 +1073,8 @@ void tcg_gen_callN(void *func, TCGTemp *ret, int nargs, 
TCGTemp **args);
 
 TCGOp *tcg_emit_op(TCGOpcode opc);
 void tcg_op_remove(TCGContext *s, TCGOp *op);
-TCGOp *tcg_op_insert_before(TCGContext *s, TCGOp *op, TCGOpcode opc, int narg);
-TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *op, TCGOpcode opc, int narg);
+TCGOp *tcg_op_insert_before(TCGContext *s, TCGOp *op, TCGOpcode opc);
+TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *op, TCGOpcode opc);
 
 void tcg_optimize(TCGContext *s);
 
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 6b98ec13e6..01e80c3e46 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1261,7 +1261,7 @@ void tcg_optimize(TCGContext *s)
 uint64_t a = ((uint64_t)ah << 32) | al;
 uint64_t b = ((uint64_t)bh << 32) | bl;
 TCGArg rl, rh;
-TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32, 2);
+TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32);
 
 if (opc == INDEX_op_add2_i32) {
 a += b;
@@ -1283,7 +1283,7 @@ void tcg_optimize(TCGContext *s)
 uint32_t b = arg_info(op->args[3])->val;
 uint64_t r = (uint64_t)a * b;
 TCGArg rl, rh;
-TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32, 2);
+TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32);
 
 rl = op->args[0];
 rh = op->args[1];
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 54f1272187..963cb37892 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -2205,16 +2205,14 @@ TCGOp *tcg_emit_op(TCGOpcode opc)
 return op;
 }
 
-TCGOp *tcg_op_insert_before(TCGContext *s, TCGOp *old_op,
-TCGOpcode opc, int nargs)
+TCGOp *tcg_op_insert_before(TCGContext *s, TCGOp *old_op, TCGOpcode opc)
 {
 TCGOp *new_op = tcg_op_alloc(opc);
 QTAILQ_INSERT_BEFORE(old_op, new_op, link);
 return new_op;
 }
 
-TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *old_op,
-   TCGOpcode opc, int nargs)
+TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *old_op, TCGOpcode opc)
 {
 TCGOp *new_op = tcg_op_alloc(opc);
 QTAILQ_INSERT_AFTER(>ops, old_op, new_op, link);
@@ -2552,7 +2550,7 @@ static bool liveness_pass_2(TCGContext *s)
 TCGOpcode lopc = (arg_ts->type == TCG_TYPE_I32
   ? INDEX_op_ld_i32
   : INDEX_op_ld_i64);
-TCGOp *lop = tcg_op_insert_before(s, op, lopc, 3);
+TCGOp *lop = tcg_op_insert_before(s, op, lopc);
 
 lop->args[0] = temp_arg(dir_ts);
 lop->args[1] = temp_arg(arg_ts->mem_base);
@@ -2621,7 +2619,7 @@ static bool liveness_pass_2(TCGContext *s)
 TCGOpcode sopc = (arg_ts->type == TCG_TYPE_I32
   ? INDEX_op_st_i32
   : INDEX_op_st_i64);
-TCGOp *sop = tcg_op_insert_after(s, op, sopc, 3);
+TCGOp *sop = tcg_op_insert_after(s, op, sopc);
 
 sop->args[0] = temp_arg(dir_ts);
 sop->args[1] = temp_arg(arg_ts->mem_base);
-- 
2.17.2

[Qemu-devel] [PULL 30/32] exec: introduce qemu_xxhash{2,4,5,6,7}

2018-12-13 Thread Richard Henderson

From: "Emilio G. Cota" 

Before moving them all to include/qemu/xxhash.h.

Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
Signed-off-by: Richard Henderson 
---
 include/exec/tb-hash-xx.h | 41 +--
 include/exec/tb-hash.h|  2 +-
 tests/qht-bench.c |  2 +-
 util/qsp.c| 12 ++--
 4 files changed, 39 insertions(+), 18 deletions(-)

diff --git a/include/exec/tb-hash-xx.h b/include/exec/tb-hash-xx.h
index 747a9a612c..98ce4b628a 100644
--- a/include/exec/tb-hash-xx.h
+++ b/include/exec/tb-hash-xx.h
@@ -42,23 +42,23 @@
 #define PRIME32_4668265263U
 #define PRIME32_5374761393U
 
-#define TB_HASH_XX_SEED 1
+#define QEMU_XXHASH_SEED 1
 
 /*
  * xxhash32, customized for input variables that are not guaranteed to be
  * contiguous in memory.
  */
 static inline uint32_t
-tb_hash_func7(uint64_t a0, uint64_t b0, uint32_t e, uint32_t f, uint32_t g)
+qemu_xxhash7(uint64_t ab, uint64_t cd, uint32_t e, uint32_t f, uint32_t g)
 {
-uint32_t v1 = TB_HASH_XX_SEED + PRIME32_1 + PRIME32_2;
-uint32_t v2 = TB_HASH_XX_SEED + PRIME32_2;
-uint32_t v3 = TB_HASH_XX_SEED + 0;
-uint32_t v4 = TB_HASH_XX_SEED - PRIME32_1;
-uint32_t a = a0 >> 32;
-uint32_t b = a0;
-uint32_t c = b0 >> 32;
-uint32_t d = b0;
+uint32_t v1 = QEMU_XXHASH_SEED + PRIME32_1 + PRIME32_2;
+uint32_t v2 = QEMU_XXHASH_SEED + PRIME32_2;
+uint32_t v3 = QEMU_XXHASH_SEED + 0;
+uint32_t v4 = QEMU_XXHASH_SEED - PRIME32_1;
+uint32_t a = ab >> 32;
+uint32_t b = ab;
+uint32_t c = cd >> 32;
+uint32_t d = cd;
 uint32_t h32;
 
 v1 += a * PRIME32_2;
@@ -98,4 +98,25 @@ tb_hash_func7(uint64_t a0, uint64_t b0, uint32_t e, uint32_t 
f, uint32_t g)
 return h32;
 }
 
+static inline uint32_t qemu_xxhash2(uint64_t ab)
+{
+return qemu_xxhash7(ab, 0, 0, 0, 0);
+}
+
+static inline uint32_t qemu_xxhash4(uint64_t ab, uint64_t cd)
+{
+return qemu_xxhash7(ab, cd, 0, 0, 0);
+}
+
+static inline uint32_t qemu_xxhash5(uint64_t ab, uint64_t cd, uint32_t e)
+{
+return qemu_xxhash7(ab, cd, e, 0, 0);
+}
+
+static inline uint32_t qemu_xxhash6(uint64_t ab, uint64_t cd, uint32_t e,
+uint32_t f)
+{
+return qemu_xxhash7(ab, cd, e, f, 0);
+}
+
 #endif /* EXEC_TB_HASH_XX_H */
diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
index 0526c4f678..731ba4c272 100644
--- a/include/exec/tb-hash.h
+++ b/include/exec/tb-hash.h
@@ -61,7 +61,7 @@ static inline
 uint32_t tb_hash_func(tb_page_addr_t phys_pc, target_ulong pc, uint32_t flags,
   uint32_t cf_mask, uint32_t trace_vcpu_dstate)
 {
-return tb_hash_func7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate);
+return qemu_xxhash7(phys_pc, pc, flags, cf_mask, trace_vcpu_dstate);
 }
 
 #endif
diff --git a/tests/qht-bench.c b/tests/qht-bench.c
index 636750d39f..0278f4da04 100644
--- a/tests/qht-bench.c
+++ b/tests/qht-bench.c
@@ -105,7 +105,7 @@ static bool is_equal(const void *ap, const void *bp)
 
 static uint32_t h(unsigned long v)
 {
-return tb_hash_func7(v, 0, 0, 0, 0);
+return qemu_xxhash2(v);
 }
 
 static uint32_t hval(unsigned long v)
diff --git a/util/qsp.c b/util/qsp.c
index a848b09c6d..dc29c41fde 100644
--- a/util/qsp.c
+++ b/util/qsp.c
@@ -135,13 +135,13 @@ QemuCondWaitFunc qemu_cond_wait_func = 
qemu_cond_wait_impl;
  * without it we still get a pretty unique hash.
  */
 static inline
-uint32_t do_qsp_callsite_hash(const QSPCallSite *callsite, uint64_t a)
+uint32_t do_qsp_callsite_hash(const QSPCallSite *callsite, uint64_t ab)
 {
-uint64_t b = (uint64_t)(uintptr_t)callsite->obj;
+uint64_t cd = (uint64_t)(uintptr_t)callsite->obj;
 uint32_t e = callsite->line;
 uint32_t f = callsite->type;
 
-return tb_hash_func7(a, b, e, f, 0);
+return qemu_xxhash6(ab, cd, e, f);
 }
 
 static inline
@@ -169,11 +169,11 @@ static uint32_t qsp_entry_no_thread_hash(const QSPEntry 
*entry)
 static uint32_t qsp_entry_no_thread_obj_hash(const QSPEntry *entry)
 {
 const QSPCallSite *callsite = entry->callsite;
-uint64_t a = g_str_hash(callsite->file);
-uint64_t b = callsite->line;
+uint64_t ab = g_str_hash(callsite->file);
+uint64_t cd = callsite->line;
 uint32_t e = callsite->type;
 
-return tb_hash_func7(a, b, e, 0, 0);
+return qemu_xxhash5(ab, cd, e);
 }
 
 static bool qsp_callsite_cmp(const void *ap, const void *bp)
-- 
2.17.2

[Qemu-devel] [PULL 14/32] tcg/arm: Return false on failure from patch_reloc

2018-12-13 Thread Richard Henderson

This does require an extra two checks within the slow paths
to replace the assert that we're moving.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/arm/tcg-target.inc.c | 22 --
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index deefa20fbf..49f57d655e 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -187,10 +187,14 @@ static const uint8_t tcg_cond_to_arm_cond[] = {
 [TCG_COND_GTU] = COND_HI,
 };
 
-static inline void reloc_pc24(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
+static inline bool reloc_pc24(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
 {
 ptrdiff_t offset = (tcg_ptr_byte_diff(target, code_ptr) - 8) >> 2;
-*code_ptr = (*code_ptr & ~0xff) | (offset & 0xff);
+if (offset == sextract32(offset, 0, 24)) {
+*code_ptr = (*code_ptr & ~0xff) | (offset & 0xff);
+return true;
+}
+return false;
 }
 
 static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
@@ -199,7 +203,7 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 tcg_debug_assert(addend == 0);
 
 if (type == R_ARM_PC24) {
-reloc_pc24(code_ptr, (tcg_insn_unit *)value);
+return reloc_pc24(code_ptr, (tcg_insn_unit *)value);
 } else if (type == R_ARM_PC13) {
 intptr_t diff = value - (uintptr_t)(code_ptr + 2);
 tcg_insn_unit insn = *code_ptr;
@@ -213,7 +217,11 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 } else {
 int rd = extract32(insn, 12, 4);
 int rt = rd == TCG_REG_PC ? TCG_REG_TMP : rd;
-assert(diff >= 0x1000 && diff < 0x10);
+
+if (diff < 0x1000 || diff >= 0x10) {
+return false;
+}
+
 /* add rt, pc, #high */
 *code_ptr++ = ((insn & 0xf000) | (1 << 25) | ARITH_ADD
| (TCG_REG_PC << 16) | (rt << 12)
@@ -1372,7 +1380,8 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 TCGMemOp opc = get_memop(oi);
 void *func;
 
-reloc_pc24(lb->label_ptr[0], s->code_ptr);
+bool ok = reloc_pc24(lb->label_ptr[0], s->code_ptr);
+tcg_debug_assert(ok);
 
 argreg = tcg_out_arg_reg32(s, TCG_REG_R0, TCG_AREG0);
 if (TARGET_LONG_BITS == 64) {
@@ -1432,7 +1441,8 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 TCGMemOpIdx oi = lb->oi;
 TCGMemOp opc = get_memop(oi);
 
-reloc_pc24(lb->label_ptr[0], s->code_ptr);
+bool ok = reloc_pc24(lb->label_ptr[0], s->code_ptr);
+tcg_debug_assert(ok);
 
 argreg = TCG_REG_R0;
 argreg = tcg_out_arg_reg32(s, argreg, TCG_AREG0);
-- 
2.17.2

[Qemu-devel] [PULL 12/32] tcg/i386: Return false on failure from patch_reloc

2018-12-13 Thread Richard Henderson

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 5c88f1f36b..28192f4608 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -175,7 +175,7 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 case R_386_PC32:
 value -= (uintptr_t)code_ptr;
 if (value != (int32_t)value) {
-tcg_abort();
+return false;
 }
 /* FALLTHRU */
 case R_386_32:
@@ -184,7 +184,7 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 case R_386_PC8:
 value -= (uintptr_t)code_ptr;
 if (value != (int8_t)value) {
-tcg_abort();
+return false;
 }
 tcg_patch8(code_ptr, value);
 break;
-- 
2.17.2

[Qemu-devel] [PULL 21/32] tcg/i386: Precompute all guest_base parameters

2018-12-13 Thread Richard Henderson

These values are constant between all qemu_ld/st invocations;
there is no need to figure this out each time.  If we cannot
use a segment or an offset directly for guest_base, load the
value into a register in the prologue.

Reviewed-by: Emilio G. Cota 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 101 +++---
 1 file changed, 40 insertions(+), 61 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index f7b548545a..3fb2f4b971 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1857,22 +1857,31 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 tcg_out_push(s, retaddr);
 tcg_out_jmp(s, qemu_st_helpers[opc & (MO_BSWAP | MO_SIZE)]);
 }
-#elif defined(__x86_64__) && defined(__linux__)
-# include 
-# include 
-
+#elif TCG_TARGET_REG_BITS == 32
+# define x86_guest_base_seg 0
+# define x86_guest_base_index   -1
+# define x86_guest_base_offset  guest_base
+#else
+static int x86_guest_base_seg;
+static int x86_guest_base_index = -1;
+static int32_t x86_guest_base_offset;
+# if defined(__x86_64__) && defined(__linux__)
+#  include 
+#  include 
 int arch_prctl(int code, unsigned long addr);
-
-static int guest_base_flags;
-static inline void setup_guest_base_seg(void)
+static inline int setup_guest_base_seg(void)
 {
 if (arch_prctl(ARCH_SET_GS, guest_base) == 0) {
-guest_base_flags = P_GS;
+return P_GS;
 }
+return 0;
 }
-#else
-# define guest_base_flags 0
-static inline void setup_guest_base_seg(void) { }
+# else
+static inline int setup_guest_base_seg(void)
+{
+return 0;
+}
+# endif
 #endif /* SOFTMMU */
 
 static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
@@ -2011,27 +2020,9 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is64)
 add_qemu_ldst_label(s, true, is64, oi, datalo, datahi, addrlo, addrhi,
 s->code_ptr, label_ptr);
 #else
-{
-int32_t offset = guest_base;
-int index = -1;
-int seg = 0;
-
-/*
- * Recall we store 32-bit values zero-extended.  No need for
- * further manual extension or an addr32 (0x67) prefix.
- */
-if (guest_base == 0 || guest_base_flags) {
-seg = guest_base_flags;
-offset = 0;
-} else if (TCG_TARGET_REG_BITS == 64 && offset != guest_base) {
-tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_L1, guest_base);
-index = TCG_REG_L1;
-offset = 0;
-}
-
-tcg_out_qemu_ld_direct(s, datalo, datahi,
-   addrlo, index, offset, seg, is64, opc);
-}
+tcg_out_qemu_ld_direct(s, datalo, datahi, addrlo, x86_guest_base_index,
+   x86_guest_base_offset, x86_guest_base_seg,
+   is64, opc);
 #endif
 }
 
@@ -2147,28 +2138,8 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg 
*args, bool is64)
 add_qemu_ldst_label(s, false, is64, oi, datalo, datahi, addrlo, addrhi,
 s->code_ptr, label_ptr);
 #else
-{
-int32_t offset = guest_base;
-int index = -1;
-int seg = 0;
-
-/*
- * Recall we store 32-bit values zero-extended.  No need for
- * further manual extension or an addr32 (0x67) prefix.
- */
-if (guest_base == 0 || guest_base_flags) {
-seg = guest_base_flags;
-offset = 0;
-} else if (TCG_TARGET_REG_BITS == 64 && offset != guest_base) {
-/* ??? Note that we require L0 free for bswap.  */
-tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_L1, guest_base);
-index = TCG_REG_L1;
-offset = 0;
-}
-
-tcg_out_qemu_st_direct(s, datalo, datahi,
-   addrlo, index, offset, seg, opc);
-}
+tcg_out_qemu_st_direct(s, datalo, datahi, addrlo, x86_guest_base_index,
+   x86_guest_base_offset, x86_guest_base_seg, opc);
 #endif
 }
 
@@ -3415,6 +3386,21 @@ static void tcg_target_qemu_prologue(TCGContext *s)
 (ARRAY_SIZE(tcg_target_callee_save_regs) + 2) * 4
 + stack_addend);
 #else
+# if !defined(CONFIG_SOFTMMU) && TCG_TARGET_REG_BITS == 64
+if (guest_base) {
+int seg = setup_guest_base_seg();
+if (seg != 0) {
+x86_guest_base_seg = seg;
+} else if (guest_base == (int32_t)guest_base) {
+x86_guest_base_offset = guest_base;
+} else {
+/* Choose R12 because, as a base, it requires a SIB byte. */
+x86_guest_base_index = TCG_REG_R12;
+tcg_out_mov(s, TCG_TYPE_PTR, x86_guest_base_index, guest_base);
+tcg_regset_set_reg(s->reserved_regs, x86_guest_base_index);
+}
+}
+# endif
 tcg_out_mov(s, TCG_TYPE_PTR, TCG_AREG0, tcg_target_call_iarg_regs[0]);
 tcg_out_addi(s,

[Qemu-devel] [PULL 23/32] tcg: Clean up generic bswap32

2018-12-13 Thread Richard Henderson

Based on the only current user, Sparc:

New code uses 1 constant that takes 2 insns to create, plus 8.
Old code used 2 constants that took 2 insns to create, plus 9.
The result is a new total of 10 vs an old total of 13.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-op.c | 54 ++--
 1 file changed, 27 insertions(+), 27 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 7a8015c5a9..a956499e46 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1012,22 +1012,22 @@ void tcg_gen_bswap32_i32(TCGv_i32 ret, TCGv_i32 arg)
 if (TCG_TARGET_HAS_bswap32_i32) {
 tcg_gen_op2_i32(INDEX_op_bswap32_i32, ret, arg);
 } else {
-TCGv_i32 t0, t1;
-t0 = tcg_temp_new_i32();
-t1 = tcg_temp_new_i32();
+TCGv_i32 t0 = tcg_temp_new_i32();
+TCGv_i32 t1 = tcg_temp_new_i32();
+TCGv_i32 t2 = tcg_const_i32(0x00ff00ff);
 
-tcg_gen_shli_i32(t0, arg, 24);
+/* arg = abcd */
+tcg_gen_shri_i32(t0, arg, 8);   /*  t0 = .abc */
+tcg_gen_and_i32(t1, arg, t2);   /*  t1 = .b.d */
+tcg_gen_and_i32(t0, t0, t2);/*  t0 = .a.c */
+tcg_temp_free_i32(t2);
+tcg_gen_shli_i32(t1, t1, 8);/*  t1 = b.d. */
+tcg_gen_or_i32(ret, t0, t1);/* ret = badc */
 
-tcg_gen_andi_i32(t1, arg, 0xff00);
-tcg_gen_shli_i32(t1, t1, 8);
-tcg_gen_or_i32(t0, t0, t1);
+tcg_gen_shri_i32(t0, ret, 16);  /*  t0 = ..ba */
+tcg_gen_shli_i32(t1, ret, 16);  /*  t1 = dc.. */
+tcg_gen_or_i32(ret, t0, t1);/* ret = dcba */
 
-tcg_gen_shri_i32(t1, arg, 8);
-tcg_gen_andi_i32(t1, t1, 0xff00);
-tcg_gen_or_i32(t0, t0, t1);
-
-tcg_gen_shri_i32(t1, arg, 24);
-tcg_gen_or_i32(ret, t0, t1);
 tcg_temp_free_i32(t0);
 tcg_temp_free_i32(t1);
 }
@@ -1638,23 +1638,23 @@ void tcg_gen_bswap32_i64(TCGv_i64 ret, TCGv_i64 arg)
 } else if (TCG_TARGET_HAS_bswap32_i64) {
 tcg_gen_op2_i64(INDEX_op_bswap32_i64, ret, arg);
 } else {
-TCGv_i64 t0, t1;
-t0 = tcg_temp_new_i64();
-t1 = tcg_temp_new_i64();
+TCGv_i64 t0 = tcg_temp_new_i64();
+TCGv_i64 t1 = tcg_temp_new_i64();
+TCGv_i64 t2 = tcg_const_i64(0x00ff00ff);
 
-tcg_gen_shli_i64(t0, arg, 24);
-tcg_gen_ext32u_i64(t0, t0);
+/* arg = abcd */
+tcg_gen_shri_i64(t0, arg, 8);   /*  t0 = .abc */
+tcg_gen_and_i64(t1, arg, t2);   /*  t1 = .b.d */
+tcg_gen_and_i64(t0, t0, t2);/*  t0 = .a.c */
+tcg_temp_free_i64(t2);
+tcg_gen_shli_i64(t1, t1, 8);/*  t1 = b.d. */
+tcg_gen_or_i64(ret, t0, t1);/* ret = badc */
 
-tcg_gen_andi_i64(t1, arg, 0xff00);
-tcg_gen_shli_i64(t1, t1, 8);
-tcg_gen_or_i64(t0, t0, t1);
+tcg_gen_shli_i64(t1, ret, 48);  /*  t1 = dc.. */
+tcg_gen_shri_i64(t0, ret, 16);  /*  t0 = ..ba */
+tcg_gen_shri_i64(t1, ret, 32);  /*  t1 = dc.. */
+tcg_gen_or_i64(ret, t0, t1);/* ret = dcba */
 
-tcg_gen_shri_i64(t1, arg, 8);
-tcg_gen_andi_i64(t1, t1, 0xff00);
-tcg_gen_or_i64(t0, t0, t1);
-
-tcg_gen_shri_i64(t1, arg, 24);
-tcg_gen_or_i64(ret, t0, t1);
 tcg_temp_free_i64(t0);
 tcg_temp_free_i64(t1);
 }
-- 
2.17.2

[Qemu-devel] [PULL 08/32] tcg/s390: Remove retranslation code

2018-12-13 Thread Richard Henderson

There is no longer a need for preserving branch offset operands,
as we no longer re-translate.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.inc.c | 10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 17c435ade5..96c344142e 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -1329,13 +1329,11 @@ static void tgen_branch(TCGContext *s, int cc, TCGLabel 
*l)
 static void tgen_compare_branch(TCGContext *s, S390Opcode opc, int cc,
 TCGReg r1, TCGReg r2, TCGLabel *l)
 {
-intptr_t off;
+intptr_t off = 0;
 
 if (l->has_value) {
 off = l->u.value_ptr - s->code_ptr;
 } else {
-/* We need to keep the offset unchanged for retranslation.  */
-off = s->code_ptr[1];
 tcg_out_reloc(s, s->code_ptr + 1, R_390_PC16DBL, l, 2);
 }
 
@@ -1347,13 +1345,11 @@ static void tgen_compare_branch(TCGContext *s, 
S390Opcode opc, int cc,
 static void tgen_compare_imm_branch(TCGContext *s, S390Opcode opc, int cc,
 TCGReg r1, int i2, TCGLabel *l)
 {
-tcg_target_long off;
+tcg_target_long off = 0;
 
 if (l->has_value) {
 off = l->u.value_ptr - s->code_ptr;
 } else {
-/* We need to keep the offset unchanged for retranslation.  */
-off = s->code_ptr[1];
 tcg_out_reloc(s, s->code_ptr + 1, R_390_PC16DBL, l, 2);
 }
 
@@ -1696,7 +1692,6 @@ static void tcg_out_qemu_ld(TCGContext* s, TCGReg 
data_reg, TCGReg addr_reg,
 
 base_reg = tcg_out_tlb_read(s, addr_reg, opc, mem_index, 1);
 
-/* We need to keep the offset unchanged for retranslation.  */
 tcg_out16(s, RI_BRC | (S390_CC_NE << 4));
 label_ptr = s->code_ptr;
 s->code_ptr += 1;
@@ -1724,7 +1719,6 @@ static void tcg_out_qemu_st(TCGContext* s, TCGReg 
data_reg, TCGReg addr_reg,
 
 base_reg = tcg_out_tlb_read(s, addr_reg, opc, mem_index, 0);
 
-/* We need to keep the offset unchanged for retranslation.  */
 tcg_out16(s, RI_BRC | (S390_CC_NE << 4));
 label_ptr = s->code_ptr;
 s->code_ptr += 1;
-- 
2.17.2

[Qemu-devel] [PULL 31/32] include: move exec/tb-hash-xx.h to qemu/xxhash.h

2018-12-13 Thread Richard Henderson

From: "Emilio G. Cota" 

Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
Signed-off-by: Richard Henderson 
---
 include/exec/tb-hash.h   | 2 +-
 include/{exec/tb-hash-xx.h => qemu/xxhash.h} | 6 +++---
 tests/qht-bench.c| 2 +-
 util/qsp.c   | 2 +-
 4 files changed, 6 insertions(+), 6 deletions(-)
 rename include/{exec/tb-hash-xx.h => qemu/xxhash.h} (97%)

diff --git a/include/exec/tb-hash.h b/include/exec/tb-hash.h
index 731ba4c272..4f3a37d927 100644
--- a/include/exec/tb-hash.h
+++ b/include/exec/tb-hash.h
@@ -20,7 +20,7 @@
 #ifndef EXEC_TB_HASH_H
 #define EXEC_TB_HASH_H
 
-#include "exec/tb-hash-xx.h"
+#include "qemu/xxhash.h"
 
 #ifdef CONFIG_SOFTMMU
 
diff --git a/include/exec/tb-hash-xx.h b/include/qemu/xxhash.h
similarity index 97%
rename from include/exec/tb-hash-xx.h
rename to include/qemu/xxhash.h
index 98ce4b628a..fe35dde328 100644
--- a/include/exec/tb-hash-xx.h
+++ b/include/qemu/xxhash.h
@@ -31,8 +31,8 @@
  * - xxHash source repository : https://github.com/Cyan4973/xxHash
  */
 
-#ifndef EXEC_TB_HASH_XX_H
-#define EXEC_TB_HASH_XX_H
+#ifndef QEMU_XXHASH_H
+#define QEMU_XXHASH_H
 
 #include "qemu/bitops.h"
 
@@ -119,4 +119,4 @@ static inline uint32_t qemu_xxhash6(uint64_t ab, uint64_t 
cd, uint32_t e,
 return qemu_xxhash7(ab, cd, e, f, 0);
 }
 
-#endif /* EXEC_TB_HASH_XX_H */
+#endif /* QEMU_XXHASH_H */
diff --git a/tests/qht-bench.c b/tests/qht-bench.c
index 0278f4da04..ab4e708180 100644
--- a/tests/qht-bench.c
+++ b/tests/qht-bench.c
@@ -9,7 +9,7 @@
 #include "qemu/atomic.h"
 #include "qemu/qht.h"
 #include "qemu/rcu.h"
-#include "exec/tb-hash-xx.h"
+#include "qemu/xxhash.h"
 
 struct thread_stats {
 size_t rd;
diff --git a/util/qsp.c b/util/qsp.c
index dc29c41fde..410f1ba004 100644
--- a/util/qsp.c
+++ b/util/qsp.c
@@ -61,7 +61,7 @@
 #include "qemu/timer.h"
 #include "qemu/qht.h"
 #include "qemu/rcu.h"
-#include "exec/tb-hash-xx.h"
+#include "qemu/xxhash.h"
 
 enum QSPType {
 QSP_MUTEX,
-- 
2.17.2

[Qemu-devel] [PULL 18/32] tcg/i386: Propagate is64 to tcg_out_qemu_ld_slow_path

2018-12-13 Thread Richard Henderson

This helps preserve the invariant that all TCG_TYPE_I32 values
are stored zero-extended in the 64-bit host registers.

Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 6bf4f84b20..695b406b4e 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1692,7 +1692,8 @@ static inline void tcg_out_tlb_load(TCGContext *s, TCGReg 
addrlo, TCGReg addrhi,
  * Record the context of a call to the out of line helper code for the slow 
path
  * for a load or store, so that we can later generate the correct helper code
  */
-static void add_qemu_ldst_label(TCGContext *s, bool is_ld, TCGMemOpIdx oi,
+static void add_qemu_ldst_label(TCGContext *s, bool is_ld, bool is_64,
+TCGMemOpIdx oi,
 TCGReg datalo, TCGReg datahi,
 TCGReg addrlo, TCGReg addrhi,
 tcg_insn_unit *raddr,
@@ -1702,6 +1703,7 @@ static void add_qemu_ldst_label(TCGContext *s, bool 
is_ld, TCGMemOpIdx oi,
 
 label->is_ld = is_ld;
 label->oi = oi;
+label->type = is_64 ? TCG_TYPE_I64 : TCG_TYPE_I32;
 label->datalo_reg = datalo;
 label->datahi_reg = datahi;
 label->addrlo_reg = addrlo;
@@ -1722,6 +1724,7 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 TCGMemOp opc = get_memop(oi);
 TCGReg data_reg;
 tcg_insn_unit **label_ptr = >label_ptr[0];
+int rexw = (l->type == TCG_TYPE_I64 ? P_REXW : 0);
 
 /* resolve label address */
 tcg_patch32(label_ptr[0], s->code_ptr - label_ptr[0] - 4);
@@ -1760,10 +1763,10 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *l)
 data_reg = l->datalo_reg;
 switch (opc & MO_SSIZE) {
 case MO_SB:
-tcg_out_ext8s(s, data_reg, TCG_REG_EAX, P_REXW);
+tcg_out_ext8s(s, data_reg, TCG_REG_EAX, rexw);
 break;
 case MO_SW:
-tcg_out_ext16s(s, data_reg, TCG_REG_EAX, P_REXW);
+tcg_out_ext16s(s, data_reg, TCG_REG_EAX, rexw);
 break;
 #if TCG_TARGET_REG_BITS == 64
 case MO_SL:
@@ -2014,7 +2017,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is64)
 tcg_out_qemu_ld_direct(s, datalo, datahi, TCG_REG_L1, -1, 0, 0, is64, opc);
 
 /* Record the current context of a load into ldst label */
-add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
+add_qemu_ldst_label(s, true, is64, oi, datalo, datahi, addrlo, addrhi,
 s->code_ptr, label_ptr);
 #else
 {
@@ -2154,7 +2157,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg 
*args, bool is64)
 tcg_out_qemu_st_direct(s, datalo, datahi, TCG_REG_L1, 0, 0, opc);
 
 /* Record the current context of a store into ldst label */
-add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
+add_qemu_ldst_label(s, false, is64, oi, datalo, datahi, addrlo, addrhi,
 s->code_ptr, label_ptr);
 #else
 {
-- 
2.17.2

[Qemu-devel] [PULL 24/32] tcg: Clean up generic bswap64

2018-12-13 Thread Richard Henderson

Based on the only current user, Sparc:

New code uses 2 constants that take 2 insns to load from constant pool,
plus 13.  Old code used 6 constants that took 1 or 2 insns to create,
plus 21.  The result is a new total of 17 vs an old total of 29.

Signed-off-by: Richard Henderson 
---
 tcg/tcg-op.c | 43 ++-
 1 file changed, 18 insertions(+), 25 deletions(-)

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index a956499e46..887b371a81 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -1678,37 +1678,30 @@ void tcg_gen_bswap64_i64(TCGv_i64 ret, TCGv_i64 arg)
 } else {
 TCGv_i64 t0 = tcg_temp_new_i64();
 TCGv_i64 t1 = tcg_temp_new_i64();
+TCGv_i64 t2 = tcg_temp_new_i64();
 
-tcg_gen_shli_i64(t0, arg, 56);
+/* arg = abcdefgh */
+tcg_gen_movi_i64(t2, 0x00ff00ff00ff00ffull);
+tcg_gen_shri_i64(t0, arg, 8);   /*  t0 = .abcdefg */
+tcg_gen_and_i64(t1, arg, t2);   /*  t1 = .b.d.f.h */
+tcg_gen_and_i64(t0, t0, t2);/*  t0 = .a.c.e.g */
+tcg_gen_shli_i64(t1, t1, 8);/*  t1 = b.d.f.h. */
+tcg_gen_or_i64(ret, t0, t1);/* ret = badcfehg */
 
-tcg_gen_andi_i64(t1, arg, 0xff00);
-tcg_gen_shli_i64(t1, t1, 40);
-tcg_gen_or_i64(t0, t0, t1);
+tcg_gen_movi_i64(t2, 0xull);
+tcg_gen_shri_i64(t0, ret, 16);  /*  t0 = ..badcfe */
+tcg_gen_and_i64(t1, ret, t2);   /*  t1 = ..dc..hg */
+tcg_gen_and_i64(t0, t0, t2);/*  t0 = ..ba..fe */
+tcg_gen_shli_i64(t1, t1, 16);   /*  t1 = dc..hg.. */
+tcg_gen_or_i64(ret, t0, t1);/* ret = dcbahgfe */
 
-tcg_gen_andi_i64(t1, arg, 0x00ff);
-tcg_gen_shli_i64(t1, t1, 24);
-tcg_gen_or_i64(t0, t0, t1);
+tcg_gen_shri_i64(t0, ret, 32);  /*  t0 = dcba */
+tcg_gen_shli_i64(t1, ret, 32);  /*  t1 = hgfe */
+tcg_gen_or_i64(ret, t0, t1);/* ret = hgfedcba */
 
-tcg_gen_andi_i64(t1, arg, 0xff00);
-tcg_gen_shli_i64(t1, t1, 8);
-tcg_gen_or_i64(t0, t0, t1);
-
-tcg_gen_shri_i64(t1, arg, 8);
-tcg_gen_andi_i64(t1, t1, 0xff00);
-tcg_gen_or_i64(t0, t0, t1);
-
-tcg_gen_shri_i64(t1, arg, 24);
-tcg_gen_andi_i64(t1, t1, 0x00ff);
-tcg_gen_or_i64(t0, t0, t1);
-
-tcg_gen_shri_i64(t1, arg, 40);
-tcg_gen_andi_i64(t1, t1, 0xff00);
-tcg_gen_or_i64(t0, t0, t1);
-
-tcg_gen_shri_i64(t1, arg, 56);
-tcg_gen_or_i64(ret, t0, t1);
 tcg_temp_free_i64(t0);
 tcg_temp_free_i64(t1);
+tcg_temp_free_i64(t2);
 }
 }
 
-- 
2.17.2

[Qemu-devel] [PULL 16/32] tcg/s390x: Return false on failure from patch_reloc

2018-12-13 Thread Richard Henderson

This does require an extra two checks within the slow paths
to replace the assert that we're moving.  Also add two checks
within existing functions that lacked any kind of assert for
out of range branch.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/s390/tcg-target.inc.c | 34 +++---
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 68a4c60394..39ecf609a1 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -377,23 +377,29 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 
 switch (type) {
 case R_390_PC16DBL:
-assert(pcrel2 == (int16_t)pcrel2);
-tcg_patch16(code_ptr, pcrel2);
+if (pcrel2 == (int16_t)pcrel2) {
+tcg_patch16(code_ptr, pcrel2);
+return true;
+}
 break;
 case R_390_PC32DBL:
-assert(pcrel2 == (int32_t)pcrel2);
-tcg_patch32(code_ptr, pcrel2);
+if (pcrel2 == (int32_t)pcrel2) {
+tcg_patch32(code_ptr, pcrel2);
+return true;
+}
 break;
 case R_390_20:
-assert(value == sextract64(value, 0, 20));
-old = *(uint32_t *)code_ptr & 0xf0ff;
-old |= ((value & 0xfff) << 16) | ((value & 0xff000) >> 4);
-tcg_patch32(code_ptr, old);
+if (value == sextract64(value, 0, 20)) {
+old = *(uint32_t *)code_ptr & 0xf0ff;
+old |= ((value & 0xfff) << 16) | ((value & 0xff000) >> 4);
+tcg_patch32(code_ptr, old);
+return true;
+}
 break;
 default:
 g_assert_not_reached();
 }
-return true;
+return false;
 }
 
 /* parse target specific constraints */
@@ -1334,6 +1340,7 @@ static void tgen_compare_branch(TCGContext *s, S390Opcode 
opc, int cc,
 
 if (l->has_value) {
 off = l->u.value_ptr - s->code_ptr;
+tcg_debug_assert(off == (int16_t)off);
 } else {
 tcg_out_reloc(s, s->code_ptr + 1, R_390_PC16DBL, l, 2);
 }
@@ -1350,6 +1357,7 @@ static void tgen_compare_imm_branch(TCGContext *s, 
S390Opcode opc, int cc,
 
 if (l->has_value) {
 off = l->u.value_ptr - s->code_ptr;
+tcg_debug_assert(off == (int16_t)off);
 } else {
 tcg_out_reloc(s, s->code_ptr + 1, R_390_PC16DBL, l, 2);
 }
@@ -1615,7 +1623,9 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 TCGMemOpIdx oi = lb->oi;
 TCGMemOp opc = get_memop(oi);
 
-patch_reloc(lb->label_ptr[0], R_390_PC16DBL, (intptr_t)s->code_ptr, 2);
+bool ok = patch_reloc(lb->label_ptr[0], R_390_PC16DBL,
+  (intptr_t)s->code_ptr, 2);
+tcg_debug_assert(ok);
 
 tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_R2, TCG_AREG0);
 if (TARGET_LONG_BITS == 64) {
@@ -1636,7 +1646,9 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 TCGMemOpIdx oi = lb->oi;
 TCGMemOp opc = get_memop(oi);
 
-patch_reloc(lb->label_ptr[0], R_390_PC16DBL, (intptr_t)s->code_ptr, 2);
+bool ok = patch_reloc(lb->label_ptr[0], R_390_PC16DBL,
+  (intptr_t)s->code_ptr, 2);
+tcg_debug_assert(ok);
 
 tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_R2, TCG_AREG0);
 if (TARGET_LONG_BITS == 64) {
-- 
2.17.2

[Qemu-devel] [PULL 20/32] tcg/i386: Assume 32-bit values are zero-extended

2018-12-13 Thread Richard Henderson

We now have an invariant that all TCG_TYPE_I32 values are
zero-extended, which means that we do not need to extend
them again during qemu_ld/st, either explicitly via a separate
tcg_out_ext32u or implicitly via P_ADDR32.

Reviewed-by: Emilio G. Cota 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 103 +++---
 1 file changed, 40 insertions(+), 63 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index fe864e9ef9..f7b548545a 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -309,13 +309,11 @@ static inline int tcg_target_const_match(tcg_target_long 
val, TCGType type,
 #define P_EXT38 0x200   /* 0x0f 0x38 opcode prefix */
 #define P_DATA160x400   /* 0x66 opcode prefix */
 #if TCG_TARGET_REG_BITS == 64
-# define P_ADDR32   0x800   /* 0x67 opcode prefix */
 # define P_REXW 0x1000  /* Set REX.W = 1 */
 # define P_REXB_R   0x2000  /* REG field as byte register */
 # define P_REXB_RM  0x4000  /* R/M field as byte register */
 # define P_GS   0x8000  /* gs segment override */
 #else
-# define P_ADDR32  0
 # define P_REXW0
 # define P_REXB_R  0
 # define P_REXB_RM 0
@@ -528,9 +526,6 @@ static void tcg_out_opc(TCGContext *s, int opc, int r, int 
rm, int x)
 tcg_debug_assert((opc & P_REXW) == 0);
 tcg_out8(s, 0x66);
 }
-if (opc & P_ADDR32) {
-tcg_out8(s, 0x67);
-}
 if (opc & P_SIMDF3) {
 tcg_out8(s, 0xf3);
 } else if (opc & P_SIMDF2) {
@@ -1659,11 +1654,7 @@ static inline void tcg_out_tlb_load(TCGContext *s, 
TCGReg addrlo, TCGReg addrhi,
 tcg_out_modrm_offset(s, OPC_CMP_GvEv + trexw, r1, r0, 0);
 
 /* Prepare for both the fast path add of the tlb addend, and the slow
-   path function argument setup.  There are two cases worth note:
-   For 32-bit guest and x86_64 host, MOVL zero-extends the guest address
-   before the fastpath ADDQ below.  For 64-bit guest and x32 host, MOVQ
-   copies the entire guest address for the slow path, while truncation
-   for the 32-bit host happens with the fastpath ADDL below.  */
+   path function argument setup.  */
 tcg_out_mov(s, ttype, r1, addrlo);
 
 /* jne slow_path */
@@ -2022,41 +2013,31 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is64)
 #else
 {
 int32_t offset = guest_base;
-TCGReg base = addrlo;
 int index = -1;
 int seg = 0;
 
-/* For a 32-bit guest, the high 32 bits may contain garbage.
-   We can do this with the ADDR32 prefix if we're not using
-   a guest base, or when using segmentation.  Otherwise we
-   need to zero-extend manually.  */
+/*
+ * Recall we store 32-bit values zero-extended.  No need for
+ * further manual extension or an addr32 (0x67) prefix.
+ */
 if (guest_base == 0 || guest_base_flags) {
 seg = guest_base_flags;
 offset = 0;
-if (TCG_TARGET_REG_BITS > TARGET_LONG_BITS) {
-seg |= P_ADDR32;
-}
-} else if (TCG_TARGET_REG_BITS == 64) {
-if (TARGET_LONG_BITS == 32) {
-tcg_out_ext32u(s, TCG_REG_L0, base);
-base = TCG_REG_L0;
-}
-if (offset != guest_base) {
-tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_L1, guest_base);
-index = TCG_REG_L1;
-offset = 0;
-}
+} else if (TCG_TARGET_REG_BITS == 64 && offset != guest_base) {
+tcg_out_movi(s, TCG_TYPE_I64, TCG_REG_L1, guest_base);
+index = TCG_REG_L1;
+offset = 0;
 }
 
 tcg_out_qemu_ld_direct(s, datalo, datahi,
-   base, index, offset, seg, is64, opc);
+   addrlo, index, offset, seg, is64, opc);
 }
 #endif
 }
 
 static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
-   TCGReg base, intptr_t ofs, int seg,
-   TCGMemOp memop)
+   TCGReg base, int index, intptr_t ofs,
+   int seg, TCGMemOp memop)
 {
 /* ??? Ideally we wouldn't need a scratch register.  For user-only,
we could perform the bswap twice to restore the original value
@@ -2080,8 +2061,8 @@ static void tcg_out_qemu_st_direct(TCGContext *s, TCGReg 
datalo, TCGReg datahi,
 tcg_out_mov(s, TCG_TYPE_I32, scratch, datalo);
 datalo = scratch;
 }
-tcg_out_modrm_offset(s, OPC_MOVB_EvGv + P_REXB_R + seg,
- datalo, base, ofs);
+tcg_out_modrm_sib_offset(s, OPC_MOVB_EvGv + P_REXB_R + seg,
+ datalo, base, index, 0, ofs);
 break;

[Qemu-devel] [PULL 22/32] tcg/i386: Add setup_guest_base_seg for FreeBSD

2018-12-13 Thread Richard Henderson

Reviewed-by: Emilio G. Cota 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 3fb2f4b971..c21c3272f2 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1876,6 +1876,15 @@ static inline int setup_guest_base_seg(void)
 }
 return 0;
 }
+# elif defined (__FreeBSD__) || defined (__FreeBSD_kernel__)
+#  include 
+static inline int setup_guest_base_seg(void)
+{
+if (sysarch(AMD64_SET_GSBASE, _base) == 0) {
+return P_GS;
+}
+return 0;
+}
 # else
 static inline int setup_guest_base_seg(void)
 {
-- 
2.17.2

[Qemu-devel] [PULL 06/32] tcg/arm: Fold away "noaddr" branch routines

2018-12-13 Thread Richard Henderson

There are one use apiece for these.  There is no longer a need for
preserving branch offset operands, as we no longer re-translate.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/arm/tcg-target.inc.c | 22 +++---
 1 file changed, 3 insertions(+), 19 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 1142eb13ad..1651f00281 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -366,22 +366,6 @@ static inline void tcg_out_b(TCGContext *s, int cond, 
int32_t offset)
 (((offset - 8) >> 2) & 0x00ff));
 }
 
-static inline void tcg_out_b_noaddr(TCGContext *s, int cond)
-{
-/* We pay attention here to not modify the branch target by masking
-   the corresponding bytes.  This ensure that caches and memory are
-   kept coherent during retranslation. */
-tcg_out32(s, deposit32(*s->code_ptr, 24, 8, (cond << 4) | 0x0a));
-}
-
-static inline void tcg_out_bl_noaddr(TCGContext *s, int cond)
-{
-/* We pay attention here to not modify the branch target by masking
-   the corresponding bytes.  This ensure that caches and memory are
-   kept coherent during retranslation. */
-tcg_out32(s, deposit32(*s->code_ptr, 24, 8, (cond << 4) | 0x0b));
-}
-
 static inline void tcg_out_bl(TCGContext *s, int cond, int32_t offset)
 {
 tcg_out32(s, (cond << 28) | 0x0b00 |
@@ -1082,7 +1066,7 @@ static inline void tcg_out_goto_label(TCGContext *s, int 
cond, TCGLabel *l)
 tcg_out_goto(s, cond, l->u.value_ptr);
 } else {
 tcg_out_reloc(s, s->code_ptr, R_ARM_PC24, l, 0);
-tcg_out_b_noaddr(s, cond);
+tcg_out_b(s, cond, 0);
 }
 }
 
@@ -1628,7 +1612,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is64)
 /* This a conditional BL only to load a pointer within this opcode into LR
for the slow path.  We will not be using the value for a tail call.  */
 label_ptr = s->code_ptr;
-tcg_out_bl_noaddr(s, COND_NE);
+tcg_out_bl(s, COND_NE, 0);
 
 tcg_out_qemu_ld_index(s, opc, datalo, datahi, addrlo, addend);
 
@@ -1760,7 +1744,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg 
*args, bool is64)
 
 /* The conditional call must come last, as we're going to return here.  */
 label_ptr = s->code_ptr;
-tcg_out_bl_noaddr(s, COND_NE);
+tcg_out_bl(s, COND_NE, 0);
 
 add_qemu_ldst_label(s, false, oi, datalo, datahi, addrlo, addrhi,
 s->code_ptr, label_ptr);
-- 
2.17.2

[Qemu-devel] [PULL 29/32] qht-bench: document -p flag

2018-12-13 Thread Richard Henderson

From: "Emilio G. Cota" 

Which we forgot to do in bd224fce60 ("qht-bench: add -p flag
to precompute hash values", 2018-09-26).

Reviewed-by: Alex Bennée 
Signed-off-by: Emilio G. Cota 
Signed-off-by: Richard Henderson 
---
 tests/qht-bench.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qht-bench.c b/tests/qht-bench.c
index 2089e2bed1..636750d39f 100644
--- a/tests/qht-bench.c
+++ b/tests/qht-bench.c
@@ -72,6 +72,7 @@ static const char commands_string[] =
 " -n = number of threads\n"
 "\n"
 " -o = offset at which keys start\n"
+" -p = precompute hashes\n"
 "\n"
 " -g = set -s,-k,-K,-l,-r to the same value\n"
 " -s = initial size hint\n"
-- 
2.17.2

[Qemu-devel] [PULL 04/32] tcg/aarch64: Fold away "noaddr" branch routines

2018-12-13 Thread Richard Henderson

There are one use apiece for these.  There is no longer a need for
preserving branch offset operands, as we no longer re-translate.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 21 ++---
 1 file changed, 2 insertions(+), 19 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index a41b633960..28de0226fb 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -1129,23 +1129,6 @@ static inline void tcg_out_goto_long(TCGContext *s, 
tcg_insn_unit *target)
 }
 }
 
-static inline void tcg_out_goto_noaddr(TCGContext *s)
-{
-/* We pay attention here to not modify the branch target by reading from
-   the buffer. This ensure that caches and memory are kept coherent during
-   retranslation.  Mask away possible garbage in the high bits for the
-   first translation, while keeping the offset bits for retranslation. */
-uint32_t old = tcg_in32(s);
-tcg_out_insn(s, 3206, B, old);
-}
-
-static inline void tcg_out_goto_cond_noaddr(TCGContext *s, TCGCond c)
-{
-/* See comments in tcg_out_goto_noaddr.  */
-uint32_t old = tcg_in32(s) >> 5;
-tcg_out_insn(s, 3202, B_C, c, old);
-}
-
 static inline void tcg_out_callr(TCGContext *s, TCGReg reg)
 {
 tcg_out_insn(s, 3207, BLR, reg);
@@ -1192,7 +1175,7 @@ static inline void tcg_out_goto_label(TCGContext *s, 
TCGLabel *l)
 {
 if (!l->has_value) {
 tcg_out_reloc(s, s->code_ptr, R_AARCH64_JUMP26, l, 0);
-tcg_out_goto_noaddr(s);
+tcg_out_insn(s, 3206, B, 0);
 } else {
 tcg_out_goto(s, l->u.value_ptr);
 }
@@ -1523,7 +1506,7 @@ static void tcg_out_tlb_read(TCGContext *s, TCGReg 
addr_reg, TCGMemOp opc,
 
 /* If not equal, we jump to the slow path. */
 *label_ptr = s->code_ptr;
-tcg_out_goto_cond_noaddr(s, TCG_COND_NE);
+tcg_out_insn(s, 3202, B_C, TCG_COND_NE, 0);
 }
 
 #endif /* CONFIG_SOFTMMU */
-- 
2.17.2

[Qemu-devel] [PULL 15/32] tcg/ppc: Return false on failure from patch_reloc

2018-12-13 Thread Richard Henderson

The reloc_pc{14,24}_val routines retain their asserts.
Use these directly within the slow paths.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.inc.c | 32 +---
 1 file changed, 21 insertions(+), 11 deletions(-)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 860b0d36e1..8c1cfdd7ac 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -193,9 +193,14 @@ static uint32_t reloc_pc24_val(tcg_insn_unit *pc, 
tcg_insn_unit *target)
 return disp & 0x3fc;
 }
 
-static void reloc_pc24(tcg_insn_unit *pc, tcg_insn_unit *target)
+static bool reloc_pc24(tcg_insn_unit *pc, tcg_insn_unit *target)
 {
-*pc = (*pc & ~0x3fc) | reloc_pc24_val(pc, target);
+ptrdiff_t disp = tcg_ptr_byte_diff(target, pc);
+if (in_range_b(disp)) {
+*pc = (*pc & ~0x3fc) | (disp & 0x3fc);
+return true;
+}
+return false;
 }
 
 static uint16_t reloc_pc14_val(tcg_insn_unit *pc, tcg_insn_unit *target)
@@ -205,9 +210,14 @@ static uint16_t reloc_pc14_val(tcg_insn_unit *pc, 
tcg_insn_unit *target)
 return disp & 0xfffc;
 }
 
-static void reloc_pc14(tcg_insn_unit *pc, tcg_insn_unit *target)
+static bool reloc_pc14(tcg_insn_unit *pc, tcg_insn_unit *target)
 {
-*pc = (*pc & ~0xfffc) | reloc_pc14_val(pc, target);
+ptrdiff_t disp = tcg_ptr_byte_diff(target, pc);
+if (disp == (int16_t) disp) {
+*pc = (*pc & ~0xfffc) | (disp & 0xfffc);
+return true;
+}
+return false;
 }
 
 /* parse target specific constraints */
@@ -524,11 +534,9 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 
 switch (type) {
 case R_PPC_REL14:
-reloc_pc14(code_ptr, target);
-break;
+return reloc_pc14(code_ptr, target);
 case R_PPC_REL24:
-reloc_pc24(code_ptr, target);
-break;
+return reloc_pc24(code_ptr, target);
 case R_PPC_ADDR16:
 /* We are abusing this relocation type.  This points to a pair
of insns, addis + load.  If the displacement is small, we
@@ -540,7 +548,9 @@ static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 } else {
 int16_t lo = value;
 int hi = value - lo;
-assert(hi + lo == value);
+if (hi + lo != value) {
+return false;
+}
 code_ptr[0] = deposit32(code_ptr[0], 0, 16, hi >> 16);
 code_ptr[1] = deposit32(code_ptr[1], 0, 16, lo);
 }
@@ -1638,7 +1648,7 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 TCGMemOp opc = get_memop(oi);
 TCGReg hi, lo, arg = TCG_REG_R3;
 
-reloc_pc14(lb->label_ptr[0], s->code_ptr);
+**lb->label_ptr |= reloc_pc14_val(*lb->label_ptr, s->code_ptr);
 
 tcg_out_mov(s, TCG_TYPE_PTR, arg++, TCG_AREG0);
 
@@ -1683,7 +1693,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 TCGMemOp s_bits = opc & MO_SIZE;
 TCGReg hi, lo, arg = TCG_REG_R3;
 
-reloc_pc14(lb->label_ptr[0], s->code_ptr);
+**lb->label_ptr |= reloc_pc14_val(*lb->label_ptr, s->code_ptr);
 
 tcg_out_mov(s, TCG_TYPE_PTR, arg++, TCG_AREG0);
 
-- 
2.17.2

[Qemu-devel] [PULL 19/32] tcg/i386: Implement INDEX_op_extr{lh}_i64_i32 for 32-bit guests

2018-12-13 Thread Richard Henderson

This preserves the invariant that all TCG_TYPE_I32 values are
zero-extended in the 64-bit host register.

Reviewed-by: Emilio G. Cota 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h | 5 +++--
 tcg/i386/tcg-target.inc.c | 6 ++
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 2441658865..c523d5f5e1 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -135,8 +135,9 @@ extern bool have_avx2;
 #define TCG_TARGET_HAS_direct_jump  1
 
 #if TCG_TARGET_REG_BITS == 64
-#define TCG_TARGET_HAS_extrl_i64_i320
-#define TCG_TARGET_HAS_extrh_i64_i320
+/* Keep target addresses zero-extended in a register.  */
+#define TCG_TARGET_HAS_extrl_i64_i32(TARGET_LONG_BITS == 32)
+#define TCG_TARGET_HAS_extrh_i64_i32(TARGET_LONG_BITS == 32)
 #define TCG_TARGET_HAS_div2_i64 1
 #define TCG_TARGET_HAS_rot_i64  1
 #define TCG_TARGET_HAS_ext8s_i641
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 695b406b4e..fe864e9ef9 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -2549,12 +2549,16 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode 
opc,
 break;
 case INDEX_op_extu_i32_i64:
 case INDEX_op_ext32u_i64:
+case INDEX_op_extrl_i64_i32:
 tcg_out_ext32u(s, a0, a1);
 break;
 case INDEX_op_ext_i32_i64:
 case INDEX_op_ext32s_i64:
 tcg_out_ext32s(s, a0, a1);
 break;
+case INDEX_op_extrh_i64_i32:
+tcg_out_shifti(s, SHIFT_SHR + P_REXW, a0, 32);
+break;
 #endif
 
 OP_32_64(deposit):
@@ -2918,6 +2922,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_neg_i64:
 case INDEX_op_not_i32:
 case INDEX_op_not_i64:
+case INDEX_op_extrh_i64_i32:
 return _0;
 
 case INDEX_op_ext8s_i32:
@@ -2933,6 +2938,7 @@ static const TCGTargetOpDef *tcg_target_op_def(TCGOpcode 
op)
 case INDEX_op_ext32u_i64:
 case INDEX_op_ext_i32_i64:
 case INDEX_op_extu_i32_i64:
+case INDEX_op_extrl_i64_i32:
 case INDEX_op_extract_i32:
 case INDEX_op_extract_i64:
 case INDEX_op_sextract_i32:
-- 
2.17.2

[Qemu-devel] [PULL 11/32] tcg: Return success from patch_reloc

2018-12-13 Thread Richard Henderson

This will move the assert for success from within (subroutines of)
patch_reloc into the callers.  It will also let new code do something
different when a relocation is out of range.

For the moment, all backends are trivially converted to return true.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 3 ++-
 tcg/arm/tcg-target.inc.c | 3 ++-
 tcg/i386/tcg-target.inc.c| 3 ++-
 tcg/mips/tcg-target.inc.c| 3 ++-
 tcg/ppc/tcg-target.inc.c | 3 ++-
 tcg/s390/tcg-target.inc.c| 3 ++-
 tcg/sparc/tcg-target.inc.c   | 5 +++--
 tcg/tcg.c| 8 +---
 tcg/tci/tcg-target.inc.c | 3 ++-
 9 files changed, 22 insertions(+), 12 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 28de0226fb..16f08c59c4 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -94,7 +94,7 @@ static inline void reloc_pc19(tcg_insn_unit *code_ptr, 
tcg_insn_unit *target)
 *code_ptr = deposit32(*code_ptr, 5, 19, offset);
 }
 
-static inline void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static inline bool patch_reloc(tcg_insn_unit *code_ptr, int type,
intptr_t value, intptr_t addend)
 {
 tcg_debug_assert(addend == 0);
@@ -109,6 +109,7 @@ static inline void patch_reloc(tcg_insn_unit *code_ptr, int 
type,
 default:
 tcg_abort();
 }
+return true;
 }
 
 #define TCG_CT_CONST_AIMM 0x100
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index 1651f00281..deefa20fbf 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -193,7 +193,7 @@ static inline void reloc_pc24(tcg_insn_unit *code_ptr, 
tcg_insn_unit *target)
 *code_ptr = (*code_ptr & ~0xff) | (offset & 0xff);
 }
 
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 intptr_t value, intptr_t addend)
 {
 tcg_debug_assert(addend == 0);
@@ -229,6 +229,7 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 } else {
 g_assert_not_reached();
 }
+return true;
 }
 
 #define TCG_CT_CONST_ARM  0x100
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 436195894b..5c88f1f36b 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -167,7 +167,7 @@ static bool have_lzcnt;
 
 static tcg_insn_unit *tb_ret_addr;
 
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 intptr_t value, intptr_t addend)
 {
 value += addend;
@@ -191,6 +191,7 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 default:
 tcg_abort();
 }
+return true;
 }
 
 #if TCG_TARGET_REG_BITS == 64
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index e21cb1ae28..a06ff257fa 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -168,12 +168,13 @@ static inline void reloc_26(tcg_insn_unit *pc, 
tcg_insn_unit *target)
 *pc = deposit32(*pc, 0, 26, reloc_26_val(pc, target));
 }
 
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 intptr_t value, intptr_t addend)
 {
 tcg_debug_assert(type == R_MIPS_PC16);
 tcg_debug_assert(addend == 0);
 reloc_pc16(code_ptr, (tcg_insn_unit *)value);
+return true;
 }
 
 #define TCG_CT_CONST_ZERO 0x100
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index 2e2a22f579..860b0d36e1 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -513,7 +513,7 @@ static const uint32_t tcg_to_isel[] = {
 [TCG_COND_GTU] = ISEL | BC_(7, CR_GT),
 };
 
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 intptr_t value, intptr_t addend)
 {
 tcg_insn_unit *target;
@@ -548,6 +548,7 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 default:
 g_assert_not_reached();
 }
+return true;
 }
 
 static void tcg_out_mem_long(TCGContext *s, int opi, int opx, TCGReg rt,
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index 96c344142e..68a4c60394 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -366,7 +366,7 @@ static void * const qemu_st_helpers[16] = {
 static tcg_insn_unit *tb_ret_addr;
 uint64_t s390_facilities;
 
-static void patch_reloc(tcg_insn_unit *code_ptr, int type,
+static bool patch_reloc(tcg_insn_unit *code_ptr, int type,
 intptr_t value, intptr_t addend)
 {
 intptr_t pcrel2;
@@ -393,6 +393,7 @@ static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 default:
 g_assert_not_reached();
 }
+return true;
 }
 
 /* parse target specific constraints */
diff --git a/tcg/sparc/tcg-target.inc.c

[Qemu-devel] [PULL 00/32] tcg patch queue

2018-12-13 Thread Richard Henderson

The following changes since commit 2d894e48362ad2a576fca929dcca1787f43a8af6:

  Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' 
into staging (2018-12-13 17:50:45 +)

are available in the Git repository at:

  https://github.com/rth7680/qemu.git tags/pull-tcg-20181213

for you to fetch changes up to 99f70ba5b6b4566509b2069a8d29c6686b8115de:

  xxhash: match output against the original xxhash32 (2018-12-13 18:56:11 -0600)


- Remove retranslation remenents
- Return success from patch_reloc
- Preserve 32-bit values as zero-extended on x86_64
- Make bswap during memory ops as optional
- Cleanup xxhash


Alistair Francis (1):
  tcg/mips: Improve the add2/sub2 command to use TCG_TARGET_REG_BITS

Emilio G. Cota (5):
  tcg: Drop nargs from tcg_op_insert_{before,after}
  qht-bench: document -p flag
  exec: introduce qemu_xxhash{2,4,5,6,7}
  include: move exec/tb-hash-xx.h to qemu/xxhash.h
  xxhash: match output against the original xxhash32

Richard Henderson (26):
  tcg/i386: Always use %ebp for TCG_AREG0
  tcg/i386: Move TCG_REG_CALL_STACK from define to enum
  tcg/aarch64: Remove reloc_pc26_atomic
  tcg/aarch64: Fold away "noaddr" branch routines
  tcg/arm: Remove reloc_pc24_atomic
  tcg/arm: Fold away "noaddr" branch routines
  tcg/ppc: Fold away "noaddr" branch routines
  tcg/s390: Remove retranslation code
  tcg/sparc: Remove retranslation code
  tcg/mips: Remove retranslation code
  tcg: Return success from patch_reloc
  tcg/i386: Return false on failure from patch_reloc
  tcg/aarch64: Return false on failure from patch_reloc
  tcg/arm: Return false on failure from patch_reloc
  tcg/ppc: Return false on failure from patch_reloc
  tcg/s390x: Return false on failure from patch_reloc
  tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct
  tcg/i386: Propagate is64 to tcg_out_qemu_ld_slow_path
  tcg/i386: Implement INDEX_op_extr{lh}_i64_i32 for 32-bit guests
  tcg/i386: Assume 32-bit values are zero-extended
  tcg/i386: Precompute all guest_base parameters
  tcg/i386: Add setup_guest_base_seg for FreeBSD
  tcg: Clean up generic bswap32
  tcg: Clean up generic bswap64
  tcg/optimize: Optimize bswap
  tcg: Add TCG_TARGET_HAS_MEMORY_BSWAP

 include/exec/tb-hash.h   |   4 +-
 include/{exec/tb-hash-xx.h => qemu/xxhash.h} |  47 --
 tcg/aarch64/tcg-target.h |   1 +
 tcg/arm/tcg-target.h |   1 +
 tcg/i386/tcg-target.h|  17 +--
 tcg/mips/tcg-target.h|   1 +
 tcg/ppc/tcg-target.h |   1 +
 tcg/s390/tcg-target.h|   1 +
 tcg/sparc/tcg-target.h   |   1 +
 tcg/tcg.h|   4 +-
 tcg/tci/tcg-target.h |   2 +
 tcg/aarch64/tcg-target.inc.c |  71 +++--
 tcg/arm/tcg-target.inc.c |  55 +++
 tcg/i386/tcg-target.inc.c| 208 --
 tcg/mips/tcg-target.inc.c|  12 +-
 tcg/optimize.c   |  16 +-
 tcg/ppc/tcg-target.inc.c |  60 
 tcg/s390/tcg-target.inc.c|  45 +++---
 tcg/sparc/tcg-target.inc.c   |  13 +-
 tcg/tcg-op.c | 215 ---
 tcg/tcg.c|  18 +--
 tcg/tci/tcg-target.inc.c |   3 +-
 tests/qht-bench.c|   5 +-
 util/qsp.c   |  14 +-
 24 files changed, 452 insertions(+), 363 deletions(-)
 rename include/{exec/tb-hash-xx.h => qemu/xxhash.h} (73%)

[Qemu-devel] [PULL 17/32] tcg/i386: Propagate is64 to tcg_out_qemu_ld_direct

2018-12-13 Thread Richard Henderson

This helps preserve the invariant that all TCG_TYPE_I32 values
are stored zero-extended in the 64-bit host registers.

Reviewed-by: Emilio G. Cota 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.inc.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 28192f4608..6bf4f84b20 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -1883,10 +1883,11 @@ static inline void setup_guest_base_seg(void) { }
 
 static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg datalo, TCGReg datahi,
TCGReg base, int index, intptr_t ofs,
-   int seg, TCGMemOp memop)
+   int seg, bool is64, TCGMemOp memop)
 {
 const TCGMemOp real_bswap = memop & MO_BSWAP;
 TCGMemOp bswap = real_bswap;
+int rexw = is64 * P_REXW;
 int movop = OPC_MOVL_GvEv;
 
 if (have_movbe && real_bswap) {
@@ -1900,7 +1901,7 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg 
datalo, TCGReg datahi,
  base, index, 0, ofs);
 break;
 case MO_SB:
-tcg_out_modrm_sib_offset(s, OPC_MOVSBL + P_REXW + seg, datalo,
+tcg_out_modrm_sib_offset(s, OPC_MOVSBL + rexw + seg, datalo,
  base, index, 0, ofs);
 break;
 case MO_UW:
@@ -1920,9 +1921,9 @@ static void tcg_out_qemu_ld_direct(TCGContext *s, TCGReg 
datalo, TCGReg datahi,
  base, index, 0, ofs);
 tcg_out_rolw_8(s, datalo);
 }
-tcg_out_modrm(s, OPC_MOVSWL + P_REXW, datalo, datalo);
+tcg_out_modrm(s, OPC_MOVSWL + rexw, datalo, datalo);
 } else {
-tcg_out_modrm_sib_offset(s, OPC_MOVSWL + P_REXW + seg,
+tcg_out_modrm_sib_offset(s, OPC_MOVSWL + rexw + seg,
  datalo, base, index, 0, ofs);
 }
 break;
@@ -2010,7 +2011,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is64)
  label_ptr, offsetof(CPUTLBEntry, addr_read));
 
 /* TLB Hit.  */
-tcg_out_qemu_ld_direct(s, datalo, datahi, TCG_REG_L1, -1, 0, 0, opc);
+tcg_out_qemu_ld_direct(s, datalo, datahi, TCG_REG_L1, -1, 0, 0, is64, opc);
 
 /* Record the current context of a load into ldst label */
 add_qemu_ldst_label(s, true, oi, datalo, datahi, addrlo, addrhi,
@@ -2045,7 +2046,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is64)
 }
 
 tcg_out_qemu_ld_direct(s, datalo, datahi,
-   base, index, offset, seg, opc);
+   base, index, offset, seg, is64, opc);
 }
 #endif
 }
-- 
2.17.2

[Qemu-devel] [PULL 13/32] tcg/aarch64: Return false on failure from patch_reloc

2018-12-13 Thread Richard Henderson

This does require an extra two checks within the slow paths
to replace the assert that we're moving.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 37 
 1 file changed, 21 insertions(+), 16 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 16f08c59c4..0562e0aa40 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -78,20 +78,26 @@ static const int tcg_target_call_oarg_regs[1] = {
 #define TCG_REG_GUEST_BASE TCG_REG_X28
 #endif
 
-static inline void reloc_pc26(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
+static inline bool reloc_pc26(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
 {
 ptrdiff_t offset = target - code_ptr;
-tcg_debug_assert(offset == sextract64(offset, 0, 26));
-/* read instruction, mask away previous PC_REL26 parameter contents,
-   set the proper offset, then write back the instruction. */
-*code_ptr = deposit32(*code_ptr, 0, 26, offset);
+if (offset == sextract64(offset, 0, 26)) {
+/* read instruction, mask away previous PC_REL26 parameter contents,
+   set the proper offset, then write back the instruction. */
+*code_ptr = deposit32(*code_ptr, 0, 26, offset);
+return true;
+}
+return false;
 }
 
-static inline void reloc_pc19(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
+static inline bool reloc_pc19(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
 {
 ptrdiff_t offset = target - code_ptr;
-tcg_debug_assert(offset == sextract64(offset, 0, 19));
-*code_ptr = deposit32(*code_ptr, 5, 19, offset);
+if (offset == sextract64(offset, 0, 19)) {
+*code_ptr = deposit32(*code_ptr, 5, 19, offset);
+return true;
+}
+return false;
 }
 
 static inline bool patch_reloc(tcg_insn_unit *code_ptr, int type,
@@ -101,15 +107,12 @@ static inline bool patch_reloc(tcg_insn_unit *code_ptr, 
int type,
 switch (type) {
 case R_AARCH64_JUMP26:
 case R_AARCH64_CALL26:
-reloc_pc26(code_ptr, (tcg_insn_unit *)value);
-break;
+return reloc_pc26(code_ptr, (tcg_insn_unit *)value);
 case R_AARCH64_CONDBR19:
-reloc_pc19(code_ptr, (tcg_insn_unit *)value);
-break;
+return reloc_pc19(code_ptr, (tcg_insn_unit *)value);
 default:
-tcg_abort();
+g_assert_not_reached();
 }
-return true;
 }
 
 #define TCG_CT_CONST_AIMM 0x100
@@ -1387,7 +1390,8 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 TCGMemOp opc = get_memop(oi);
 TCGMemOp size = opc & MO_SIZE;
 
-reloc_pc19(lb->label_ptr[0], s->code_ptr);
+bool ok = reloc_pc19(lb->label_ptr[0], s->code_ptr);
+tcg_debug_assert(ok);
 
 tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_X0, TCG_AREG0);
 tcg_out_mov(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
@@ -1409,7 +1413,8 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, 
TCGLabelQemuLdst *lb)
 TCGMemOp opc = get_memop(oi);
 TCGMemOp size = opc & MO_SIZE;
 
-reloc_pc19(lb->label_ptr[0], s->code_ptr);
+bool ok = reloc_pc19(lb->label_ptr[0], s->code_ptr);
+tcg_debug_assert(ok);
 
 tcg_out_mov(s, TCG_TYPE_PTR, TCG_REG_X0, TCG_AREG0);
 tcg_out_mov(s, TARGET_LONG_BITS == 64, TCG_REG_X1, lb->addrlo_reg);
-- 
2.17.2

[Qemu-devel] [PULL 09/32] tcg/sparc: Remove retranslation code

2018-12-13 Thread Richard Henderson

There is no longer a need for preserving branch offset operands,
as we no longer re-translate.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/sparc/tcg-target.inc.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 04bdc3df5e..671a04c54b 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -639,13 +639,11 @@ static void tcg_out_bpcc0(TCGContext *s, int scond, int 
flags, int off19)
 
 static void tcg_out_bpcc(TCGContext *s, int scond, int flags, TCGLabel *l)
 {
-int off19;
+int off19 = 0;
 
 if (l->has_value) {
 off19 = INSN_OFF19(tcg_pcrel_diff(s, l->u.value_ptr));
 } else {
-/* Make sure to preserve destinations during retranslation.  */
-off19 = *s->code_ptr & INSN_OFF19(-1);
 tcg_out_reloc(s, s->code_ptr, R_SPARC_WDISP19, l, 0);
 }
 tcg_out_bpcc0(s, scond, flags, off19);
@@ -685,13 +683,11 @@ static void tcg_out_brcond_i64(TCGContext *s, TCGCond 
cond, TCGReg arg1,
 {
 /* For 64-bit signed comparisons vs zero, we can avoid the compare.  */
 if (arg2 == 0 && !is_unsigned_cond(cond)) {
-int off16;
+int off16 = 0;
 
 if (l->has_value) {
 off16 = INSN_OFF16(tcg_pcrel_diff(s, l->u.value_ptr));
 } else {
-/* Make sure to preserve destinations during retranslation.  */
-off16 = *s->code_ptr & INSN_OFF16(-1);
 tcg_out_reloc(s, s->code_ptr, R_SPARC_WDISP16, l, 0);
 }
 tcg_out32(s, INSN_OP(0) | INSN_OP2(3) | BPR_PT | INSN_RS1(arg1)
-- 
2.17.2

[Qemu-devel] [PULL 07/32] tcg/ppc: Fold away "noaddr" branch routines

2018-12-13 Thread Richard Henderson

There is no longer a need for preserving branch offset operands,
as we no longer re-translate.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/ppc/tcg-target.inc.c | 25 +++--
 1 file changed, 7 insertions(+), 18 deletions(-)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index c2f729ee8f..2e2a22f579 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -210,18 +210,6 @@ static void reloc_pc14(tcg_insn_unit *pc, tcg_insn_unit 
*target)
 *pc = (*pc & ~0xfffc) | reloc_pc14_val(pc, target);
 }
 
-static inline void tcg_out_b_noaddr(TCGContext *s, int insn)
-{
-unsigned retrans = *s->code_ptr & 0x3fc;
-tcg_out32(s, insn | retrans);
-}
-
-static inline void tcg_out_bc_noaddr(TCGContext *s, int insn)
-{
-unsigned retrans = *s->code_ptr & 0xfffc;
-tcg_out32(s, insn | retrans);
-}
-
 /* parse target specific constraints */
 static const char *target_parse_constraint(TCGArgConstraint *ct,
const char *ct_str, TCGType type)
@@ -1179,11 +1167,11 @@ static void tcg_out_setcond(TCGContext *s, TCGType 
type, TCGCond cond,
 static void tcg_out_bc(TCGContext *s, int bc, TCGLabel *l)
 {
 if (l->has_value) {
-tcg_out32(s, bc | reloc_pc14_val(s->code_ptr, l->u.value_ptr));
+bc |= reloc_pc14_val(s->code_ptr, l->u.value_ptr);
 } else {
 tcg_out_reloc(s, s->code_ptr, R_PPC_REL14, l, 0);
-tcg_out_bc_noaddr(s, bc);
 }
+tcg_out32(s, bc);
 }
 
 static void tcg_out_brcond(TCGContext *s, TCGCond cond,
@@ -1771,7 +1759,7 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg 
*args, bool is_64)
 
 /* Load a pointer into the current opcode w/conditional branch-link. */
 label_ptr = s->code_ptr;
-tcg_out_bc_noaddr(s, BC | BI(7, CR_EQ) | BO_COND_FALSE | LK);
+tcg_out32(s, BC | BI(7, CR_EQ) | BO_COND_FALSE | LK);
 
 rbase = TCG_REG_R3;
 #else  /* !CONFIG_SOFTMMU */
@@ -1846,7 +1834,7 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg 
*args, bool is_64)
 
 /* Load a pointer into the current opcode w/conditional branch-link. */
 label_ptr = s->code_ptr;
-tcg_out_bc_noaddr(s, BC | BI(7, CR_EQ) | BO_COND_FALSE | LK);
+tcg_out32(s, BC | BI(7, CR_EQ) | BO_COND_FALSE | LK);
 
 rbase = TCG_REG_R3;
 #else  /* !CONFIG_SOFTMMU */
@@ -2044,13 +2032,14 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, 
const TCGArg *args,
 case INDEX_op_br:
 {
 TCGLabel *l = arg_label(args[0]);
+uint32_t insn = B;
 
 if (l->has_value) {
-tcg_out_b(s, 0, l->u.value_ptr);
+insn |= reloc_pc24_val(s->code_ptr, l->u.value_ptr);
 } else {
 tcg_out_reloc(s, s->code_ptr, R_PPC_REL24, l, 0);
-tcg_out_b_noaddr(s, B);
 }
+tcg_out32(s, insn);
 }
 break;
 case INDEX_op_ld8u_i32:
-- 
2.17.2

[Qemu-devel] [PULL 05/32] tcg/arm: Remove reloc_pc24_atomic

2018-12-13 Thread Richard Henderson

It is unused since 3fb53fb4d12f2e7833bd1659e6013237b130ef20.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/arm/tcg-target.inc.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index e1fbf465cb..1142eb13ad 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -193,14 +193,6 @@ static inline void reloc_pc24(tcg_insn_unit *code_ptr, 
tcg_insn_unit *target)
 *code_ptr = (*code_ptr & ~0xff) | (offset & 0xff);
 }
 
-static inline void reloc_pc24_atomic(tcg_insn_unit *code_ptr, tcg_insn_unit 
*target)
-{
-ptrdiff_t offset = (tcg_ptr_byte_diff(target, code_ptr) - 8) >> 2;
-tcg_insn_unit insn = atomic_read(code_ptr);
-tcg_debug_assert(offset == sextract32(offset, 0, 24));
-atomic_set(code_ptr, deposit32(insn, 0, 24, offset));
-}
-
 static void patch_reloc(tcg_insn_unit *code_ptr, int type,
 intptr_t value, intptr_t addend)
 {
-- 
2.17.2

[Qemu-devel] [PATCH] fixup! target/arm: Move id_aa64mmfr* to ARMISARegisters

2018-12-13 Thread Richard Henderson

I didn't get this fix pushed back into the patch set that I actually
sent last week.  The patch is in target-arm.next, and I'm sure you
would have eventually seen the error in testing.


r~
---
 target/arm/kvm64.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/kvm64.c b/target/arm/kvm64.c
index ad83e1479c..089af9c5f0 100644
--- a/target/arm/kvm64.c
+++ b/target/arm/kvm64.c
@@ -538,9 +538,9 @@ bool kvm_arm_get_host_cpu_features(ARMHostCPUFeatures *ahcf)
   ARM64_SYS_REG(3, 0, 0, 6, 0));
 err |= read_sys_reg64(fdarray[2], >isar.id_aa64isar1,
   ARM64_SYS_REG(3, 0, 0, 6, 1));
-err |= read_sys_reg64(fdarray[2], >isar.id_aa64mmfr0,
+err |= read_sys_reg64(fdarray[2], >isar.id_aa64mmfr0,
   ARM64_SYS_REG(3, 0, 0, 7, 0));
-err |= read_sys_reg64(fdarray[2], >isar.id_aa64mmfr1,
+err |= read_sys_reg64(fdarray[2], >isar.id_aa64mmfr1,
   ARM64_SYS_REG(3, 0, 0, 7, 1));
 
 /*
-- 
2.17.2

[Qemu-devel] [PULL 10/32] tcg/mips: Remove retranslation code

2018-12-13 Thread Richard Henderson

There is no longer a need for preserving branch offset operands,
as we no longer re-translate.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/mips/tcg-target.inc.c | 7 +--
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index cff525373b..e21cb1ae28 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -483,12 +483,7 @@ static inline void tcg_out_opc_bf64(TCGContext *s, 
MIPSInsn opc, MIPSInsn opm,
 static inline void tcg_out_opc_br(TCGContext *s, MIPSInsn opc,
   TCGReg rt, TCGReg rs)
 {
-/* We pay attention here to not modify the branch target by reading
-   the existing value and using it again. This ensure that caches and
-   memory are kept coherent during retranslation. */
-uint16_t offset = (uint16_t)*s->code_ptr;
-
-tcg_out_opc_imm(s, opc, rt, rs, offset);
+tcg_out_opc_imm(s, opc, rt, rs, 0);
 }
 
 /*
-- 
2.17.2

[Qemu-devel] [PULL 02/32] tcg/i386: Move TCG_REG_CALL_STACK from define to enum

2018-12-13 Thread Richard Henderson

Reviewed-by: Alex Bennée 
Reviewed-by: Emilio G. Cota 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 7488c3d869..2441658865 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -86,10 +86,10 @@ typedef enum {
 TCG_REG_RDI = TCG_REG_EDI,
 
 TCG_AREG0 = TCG_REG_EBP,
+TCG_REG_CALL_STACK = TCG_REG_ESP
 } TCGReg;
 
 /* used for function call generation */
-#define TCG_REG_CALL_STACK TCG_REG_ESP 
 #define TCG_TARGET_STACK_ALIGN 16
 #if defined(_WIN64)
 #define TCG_TARGET_CALL_STACK_OFFSET 32
-- 
2.17.2

[Qemu-devel] [PULL 01/32] tcg/i386: Always use %ebp for TCG_AREG0

2018-12-13 Thread Richard Henderson

For x86_64, this can remove a REX prefix resulting in smaller code
when manipulating globals of type i32, as we move them between backing
store via cpu_env, aka TCG_AREG0.

Reviewed-by: Alex Bennée 
Reviewed-by: Emilio G. Cota 
Signed-off-by: Richard Henderson 
---
 tcg/i386/tcg-target.h | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/tcg/i386/tcg-target.h b/tcg/i386/tcg-target.h
index 9fdf37f23c..7488c3d869 100644
--- a/tcg/i386/tcg-target.h
+++ b/tcg/i386/tcg-target.h
@@ -84,6 +84,8 @@ typedef enum {
 TCG_REG_RBP = TCG_REG_EBP,
 TCG_REG_RSI = TCG_REG_ESI,
 TCG_REG_RDI = TCG_REG_EDI,
+
+TCG_AREG0 = TCG_REG_EBP,
 } TCGReg;
 
 /* used for function call generation */
@@ -194,12 +196,6 @@ extern bool have_avx2;
 #define TCG_TARGET_extract_i64_valid(ofs, len) \
 (((ofs) == 8 && (len) == 8) || ((ofs) + (len)) == 32)
 
-#if TCG_TARGET_REG_BITS == 64
-# define TCG_AREG0 TCG_REG_R14
-#else
-# define TCG_AREG0 TCG_REG_EBP
-#endif
-
 static inline void flush_icache_range(uintptr_t start, uintptr_t stop)
 {
 }
-- 
2.17.2

[Qemu-devel] [PULL 03/32] tcg/aarch64: Remove reloc_pc26_atomic

2018-12-13 Thread Richard Henderson

It is unused since b68686bd4bfeb70040b4099df993dfa0b4f37b03.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 tcg/aarch64/tcg-target.inc.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 083592a4d7..a41b633960 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -87,18 +87,6 @@ static inline void reloc_pc26(tcg_insn_unit *code_ptr, 
tcg_insn_unit *target)
 *code_ptr = deposit32(*code_ptr, 0, 26, offset);
 }
 
-static inline void reloc_pc26_atomic(tcg_insn_unit *code_ptr,
- tcg_insn_unit *target)
-{
-ptrdiff_t offset = target - code_ptr;
-tcg_insn_unit insn;
-tcg_debug_assert(offset == sextract64(offset, 0, 26));
-/* read instruction, mask away previous PC_REL26 parameter contents,
-   set the proper offset, then write back the instruction. */
-insn = atomic_read(code_ptr);
-atomic_set(code_ptr, deposit32(insn, 0, 26, offset));
-}
-
 static inline void reloc_pc19(tcg_insn_unit *code_ptr, tcg_insn_unit *target)
 {
 ptrdiff_t offset = target - code_ptr;
-- 
2.17.2

Re: [Qemu-devel] [PATCH 1/3] memory_ldst: Add atomic ops for PTE updates

2018-12-13 Thread Richard Henderson

On 12/13/18 5:58 PM, Benjamin Herrenschmidt wrote:
> +#ifdef CONFIG_ATOMIC64
> +/* This is meant to be used for atomic PTE updates under MT-TCG */
> +uint32_t glue(address_space_cmpxchgq_notdirty, SUFFIX)(ARG1_DECL,
> +hwaddr addr, uint64_t old, uint64_t new, MemTxAttrs attrs, MemTxResult 
> *result)
> +{
> +uint8_t *ptr;
> +MemoryRegion *mr;
> +hwaddr l = 8;
> +hwaddr addr1;
> +MemTxResult r;
> +uint8_t dirty_log_mask;
> +
> +/* Must test result */
> +assert(result);
> +
> +RCU_READ_LOCK();
> +mr = TRANSLATE(addr, , , true, attrs);
> +if (l < 8 || !memory_access_is_direct(mr, true)) {
> +r = MEMTX_ERROR;
> +} else {
> +uint32_t orig = old;
> +
> +ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
> +old = atomic_cmpxchg(ptr, orig, new);
> +

I think you need atomic_cmpxchg__nocheck here.

Failure would be with a 32-bit host that supports ATOMIC64.
E.g. i686.


r~

[Qemu-devel] [PATCH qemu v2] hmp: Print if memory section is registered with an accelerator

2018-12-13 Thread Alexey Kardashevskiy

This adds an accelerator name to the "into mtree -f" to tell the user if
a particular memory section is registered with the accelerator;
the primary user for this is KVM and such information is useful
for debugging purposes.

This adds a has_memory() callback to the accelerator class allowing any
accelerator to have a label in that memory tree dump.

Since memory sections are passed to memory listeners and get registered
in accelerators (rather than memory regions), this only prints new labels
for flatviews attached to the system address space.

An example:
 Root memory region: system
  -002f (prio 0, ram): /objects/mem0 kvm
  0030-005f (prio 0, ram): /objects/mem1 kvm
  2020-203f (prio 1, i/o): virtio-pci
  20008000-2000803f (prio 0, i/o): capabilities

Signed-off-by: Alexey Kardashevskiy 
---

This supercedes "[PATCH qemu] hmp: Print if memory section is registered in KVM"

---
Changes:
v2:
* added an accelerator callback instead of hardcoding it to kvm only
---
 include/sysemu/accel.h |  2 ++
 accel/kvm/kvm-all.c| 10 ++
 memory.c   | 22 ++
 3 files changed, 34 insertions(+)

diff --git a/include/sysemu/accel.h b/include/sysemu/accel.h
index 637358f..30b456d 100644
--- a/include/sysemu/accel.h
+++ b/include/sysemu/accel.h
@@ -25,6 +25,7 @@
 
 #include "qom/object.h"
 #include "hw/qdev-properties.h"
+#include "exec/hwaddr.h"
 
 typedef struct AccelState {
 /*< private >*/
@@ -41,6 +42,7 @@ typedef struct AccelClass {
 int (*available)(void);
 int (*init_machine)(MachineState *ms);
 void (*setup_post)(MachineState *ms, AccelState *accel);
+bool (*has_memory)(MachineState *ms, hwaddr start_addr, hwaddr size);
 bool *allowed;
 /*
  * Array of global properties that would be applied when specific
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 4880a05..634f386 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -2589,11 +2589,21 @@ int kvm_get_one_reg(CPUState *cs, uint64_t id, void 
*target)
 return r;
 }
 
+static bool kvm_accel_has_memory(MachineState *ms, hwaddr start_addr,
+ hwaddr size)
+{
+KVMState *kvm = KVM_STATE(ms->accelerator);
+KVMMemoryListener *kml = >memory_listener;
+
+return NULL != kvm_lookup_matching_slot(kml, start_addr, size);
+}
+
 static void kvm_accel_class_init(ObjectClass *oc, void *data)
 {
 AccelClass *ac = ACCEL_CLASS(oc);
 ac->name = "KVM";
 ac->init_machine = kvm_init;
+ac->has_memory = kvm_accel_has_memory;
 ac->allowed = _allowed;
 }
 
diff --git a/memory.c b/memory.c
index d14c6de..61e758a 100644
--- a/memory.c
+++ b/memory.c
@@ -29,7 +29,9 @@
 #include "exec/ram_addr.h"
 #include "sysemu/kvm.h"
 #include "sysemu/sysemu.h"
+#include "sysemu/accel.h"
 #include "hw/qdev-properties.h"
+#include "hw/boards.h"
 #include "migration/vmstate.h"
 
 //#define DEBUG_UNASSIGNED
@@ -2924,6 +2926,8 @@ struct FlatViewInfo {
 int counter;
 bool dispatch_tree;
 bool owner;
+AccelClass *ac;
+const char *ac_name;
 };
 
 static void mtree_print_flatview(gpointer key, gpointer value,
@@ -2939,6 +2943,7 @@ static void mtree_print_flatview(gpointer key, gpointer 
value,
 int n = view->nr;
 int i;
 AddressSpace *as;
+bool system_as = false;
 
 p(f, "FlatView #%d\n", fvi->counter);
 ++fvi->counter;
@@ -2950,6 +2955,9 @@ static void mtree_print_flatview(gpointer key, gpointer 
value,
 p(f, ", alias %s", memory_region_name(as->root->alias));
 }
 p(f, "\n");
+if (as == _space_memory) {
+system_as = true;
+}
 }
 
 p(f, " Root memory region: %s\n",
@@ -2985,6 +2993,13 @@ static void mtree_print_flatview(gpointer key, gpointer 
value,
 if (fvi->owner) {
 mtree_print_mr_owner(p, f, mr);
 }
+
+if (system_as && fvi->ac &&
+fvi->ac->has_memory(current_machine,
+int128_get64(range->addr.start),
+MR_SIZE(range->addr.size) + 1)) {
+p(f, " %s", fvi->ac_name);
+}
 p(f, "\n");
 range++;
 }
@@ -3028,6 +3043,13 @@ void mtree_info(fprintf_function mon_printf, void *f, 
bool flatview,
 };
 GArray *fv_address_spaces;
 GHashTable *views = g_hash_table_new(g_direct_hash, g_direct_equal);
+AccelClass *ac = ACCEL_GET_CLASS(current_machine->accelerator);
+
+if (ac->has_memory) {
+fvi.ac = ac;
+fvi.ac_name = current_machine->accel ? current_machine->accel :
+object_class_get_name(OBJECT_CLASS(ac));
+}
 
 /* Gather all FVs in one table */
 QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
-- 
2.17.1

Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting

2018-12-13 Thread Yongji Xie

On Fri, 14 Dec 2018 at 10:20, Michael S. Tsirkin  wrote:
>
> On Fri, Dec 14, 2018 at 09:56:41AM +0800, Yongji Xie wrote:
> > On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin  wrote:
> > >
> > > On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohi...@gmail.com wrote:
> > > > From: Xie Yongji 
> > > >
> > > > This patchset is aimed at supporting qemu to reconnect
> > > > vhost-user-blk backend after vhost-user-blk backend crash or
> > > > restart.
> > > >
> > > > The patch 1 tries to implenment the sync connection for
> > > > "reconnect socket".
> > > >
> > > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > > > to support offering shared memory to backend to record
> > > > its inflight I/O.
> > > >
> > > > The patch 3,4 are the corresponding libvhost-user patches of
> > > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> > > >
> > > > The patch 5 supports vhost-user-blk to reconnect backend when
> > > > connection closed.
> > > >
> > > > The patch 6 tells qemu that we support reconnecting now.
> > > >
> > > > To use it, we could start qemu with:
> > > >
> > > > qemu-system-x86_64 \
> > > > -chardev 
> > > > socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> > > > -device vhost-user-blk-pci,chardev=char0 \
> > > >
> > > > and start vhost-user-blk backend with:
> > > >
> > > > vhost-user-blk -b /path/file -s /path/vhost.socket
> > > >
> > > > Then we can restart vhost-user-blk at any time during VM running.
> > > >
> > > > Xie Yongji (6):
> > > >   char-socket: Enable "wait" option for client mode
> > > >   vhost-user: Add shared memory to record inflight I/O
> > > >   libvhost-user: Introduce vu_queue_map_desc()
> > > >   libvhost-user: Support recording inflight I/O in shared memory
> > > >   vhost-user-blk: Add support for reconnecting backend
> > > >   contrib/vhost-user-blk: enable inflight I/O recording
> > >
> > > What is missing in all this is documentation.
> > > Specifically docs/interop/vhost-user.txt.
> > >
> > > At a high level the design is IMO a good one.
> > >
> > > However I would prefer reading the protocol first before
> > > the code.
> > >
> > > So here's what I managed to figure out, and it matches
> > > how I imagined it would work when I was still
> > > thinking about out of order for net:
> > >
> > > - backend allocates memory to keep its stuff around
> > > - sends it to qemu so it can maintain it
> > > - gets it back on reconnect
> > >
> > > format and size etc are all up to the backend,
> > > a good implementation would probably implement some
> > > kind of versioning.
> > >
> > > Is this what this implements?
> > >
> >
> > Definitely, yes. And the comments looks good to me. Qemu get size and
> > version from backend, then allocate memory and send it back with
> > version. Backend knows how to use the memory according to the version.
> > If we do that, should we allocate the memory per device rather than
> > per virtqueue?
> >
> > Thanks,
> > Yongji
>
> It's up to you. Maybe both.
>

OK. I think I may still keep it in virtqueue level in v2. Thank you.

Thanks,
Yongji

Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting

2018-12-13 Thread Michael S. Tsirkin

On Fri, Dec 14, 2018 at 09:56:41AM +0800, Yongji Xie wrote:
> On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin  wrote:
> >
> > On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohi...@gmail.com wrote:
> > > From: Xie Yongji 
> > >
> > > This patchset is aimed at supporting qemu to reconnect
> > > vhost-user-blk backend after vhost-user-blk backend crash or
> > > restart.
> > >
> > > The patch 1 tries to implenment the sync connection for
> > > "reconnect socket".
> > >
> > > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > > to support offering shared memory to backend to record
> > > its inflight I/O.
> > >
> > > The patch 3,4 are the corresponding libvhost-user patches of
> > > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> > >
> > > The patch 5 supports vhost-user-blk to reconnect backend when
> > > connection closed.
> > >
> > > The patch 6 tells qemu that we support reconnecting now.
> > >
> > > To use it, we could start qemu with:
> > >
> > > qemu-system-x86_64 \
> > > -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait 
> > > \
> > > -device vhost-user-blk-pci,chardev=char0 \
> > >
> > > and start vhost-user-blk backend with:
> > >
> > > vhost-user-blk -b /path/file -s /path/vhost.socket
> > >
> > > Then we can restart vhost-user-blk at any time during VM running.
> > >
> > > Xie Yongji (6):
> > >   char-socket: Enable "wait" option for client mode
> > >   vhost-user: Add shared memory to record inflight I/O
> > >   libvhost-user: Introduce vu_queue_map_desc()
> > >   libvhost-user: Support recording inflight I/O in shared memory
> > >   vhost-user-blk: Add support for reconnecting backend
> > >   contrib/vhost-user-blk: enable inflight I/O recording
> >
> > What is missing in all this is documentation.
> > Specifically docs/interop/vhost-user.txt.
> >
> > At a high level the design is IMO a good one.
> >
> > However I would prefer reading the protocol first before
> > the code.
> >
> > So here's what I managed to figure out, and it matches
> > how I imagined it would work when I was still
> > thinking about out of order for net:
> >
> > - backend allocates memory to keep its stuff around
> > - sends it to qemu so it can maintain it
> > - gets it back on reconnect
> >
> > format and size etc are all up to the backend,
> > a good implementation would probably implement some
> > kind of versioning.
> >
> > Is this what this implements?
> >
> 
> Definitely, yes. And the comments looks good to me. Qemu get size and
> version from backend, then allocate memory and send it back with
> version. Backend knows how to use the memory according to the version.
> If we do that, should we allocate the memory per device rather than
> per virtqueue?
> 
> Thanks,
> Yongji

It's up to you. Maybe both.

-- 
MST

Re: [Qemu-devel] [PATCH for-4.0 0/6] vhost-user-blk: Add support for backend reconnecting

2018-12-13 Thread Yongji Xie

On Thu, 13 Dec 2018 at 22:45, Michael S. Tsirkin  wrote:
>
> On Thu, Dec 06, 2018 at 02:35:46PM +0800, elohi...@gmail.com wrote:
> > From: Xie Yongji 
> >
> > This patchset is aimed at supporting qemu to reconnect
> > vhost-user-blk backend after vhost-user-blk backend crash or
> > restart.
> >
> > The patch 1 tries to implenment the sync connection for
> > "reconnect socket".
> >
> > The patch 2 introduces a new message VHOST_USER_SET_VRING_INFLIGHT
> > to support offering shared memory to backend to record
> > its inflight I/O.
> >
> > The patch 3,4 are the corresponding libvhost-user patches of
> > patch 2. Make libvhost-user support VHOST_USER_SET_VRING_INFLIGHT.
> >
> > The patch 5 supports vhost-user-blk to reconnect backend when
> > connection closed.
> >
> > The patch 6 tells qemu that we support reconnecting now.
> >
> > To use it, we could start qemu with:
> >
> > qemu-system-x86_64 \
> > -chardev socket,id=char0,path=/path/vhost.socket,reconnect=1,wait \
> > -device vhost-user-blk-pci,chardev=char0 \
> >
> > and start vhost-user-blk backend with:
> >
> > vhost-user-blk -b /path/file -s /path/vhost.socket
> >
> > Then we can restart vhost-user-blk at any time during VM running.
> >
> > Xie Yongji (6):
> >   char-socket: Enable "wait" option for client mode
> >   vhost-user: Add shared memory to record inflight I/O
> >   libvhost-user: Introduce vu_queue_map_desc()
> >   libvhost-user: Support recording inflight I/O in shared memory
> >   vhost-user-blk: Add support for reconnecting backend
> >   contrib/vhost-user-blk: enable inflight I/O recording
>
> What is missing in all this is documentation.
> Specifically docs/interop/vhost-user.txt.
>
> At a high level the design is IMO a good one.
>
> However I would prefer reading the protocol first before
> the code.
>
> So here's what I managed to figure out, and it matches
> how I imagined it would work when I was still
> thinking about out of order for net:
>
> - backend allocates memory to keep its stuff around
> - sends it to qemu so it can maintain it
> - gets it back on reconnect
>
> format and size etc are all up to the backend,
> a good implementation would probably implement some
> kind of versioning.
>
> Is this what this implements?
>

Definitely, yes. And the comments looks good to me. Qemu get size and
version from backend, then allocate memory and send it back with
version. Backend knows how to use the memory according to the version.
If we do that, should we allocate the memory per device rather than
per virtqueue?

Thanks,
Yongji

[Qemu-devel] [PATCH qemu v3] ppc/spapr: Receive and store device tree blob from SLOF

2018-12-13 Thread Alexey Kardashevskiy

SLOF receives a device tree and updates it with various properties
before switching to the guest kernel and QEMU is not aware of any changes
made by SLOF. Since there is no real RTAS (QEMU implements it), it makes
sense to pass the SLOF final device tree to QEMU to let it implement
RTAS related tasks better, such as PCI host bus adapter hotplug.

Specifially, now QEMU can find out the actual XICS phandle (for PHB
hotplug) and the RTAS linux,rtas-entry/base properties (for firmware
assisted NMI - FWNMI).

This stores the initial DT blob in the sPAPR machine and replaces it
in the KVMPPC_H_UPDATE_DT (new private hypercall) handler.

This adds an @update_dt_enabled machine property to allow backward
migration.

SLOF already has a hypercall since
https://github.com/aik/SLOF/commit/e6fc84652c9c0073f9183

This makes use of the new fdt_check_full() helper. In order to allow
the configure script to pick the correct DTC version, this adjusts
the DTC presense test.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v3:
* fixed leaked fdt_blob during migration
---
 configure  |  2 +-
 include/hw/ppc/spapr.h |  7 ++-
 hw/ppc/spapr.c | 43 +-
 hw/ppc/spapr_hcall.c   | 42 +
 hw/ppc/trace-events|  3 +++
 5 files changed, 94 insertions(+), 3 deletions(-)

diff --git a/configure b/configure
index 0a3c6a7..e5312da 100755
--- a/configure
+++ b/configure
@@ -3880,7 +3880,7 @@ if test "$fdt" != "no" ; then
   cat > $TMPC << EOF
 #include 
 #include 
-int main(void) { fdt_first_subnode(0, 0); return 0; }
+int main(void) { fdt_check_full(NULL, 0); return 0; }
 EOF
   if compile_prog "" "$fdt_libs" ; then
 # system DTC is good - use it
diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h
index 1987640..86c90df 100644
--- a/include/hw/ppc/spapr.h
+++ b/include/hw/ppc/spapr.h
@@ -102,6 +102,7 @@ struct sPAPRMachineClass {
 
 /*< public >*/
 bool dr_lmb_enabled;   /* enable dynamic-reconfig/hotplug of LMBs */
+bool update_dt_enabled;/* enable KVMPPC_H_UPDATE_DT */
 bool use_ohci_by_default;  /* use USB-OHCI instead of XHCI */
 bool pre_2_10_has_unused_icps;
 bool legacy_irq_allocation;
@@ -138,6 +139,9 @@ struct sPAPRMachineState {
 int vrma_adjust;
 ssize_t rtas_size;
 void *rtas_blob;
+uint32_t fdt_size;
+uint32_t fdt_initial_size;
+void *fdt_blob;
 long kernel_size;
 bool kernel_le;
 uint32_t initrd_base;
@@ -464,7 +468,8 @@ struct sPAPRMachineState {
 #define KVMPPC_H_LOGICAL_MEMOP  (KVMPPC_HCALL_BASE + 0x1)
 /* Client Architecture support */
 #define KVMPPC_H_CAS(KVMPPC_HCALL_BASE + 0x2)
-#define KVMPPC_HCALL_MAXKVMPPC_H_CAS
+#define KVMPPC_H_UPDATE_DT  (KVMPPC_HCALL_BASE + 0x3)
+#define KVMPPC_HCALL_MAXKVMPPC_H_UPDATE_DT
 
 typedef struct sPAPRDeviceTreeUpdateHeader {
 uint32_t version_id;
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 8a18250..42752bd 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1654,7 +1654,10 @@ static void spapr_machine_reset(void)
 /* Load the fdt */
 qemu_fdt_dumpdtb(fdt, fdt_totalsize(fdt));
 cpu_physical_memory_write(fdt_addr, fdt, fdt_totalsize(fdt));
-g_free(fdt);
+g_free(spapr->fdt_blob);
+spapr->fdt_size = fdt_totalsize(fdt);
+spapr->fdt_initial_size = spapr->fdt_size;
+spapr->fdt_blob = fdt;
 
 /* Set up the entry state */
 spapr_cpu_set_entry_state(first_ppc_cpu, SPAPR_ENTRY_POINT, fdt_addr);
@@ -1908,6 +1911,39 @@ static const VMStateDescription vmstate_spapr_irq_map = {
 },
 };
 
+static bool spapr_dtb_needed(void *opaque)
+{
+sPAPRMachineClass *smc = SPAPR_MACHINE_GET_CLASS(opaque);
+
+return smc->update_dt_enabled;
+}
+
+static int spapr_dtb_pre_load(void *opaque)
+{
+sPAPRMachineState *spapr = (sPAPRMachineState *)opaque;
+
+g_free(spapr->fdt_blob);
+spapr->fdt_blob = NULL;
+spapr->fdt_size = 0;
+
+return 0;
+}
+
+static const VMStateDescription vmstate_spapr_dtb = {
+.name = "spapr_dtb",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = spapr_dtb_needed,
+.pre_load = spapr_dtb_pre_load,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(fdt_initial_size, sPAPRMachineState),
+VMSTATE_UINT32(fdt_size, sPAPRMachineState),
+VMSTATE_VBUFFER_ALLOC_UINT32(fdt_blob, sPAPRMachineState, 0, NULL,
+ fdt_size),
+VMSTATE_END_OF_LIST()
+},
+};
+
 static const VMStateDescription vmstate_spapr = {
 .name = "spapr",
 .version_id = 3,
@@ -1937,6 +1973,7 @@ static const VMStateDescription vmstate_spapr = {
 _spapr_cap_ibs,
 _spapr_irq_map,
 _spapr_cap_nested_kvm_hv,
+_spapr_dtb,
 NULL
 }
 };
@@ -3871,6 +3908,7 @@ static void spapr_machine_class_init(ObjectClass *oc, 
void *data)
 hc->unplug = spapr_machine_device_unplug;
 
 smc->dr_lmb_enabled = true;
+

Re: [Qemu-devel] [PATCH v2] util: check the return value of fcntl in qemu_set_{block, nonblock}

2018-12-13 Thread Li Qiang

Hi all,

Here is the error.

  GTESTER check-qtest-x86_64
Unable to get file status flag on fd 21860: Bad file descriptor(errno=9)
  GTESTER check-qtest-aarch64
Broken pipe
GTester: last random seed: R02S3f0d6981dd97231d06e0b2966baf94b9
Unable to get file status flag on fd 21965: Bad file descriptor(errno=9)
Broken pipe
GTester: last random seed: R02S29fde958e7ee4c26c4f295ff4dbd47d4
Unable to get file status flag on fd 21890: Bad file descriptor(errno=9)
Broken pipe
GTester: last random seed: R02S6d074187e5c8501255c96b247f5c8e3f
Unable to get file status flag on fd 21923: Bad file descriptor(errno=9)
Broken pipe
GTester: last random seed: R02S446127f38eb9e8b4f181e6fc95026ba0
  GTESTER tests/test-qht-par
Could not access KVM kernel module: No such file or directory


The fd '21860' '21965', '21890', '21923' is a little strange. Following is
my guess.

21280 --> 0x5564
21965 --> 0x55CD
21890 --> 0x5582
21923 --> 0x55A3

Seems they are stack uninitialized value which 'fd's memory holds.
Seems 'qemu_chr_fe_get_msgfds' first failed, then the 'fd' is an
uninitialized value
cause my first patch 'assert' fails.

Thanks,
Li Qiang



 于2018年12月14日周五 上午12:01写道：

> Patchew URL:
> https://patchew.org/QEMU/1544701071-2922-1-git-send-email-liq...@gmail.com/
>
>
>
> Hi,
>
> This series failed the docker-quick@centos7 build test. Please find the
> testing commands and
> their output below. If you have Docker installed, you can probably
> reproduce it
> locally.
>
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> time make docker-test-quick@centos7 SHOW_ENV=1 J=8
> === TEST SCRIPT END ===
>
> libpmem support   no
> libudev   no
>
> WARNING: Use of SDL 1.2 is deprecated and will be removed in
> WARNING: future releases. Please switch to using SDL 2.0
>
> NOTE: cross-compilers enabled:  'cc'
>   GEN x86_64-softmmu/config-devices.mak.tmp
>
>
> The full log is available at
>
> http://patchew.org/logs/1544701071-2922-1-git-send-email-liq...@gmail.com/testing.docker-quick@centos7/?type=message
> .
> ---
> Email generated automatically by Patchew [http://patchew.org/].
> Please send your feedback to patchew-de...@redhat.com

Re: [Qemu-devel] [PATCH 2/2] avoid TABs in files that only contain a few

2018-12-13 Thread Michael S. Tsirkin

On Thu, Dec 13, 2018 at 11:37:37PM +0100, Paolo Bonzini wrote:
> Most files that have TABs only contain a handful of them.  Change
> them to spaces so that we don't confuse people.
> 
> disas, standard-headers, linux-headers and libdecnumber are imported
> from other projects and probably should be exempted from the check.

For sure for standard-headers, linux-headers since if someone
does contribute a patch we want them to contribue upstream.

> Outside those, after this patch the following files still contain both
> 8-space and TAB sequences at the beginning of the line.  Many of them
> have a majority of TABs, or were initially committed with all tabs.
> 
> bsd-user/i386/target_syscall.h
> bsd-user/x86_64/target_syscall.h
> crypto/aes.c
> hw/audio/fmopl.c
> hw/audio/fmopl.h
> hw/block/tc58128.c
> hw/display/cirrus_vga.c
> hw/display/xenfb.c
> hw/dma/etraxfs_dma.c
> hw/intc/sh_intc.c
> hw/misc/mst_fpga.c
> hw/net/pcnet.c
> hw/sh4/sh7750.c
> hw/timer/m48t59.c
> hw/timer/sh_timer.c
> include/crypto/aes.h
> include/disas/bfd.h
> include/hw/sh4/sh.h
> libdecnumber/decNumber.c
> linux-headers/asm-generic/unistd.h
> linux-headers/linux/kvm.h
> linux-user/alpha/target_syscall.h
> linux-user/arm/nwfpe/double_cpdo.c
> linux-user/arm/nwfpe/fpa11_cpdt.c
> linux-user/arm/nwfpe/fpa11_cprt.c
> linux-user/arm/nwfpe/fpa11.h
> linux-user/flat.h
> linux-user/flatload.c
> linux-user/i386/target_syscall.h
> linux-user/ppc/target_syscall.h
> linux-user/sparc/target_syscall.h
> linux-user/syscall.c
> linux-user/syscall_defs.h
> linux-user/x86_64/target_syscall.h
> slirp/cksum.c
> slirp/if.c
> slirp/ip.h
> slirp/ip_icmp.c
> slirp/ip_icmp.h
> slirp/ip_input.c
> slirp/ip_output.c
> slirp/mbuf.c
> slirp/misc.c
> slirp/sbuf.c
> slirp/socket.c
> slirp/socket.h
> slirp/tcp_input.c
> slirp/tcpip.h
> slirp/tcp_output.c
> slirp/tcp_subr.c
> slirp/tcp_timer.c
> slirp/tftp.c
> slirp/udp.c
> slirp/udp.h
> target/cris/cpu.h
> target/cris/mmu.c
> target/cris/op_helper.c
> target/sh4/helper.c
> target/sh4/op_helper.c
> target/sh4/translate.c
> tcg/sparc/tcg-target.inc.c
> tests/tcg/cris/check_addo.c
> tests/tcg/cris/check_moveq.c
> tests/tcg/cris/check_swap.c
> tests/tcg/multiarch/test-mmap.c
> ui/vnc-enc-hextile-template.h
> ui/vnc-enc-zywrle.h
> util/envlist.c
> util/readline.c
> 
> The following have only TABs:
> 
> bsd-user/i386/target_signal.h
> bsd-user/sparc64/target_signal.h
> bsd-user/sparc64/target_syscall.h
> bsd-user/sparc/target_signal.h
> bsd-user/sparc/target_syscall.h
> bsd-user/x86_64/target_signal.h
> crypto/desrfb.c
> hw/audio/intel-hda-defs.h
> hw/core/uboot_image.h
> hw/sh4/sh7750_regnames.c
> hw/sh4/sh7750_regs.h
> include/hw/cris/etraxfs_dma.h
> linux-user/alpha/termbits.h
> linux-user/arm/nwfpe/fpopcode.h
> linux-user/arm/nwfpe/fpsr.h
> linux-user/arm/syscall_nr.h
> linux-user/arm/target_signal.h
> linux-user/cris/target_signal.h
> linux-user/i386/target_signal.h
> linux-user/linux_loop.h
> linux-user/m68k/target_signal.h
> linux-user/microblaze/target_signal.h
> linux-user/mips64/target_signal.h
> linux-user/mips/target_signal.h
> linux-user/mips/target_syscall.h
> linux-user/mips/termbits.h
> linux-user/ppc/target_signal.h
> linux-user/sh4/target_signal.h
> linux-user/sh4/termbits.h
> linux-user/sparc64/target_syscall.h
> linux-user/sparc/target_signal.h
> linux-user/x86_64/target_signal.h
> linux-user/x86_64/termbits.h
> pc-bios/optionrom/optionrom.h
> slirp/mbuf.h
> slirp/misc.h
> slirp/sbuf.h
> slirp/tcp.h
> slirp/tcp_timer.h
> slirp/tcp_var.h
> target/i386/svm.h
> target/sparc/asi.h
> target/xtensa/core-dc232b/xtensa-modules.inc.c
> target/xtensa/core-dc233c/xtensa-modules.inc.c
> target/xtensa/core-de212/core-isa.h
> target/xtensa/core-de212/xtensa-modules.inc.c
> target/xtensa/core-fsf/xtensa-modules.inc.c
> target/xtensa/core-sample_controller/core-isa.h
> target/xtensa/core-sample_controller/xtensa-modules.inc.c
> target/xtensa/core-test_kc705_be/core-isa.h
> target/xtensa/core-test_kc705_be/xtensa-modules.inc.c
> tests/tcg/cris/check_abs.c
> tests/tcg/cris/check_addc.c
> tests/tcg/cris/check_addcm.c
> tests/tcg/cris/check_addoq.c
> tests/tcg/cris/check_bound.c
> tests/tcg/cris/check_ftag.c
> tests/tcg/cris/check_int64.c
> tests/tcg/cris/check_lz.c
> tests/tcg/cris/check_openpf5.c
> tests/tcg/cris/check_sigalrm.c
> tests/tcg/cris/crisutils.h
> tests/tcg/cris/sys.c
> tests/tcg/i386/test-i386-ssse3.c
> ui/vgafont.h
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  block/bochs.c

Re: [Qemu-devel] [PATCH 0/4] xxhash patches for 4.0

2018-12-13 Thread Richard Henderson

On 11/23/18 5:02 PM, Emilio G. Cota wrote:
> (Plus a qht-bench trivial patch.)
> 
> Note that these apply on top of rth's tcg-next-for-4.0.
> 
> Thanks,

Queued, thanks.


r~

Re: [Qemu-devel] [PATCH for-4.0 v8 6/7] qemu_thread_create: propagate the error to callers to handle

2018-12-13 Thread David Gibson

On Thu, 13 Dec 2018 08:26:48 +0100
Markus Armbruster  wrote:

> There's a question for David Gibson inline.  Please search for /ppc/.
> 
> Fei Li  writes:
> 
> > Make qemu_thread_create() return a Boolean to indicate if it succeeds
> > rather than failing with an error. And add an Error parameter to hold
> > the error message and let the callers handle it.  
> 
> The "rather than failing with an error" is misleading.  Before the
> patch, we report to stderr and abort().  What about:
> 
> qemu-thread: Make qemu_thread_create() handle errors properly
> 
> qemu_thread_create() abort()s on error.  Not nice.  Give it a
> return value and an Error ** argument, so it can return success /
> failure.
> 
> Still missing from the commit message then: how you update the callers.
> Let's see below.

[snip]
> > --- a/hw/ppc/spapr_hcall.c
> > +++ b/hw/ppc/spapr_hcall.c
> > @@ -478,6 +478,7 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU 
> > *cpu,
> >  sPAPRPendingHPT *pending = spapr->pending_hpt;
> >  uint64_t current_ram_size;
> >  int rc;
> > +Error *local_err = NULL;
> >  
> >  if (spapr->resize_hpt == SPAPR_RESIZE_HPT_DISABLED) {
> >  return H_AUTHORITY;
> > @@ -538,8 +539,13 @@ static target_ulong h_resize_hpt_prepare(PowerPCCPU 
> > *cpu,
> >  pending->shift = shift;
> >  pending->ret = H_HARDWARE;
> >  
> > -qemu_thread_create(>thread, "sPAPR HPT prepare",
> > -   hpt_prepare_thread, pending, QEMU_THREAD_DETACHED);
> > +if (!qemu_thread_create(>thread, "sPAPR HPT prepare",
> > +hpt_prepare_thread, pending,
> > +QEMU_THREAD_DETACHED, _err)) {
> > +error_reportf_err(local_err, "failed to create hpt_prepare_thread: 
> > ");
> > +g_free(pending);
> > +return H_RESOURCE;
> > +}
> >  
> >  spapr->pending_hpt = pending;
> >
> 
> This is a caller that returns an error code on failure.  You change it
> to report the error, then return failure.  The return failure part looks
> fine.  Whether reporting the error is appropriate I can't say for sure.
> No other failure mode reports anything.  David, what do you think?

I think it's reasonable here.  In this context error returns and
reported errors are for different audiences.  The error returns are for
the guest, the reported errors are for the guest administrator or
management layers.  This particularly failure is essentially a host
side fault that is mostly relevant to the VM management.  We have to
say *something* to the guest to explain that the action couldn't go
forward and H_RESOURCE makes as much sense as anything.

-- 
David Gibson 
Principal Software Engineer, Virtualization, Red Hat


pgpdt11zSAAb3.pgp
Description: OpenPGP digital signature

[Qemu-devel] [PATCH v1 5/5] sifive_uart: Implement interrupt pending register

2018-12-13 Thread Alistair Francis

From: Nathaniel Graff 

The watermark bits are set in the interrupt pending register according
to the configuration of txcnt and rxcnt in the txctrl and rxctrl
registers.

Since the UART TX does not implement a FIFO, the txwm bit is set as long
as the TX watermark level is greater than zero.

Signed-off-by: Nathaniel Graff 
Reviewed-by: Michael Clark 
Reviewed-by: Alistair Francis 
Signed-off-by: Alistair Francis 
---
 hw/riscv/sifive_uart.c | 24 +++-
 include/hw/riscv/sifive_uart.h |  3 +++
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/hw/riscv/sifive_uart.c b/hw/riscv/sifive_uart.c
index b0c3798cf2..456a3d3697 100644
--- a/hw/riscv/sifive_uart.c
+++ b/hw/riscv/sifive_uart.c
@@ -28,12 +28,26 @@
  * Not yet implemented:
  *
  * Transmit FIFO using "qemu/fifo8.h"
- * SIFIVE_UART_IE_TXWM interrupts
- * SIFIVE_UART_IE_RXWM interrupts must honor fifo watermark
- * Rx FIFO watermark interrupt trigger threshold
- * Tx FIFO watermark interrupt trigger threshold.
  */
 
+/* Returns the state of the IP (interrupt pending) register */
+static uint64_t uart_ip(SiFiveUARTState *s)
+{
+uint64_t ret = 0;
+
+uint64_t txcnt = SIFIVE_UART_GET_TXCNT(s->txctrl);
+uint64_t rxcnt = SIFIVE_UART_GET_RXCNT(s->rxctrl);
+
+if (txcnt != 0) {
+ret |= SIFIVE_UART_IP_TXWM;
+}
+if (s->rx_fifo_len > rxcnt) {
+ret |= SIFIVE_UART_IP_RXWM;
+}
+
+return ret;
+}
+
 static void update_irq(SiFiveUARTState *s)
 {
 int cond = 0;
@@ -69,7 +83,7 @@ uart_read(void *opaque, hwaddr addr, unsigned int size)
 case SIFIVE_UART_IE:
 return s->ie;
 case SIFIVE_UART_IP:
-return s->rx_fifo_len ? SIFIVE_UART_IP_RXWM : 0;
+return uart_ip(s);
 case SIFIVE_UART_TXCTRL:
 return s->txctrl;
 case SIFIVE_UART_RXCTRL:
diff --git a/include/hw/riscv/sifive_uart.h b/include/hw/riscv/sifive_uart.h
index 504f18a60f..c8dc1c57fd 100644
--- a/include/hw/riscv/sifive_uart.h
+++ b/include/hw/riscv/sifive_uart.h
@@ -43,6 +43,9 @@ enum {
 SIFIVE_UART_IP_RXWM   = 2  /* Receive watermark interrupt pending */
 };
 
+#define SIFIVE_UART_GET_TXCNT(txctrl)   ((txctrl >> 16) & 0x7)
+#define SIFIVE_UART_GET_RXCNT(rxctrl)   ((rxctrl >> 16) & 0x7)
+
 #define TYPE_SIFIVE_UART "riscv.sifive.uart"
 
 #define SIFIVE_UART(obj) \
-- 
2.19.1

[Qemu-devel] [PATCH v1 4/5] RISC-V: Enable second UART on sifive_e and sifive_u

2018-12-13 Thread Alistair Francis

From: Michael Clark 

Previously the second UARTs on the sifive_e and sifive_u machines
where disabled due to check-qtest-riscv32 and check-qtest-riscv64
failures. Recent changes in the QEMU core serial code have
resolved these failures so the second UARTs can be instantiated.

Cc: Palmer Dabbelt 
Cc: Sagar Karandikar 
Cc: Bastian Koppelmann 
Cc: Alistair Francis 
Signed-off-by: Michael Clark 
Reviewed-by: Alistair Francis 
Signed-off-by: Alistair Francis 
---
 hw/riscv/sifive_e.c | 5 ++---
 hw/riscv/sifive_u.c | 5 ++---
 2 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/riscv/sifive_e.c b/hw/riscv/sifive_e.c
index cb513cc3bb..5d9d65ff29 100644
--- a/hw/riscv/sifive_e.c
+++ b/hw/riscv/sifive_e.c
@@ -192,9 +192,8 @@ static void riscv_sifive_e_soc_realize(DeviceState *dev, 
Error **errp)
 memmap[SIFIVE_E_QSPI0].base, memmap[SIFIVE_E_QSPI0].size);
 sifive_mmio_emulate(sys_mem, "riscv.sifive.e.pwm0",
 memmap[SIFIVE_E_PWM0].base, memmap[SIFIVE_E_PWM0].size);
-/* sifive_uart_create(sys_mem, memmap[SIFIVE_E_UART1].base,
-serial_hd(1), qdev_get_gpio_in(DEVICE(s->plic),
-   SIFIVE_E_UART1_IRQ)); */
+sifive_uart_create(sys_mem, memmap[SIFIVE_E_UART1].base,
+serial_hd(1), qdev_get_gpio_in(DEVICE(s->plic), SIFIVE_E_UART1_IRQ));
 sifive_mmio_emulate(sys_mem, "riscv.sifive.e.qspi1",
 memmap[SIFIVE_E_QSPI1].base, memmap[SIFIVE_E_QSPI1].size);
 sifive_mmio_emulate(sys_mem, "riscv.sifive.e.pwm1",
diff --git a/hw/riscv/sifive_u.c b/hw/riscv/sifive_u.c
index ef07df2442..3591898011 100644
--- a/hw/riscv/sifive_u.c
+++ b/hw/riscv/sifive_u.c
@@ -350,9 +350,8 @@ static void riscv_sifive_u_soc_realize(DeviceState *dev, 
Error **errp)
 memmap[SIFIVE_U_PLIC].size);
 sifive_uart_create(system_memory, memmap[SIFIVE_U_UART0].base,
 serial_hd(0), qdev_get_gpio_in(DEVICE(s->plic), SIFIVE_U_UART0_IRQ));
-/* sifive_uart_create(system_memory, memmap[SIFIVE_U_UART1].base,
-serial_hd(1), qdev_get_gpio_in(DEVICE(s->plic),
-   SIFIVE_U_UART1_IRQ)); */
+sifive_uart_create(system_memory, memmap[SIFIVE_U_UART1].base,
+serial_hd(1), qdev_get_gpio_in(DEVICE(s->plic), SIFIVE_U_UART1_IRQ));
 sifive_clint_create(memmap[SIFIVE_U_CLINT].base,
 memmap[SIFIVE_U_CLINT].size, smp_cpus,
 SIFIVE_SIP_BASE, SIFIVE_TIMECMP_BASE, SIFIVE_TIME_BASE);
-- 
2.19.1

[Qemu-devel] [PATCH v1 2/5] RISC-V: Fix CLINT timecmp low 32-bit writes

2018-12-13 Thread Alistair Francis

From: Michael Clark 

A missing shift made updates to the low order bits
of timecmp erroneously copy the old low order bits
into the high order bits of the 64-bit timecmp
register. Add the missing shift and rename timecmp
local variables to timecmp_hi and timecmp_lo.

This bug didn't show up as the low order bits are
usually written first followed by the high order
bits meaning the high order bits contained an invalid
value between the timecmp_lo and timecmp_hi update.

Cc: Palmer Dabbelt 
Cc: Sagar Karandikar 
Cc: Bastian Koppelmann 
Cc: Alistair Francis 
Co-Authored-by: Johannes Haring 
Signed-off-by: Michael Clark 
Reviewed-by: Alistair Francis 
Signed-off-by: Alistair Francis 
---
 hw/riscv/sifive_clint.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/riscv/sifive_clint.c b/hw/riscv/sifive_clint.c
index 0d2fd52487..d4c159e937 100644
--- a/hw/riscv/sifive_clint.c
+++ b/hw/riscv/sifive_clint.c
@@ -146,15 +146,15 @@ static void sifive_clint_write(void *opaque, hwaddr addr, 
uint64_t value,
 error_report("clint: invalid timecmp hartid: %zu", hartid);
 } else if ((addr & 0x7) == 0) {
 /* timecmp_lo */
-uint64_t timecmp = env->timecmp;
+uint64_t timecmp_hi = env->timecmp >> 32;
 sifive_clint_write_timecmp(RISCV_CPU(cpu),
-timecmp << 32 | (value & 0x));
+timecmp_hi << 32 | (value & 0x));
 return;
 } else if ((addr & 0x7) == 4) {
 /* timecmp_hi */
-uint64_t timecmp = env->timecmp;
+uint64_t timecmp_lo = env->timecmp;
 sifive_clint_write_timecmp(RISCV_CPU(cpu),
-value << 32 | (timecmp & 0x));
+value << 32 | (timecmp_lo & 0x));
 } else {
 error_report("clint: invalid timecmp write: %08x", (uint32_t)addr);
 }
-- 
2.19.1

[Qemu-devel] [PATCH v1 3/5] RISC-V: Fix PLIC pending bitfield reads

2018-12-13 Thread Alistair Francis

From: Michael Clark 

The address calculation for the pending bitfield had
a copy paste bug. This bug went unnoticed because the Linux
PLIC driver does not read the pending bitfield, rather it
reads pending interrupt numbers from the claim register
and writes acknowledgements back to the claim register.

Cc: Palmer Dabbelt 
Cc: Sagar Karandikar 
Cc: Bastian Koppelmann 
Cc: Alistair Francis 
Reported-by: Vincent Siles 
Signed-off-by: Michael Clark 
Reviewed-by: Alistair Francis 
Signed-off-by: Alistair Francis 
---
 hw/riscv/sifive_plic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/riscv/sifive_plic.c b/hw/riscv/sifive_plic.c
index 9cf9a1f986..d12ec3fc9a 100644
--- a/hw/riscv/sifive_plic.c
+++ b/hw/riscv/sifive_plic.c
@@ -214,7 +214,7 @@ static uint64_t sifive_plic_read(void *opaque, hwaddr addr, 
unsigned size)
 } else if (addr >= plic->pending_base && /* 1 bit per source */
addr < plic->pending_base + (plic->num_sources >> 3))
 {
-uint32_t word = (addr - plic->priority_base) >> 2;
+uint32_t word = (addr - plic->pending_base) >> 2;
 if (RISCV_DEBUG_PLIC) {
 qemu_log("plic: read pending: word=%d value=%d\n",
 word, plic->pending[word]);
-- 
2.19.1

[Qemu-devel] [PATCH v1 1/5] RISC-V: Add hartid and \n to interrupt logging

2018-12-13 Thread Alistair Francis

From: Michael Clark 

Add carriage return that was erroneously removed
when converting to qemu_log. Change hard coded
core number to the actual hartid.

Cc: Sagar Karandikar 
Cc: Bastian Koppelmann 
Cc: Palmer Dabbelt 
Cc: Alistair Francis 
Signed-off-by: Michael Clark 
Reviewed-by: Alistair Francis 
Signed-off-by: Alistair Francis 
---
 target/riscv/cpu_helper.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/target/riscv/cpu_helper.c b/target/riscv/cpu_helper.c
index 86f9f4730c..0234c2d528 100644
--- a/target/riscv/cpu_helper.c
+++ b/target/riscv/cpu_helper.c
@@ -445,11 +445,13 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 if (RISCV_DEBUG_INTERRUPT) {
 int log_cause = cs->exception_index & RISCV_EXCP_INT_MASK;
 if (cs->exception_index & RISCV_EXCP_INT_FLAG) {
-qemu_log_mask(LOG_TRACE, "core   0: trap %s, epc 0x" TARGET_FMT_lx,
-riscv_intr_names[log_cause], env->pc);
+qemu_log_mask(LOG_TRACE, "core "
+TARGET_FMT_ld ": trap %s, epc 0x" TARGET_FMT_lx "\n",
+env->mhartid, riscv_intr_names[log_cause], env->pc);
 } else {
-qemu_log_mask(LOG_TRACE, "core   0: intr %s, epc 0x" TARGET_FMT_lx,
-riscv_excp_names[log_cause], env->pc);
+qemu_log_mask(LOG_TRACE, "core "
+TARGET_FMT_ld ": intr %s, epc 0x" TARGET_FMT_lx "\n",
+env->mhartid, riscv_excp_names[log_cause], env->pc);
 }
 }
 
@@ -511,8 +513,8 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 
 if (hasbadaddr) {
 if (RISCV_DEBUG_INTERRUPT) {
-qemu_log_mask(LOG_TRACE, "core " TARGET_FMT_ld
-": badaddr 0x" TARGET_FMT_lx, env->mhartid, env->badaddr);
+qemu_log_mask(LOG_TRACE, "core " TARGET_FMT_ld ": badaddr 0x"
+TARGET_FMT_lx "\n", env->mhartid, env->badaddr);
 }
 env->sbadaddr = env->badaddr;
 } else {
@@ -536,8 +538,8 @@ void riscv_cpu_do_interrupt(CPUState *cs)
 
 if (hasbadaddr) {
 if (RISCV_DEBUG_INTERRUPT) {
-qemu_log_mask(LOG_TRACE, "core " TARGET_FMT_ld
-": badaddr 0x" TARGET_FMT_lx, env->mhartid, env->badaddr);
+qemu_log_mask(LOG_TRACE, "core " TARGET_FMT_ld ": badaddr 0x"
+TARGET_FMT_lx "\n", env->mhartid, env->badaddr);
 }
 env->mbadaddr = env->badaddr;
 } else {
-- 
2.19.1

[Qemu-devel] [PATCH v1 0/5] Misc RISC-V fixes

2018-12-13 Thread Alistair Francis

This series is another go at reducing the diff between the RISC-V fork
(https://github.com/riscv/riscv-qemu/) and mainline QEMU.

This is a small series with only a handful of changes as I don't want to
have to deal with too many conflicts all at once and I don't want to
create too much conflict with the ongoing decode tree work.

Once the decode tree work goes in we can look at rebasing the changes in
the RISC-V fork that touch translate.c.

Michael Clark (4):
  RISC-V: Add hartid and \n to interrupt logging
  RISC-V: Fix CLINT timecmp low 32-bit writes
  RISC-V: Fix PLIC pending bitfield reads
  RISC-V: Enable second UART on sifive_e and sifive_u

Nathaniel Graff (1):
  sifive_uart: Implement interrupt pending register

 hw/riscv/sifive_clint.c|  8 
 hw/riscv/sifive_e.c|  5 ++---
 hw/riscv/sifive_plic.c |  2 +-
 hw/riscv/sifive_u.c|  5 ++---
 hw/riscv/sifive_uart.c | 24 +++-
 include/hw/riscv/sifive_uart.h |  3 +++
 target/riscv/cpu_helper.c  | 18 ++
 7 files changed, 41 insertions(+), 24 deletions(-)

-- 
2.19.1

Re: [Qemu-devel] [RFC v2 08/38] tcg: drop nargs from tcg_op_insert_{before, after}

2018-12-13 Thread Richard Henderson

On 12/9/18 1:37 PM, Emilio G. Cota wrote:
> It's unused.
> 
> Signed-off-by: Emilio G. Cota 
> ---
>  tcg/tcg.h  |  4 ++--
>  tcg/optimize.c |  4 ++--
>  tcg/tcg.c  | 10 --
>  3 files changed, 8 insertions(+), 10 deletions(-)

Cherry-picked this into tcg-next.
The nargs argument is unused since 75e8b9b7aa0b95a761b9add7e2f09248b101a392.

r~

Re: [Qemu-devel] [PATCH 2/3] i386: Atomically update PTEs with mttcg

2018-12-13 Thread Benjamin Herrenschmidt

Note to RiscV folks: You may want to adapt your code to do the same as
this, esp. afaik, what you do today is endian-broken if host and guest
endian are different.

Cheers,
Ben. 

On Fri, 2018-12-14 at 10:58 +1100, Benjamin Herrenschmidt wrote:
> Afaik, this isn't well documented (at least it wasn't when I last looked)
> but OSes such as Linux rely on this behaviour:
> 
> The HW updates to the page tables need to be done atomically with the
> checking of the present bit (and other permissions).
> 
> This is what allows Linux to do simple xchg of PTEs with 0 and assume
> the value read has "final" stable dirty and accessed bits (the TLB
> invalidation is deferred).
> 
> Signed-off-by: Benjamin Herrenschmidt 
> ---
>  target/i386/excp_helper.c | 104 +-
>  1 file changed, 80 insertions(+), 24 deletions(-)
> 
> diff --git a/target/i386/excp_helper.c b/target/i386/excp_helper.c
> index 49231f6b69..93fc24c011 100644
> --- a/target/i386/excp_helper.c
> +++ b/target/i386/excp_helper.c
> @@ -157,11 +157,45 @@ int x86_cpu_handle_mmu_fault(CPUState *cs, vaddr addr, 
> int size,
>  
>  #else
>  
> +static inline uint64_t update_entry(CPUState *cs, target_ulong addr,
> +uint64_t orig_entry, uint32_t bits)
> +{
> +uint64_t new_entry = orig_entry | bits;
> +
> +/* Write the updated bottom 32-bits */
> +if (qemu_tcg_mttcg_enabled()) {
> +uint32_t old_le = cpu_to_le32(orig_entry);
> +uint32_t new_le = cpu_to_le32(new_entry);
> +MemTxResult result;
> +uint32_t old_ret;
> +
> +old_ret = address_space_cmpxchgl_notdirty(cs->as, addr,
> +  old_le, new_le,
> +  MEMTXATTRS_UNSPECIFIED,
> +  );
> +if (result == MEMTX_OK) {
> +if (old_ret != old_le && old_ret != new_le) {
> +new_entry = 0;
> +}
> +return new_entry;
> +}
> +
> +/* Do we need to support this case where PTEs aren't in RAM ?
> + *
> + * For now fallback to non-atomic case
> + */
> +}
> +
> +x86_stl_phys_notdirty(cs, addr, new_entry);
> +
> +return new_entry;
> +}
> +
>  static hwaddr get_hphys(CPUState *cs, hwaddr gphys, MMUAccessType 
> access_type,
>  int *prot)
>  {
>  CPUX86State *env = _CPU(cs)->env;
> -uint64_t rsvd_mask = PG_HI_RSVD_MASK;
> +uint64_t rsvd_mask;
>  uint64_t ptep, pte;
>  uint64_t exit_info_1 = 0;
>  target_ulong pde_addr, pte_addr;
> @@ -172,6 +206,8 @@ static hwaddr get_hphys(CPUState *cs, hwaddr gphys, 
> MMUAccessType access_type,
>  return gphys;
>  }
>  
> + restart:
> +rsvd_mask = PG_HI_RSVD_MASK;
>  if (!(env->nested_pg_mode & SVM_NPT_NXE)) {
>  rsvd_mask |= PG_NX_MASK;
>  }
> @@ -198,8 +234,10 @@ static hwaddr get_hphys(CPUState *cs, hwaddr gphys, 
> MMUAccessType access_type,
>  goto do_fault_rsvd;
>  }
>  if (!(pml4e & PG_ACCESSED_MASK)) {
> -pml4e |= PG_ACCESSED_MASK;
> -x86_stl_phys_notdirty(cs, pml4e_addr, pml4e);
> +pml4e = update_entry(cs, pml4e_addr, pml4e, 
> PG_ACCESSED_MASK);
> +if (!pml4e) {
> +goto restart;
> +}
>  }
>  ptep &= pml4e ^ PG_NX_MASK;
>  pdpe_addr = (pml4e & PG_ADDRESS_MASK) +
> @@ -213,8 +251,10 @@ static hwaddr get_hphys(CPUState *cs, hwaddr gphys, 
> MMUAccessType access_type,
>  }
>  ptep &= pdpe ^ PG_NX_MASK;
>  if (!(pdpe & PG_ACCESSED_MASK)) {
> -pdpe |= PG_ACCESSED_MASK;
> -x86_stl_phys_notdirty(cs, pdpe_addr, pdpe);
> +pdpe = update_entry(cs, pdpe_addr, pdpe, PG_ACCESSED_MASK);
> +if (!pdpe) {
> +goto restart;
> +}
>  }
>  if (pdpe & PG_PSE_MASK) {
>  /* 1 GB page */
> @@ -256,8 +296,10 @@ static hwaddr get_hphys(CPUState *cs, hwaddr gphys, 
> MMUAccessType access_type,
>  }
>  /* 4 KB page */
>  if (!(pde & PG_ACCESSED_MASK)) {
> -pde |= PG_ACCESSED_MASK;
> -x86_stl_phys_notdirty(cs, pde_addr, pde);
> +pde = update_entry(cs, pde_addr, pde, PG_ACCESSED_MASK);
> +if (!pde) {
> +goto restart;
> +}
>  }
>  pte_addr = (pde & PG_ADDRESS_MASK) + (((gphys >> 12) & 0x1ff) << 3);
>  pte = x86_ldq_phys(cs, pte_addr);
> @@ -295,8 +337,10 @@ static hwaddr get_hphys(CPUState *cs, hwaddr gphys, 
> MMUAccessType access_type,
>  }
>  
>  if (!(pde & PG_ACCESSED_MASK)) {
> -pde |= PG_ACCESSED_MASK;
> -x86_stl_phys_notdirty(cs, pde_addr, pde);
> +pde =

Re: [Qemu-devel] [PATCH 3/3] ppc: Fix radix RC updates

2018-12-13 Thread Benjamin Herrenschmidt

On Fri, 2018-12-14 at 10:58 +1100, Benjamin Herrenschmidt wrote:
> They should be atomic for MTTCG. Note: a real POWER9 core doesn't actually
> implement atomic PTE updates, it always fault for SW to handle it. Only
> the nest MMU (used by some accelerator devices and GPUs) implements
> those HW updates.
> 
> However, the architecture does allow the core to do it, and doing so
> in TCG is faster than letting the guest do it.

Note: ppc hash MMU needs fixes too but of a different nature. I have
some queued up as well, but as-is, they are entangled with other ppc
target fixes, so I'll send them separately via David.

Cheers,
Ben.

> Signed-off-by: Benjamin Herrenschmidt 
> ---
>  target/ppc/cpu.h |  1 +
>  target/ppc/mmu-radix64.c | 70 +---
>  2 files changed, 59 insertions(+), 12 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index ab68abe8a2..afdef2af2f 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -493,6 +493,7 @@ struct ppc_slb_t {
>  #define DSISR_AMR0x0020
>  /* Unsupported Radix Tree Configuration */
>  #define DSISR_R_BADCONFIG0x0008
> +#define DSISR_ATOMIC_RC  0x0004
>  
>  /* SRR1 error code fields */
>  
> diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
> index ab76cbc835..dba95aabdc 100644
> --- a/target/ppc/mmu-radix64.c
> +++ b/target/ppc/mmu-radix64.c
> @@ -28,6 +28,15 @@
>  #include "mmu-radix64.h"
>  #include "mmu-book3s-v3.h"
>  
> +static inline bool ppc_radix64_hw_rc_updates(CPUPPCState *env)
> +{
> +#ifdef CONFIG_ATOMIC64
> +return true;
> +#else
> +return !qemu_tcg_mttcg_enabled();
> +#endif
> +}
> +
>  static bool ppc_radix64_get_fully_qualified_addr(CPUPPCState *env, vaddr 
> eaddr,
>   uint64_t *lpid, uint64_t 
> *pid)
>  {
> @@ -120,11 +129,18 @@ static bool ppc_radix64_check_prot(PowerPCCPU *cpu, int 
> rwx, uint64_t pte,
>  return true;
>  }
>  
> +/* Check RC bits if necessary */
> +if (!ppc_radix64_hw_rc_updates(env)) {
> +if (!(pte & R_PTE_R) || ((rwx == 1) && !(pte & R_PTE_C))) {
> +*fault_cause |= DSISR_ATOMIC_RC;
> +return true;
> +}
> +}
> +
>  return false;
>  }
>  
> -static void ppc_radix64_set_rc(PowerPCCPU *cpu, int rwx, uint64_t pte,
> -   hwaddr pte_addr, int *prot)
> +static uint64_t ppc_radix64_set_rc(PowerPCCPU *cpu, int rwx, uint64_t pte, 
> hwaddr pte_addr)
>  {
>  CPUState *cs = CPU(cpu);
>  uint64_t npte;
> @@ -133,17 +149,38 @@ static void ppc_radix64_set_rc(PowerPCCPU *cpu, int 
> rwx, uint64_t pte,
>  
>  if (rwx == 1) { /* Store/Write */
>  npte |= R_PTE_C; /* Set change bit */
> -} else {
> -/*
> - * Treat the page as read-only for now, so that a later write
> - * will pass through this function again to set the C bit.
> - */
> -*prot &= ~PAGE_WRITE;
>  }
> +if (pte == npte) {
> +return pte;
> +}
> +
> +#ifdef CONFIG_ATOMIC64
> +if (qemu_tcg_mttcg_enabled()) {
> +uint64_t old_be = cpu_to_be32(pte);
> +uint64_t new_be = cpu_to_be32(npte);
> +MemTxResult result;
> +uint64_t old_ret;
> +
> +old_ret = address_space_cmpxchgq_notdirty(cs->as, pte_addr,
> +  old_be, new_be,
> +  MEMTXATTRS_UNSPECIFIED,
> +  );
> +if (result == MEMTX_OK) {
> +if (old_ret != old_be && old_ret != new_be) {
> +return 0;
> +}
> +return npte;
> +}
>  
> -if (pte ^ npte) { /* If pte has changed then write it back */
> -stq_phys(cs->as, pte_addr, npte);
> +/* Do we need to support this case where PTEs aren't in RAM ?
> + *
> + * For now fallback to non-atomic case
> + */
>  }
> +#endif
> +
> +stq_phys(cs->as, pte_addr, npte);
> +return npte;
>  }
>  
>  static uint64_t ppc_radix64_walk_tree(PowerPCCPU *cpu, vaddr eaddr,
> @@ -234,6 +271,7 @@ int ppc_radix64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
> eaddr, int rwx,
>  
>  /* Walk Radix Tree from Process Table Entry to Convert EA to RA */
>  page_size = PRTBE_R_GET_RTS(prtbe0);
> + restart:
>  pte = ppc_radix64_walk_tree(cpu, eaddr & R_EADDR_MASK,
>  prtbe0 & PRTBE_R_RPDB, prtbe0 & PRTBE_R_RPDS,
>  , _size, _cause, _addr);
> @@ -244,8 +282,16 @@ int ppc_radix64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
> eaddr, int rwx,
>  }
>  
>  /* Update Reference and Change Bits */
> -ppc_radix64_set_rc(cpu, rwx, pte, pte_addr, );
> -
> +if (ppc_radix64_hw_rc_updates(env)) {
> +pte = ppc_radix64_set_rc(cpu, rwx, pte, pte_addr);
> +if (!pte) {
> +goto

[Qemu-devel] [PATCH 3/3] ppc: Fix radix RC updates

2018-12-13 Thread Benjamin Herrenschmidt

They should be atomic for MTTCG. Note: a real POWER9 core doesn't actually
implement atomic PTE updates, it always fault for SW to handle it. Only
the nest MMU (used by some accelerator devices and GPUs) implements
those HW updates.

However, the architecture does allow the core to do it, and doing so
in TCG is faster than letting the guest do it.

Signed-off-by: Benjamin Herrenschmidt 
---
 target/ppc/cpu.h |  1 +
 target/ppc/mmu-radix64.c | 70 +---
 2 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index ab68abe8a2..afdef2af2f 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -493,6 +493,7 @@ struct ppc_slb_t {
 #define DSISR_AMR0x0020
 /* Unsupported Radix Tree Configuration */
 #define DSISR_R_BADCONFIG0x0008
+#define DSISR_ATOMIC_RC  0x0004
 
 /* SRR1 error code fields */
 
diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
index ab76cbc835..dba95aabdc 100644
--- a/target/ppc/mmu-radix64.c
+++ b/target/ppc/mmu-radix64.c
@@ -28,6 +28,15 @@
 #include "mmu-radix64.h"
 #include "mmu-book3s-v3.h"
 
+static inline bool ppc_radix64_hw_rc_updates(CPUPPCState *env)
+{
+#ifdef CONFIG_ATOMIC64
+return true;
+#else
+return !qemu_tcg_mttcg_enabled();
+#endif
+}
+
 static bool ppc_radix64_get_fully_qualified_addr(CPUPPCState *env, vaddr eaddr,
  uint64_t *lpid, uint64_t *pid)
 {
@@ -120,11 +129,18 @@ static bool ppc_radix64_check_prot(PowerPCCPU *cpu, int 
rwx, uint64_t pte,
 return true;
 }
 
+/* Check RC bits if necessary */
+if (!ppc_radix64_hw_rc_updates(env)) {
+if (!(pte & R_PTE_R) || ((rwx == 1) && !(pte & R_PTE_C))) {
+*fault_cause |= DSISR_ATOMIC_RC;
+return true;
+}
+}
+
 return false;
 }
 
-static void ppc_radix64_set_rc(PowerPCCPU *cpu, int rwx, uint64_t pte,
-   hwaddr pte_addr, int *prot)
+static uint64_t ppc_radix64_set_rc(PowerPCCPU *cpu, int rwx, uint64_t pte, 
hwaddr pte_addr)
 {
 CPUState *cs = CPU(cpu);
 uint64_t npte;
@@ -133,17 +149,38 @@ static void ppc_radix64_set_rc(PowerPCCPU *cpu, int rwx, 
uint64_t pte,
 
 if (rwx == 1) { /* Store/Write */
 npte |= R_PTE_C; /* Set change bit */
-} else {
-/*
- * Treat the page as read-only for now, so that a later write
- * will pass through this function again to set the C bit.
- */
-*prot &= ~PAGE_WRITE;
 }
+if (pte == npte) {
+return pte;
+}
+
+#ifdef CONFIG_ATOMIC64
+if (qemu_tcg_mttcg_enabled()) {
+uint64_t old_be = cpu_to_be32(pte);
+uint64_t new_be = cpu_to_be32(npte);
+MemTxResult result;
+uint64_t old_ret;
+
+old_ret = address_space_cmpxchgq_notdirty(cs->as, pte_addr,
+  old_be, new_be,
+  MEMTXATTRS_UNSPECIFIED,
+  );
+if (result == MEMTX_OK) {
+if (old_ret != old_be && old_ret != new_be) {
+return 0;
+}
+return npte;
+}
 
-if (pte ^ npte) { /* If pte has changed then write it back */
-stq_phys(cs->as, pte_addr, npte);
+/* Do we need to support this case where PTEs aren't in RAM ?
+ *
+ * For now fallback to non-atomic case
+ */
 }
+#endif
+
+stq_phys(cs->as, pte_addr, npte);
+return npte;
 }
 
 static uint64_t ppc_radix64_walk_tree(PowerPCCPU *cpu, vaddr eaddr,
@@ -234,6 +271,7 @@ int ppc_radix64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
eaddr, int rwx,
 
 /* Walk Radix Tree from Process Table Entry to Convert EA to RA */
 page_size = PRTBE_R_GET_RTS(prtbe0);
+ restart:
 pte = ppc_radix64_walk_tree(cpu, eaddr & R_EADDR_MASK,
 prtbe0 & PRTBE_R_RPDB, prtbe0 & PRTBE_R_RPDS,
 , _size, _cause, _addr);
@@ -244,8 +282,16 @@ int ppc_radix64_handle_mmu_fault(PowerPCCPU *cpu, vaddr 
eaddr, int rwx,
 }
 
 /* Update Reference and Change Bits */
-ppc_radix64_set_rc(cpu, rwx, pte, pte_addr, );
-
+if (ppc_radix64_hw_rc_updates(env)) {
+pte = ppc_radix64_set_rc(cpu, rwx, pte, pte_addr);
+if (!pte) {
+goto restart;
+}
+}
+/* If the page doesn't have C, treat it as read only */
+if (!(pte & R_PTE_C)) {
+prot &= ~PAGE_WRITE;
+}
 tlb_set_page(cs, eaddr & TARGET_PAGE_MASK, raddr & TARGET_PAGE_MASK,
  prot, mmu_idx, 1UL << page_size);
 return 0;
-- 
2.19.2

[Qemu-devel] [PATCH 1/3] memory_ldst: Add atomic ops for PTE updates

2018-12-13 Thread Benjamin Herrenschmidt

On some architectures, PTE updates for dirty and changed bits need
to be performed atomically. This adds a couple of address_space_cmpxchg*
helpers for that purpose.

Signed-off-by: Benjamin Herrenschmidt 
---
 include/exec/memory_ldst.inc.h |  6 +++
 memory_ldst.inc.c  | 78 ++
 2 files changed, 84 insertions(+)

diff --git a/include/exec/memory_ldst.inc.h b/include/exec/memory_ldst.inc.h
index 272c20f02e..f3cfa7e9a6 100644
--- a/include/exec/memory_ldst.inc.h
+++ b/include/exec/memory_ldst.inc.h
@@ -28,6 +28,12 @@ extern uint64_t glue(address_space_ldq, SUFFIX)(ARG1_DECL,
 hwaddr addr, MemTxAttrs attrs, MemTxResult *result);
 extern void glue(address_space_stl_notdirty, SUFFIX)(ARG1_DECL,
 hwaddr addr, uint32_t val, MemTxAttrs attrs, MemTxResult *result);
+extern uint32_t glue(address_space_cmpxchgl_notdirty, SUFFIX)(ARG1_DECL,
+hwaddr addr, uint32_t old, uint32_t new, MemTxAttrs attrs,
+MemTxResult *result);
+extern uint32_t glue(address_space_cmpxchgq_notdirty, SUFFIX)(ARG1_DECL,
+hwaddr addr, uint64_t old, uint64_t new, MemTxAttrs attrs,
+MemTxResult *result);
 extern void glue(address_space_stw, SUFFIX)(ARG1_DECL,
 hwaddr addr, uint32_t val, MemTxAttrs attrs, MemTxResult *result);
 extern void glue(address_space_stl, SUFFIX)(ARG1_DECL,
diff --git a/memory_ldst.inc.c b/memory_ldst.inc.c
index acf865b900..7ab6de37ba 100644
--- a/memory_ldst.inc.c
+++ b/memory_ldst.inc.c
@@ -320,6 +320,84 @@ void glue(address_space_stl_notdirty, SUFFIX)(ARG1_DECL,
 RCU_READ_UNLOCK();
 }
 
+/* This is meant to be used for atomic PTE updates under MT-TCG */
+uint32_t glue(address_space_cmpxchgl_notdirty, SUFFIX)(ARG1_DECL,
+hwaddr addr, uint32_t old, uint32_t new, MemTxAttrs attrs, MemTxResult 
*result)
+{
+uint8_t *ptr;
+MemoryRegion *mr;
+hwaddr l = 4;
+hwaddr addr1;
+MemTxResult r;
+uint8_t dirty_log_mask;
+
+/* Must test result */
+assert(result);
+
+RCU_READ_LOCK();
+mr = TRANSLATE(addr, , , true, attrs);
+if (l < 4 || !memory_access_is_direct(mr, true)) {
+r = MEMTX_ERROR;
+} else {
+uint32_t orig = old;
+
+ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
+old = atomic_cmpxchg(ptr, orig, new);
+
+if (old == orig) {
+dirty_log_mask = memory_region_get_dirty_log_mask(mr);
+dirty_log_mask &= ~(1 << DIRTY_MEMORY_CODE);
+cpu_physical_memory_set_dirty_range(memory_region_get_ram_addr(mr) 
+ addr,
+4, dirty_log_mask);
+}
+r = MEMTX_OK;
+}
+*result = r;
+RCU_READ_UNLOCK();
+
+return old;
+}
+
+#ifdef CONFIG_ATOMIC64
+/* This is meant to be used for atomic PTE updates under MT-TCG */
+uint32_t glue(address_space_cmpxchgq_notdirty, SUFFIX)(ARG1_DECL,
+hwaddr addr, uint64_t old, uint64_t new, MemTxAttrs attrs, MemTxResult 
*result)
+{
+uint8_t *ptr;
+MemoryRegion *mr;
+hwaddr l = 8;
+hwaddr addr1;
+MemTxResult r;
+uint8_t dirty_log_mask;
+
+/* Must test result */
+assert(result);
+
+RCU_READ_LOCK();
+mr = TRANSLATE(addr, , , true, attrs);
+if (l < 8 || !memory_access_is_direct(mr, true)) {
+r = MEMTX_ERROR;
+} else {
+uint32_t orig = old;
+
+ptr = qemu_map_ram_ptr(mr->ram_block, addr1);
+old = atomic_cmpxchg(ptr, orig, new);
+
+if (old == orig) {
+dirty_log_mask = memory_region_get_dirty_log_mask(mr);
+dirty_log_mask &= ~(1 << DIRTY_MEMORY_CODE);
+cpu_physical_memory_set_dirty_range(memory_region_get_ram_addr(mr) 
+ addr,
+8, dirty_log_mask);
+}
+r = MEMTX_OK;
+}
+*result = r;
+RCU_READ_UNLOCK();
+
+return old;
+}
+#endif /* CONFIG_ATOMIC64 */
+
 /* warning: addr must be aligned */
 static inline void glue(address_space_stl_internal, SUFFIX)(ARG1_DECL,
 hwaddr addr, uint32_t val, MemTxAttrs attrs,
-- 
2.19.2

[Qemu-devel] [PATCH 2/3] i386: Atomically update PTEs with mttcg

2018-12-13 Thread Benjamin Herrenschmidt

Afaik, this isn't well documented (at least it wasn't when I last looked)
but OSes such as Linux rely on this behaviour:

The HW updates to the page tables need to be done atomically with the
checking of the present bit (and other permissions).

This is what allows Linux to do simple xchg of PTEs with 0 and assume
the value read has "final" stable dirty and accessed bits (the TLB
invalidation is deferred).

Signed-off-by: Benjamin Herrenschmidt 
---
 target/i386/excp_helper.c | 104 +-
 1 file changed, 80 insertions(+), 24 deletions(-)

diff --git a/target/i386/excp_helper.c b/target/i386/excp_helper.c
index 49231f6b69..93fc24c011 100644
--- a/target/i386/excp_helper.c
+++ b/target/i386/excp_helper.c
@@ -157,11 +157,45 @@ int x86_cpu_handle_mmu_fault(CPUState *cs, vaddr addr, 
int size,
 
 #else
 
+static inline uint64_t update_entry(CPUState *cs, target_ulong addr,
+uint64_t orig_entry, uint32_t bits)
+{
+uint64_t new_entry = orig_entry | bits;
+
+/* Write the updated bottom 32-bits */
+if (qemu_tcg_mttcg_enabled()) {
+uint32_t old_le = cpu_to_le32(orig_entry);
+uint32_t new_le = cpu_to_le32(new_entry);
+MemTxResult result;
+uint32_t old_ret;
+
+old_ret = address_space_cmpxchgl_notdirty(cs->as, addr,
+  old_le, new_le,
+  MEMTXATTRS_UNSPECIFIED,
+  );
+if (result == MEMTX_OK) {
+if (old_ret != old_le && old_ret != new_le) {
+new_entry = 0;
+}
+return new_entry;
+}
+
+/* Do we need to support this case where PTEs aren't in RAM ?
+ *
+ * For now fallback to non-atomic case
+ */
+}
+
+x86_stl_phys_notdirty(cs, addr, new_entry);
+
+return new_entry;
+}
+
 static hwaddr get_hphys(CPUState *cs, hwaddr gphys, MMUAccessType access_type,
 int *prot)
 {
 CPUX86State *env = _CPU(cs)->env;
-uint64_t rsvd_mask = PG_HI_RSVD_MASK;
+uint64_t rsvd_mask;
 uint64_t ptep, pte;
 uint64_t exit_info_1 = 0;
 target_ulong pde_addr, pte_addr;
@@ -172,6 +206,8 @@ static hwaddr get_hphys(CPUState *cs, hwaddr gphys, 
MMUAccessType access_type,
 return gphys;
 }
 
+ restart:
+rsvd_mask = PG_HI_RSVD_MASK;
 if (!(env->nested_pg_mode & SVM_NPT_NXE)) {
 rsvd_mask |= PG_NX_MASK;
 }
@@ -198,8 +234,10 @@ static hwaddr get_hphys(CPUState *cs, hwaddr gphys, 
MMUAccessType access_type,
 goto do_fault_rsvd;
 }
 if (!(pml4e & PG_ACCESSED_MASK)) {
-pml4e |= PG_ACCESSED_MASK;
-x86_stl_phys_notdirty(cs, pml4e_addr, pml4e);
+pml4e = update_entry(cs, pml4e_addr, pml4e, PG_ACCESSED_MASK);
+if (!pml4e) {
+goto restart;
+}
 }
 ptep &= pml4e ^ PG_NX_MASK;
 pdpe_addr = (pml4e & PG_ADDRESS_MASK) +
@@ -213,8 +251,10 @@ static hwaddr get_hphys(CPUState *cs, hwaddr gphys, 
MMUAccessType access_type,
 }
 ptep &= pdpe ^ PG_NX_MASK;
 if (!(pdpe & PG_ACCESSED_MASK)) {
-pdpe |= PG_ACCESSED_MASK;
-x86_stl_phys_notdirty(cs, pdpe_addr, pdpe);
+pdpe = update_entry(cs, pdpe_addr, pdpe, PG_ACCESSED_MASK);
+if (!pdpe) {
+goto restart;
+}
 }
 if (pdpe & PG_PSE_MASK) {
 /* 1 GB page */
@@ -256,8 +296,10 @@ static hwaddr get_hphys(CPUState *cs, hwaddr gphys, 
MMUAccessType access_type,
 }
 /* 4 KB page */
 if (!(pde & PG_ACCESSED_MASK)) {
-pde |= PG_ACCESSED_MASK;
-x86_stl_phys_notdirty(cs, pde_addr, pde);
+pde = update_entry(cs, pde_addr, pde, PG_ACCESSED_MASK);
+if (!pde) {
+goto restart;
+}
 }
 pte_addr = (pde & PG_ADDRESS_MASK) + (((gphys >> 12) & 0x1ff) << 3);
 pte = x86_ldq_phys(cs, pte_addr);
@@ -295,8 +337,10 @@ static hwaddr get_hphys(CPUState *cs, hwaddr gphys, 
MMUAccessType access_type,
 }
 
 if (!(pde & PG_ACCESSED_MASK)) {
-pde |= PG_ACCESSED_MASK;
-x86_stl_phys_notdirty(cs, pde_addr, pde);
+pde = update_entry(cs, pde_addr, pde, PG_ACCESSED_MASK);
+if (!pde) {
+goto restart;
+}
 }
 
 /* page directory entry */
@@ -376,7 +420,7 @@ int x86_cpu_handle_mmu_fault(CPUState *cs, vaddr addr, int 
size,
 int error_code = 0;
 int is_dirty, prot, page_size, is_write, is_user;
 hwaddr paddr;
-uint64_t rsvd_mask = PG_HI_RSVD_MASK;
+uint64_t rsvd_mask;
 uint32_t page_offset;
 target_ulong vaddr;
 
@@ -401,6 +445,8 @@ int

Re: [Qemu-devel] [PATCH 2/2] avoid TABs in files that only contain a few

2018-12-13 Thread David Gibson

On Thu, Dec 13, 2018 at 11:37:37PM +0100, Paolo Bonzini wrote:
> Most files that have TABs only contain a handful of them.  Change
> them to spaces so that we don't confuse people.
> 
> disas, standard-headers, linux-headers and libdecnumber are imported
> from other projects and probably should be exempted from the check.
> Outside those, after this patch the following files still contain both
> 8-space and TAB sequences at the beginning of the line.  Many of them
> have a majority of TABs, or were initially committed with all tabs.
> 
> bsd-user/i386/target_syscall.h
> bsd-user/x86_64/target_syscall.h
> crypto/aes.c
> hw/audio/fmopl.c
> hw/audio/fmopl.h
> hw/block/tc58128.c
> hw/display/cirrus_vga.c
> hw/display/xenfb.c
> hw/dma/etraxfs_dma.c
> hw/intc/sh_intc.c
> hw/misc/mst_fpga.c
> hw/net/pcnet.c
> hw/sh4/sh7750.c
> hw/timer/m48t59.c
> hw/timer/sh_timer.c
> include/crypto/aes.h
> include/disas/bfd.h
> include/hw/sh4/sh.h
> libdecnumber/decNumber.c
> linux-headers/asm-generic/unistd.h
> linux-headers/linux/kvm.h
> linux-user/alpha/target_syscall.h
> linux-user/arm/nwfpe/double_cpdo.c
> linux-user/arm/nwfpe/fpa11_cpdt.c
> linux-user/arm/nwfpe/fpa11_cprt.c
> linux-user/arm/nwfpe/fpa11.h
> linux-user/flat.h
> linux-user/flatload.c
> linux-user/i386/target_syscall.h
> linux-user/ppc/target_syscall.h
> linux-user/sparc/target_syscall.h
> linux-user/syscall.c
> linux-user/syscall_defs.h
> linux-user/x86_64/target_syscall.h
> slirp/cksum.c
> slirp/if.c
> slirp/ip.h
> slirp/ip_icmp.c
> slirp/ip_icmp.h
> slirp/ip_input.c
> slirp/ip_output.c
> slirp/mbuf.c
> slirp/misc.c
> slirp/sbuf.c
> slirp/socket.c
> slirp/socket.h
> slirp/tcp_input.c
> slirp/tcpip.h
> slirp/tcp_output.c
> slirp/tcp_subr.c
> slirp/tcp_timer.c
> slirp/tftp.c
> slirp/udp.c
> slirp/udp.h
> target/cris/cpu.h
> target/cris/mmu.c
> target/cris/op_helper.c
> target/sh4/helper.c
> target/sh4/op_helper.c
> target/sh4/translate.c
> tcg/sparc/tcg-target.inc.c
> tests/tcg/cris/check_addo.c
> tests/tcg/cris/check_moveq.c
> tests/tcg/cris/check_swap.c
> tests/tcg/multiarch/test-mmap.c
> ui/vnc-enc-hextile-template.h
> ui/vnc-enc-zywrle.h
> util/envlist.c
> util/readline.c
> 
> The following have only TABs:
> 
> bsd-user/i386/target_signal.h
> bsd-user/sparc64/target_signal.h
> bsd-user/sparc64/target_syscall.h
> bsd-user/sparc/target_signal.h
> bsd-user/sparc/target_syscall.h
> bsd-user/x86_64/target_signal.h
> crypto/desrfb.c
> hw/audio/intel-hda-defs.h
> hw/core/uboot_image.h
> hw/sh4/sh7750_regnames.c
> hw/sh4/sh7750_regs.h
> include/hw/cris/etraxfs_dma.h
> linux-user/alpha/termbits.h
> linux-user/arm/nwfpe/fpopcode.h
> linux-user/arm/nwfpe/fpsr.h
> linux-user/arm/syscall_nr.h
> linux-user/arm/target_signal.h
> linux-user/cris/target_signal.h
> linux-user/i386/target_signal.h
> linux-user/linux_loop.h
> linux-user/m68k/target_signal.h
> linux-user/microblaze/target_signal.h
> linux-user/mips64/target_signal.h
> linux-user/mips/target_signal.h
> linux-user/mips/target_syscall.h
> linux-user/mips/termbits.h
> linux-user/ppc/target_signal.h
> linux-user/sh4/target_signal.h
> linux-user/sh4/termbits.h
> linux-user/sparc64/target_syscall.h
> linux-user/sparc/target_signal.h
> linux-user/x86_64/target_signal.h
> linux-user/x86_64/termbits.h
> pc-bios/optionrom/optionrom.h
> slirp/mbuf.h
> slirp/misc.h
> slirp/sbuf.h
> slirp/tcp.h
> slirp/tcp_timer.h
> slirp/tcp_var.h
> target/i386/svm.h
> target/sparc/asi.h
> target/xtensa/core-dc232b/xtensa-modules.inc.c
> target/xtensa/core-dc233c/xtensa-modules.inc.c
> target/xtensa/core-de212/core-isa.h
> target/xtensa/core-de212/xtensa-modules.inc.c
> target/xtensa/core-fsf/xtensa-modules.inc.c
> target/xtensa/core-sample_controller/core-isa.h
> target/xtensa/core-sample_controller/xtensa-modules.inc.c
> target/xtensa/core-test_kc705_be/core-isa.h
> target/xtensa/core-test_kc705_be/xtensa-modules.inc.c
> tests/tcg/cris/check_abs.c
> tests/tcg/cris/check_addc.c
> tests/tcg/cris/check_addcm.c
> tests/tcg/cris/check_addoq.c
> tests/tcg/cris/check_bound.c
> tests/tcg/cris/check_ftag.c
> tests/tcg/cris/check_int64.c
> tests/tcg/cris/check_lz.c
> tests/tcg/cris/check_openpf5.c
> tests/tcg/cris/check_sigalrm.c
> tests/tcg/cris/crisutils.h
> tests/tcg/cris/sys.c
> tests/tcg/i386/test-i386-ssse3.c
> ui/vgafont.h
> 
> Signed-off-by: Paolo Bonzini 

ppc parts

Acked-by: David Gibson 

> ---
>  block/bochs.c  | 22 ++---
>  block/file-posix.c |  2 +-
>

1 2 3 4 5 >

1 - 100 of 451 matches

Mail list logo