Re: [PATCH v1 5/8] migration: Export dirty-limit time info

2022-10-01 Thread Hyman Huang




在 2022/10/2 2:31, Markus Armbruster 写道:

huang...@chinatelecom.cn writes:


From: Hyman Huang(黄勇) 

Export dirty limit throttle time and estimated ring full
time, through which we can observe the process of dirty
limit during live migration.

Signed-off-by: Hyman Huang(黄勇) 


[...]


diff --git a/qapi/migration.json b/qapi/migration.json
index bc4bc96..c263d54 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -242,6 +242,12 @@
  #   Present and non-empty when migration is blocked.
  #   (since 6.0)
  #
+# @dirty-limit-throttle-us-per-full: Throttle time (us) during the period of
+#dirty ring full (since 7.0)
+#
+# @dirty-limit-us-ring-full: Estimated periodic time (us) of dirty ring full.
+#(since 7.0)
+#


Can you explain what is measured here a bit more verbosely?
The two fields of migration info aims to export dirty-limit throttle 
time so that upper apps can check out the process of live migration, 
like 'cpu-throttle-percentage'.


The commit "tests: Add migration dirty-limit capability test" make use 
of the 'dirty-limit-throttle-us-per-full' to checkout if dirty-limit has 
started, the commit "tests/migration: Introduce dirty-limit into 
guestperf" introduce the two field so guestperf tools also show the 
process of dirty-limit migration.


And i also use qmp_query_migrate to observe the migration by checkout 
these two fields.


I'm not sure if above explantation is what you want exactly, please be 
free to start any discussion about this features.


Thank Markus.

Yong



  # Since: 0.14
  ##
  { 'struct': 'MigrationInfo',
@@ -259,7 +265,9 @@
 '*postcopy-blocktime' : 'uint32',
 '*postcopy-vcpu-blocktime': ['uint32'],
 '*compression': 'CompressionStats',
-   '*socket-address': ['SocketAddress'] } }
+   '*socket-address': ['SocketAddress'],
+   '*dirty-limit-throttle-us-per-full': 'int64',
+   '*dirty-limit-us-ring-full': 'int64'} }
  
  ##

  # @query-migrate:


[...]



--
Best regard

Hyman Huang(黄勇)



Re: [PATCH for 7.1] linux-user: fix compat with glibc >= 2.36 sys/mount.h

2022-10-01 Thread Andreas Schwab
On Aug 02 2022, Daniel P. Berrangé wrote:

> This patch removes linux/fs.h, meaning we have to define
> various FS_IOC constants that are now unavailable.

This breaks a lot of ioctl emulations, as it lacks their definitions:

#define BLKGETSIZE64   _IOR(0x12,114,size_t)
#define BLKDISCARD _IO(0x12,119)
#define BLKIOMIN   _IO(0x12,120)
#define BLKIOOPT   _IO(0x12,121)
#define BLKALIGNOFF_IO(0x12,122)
#define BLKPBSZGET _IO(0x12,123)
#define BLKDISCARDZEROES   _IO(0x12,124)
#define BLKSECDISCARD  _IO(0x12,125)
#define BLKROTATIONAL  _IO(0x12,126)
#define BLKZEROOUT _IO(0x12,127)

#define FIBMAP _IO(0x00,1)
#define FICLONE_IOW(0x94, 9, int)
#define FIGETBSZ   _IO(0x00,2)

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."



Re: [PATCH qemu] mips/malta: pass RNG seed to to kernel via env var

2022-10-01 Thread Jason A. Donenfeld
On Sat, Oct 1, 2022 at 9:32 PM Bernhard Reutner-Fischer
 wrote:
>
> On Sat, 1 Oct 2022 21:06:48 +0200
> "Jason A. Donenfeld"  wrote:
>
> > On Fri, Sep 30, 2022 at 04:05:20PM +0200, Jason A. Donenfeld wrote:
> > > With the kernel patch linked below, Linux ingests a RNG seed
> > > passed from the hypervisor. So, pass this for the Malta platform, and
> > > reinitialize it on reboot too, so that it's always fresh.
> > >
> > > Link: 
> > > https://lore.kernel.org/linux-mips/20220930140138.575751-1-ja...@zx2c4.com/
> >
> > The kernel side of this has now landed, so we can move ahead on the QEMU
> > side:
> > https://git.kernel.org/pub/scm/linux/kernel/git/mips/linux.git/commit/?id=056a68cea01edfa78b3474af1bfa39cc6bcc7bee
> >
>
> s/a RNG/an RNG/
>
> What about rngseed=""?
> len=min(4711,0)
> hex2bin(..0) will return false so ok.
> rndseed="0" is problably fine, but is it worthy and desired? 00, 0x0.
> Other than that, sounds plausible. IMHO.
> thanks,

Not sure I understand the substantive part of your message. You're
wondering whether it's okay to ingest length 1 (or one half?) inputs?
The kernel will take whatever the firmware gives it; that's by design.
The firmware in turn provides whatever it can, optimally 32 bytes as
QEMU does with this patch.

Maybe you could use some more words to describe what your thoughts are?

Jason



Re: access guest address from within instruction

2022-10-01 Thread BitFriends
well, it doesn't give errors, but warnings because of unsigned longs being
converted to TCGv_i64, which exact definiton I cannot find in the qemu
repo. Where is it located? When stepping through the instructions' code,
the value that should be read isn't read. Maybe that'll work when fixing
the warnings.

Regards

Am Sa., 1. Okt. 2022 um 22:23 Uhr schrieb Richard Henderson <
richard.hender...@linaro.org>:

> On 10/1/22 13:10, BitFriends wrote:
> > Hello,
> >
> > I am trying to create a custom instruction that accesses guest memory
> specified by an
> > address in a register. I specifically want to read from that address. So
> I tried to do
> > that using "tcg_gen_qemu_ld_i64(, env->regs[R_EDI], 0, MO_LEUQ);",
> but that doesn't
> > save any result in res.
>
> This statement should have given you compilation errors, so I don't know
> what you mean by
> "doesn't save any result".  There's clearly a disconnect between what you
> describe and
> what you actually attempted.
>
> Anyway, by the name you can see that function "gen"erates a "tcg"
> operation, which is then
> later compiled by the jit, the output of which is later executed to
> produce a result.
> Which is, in general, what you want for implementing a custom instruction.
>
>
> r~
>


Re: [PATCH v3 00/26] target/i386: pc-relative translation blocks

2022-10-01 Thread Paolo Bonzini
Il sab 1 ott 2022, 16:09 Richard Henderson 
ha scritto:

> This is the x86 specific changes required to reduce the
> amount of translation for address space randomization.
> For v3, quite a few changes based on Paolo's feedback.
>

Reviewed-by: Paolo Bonzini 


>
> r~
>
> Based-on: 20220930212622.108363-1-richard.hender...@linaro.org
> ("[PATCH v6 00/18] tcg: CPUTLBEntryFull and TARGET_TB_PCREL")
>
>
> Richard Henderson (26):
>   target/i386: Remove pc_start
>   target/i386: Return bool from disas_insn
>   target/i386: Remove cur_eip argument to gen_exception
>   target/i386: Remove cur_eip, next_eip arguments to gen_interrupt
>   target/i386: Create gen_update_eip_cur
>   target/i386: Create gen_update_eip_next
>   target/i386: Introduce DISAS_EOB*
>   target/i386: Use DISAS_EOB* in gen_movl_seg_T0
>   target/i386: Use DISAS_EOB_NEXT
>   target/i386: USe DISAS_EOB_ONLY
>   target/i386: Create cur_insn_len, cur_insn_len_i32
>   target/i386: Remove cur_eip, next_eip arguments to gen_repz*
>   target/i386: Introduce DISAS_JUMP
>   target/i386: Truncate values for lcall_real to i32
>   target/i386: Create eip_next_*
>   target/i386: Use DISAS_TOO_MANY to exit after gen_io_start
>   target/i386: Create gen_jmp_rel
>   target/i386: Use gen_jmp_rel for loop, repz, jecxz insns
>   target/i386: Use gen_jmp_rel for gen_jcc
>   target/i386: Use gen_jmp_rel for DISAS_TOO_MANY
>   target/i386: Remove MemOp argument to gen_op_j*_ecx
>   target/i386: Merge gen_jmp_tb and gen_goto_tb into gen_jmp_rel
>   target/i386: Create eip_cur_tl
>   target/i386: Add cpu_eip
>   target/i386: Inline gen_jmp_im
>   target/i386: Enable TARGET_TB_PCREL
>
>  target/i386/cpu-param.h  |   4 +
>  target/i386/helper.h |   2 +-
>  target/i386/tcg/seg_helper.c |   6 +-
>  target/i386/tcg/tcg-cpu.c|   8 +-
>  target/i386/tcg/translate.c  | 830 ++-
>  5 files changed, 448 insertions(+), 402 deletions(-)
>
> --
> 2.34.1
>
>


Re: access guest address from within instruction

2022-10-01 Thread Richard Henderson

On 10/1/22 13:10, BitFriends wrote:

Hello,

I am trying to create a custom instruction that accesses guest memory specified by an 
address in a register. I specifically want to read from that address. So I tried to do 
that using "tcg_gen_qemu_ld_i64(, env->regs[R_EDI], 0, MO_LEUQ);", but that doesn't 
save any result in res.


This statement should have given you compilation errors, so I don't know what you mean by 
"doesn't save any result".  There's clearly a disconnect between what you describe and 
what you actually attempted.


Anyway, by the name you can see that function "gen"erates a "tcg" operation, which is then 
later compiled by the jit, the output of which is later executed to produce a result. 
Which is, in general, what you want for implementing a custom instruction.



r~



access guest address from within instruction

2022-10-01 Thread BitFriends
Hello,

I am trying to create a custom instruction that accesses guest memory
specified by an address in a register. I specifically want to read from
that address. So I tried to do that using "tcg_gen_qemu_ld_i64(,
env->regs[R_EDI], 0, MO_LEUQ);", but that doesn't save any result in res.
So either my call is wrong, or I need to translate that guest address to a
usable host address. I can't find much about this stuff in the docs. Could
anyone help me out with that?

Cheers

BitFriends


[PATCH] tests/avocado: Add missing require_netdev('user') checks

2022-10-01 Thread Peter Maydell
Some avocado tests fail if QEMU was built without libslirp. Add
require_netdev('user') checks where necessary:

These tests try to ping 10.0.2.2 and expect it to succeed:
  boot_linux_console.py:BootLinuxConsole.test_arm_emcraft_sf2
  boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_sd
  ppc_bamboo.py:BambooMachine.test_ppc_bamboo

These tests run a commandline that includes '-net user':
  machine_aspeed.py:AST2x00Machine.test_arm_ast2500_evb_builroot
  (and others that use the do_test_arm_aspeed_buidroot_start()
  or do_test_arm_aspeed_sdk_start() helper functions)

These changes seem to be sufficient for 'make check-avocado'
to not fail on a --disable-slirp build.

Signed-off-by: Peter Maydell 
---
 tests/avocado/boot_linux_console.py | 4 
 tests/avocado/machine_aspeed.py | 3 +++
 tests/avocado/ppc_bamboo.py | 1 +
 3 files changed, 8 insertions(+)

diff --git a/tests/avocado/boot_linux_console.py 
b/tests/avocado/boot_linux_console.py
index f26e036ab58..ca9d09b0d7c 100644
--- a/tests/avocado/boot_linux_console.py
+++ b/tests/avocado/boot_linux_console.py
@@ -381,6 +381,8 @@ def test_arm_emcraft_sf2(self):
 :avocado: tags=u-boot
 :avocado: tags=accel:tcg
 """
+self.require_netdev('user')
+
 uboot_url = ('https://raw.githubusercontent.com/'
  'Subbaraya-Sundeep/qemu-test-binaries/'
  'fe371d32e50ca682391e1e70ab98c2942aeffb01/u-boot')
@@ -779,6 +781,8 @@ def test_arm_orangepi_sd(self):
 :avocado: tags=machine:orangepi-pc
 :avocado: tags=device:sd
 """
+self.require_netdev('user')
+
 deb_url = ('https://apt.armbian.com/pool/main/l/'

'linux-5.10.16-sunxi/linux-image-current-sunxi_21.02.2_armhf.deb')
 deb_hash = '9fa84beda245cabf0b4fa84cf6eaa7738ead1da0'
diff --git a/tests/avocado/machine_aspeed.py b/tests/avocado/machine_aspeed.py
index 0f64eb636c2..124649a24b5 100644
--- a/tests/avocado/machine_aspeed.py
+++ b/tests/avocado/machine_aspeed.py
@@ -93,6 +93,8 @@ def test_arm_ast2500_romulus_openbmc_v2_9_0(self):
 self.do_test_arm_aspeed(image_path)
 
 def do_test_arm_aspeed_buidroot_start(self, image, cpu_id):
+self.require_netdev('user')
+
 self.vm.set_console()
 self.vm.add_args('-drive', 'file=' + image + ',if=mtd,format=raw',
  '-net', 'nic', '-net', 'user')
@@ -193,6 +195,7 @@ def wait_for_console_pattern(self, success_message, 
vm=None):
  vm=vm)
 
 def do_test_arm_aspeed_sdk_start(self, image, cpu_id):
+self.require_netdev('user')
 self.vm.set_console()
 self.vm.add_args('-drive', 'file=' + image + ',if=mtd,format=raw',
  '-net', 'nic', '-net', 'user')
diff --git a/tests/avocado/ppc_bamboo.py b/tests/avocado/ppc_bamboo.py
index 102ff252dff..a81be3d6088 100644
--- a/tests/avocado/ppc_bamboo.py
+++ b/tests/avocado/ppc_bamboo.py
@@ -23,6 +23,7 @@ def test_ppc_bamboo(self):
 :avocado: tags=accel:tcg
 """
 self.require_accelerator("tcg")
+self.require_netdev('user')
 tar_url = ('http://landley.net/aboriginal/downloads/binaries/'
'system-image-powerpc-440fp.tar.gz')
 tar_hash = '53e5f16414b195b82d2c70272f81c2eedb39bad9'
-- 
2.25.1




Re: [PATCH qemu] mips/malta: pass RNG seed to to kernel via env var

2022-10-01 Thread Bernhard Reutner-Fischer
On Sat, 1 Oct 2022 21:06:48 +0200
"Jason A. Donenfeld"  wrote:

> On Fri, Sep 30, 2022 at 04:05:20PM +0200, Jason A. Donenfeld wrote:
> > With the kernel patch linked below, Linux ingests a RNG seed
> > passed from the hypervisor. So, pass this for the Malta platform, and
> > reinitialize it on reboot too, so that it's always fresh.
> > 
> > Link: 
> > https://lore.kernel.org/linux-mips/20220930140138.575751-1-ja...@zx2c4.com/ 
> >  
> 
> The kernel side of this has now landed, so we can move ahead on the QEMU
> side:
> https://git.kernel.org/pub/scm/linux/kernel/git/mips/linux.git/commit/?id=056a68cea01edfa78b3474af1bfa39cc6bcc7bee
> 

s/a RNG/an RNG/

What about rngseed=""?
len=min(4711,0)
hex2bin(..0) will return false so ok.
rndseed="0" is problably fine, but is it worthy and desired? 00, 0x0.
Other than that, sounds plausible. IMHO.
thanks,



Re: [PULL v2 00/15] x86 + misc changes for 2022-09-29

2022-10-01 Thread Paolo Bonzini
On Sat, Oct 1, 2022 at 1:01 AM Stefan Hajnoczi  wrote:
>
> This pull request doesn't build:
>
> ../meson.build:545:95: ERROR: Expecting endif got rparen.
> gdbus_codegen_error = '@0@ uses gdbus-codegen, which does not support
> control flow integrity')
>
> https://gitlab.com/qemu-project/qemu/-/jobs/3112498668

I'm really sorry. :( I have now pushed the delta, but I'll wait for CI
to pass and send a pull request on Monday.

Paolo




Re: [PATCH qemu] mips/malta: pass RNG seed to to kernel via env var

2022-10-01 Thread Jason A. Donenfeld
On Fri, Sep 30, 2022 at 04:05:20PM +0200, Jason A. Donenfeld wrote:
> With the kernel patch linked below, Linux ingests a RNG seed
> passed from the hypervisor. So, pass this for the Malta platform, and
> reinitialize it on reboot too, so that it's always fresh.
> 
> Link: 
> https://lore.kernel.org/linux-mips/20220930140138.575751-1-ja...@zx2c4.com/

The kernel side of this has now landed, so we can move ahead on the QEMU
side:
https://git.kernel.org/pub/scm/linux/kernel/git/mips/linux.git/commit/?id=056a68cea01edfa78b3474af1bfa39cc6bcc7bee



Re: [PATCH v1 5/8] migration: Export dirty-limit time info

2022-10-01 Thread Markus Armbruster
huang...@chinatelecom.cn writes:

> From: Hyman Huang(黄勇) 
>
> Export dirty limit throttle time and estimated ring full
> time, through which we can observe the process of dirty
> limit during live migration.
>
> Signed-off-by: Hyman Huang(黄勇) 

[...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index bc4bc96..c263d54 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -242,6 +242,12 @@
>  #   Present and non-empty when migration is blocked.
>  #   (since 6.0)
>  #
> +# @dirty-limit-throttle-us-per-full: Throttle time (us) during the period of
> +#dirty ring full (since 7.0)
> +#
> +# @dirty-limit-us-ring-full: Estimated periodic time (us) of dirty ring full.
> +#(since 7.0)
> +#

Can you explain what is measured here a bit more verbosely?

>  # Since: 0.14
>  ##
>  { 'struct': 'MigrationInfo',
> @@ -259,7 +265,9 @@
> '*postcopy-blocktime' : 'uint32',
> '*postcopy-vcpu-blocktime': ['uint32'],
> '*compression': 'CompressionStats',
> -   '*socket-address': ['SocketAddress'] } }
> +   '*socket-address': ['SocketAddress'],
> +   '*dirty-limit-throttle-us-per-full': 'int64',
> +   '*dirty-limit-us-ring-full': 'int64'} }
>  
>  ##
>  # @query-migrate:

[...]




[PATCH v3 41/42] target/arm: Implement FEAT_HAFDBS

2022-10-01 Thread Richard Henderson
Perform the atomic update for hardware management of the access flag
and the dirty bit.

A limitation of the implementation so far is that the page table
itself must already be writable, i.e. the dirty bit for the stage2
page table must already be set, i.e. we cannot set both dirty bits
at the same time.

This is allowed because it is CONSTRAINED UNPREDICTABLE whether any
atomic update happens at all.  The implementation is allowed to simply
fall back on software update at any time.

Signed-off-by: Richard Henderson 
---
 docs/system/arm/emulation.rst |   1 +
 target/arm/cpu64.c|   1 +
 target/arm/ptw.c  | 119 --
 3 files changed, 115 insertions(+), 6 deletions(-)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index be7bbffe59..c3582d075e 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -31,6 +31,7 @@ the following architecture extensions:
 - FEAT_FRINTTS (Floating-point to integer instructions)
 - FEAT_FlagM (Flag manipulation instructions v2)
 - FEAT_FlagM2 (Enhancements to flag manipulation instructions)
+- FEAT_HAFDBS (Hardware management of the access flag and dirty bit state)
 - FEAT_HCX (Support for the HCRX_EL2 register)
 - FEAT_HPDS (Hierarchical permission disables)
 - FEAT_I8MM (AArch64 Int8 matrix multiplication instructions)
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index e6314e86d2..b064dc7964 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -1116,6 +1116,7 @@ static void aarch64_max_initfn(Object *obj)
 cpu->isar.id_aa64mmfr0 = t;
 
 t = cpu->isar.id_aa64mmfr1;
+t = FIELD_DP64(t, ID_AA64MMFR1, HAFDBS, 2);   /* FEAT_HAFDBS */
 t = FIELD_DP64(t, ID_AA64MMFR1, VMIDBITS, 2); /* FEAT_VMID16 */
 t = FIELD_DP64(t, ID_AA64MMFR1, VH, 1);   /* FEAT_VHE */
 t = FIELD_DP64(t, ID_AA64MMFR1, HPDS, 1); /* FEAT_HPDS */
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 45734b0d28..14ab56d1b5 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -223,6 +223,7 @@ static bool S2_attrs_are_device(uint64_t hcr, uint8_t attrs)
 typedef struct {
 bool is_secure;
 bool be;
+bool rw;
 void *hphys;
 hwaddr gphys;
 } S1TranslateResult;
@@ -261,7 +262,8 @@ static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 pte_attrs = s2.cacheattrs.attrs;
 pte_secure = s2.f.attrs.secure;
 }
-res->hphys = NULL;
+res->hphys = NULL;  /* force slow path */
+res->rw = false;/* debug never modifies */
 } else {
 CPUTLBEntryFull *full;
 int flags;
@@ -276,6 +278,7 @@ static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 goto fail;
 }
 res->gphys = full->phys_addr;
+res->rw = full->prot & PAGE_WRITE;
 pte_attrs = full->pte_attrs;
 pte_secure = full->attrs.secure;
 }
@@ -381,6 +384,56 @@ static uint64_t arm_ldq_ptw(CPUARMState *env, const 
S1TranslateResult *s1,
 return data;
 }
 
+static uint64_t arm_casq_ptw(CPUARMState *env, uint64_t old_val,
+ uint64_t new_val, const S1TranslateResult *s1,
+ ARMMMUFaultInfo *fi)
+{
+uint64_t cur_val;
+
+if (unlikely(!s1->hphys)) {
+fi->type = ARMFault_UnsuppAtomicUpdate;
+fi->s1ptw = true;
+return 0;
+}
+
+#ifndef CONFIG_ATOMIC64
+/*
+ * We can't support the atomic operation on the host.  We should be
+ * running in round-robin mode though, which means that we would only
+ * race with dma i/o.
+ */
+qemu_mutex_lock_iothread();
+if (s1->be) {
+cur_val = ldq_be_p(s1->hphys);
+if (cur_val == old_val) {
+stq_be_p(s1->hphys, new_val);
+}
+} else {
+cur_val = ldq_le_p(s1->hphys);
+if (cur_val == old_val) {
+stq_le_p(s1->hphys, new_val);
+}
+}
+qemu_mutex_unlock_iothread();
+#else
+if (s1->be) {
+old_val = cpu_to_be64(old_val);
+new_val = cpu_to_be64(new_val);
+cur_val = qatomic_cmpxchg__nocheck((uint64_t *)s1->hphys,
+   old_val, new_val);
+cur_val = be64_to_cpu(cur_val);
+} else {
+old_val = cpu_to_le64(old_val);
+new_val = cpu_to_le64(new_val);
+cur_val = qatomic_cmpxchg__nocheck((uint64_t *)s1->hphys,
+   old_val, new_val);
+cur_val = le64_to_cpu(cur_val);
+}
+#endif
+
+return cur_val;
+}
+
 static bool get_level1_table_address(CPUARMState *env, ARMMMUIdx mmu_idx,
  uint32_t *table, uint32_t address)
 {
@@ -1290,6 +1343,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
 goto do_fault;
 }
 
+ restart_atomic_update:
 if (!(descriptor & 1) || (!(descriptor & 2) && (level == 3))) {
 /* Invalid, or the Reserved 

[PATCH v3 40/42] target/arm: Consider GP an attribute in get_phys_addr_lpae

2022-10-01 Thread Richard Henderson
Both GP and DBM are in the upper attribute block.
Extend the computation of attrs to include them,
then simplify the setting of guarded.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index c68fd73617..45734b0d28 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -1079,7 +1079,6 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
 uint32_t el = regime_el(env, mmu_idx);
 uint64_t descaddrmask;
 bool aarch64 = arm_el_is_aa64(env, el);
-bool guarded = false;
 S1TranslateResult s1;
 uint64_t descriptor;
 bool nstable;
@@ -1341,7 +1340,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
 descaddr &= ~(page_size - 1);
 descaddr |= (address & (page_size - 1));
 /* Extract attributes from the descriptor */
-attrs = descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(52, 12));
+attrs = descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(50, 14));
 
 if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 /* Stage 2 table descriptors do not include any attribute fields */
@@ -1349,7 +1348,6 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
 }
 /* Merge in attributes from table descriptors */
 attrs |= nstable << 5; /* NS */
-guarded = extract64(descriptor, 50, 1);  /* GP */
 if (param.hpd) {
 /* HPD disables all the table attributes except NSTable.  */
 goto skip_attrs;
@@ -1402,7 +1400,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
 
 /* When in aarch64 mode, and BTI is enabled, remember GP in the TLB.  */
 if (aarch64 && cpu_isar_feature(aa64_bti, cpu)) {
-result->f.guarded = guarded;
+result->f.guarded = extract64(attrs, 50, 1); /* GP */
 }
 
 if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
-- 
2.34.1




[PATCH v3 38/42] target/arm: Fix fault reporting in get_phys_addr_lpae

2022-10-01 Thread Richard Henderson
Always overriding fi->type was incorrect, as we would not properly
propagate the fault type from S1_ptw_translate, or arm_ldq_ptw.
Simplify things by providing a new label for a translation fault.
For other faults, store into fi directly.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 31 +--
 1 file changed, 13 insertions(+), 18 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index e6b385a8b1..01a27b30fb 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -1065,8 +1065,6 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
 {
 ARMCPU *cpu = env_archcpu(env);
-/* Read an LPAE long-descriptor translation table. */
-ARMFaultType fault_type = ARMFault_Translation;
 uint32_t level;
 ARMVAParameters param;
 uint64_t ttbr;
@@ -1104,8 +1102,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
  * so our choice is to always raise the fault.
  */
 if (param.tsz_oob) {
-fault_type = ARMFault_Translation;
-goto do_fault;
+goto do_translation_fault;
 }
 
 addrsize = 64 - 8 * param.tbi;
@@ -1142,8 +1139,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
addrsize - inputsize);
 if (-top_bits != param.select) {
 /* The gap between the two regions is a Translation fault */
-fault_type = ARMFault_Translation;
-goto do_fault;
+goto do_translation_fault;
 }
 }
 
@@ -1175,7 +1171,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
  * Translation table walk disabled => Translation fault on TLB miss
  * Note: This is always 0 on 64-bit EL2 and EL3.
  */
-goto do_fault;
+goto do_translation_fault;
 }
 
 if (mmu_idx != ARMMMUIdx_Stage2 && mmu_idx != ARMMMUIdx_Stage2_S) {
@@ -1206,8 +1202,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
 if (param.ds && stride == 9 && sl2) {
 if (sl0 != 0) {
 level = 0;
-fault_type = ARMFault_Translation;
-goto do_fault;
+goto do_translation_fault;
 }
 startlevel = -1;
 } else if (!aarch64 || stride == 9) {
@@ -1226,8 +1221,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
 ok = check_s2_mmu_setup(cpu, aarch64, startlevel,
 inputsize, stride, outputsize);
 if (!ok) {
-fault_type = ARMFault_Translation;
-goto do_fault;
+goto do_translation_fault;
 }
 level = startlevel;
 }
@@ -1249,7 +1243,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
 descaddr |= extract64(ttbr, 2, 4) << 48;
 } else if (descaddr >> outputsize) {
 level = 0;
-fault_type = ARMFault_AddressSize;
+fi->type = ARMFault_AddressSize;
 goto do_fault;
 }
 
@@ -1299,7 +1293,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
 
 if (!(descriptor & 1) || (!(descriptor & 2) && (level == 3))) {
 /* Invalid, or the Reserved level 3 encoding */
-goto do_fault;
+goto do_translation_fault;
 }
 
 descaddr = descriptor & descaddrmask;
@@ -1317,7 +1311,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
 descaddr |= extract64(descriptor, 12, 4) << 48;
 }
 } else if (descaddr >> outputsize) {
-fault_type = ARMFault_AddressSize;
+fi->type = ARMFault_AddressSize;
 goto do_fault;
 }
 
@@ -1374,9 +1368,9 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
  * Here descaddr is the final physical address, and attributes
  * are all in attrs.
  */
-fault_type = ARMFault_AccessFlag;
 if ((attrs & (1 << 8)) == 0) {
 /* Access flag */
+fi->type = ARMFault_AccessFlag;
 goto do_fault;
 }
 
@@ -1393,8 +1387,8 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
 result->f.prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
 }
 
-fault_type = ARMFault_Permission;
 if (!(result->f.prot & (1 << access_type))) {
+fi->type = ARMFault_Permission;
 goto do_fault;
 }
 
@@ -1439,8 +1433,9 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
 result->f.lg_page_size = ctz64(page_size);
 return false;
 
-do_fault:
-fi->type = fault_type;
+ do_translation_fault:
+fi->type = ARMFault_Translation;
+ do_fault:
 fi->level = level;
 /* Tag the error as S2 for failed S1 PTW at S2 or ordinary S2.  */
 fi->stage2 = fi->s1ptw || (mmu_idx == ARMMMUIdx_Stage2 ||
-- 
2.34.1




[PATCH v3 35/42] target/arm: Move S1_ptw_translate outside arm_ld[lq]_ptw

2022-10-01 Thread Richard Henderson
Separate S1 translation from the actual lookup.
Will enable lpae hardware updates.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 92 +---
 1 file changed, 48 insertions(+), 44 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index d356b0b22d..84b55b640b 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -315,38 +315,29 @@ static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 }
 
 /* All loads done in the course of a page table walk go through here. */
-static uint32_t arm_ldl_ptw(CPUARMState *env, hwaddr addr, bool is_secure,
-ARMMMUIdx mmu_idx, ARMMMUIdx ptw_idx,
-bool debug, ARMMMUFaultInfo *fi)
+static uint32_t arm_ldl_ptw(CPUARMState *env, const S1TranslateResult *s1,
+ARMMMUFaultInfo *fi)
 {
 CPUState *cs = env_cpu(env);
-S1TranslateResult s1;
 uint32_t data;
 
-if (!S1_ptw_translate(env, mmu_idx, ptw_idx, addr, is_secure,
-  debug, , fi)) {
-/* Failure. */
-assert(fi->s1ptw);
-return 0;
-}
-
-if (likely(s1.hphys)) {
+if (likely(s1->hphys)) {
 /* Page tables are in RAM, and we have the host address. */
-if (s1.be) {
-data = ldl_be_p(s1.hphys);
+if (s1->be) {
+data = ldl_be_p(s1->hphys);
 } else {
-data = ldl_le_p(s1.hphys);
+data = ldl_le_p(s1->hphys);
 }
 } else {
 /* Page tables are in MMIO. */
-MemTxAttrs attrs = { .secure = s1.is_secure };
+MemTxAttrs attrs = { .secure = s1->is_secure };
 AddressSpace *as = arm_addressspace(cs, attrs);
 MemTxResult result = MEMTX_OK;
 
-if (s1.be) {
-data = address_space_ldl_be(as, s1.gphys, attrs, );
+if (s1->be) {
+data = address_space_ldl_be(as, s1->gphys, attrs, );
 } else {
-data = address_space_ldl_le(as, s1.gphys, attrs, );
+data = address_space_ldl_le(as, s1->gphys, attrs, );
 }
 if (unlikely(result != MEMTX_OK)) {
 fi->type = ARMFault_SyncExternalOnWalk;
@@ -357,38 +348,29 @@ static uint32_t arm_ldl_ptw(CPUARMState *env, hwaddr 
addr, bool is_secure,
 return data;
 }
 
-static uint64_t arm_ldq_ptw(CPUARMState *env, hwaddr addr, bool is_secure,
-ARMMMUIdx mmu_idx, ARMMMUIdx ptw_idx,
-bool debug, ARMMMUFaultInfo *fi)
+static uint64_t arm_ldq_ptw(CPUARMState *env, const S1TranslateResult *s1,
+ARMMMUFaultInfo *fi)
 {
 CPUState *cs = env_cpu(env);
-S1TranslateResult s1;
 uint64_t data;
 
-if (!S1_ptw_translate(env, mmu_idx, ptw_idx, addr, is_secure,
-  debug, , fi)) {
-/* Failure. */
-assert(fi->s1ptw);
-return 0;
-}
-
-if (likely(s1.hphys)) {
+if (likely(s1->hphys)) {
 /* Page tables are in RAM, and we have the host address. */
-if (s1.be) {
-data = ldq_be_p(s1.hphys);
+if (s1->be) {
+data = ldq_be_p(s1->hphys);
 } else {
-data = ldq_le_p(s1.hphys);
+data = ldq_le_p(s1->hphys);
 }
 } else {
 /* Page tables are in MMIO. */
-MemTxAttrs attrs = { .secure = s1.is_secure };
+MemTxAttrs attrs = { .secure = s1->is_secure };
 AddressSpace *as = arm_addressspace(cs, attrs);
 MemTxResult result = MEMTX_OK;
 
-if (s1.be) {
-data = address_space_ldq_be(as, s1.gphys, attrs, );
+if (s1->be) {
+data = address_space_ldq_be(as, s1->gphys, attrs, );
 } else {
-data = address_space_ldq_le(as, s1.gphys, attrs, );
+data = address_space_ldq_le(as, s1->gphys, attrs, );
 }
 if (unlikely(result != MEMTX_OK)) {
 fi->type = ARMFault_SyncExternalOnWalk;
@@ -520,6 +502,7 @@ static bool get_phys_addr_v5(CPUARMState *env, uint32_t 
address,
 int domain = 0;
 int domain_prot;
 hwaddr phys_addr;
+S1TranslateResult s1;
 uint32_t dacr;
 
 /* Pagetable walk.  */
@@ -529,7 +512,11 @@ static bool get_phys_addr_v5(CPUARMState *env, uint32_t 
address,
 fi->type = ARMFault_Translation;
 goto do_fault;
 }
-desc = arm_ldl_ptw(env, table, is_secure, mmu_idx, ptw_idx, debug, fi);
+if (!S1_ptw_translate(env, mmu_idx, ptw_idx, table,
+  is_secure, debug, , fi)) {
+goto do_fault;
+}
+desc = arm_ldl_ptw(env, , fi);
 if (fi->type != ARMFault_None) {
 goto do_fault;
 }
@@ -567,7 +554,11 @@ static bool get_phys_addr_v5(CPUARMState *env, uint32_t 
address,
 /* Fine pagetable.  */
 table = (desc & 0xf000) | ((address >> 8) & 0xffc);
 }
-desc = arm_ldl_ptw(env, table, is_secure, mmu_idx, ptw_idx, 

[PATCH v3 32/42] target/arm: Extract HA and HD in aa64_va_parameters

2022-10-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/arm/internals.h | 2 ++
 target/arm/helper.c| 8 +++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index a50189e2e4..e95b6b1b8f 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1014,6 +1014,8 @@ typedef struct ARMVAParameters {
 bool using64k   : 1;
 bool tsz_oob: 1;  /* tsz has been clamped to legal range */
 bool ds : 1;
+bool ha : 1;
+bool hd : 1;
 } ARMVAParameters;
 
 ARMVAParameters aa64_va_parameters(CPUARMState *env, uint64_t va,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 19a03eb200..70ae3816b9 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -10280,7 +10280,7 @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, 
uint64_t va,
ARMMMUIdx mmu_idx, bool data)
 {
 uint64_t tcr = regime_tcr(env, mmu_idx);
-bool epd, hpd, using16k, using64k, tsz_oob, ds;
+bool epd, hpd, using16k, using64k, tsz_oob, ds, ha, hd;
 int select, tsz, tbi, max_tsz, min_tsz, ps, sh;
 ARMCPU *cpu = env_archcpu(env);
 
@@ -10298,6 +10298,8 @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, 
uint64_t va,
 epd = false;
 sh = extract32(tcr, 12, 2);
 ps = extract32(tcr, 16, 3);
+ha = extract32(tcr, 21, 1) && cpu_isar_feature(aa64_hafs, cpu);
+hd = extract32(tcr, 22, 1) && cpu_isar_feature(aa64_hdbs, cpu);
 ds = extract64(tcr, 32, 1);
 } else {
 /*
@@ -10322,6 +10324,8 @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, 
uint64_t va,
 hpd = extract64(tcr, 42, 1);
 }
 ps = extract64(tcr, 32, 3);
+ha = extract64(tcr, 39, 1) && cpu_isar_feature(aa64_hafs, cpu);
+hd = extract64(tcr, 40, 1) && cpu_isar_feature(aa64_hdbs, cpu);
 ds = extract64(tcr, 59, 1);
 }
 
@@ -10393,6 +10397,8 @@ ARMVAParameters aa64_va_parameters(CPUARMState *env, 
uint64_t va,
 .using64k = using64k,
 .tsz_oob = tsz_oob,
 .ds = ds,
+.ha = ha,
+.hd = ha & hd,
 };
 }
 
-- 
2.34.1




[PATCH v3 31/42] target/arm: Add isar predicates for FEAT_HAFDBS

2022-10-01 Thread Richard Henderson
The MMFR1 field may indicate support for hardware update of
access flag alone, or access flag and dirty bit.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 7108568685..e499a84850 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -4100,6 +4100,16 @@ static inline bool isar_feature_aa64_lva(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, VARANGE) != 0;
 }
 
+static inline bool isar_feature_aa64_hafs(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HAFDBS) != 0;
+}
+
+static inline bool isar_feature_aa64_hdbs(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HAFDBS) >= 2;
+}
+
 static inline bool isar_feature_aa64_tts2uxn(const ARMISARegisters *id)
 {
 return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, XNX) != 0;
-- 
2.34.1




[PATCH v3 42/42] target/arm: Use the max page size in a 2-stage ptw

2022-10-01 Thread Richard Henderson
We had only been reporting the stage2 page size.  This causes
problems if stage1 is using a larger page size (16k, 2M, etc),
but stage2 is using a smaller page size, because cputlb does
not set large_page_{addr,mask} properly.

Fix by using the max of the two page sizes.

Reported-by: Marc Zyngier 
Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 14ab56d1b5..985a5703c3 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -2550,7 +2550,7 @@ static bool get_phys_addr_twostage(CPUARMState *env, 
target_ulong address,
ARMMMUFaultInfo *fi)
 {
 hwaddr ipa;
-int s1_prot;
+int s1_prot, s1_lgpgsz;
 bool ret, ipa_secure, s2walk_secure;
 ARMCacheAttrs cacheattrs1;
 ARMMMUIdx s2_mmu_idx, s2_ptw_idx;
@@ -2592,6 +2592,7 @@ static bool get_phys_addr_twostage(CPUARMState *env, 
target_ulong address,
  * Save the stage1 results so that we may merge prot and cacheattrs later.
  */
 s1_prot = result->f.prot;
+s1_lgpgsz = result->f.lg_page_size;
 cacheattrs1 = result->cacheattrs;
 memset(result, 0, sizeof(*result));
 
@@ -2607,6 +2608,14 @@ static bool get_phys_addr_twostage(CPUARMState *env, 
target_ulong address,
 return ret;
 }
 
+/*
+ * Use the maximum of the S1 & S2 page size, so that invalidation
+ * of pages > TARGET_PAGE_SIZE works correctly.
+ */
+if (result->f.lg_page_size < s1_lgpgsz) {
+result->f.lg_page_size = s1_lgpgsz;
+}
+
 /* Combine the S1 and S2 cache attributes. */
 hcr = arm_hcr_el2_eff_secstate(env, is_secure);
 if (hcr & HCR_DC) {
-- 
2.34.1




[PATCH v3 27/42] target/arm: Use softmmu tlbs for page table walking

2022-10-01 Thread Richard Henderson
So far, limit the change to S1_ptw_translate, arm_ldl_ptw, and
arm_ldq_ptw.  Use probe_access_full to find the host address,
and if so use a host load.  If the probe fails, we've got our
fault info already.  On the off chance that page tables are not
in RAM, continue to use the address_space_ld* functions.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h|   5 +
 target/arm/ptw.c| 207 ++--
 target/arm/tlb_helper.c |  17 +++-
 3 files changed, 155 insertions(+), 74 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 732c0c00ac..7108568685 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -225,6 +225,8 @@ typedef struct CPUARMTBFlags {
 target_ulong flags2;
 } CPUARMTBFlags;
 
+typedef struct ARMMMUFaultInfo ARMMMUFaultInfo;
+
 typedef struct CPUArchState {
 /* Regs for current mode.  */
 uint32_t regs[16];
@@ -715,6 +717,9 @@ typedef struct CPUArchState {
 struct CPUBreakpoint *cpu_breakpoint[16];
 struct CPUWatchpoint *cpu_watchpoint[16];
 
+/* Optional fault info across tlb lookup. */
+ARMMMUFaultInfo *tlb_fi;
+
 /* Fields up to this point are cleared by a CPU reset */
 struct {} end_reset_fields;
 
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 45adb9d5a9..ba496c3421 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -9,6 +9,7 @@
 #include "qemu/osdep.h"
 #include "qemu/log.h"
 #include "qemu/range.h"
+#include "exec/exec-all.h"
 #include "cpu.h"
 #include "internals.h"
 #include "idau.h"
@@ -191,7 +192,7 @@ static bool regime_translation_disabled(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
 }
 
-static bool ptw_attrs_are_device(uint64_t hcr, ARMCacheAttrs cacheattrs)
+static bool S2_attrs_are_device(uint64_t hcr, uint8_t attrs)
 {
 /*
  * For an S1 page table walk, the stage 1 attributes are always
@@ -202,41 +203,72 @@ static bool ptw_attrs_are_device(uint64_t hcr, 
ARMCacheAttrs cacheattrs)
  * With HCR_EL2.FWB == 1 this is when descriptor bit [4] is 0, ie
  * when cacheattrs.attrs bit [2] is 0.
  */
-assert(cacheattrs.is_s2_format);
 if (hcr & HCR_FWB) {
-return (cacheattrs.attrs & 0x4) == 0;
+return (attrs & 0x4) == 0;
 } else {
-return (cacheattrs.attrs & 0xc) == 0;
+return (attrs & 0xc) == 0;
 }
 }
 
 /* Translate a S1 pagetable walk through S2 if needed.  */
-static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
-   hwaddr addr, bool *is_secure_ptr, bool debug,
-   ARMMMUFaultInfo *fi)
+static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx, hwaddr addr,
+ bool *is_secure_ptr, void **hphys, hwaddr *gphys,
+ bool debug, ARMMMUFaultInfo *fi)
 {
 bool is_secure = *is_secure_ptr;
 ARMMMUIdx s2_mmu_idx = is_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
+bool s2_phys = false;
+uint8_t pte_attrs;
+bool pte_secure;
 
-if (arm_mmu_idx_is_stage1_of_2(mmu_idx) &&
-!regime_translation_disabled(env, s2_mmu_idx, is_secure)) {
-GetPhysAddrResult s2 = {};
-uint64_t hcr;
-int ret;
+if (!arm_mmu_idx_is_stage1_of_2(mmu_idx)
+|| regime_translation_disabled(env, s2_mmu_idx, is_secure)) {
+s2_mmu_idx = is_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS;
+s2_phys = true;
+}
 
-ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, s2_mmu_idx,
- is_secure, false, debug, , fi);
-if (ret) {
-assert(fi->type != ARMFault_None);
-fi->s2addr = addr;
-fi->stage2 = true;
-fi->s1ptw = true;
-fi->s1ns = !is_secure;
-return ~0;
+if (unlikely(debug)) {
+/*
+ * From gdbstub, do not use softmmu so that we don't modify the
+ * state of the cpu at all, including softmmu tlb contents.
+ */
+if (s2_phys) {
+*gphys = addr;
+pte_attrs = 0;
+pte_secure = is_secure;
+} else {
+GetPhysAddrResult s2 = { };
+if (!get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, s2_mmu_idx,
+is_secure, false, debug, , fi)) {
+goto fail;
+}
+*gphys = s2.f.phys_addr;
+pte_attrs = s2.cacheattrs.attrs;
+pte_secure = s2.f.attrs.secure;
 }
+*hphys = NULL;
+} else {
+CPUTLBEntryFull *full;
+int flags;
 
-hcr = arm_hcr_el2_eff_secstate(env, is_secure);
-if ((hcr & HCR_PTW) && ptw_attrs_are_device(hcr, s2.cacheattrs)) {
+env->tlb_fi = fi;
+flags = probe_access_full(env, addr, MMU_DATA_LOAD,
+  arm_to_core_mmu_idx(s2_mmu_idx),
+  true, hphys, , 0);
+

[PATCH v3 39/42] target/arm: Don't shift attrs in get_phys_addr_lpae

2022-10-01 Thread Richard Henderson
Leave the upper and lower attributes in the place they originate
from in the descriptor.  Shifting them around is confusing, since
one cannot read the bit numbers out of the manual.  Also, new
attributes have been added which would alter the shifts.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 31 +++
 1 file changed, 15 insertions(+), 16 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 01a27b30fb..c68fd73617 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -1071,7 +1071,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
 hwaddr descaddr, indexmask, indexmask_grainsize;
 uint32_t tableattrs;
 target_ulong page_size;
-uint32_t attrs;
+uint64_t attrs;
 int32_t stride;
 int addrsize, inputsize, outputsize;
 uint64_t tcr = regime_tcr(env, mmu_idx);
@@ -1341,49 +1341,48 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
uint64_t address,
 descaddr &= ~(page_size - 1);
 descaddr |= (address & (page_size - 1));
 /* Extract attributes from the descriptor */
-attrs = extract64(descriptor, 2, 10)
-| (extract64(descriptor, 52, 12) << 10);
+attrs = descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(52, 12));
 
 if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 /* Stage 2 table descriptors do not include any attribute fields */
 goto skip_attrs;
 }
 /* Merge in attributes from table descriptors */
-attrs |= nstable << 3; /* NS */
+attrs |= nstable << 5; /* NS */
 guarded = extract64(descriptor, 50, 1);  /* GP */
 if (param.hpd) {
 /* HPD disables all the table attributes except NSTable.  */
 goto skip_attrs;
 }
-attrs |= extract32(tableattrs, 0, 2) << 11; /* XN, PXN */
+attrs |= extract64(tableattrs, 0, 2) << 53; /* XN, PXN */
 /*
  * The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
  * means "force PL1 access only", which means forcing AP[1] to 0.
  */
-attrs &= ~(extract32(tableattrs, 2, 1) << 4);   /* !APT[0] => AP[1] */
-attrs |= extract32(tableattrs, 3, 1) << 5;  /* APT[1] => AP[2] */
+attrs &= ~(extract64(tableattrs, 2, 1) << 6);   /* !APT[0] => AP[1] */
+attrs |= extract32(tableattrs, 3, 1) << 7;  /* APT[1] => AP[2] */
  skip_attrs:
 
 /*
  * Here descaddr is the final physical address, and attributes
  * are all in attrs.
  */
-if ((attrs & (1 << 8)) == 0) {
+if ((attrs & (1 << 10)) == 0) {
 /* Access flag */
 fi->type = ARMFault_AccessFlag;
 goto do_fault;
 }
 
-ap = extract32(attrs, 4, 2);
+ap = extract32(attrs, 6, 2);
 
 if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 ns = mmu_idx == ARMMMUIdx_Stage2;
-xn = extract32(attrs, 11, 2);
+xn = extract64(attrs, 54, 2);
 result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
 } else {
-ns = extract32(attrs, 3, 1);
-xn = extract32(attrs, 12, 1);
-pxn = extract32(attrs, 11, 1);
+ns = extract32(attrs, 5, 1);
+xn = extract64(attrs, 54, 1);
+pxn = extract64(attrs, 53, 1);
 result->f.prot = get_S1prot(env, mmu_idx, aarch64, ap, ns, xn, pxn);
 }
 
@@ -1408,10 +1407,10 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
uint64_t address,
 
 if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
 result->cacheattrs.is_s2_format = true;
-result->cacheattrs.attrs = extract32(attrs, 0, 4);
+result->cacheattrs.attrs = extract32(attrs, 2, 4);
 } else {
 /* Index into MAIR registers for cache attributes */
-uint8_t attrindx = extract32(attrs, 0, 3);
+uint8_t attrindx = extract32(attrs, 2, 3);
 uint64_t mair = env->cp15.mair_el[regime_el(env, mmu_idx)];
 assert(attrindx <= 7);
 result->cacheattrs.is_s2_format = false;
@@ -1426,7 +1425,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
 if (param.ds) {
 result->cacheattrs.shareability = param.sh;
 } else {
-result->cacheattrs.shareability = extract32(attrs, 6, 2);
+result->cacheattrs.shareability = extract32(attrs, 8, 2);
 }
 
 result->f.phys_addr = descaddr;
-- 
2.34.1




[PATCH v3 24/42] target/arm: Add ARMMMUIdx_Phys_{S,NS}

2022-10-01 Thread Richard Henderson
Not yet used, but add mmu indexes for 1-1 mapping
to physical addresses.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu-param.h |  2 +-
 target/arm/cpu.h   |  7 ++-
 target/arm/ptw.c   | 19 +--
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index 689a9645dc..98bd9e435e 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -40,6 +40,6 @@
 bool guarded;
 #endif
 
-#define NB_MMU_MODES 8
+#define NB_MMU_MODES 10
 
 #endif
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index c8cad2ef7c..0effa85c56 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2899,8 +2899,9 @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  * EL2 EL2&0 +PAN
  * EL2 (aka NS PL2)
  * EL3 (aka S PL1)
+ * Physical (NS & S)
  *
- * for a total of 8 different mmu_idx.
+ * for a total of 10 different mmu_idx.
  *
  * R profile CPUs have an MPU, but can use the same set of MMU indexes
  * as A profile. They only need to distinguish EL0 and EL1 (and
@@ -2965,6 +2966,10 @@ typedef enum ARMMMUIdx {
 ARMMMUIdx_E2= 6 | ARM_MMU_IDX_A,
 ARMMMUIdx_E3= 7 | ARM_MMU_IDX_A,
 
+/* TLBs with 1-1 mapping to the physical address spaces. */
+ARMMMUIdx_Phys_NS   = 8 | ARM_MMU_IDX_A,
+ARMMMUIdx_Phys_S= 9 | ARM_MMU_IDX_A,
+
 /*
  * These are not allocated TLBs and are used only for AT system
  * instructions or for the first stage of an S12 page table walk.
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index ccfef2caca..05dcacf45b 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -179,6 +179,11 @@ static bool regime_translation_disabled(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 case ARMMMUIdx_E3:
 break;
 
+case ARMMMUIdx_Phys_NS:
+case ARMMMUIdx_Phys_S:
+/* No translation for physical address spaces. */
+return true;
+
 default:
 g_assert_not_reached();
 }
@@ -2286,10 +2291,17 @@ static bool get_phys_addr_disabled(CPUARMState *env, 
target_ulong address,
 {
 uint8_t memattr = 0x00;/* Device nGnRnE */
 uint8_t shareability = 0;  /* non-sharable */
+int r_el;
 
-if (mmu_idx != ARMMMUIdx_Stage2 && mmu_idx != ARMMMUIdx_Stage2_S) {
-int r_el = regime_el(env, mmu_idx);
+switch (mmu_idx) {
+case ARMMMUIdx_Stage2:
+case ARMMMUIdx_Stage2_S:
+case ARMMMUIdx_Phys_NS:
+case ARMMMUIdx_Phys_S:
+break;
 
+default:
+r_el = regime_el(env, mmu_idx);
 if (arm_el_is_aa64(env, r_el)) {
 int pamax = arm_pamax(env_archcpu(env));
 uint64_t tcr = env->cp15.tcr_el[r_el];
@@ -2338,6 +2350,7 @@ static bool get_phys_addr_disabled(CPUARMState *env, 
target_ulong address,
 shareability = 2; /* outer sharable */
 }
 result->cacheattrs.is_s2_format = false;
+break;
 }
 
 result->f.phys_addr = address;
@@ -2543,6 +2556,7 @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
 is_secure = arm_is_secure_below_el3(env);
 break;
 case ARMMMUIdx_Stage2:
+case ARMMMUIdx_Phys_NS:
 case ARMMMUIdx_MPrivNegPri:
 case ARMMMUIdx_MUserNegPri:
 case ARMMMUIdx_MPriv:
@@ -2551,6 +2565,7 @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
 break;
 case ARMMMUIdx_E3:
 case ARMMMUIdx_Stage2_S:
+case ARMMMUIdx_Phys_S:
 case ARMMMUIdx_MSPrivNegPri:
 case ARMMMUIdx_MSUserNegPri:
 case ARMMMUIdx_MSPriv:
-- 
2.34.1




[PATCH v3 33/42] target/arm: Split out S1TranslateResult type

2022-10-01 Thread Richard Henderson
Consolidate the results of S1_ptw_translate in one struct.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 70 +---
 1 file changed, 36 insertions(+), 34 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 7a77bea2c7..99ad894180 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -220,13 +220,18 @@ static bool S2_attrs_are_device(uint64_t hcr, uint8_t 
attrs)
 }
 }
 
+typedef struct {
+bool is_secure;
+void *hphys;
+hwaddr gphys;
+} S1TranslateResult;
+
 /* Translate a S1 pagetable walk through S2 if needed.  */
 static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
  ARMMMUIdx s2_mmu_idx, hwaddr addr,
- bool *is_secure_ptr, void **hphys, hwaddr *gphys,
- bool debug, ARMMMUFaultInfo *fi)
+ bool is_secure, bool debug,
+ S1TranslateResult *res, ARMMMUFaultInfo *fi)
 {
-bool is_secure = *is_secure_ptr;
 uint8_t pte_attrs;
 bool s2_phys, pte_secure;
 
@@ -238,7 +243,7 @@ static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx,
  * state of the cpu at all, including softmmu tlb contents.
  */
 if (s2_phys) {
-*gphys = addr;
+res->gphys = addr;
 pte_attrs = 0;
 pte_secure = is_secure;
 } else {
@@ -251,11 +256,11 @@ static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 , fi)) {
 goto fail;
 }
-*gphys = s2.f.phys_addr;
+res->gphys = s2.f.phys_addr;
 pte_attrs = s2.cacheattrs.attrs;
 pte_secure = s2.f.attrs.secure;
 }
-*hphys = NULL;
+res->hphys = NULL;
 } else {
 CPUTLBEntryFull *full;
 int flags;
@@ -263,13 +268,13 @@ static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 env->tlb_fi = fi;
 flags = probe_access_full(env, addr, MMU_DATA_LOAD,
   arm_to_core_mmu_idx(s2_mmu_idx),
-  true, hphys, , 0);
+  true, >hphys, , 0);
 env->tlb_fi = NULL;
 
 if (unlikely(flags & TLB_INVALID_MASK)) {
 goto fail;
 }
-*gphys = full->phys_addr;
+res->gphys = full->phys_addr;
 pte_attrs = full->pte_attrs;
 pte_secure = full->attrs.secure;
 }
@@ -291,12 +296,11 @@ static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 }
 }
 
-if (is_secure) {
-/* Check if page table walk is to secure or non-secure PA space. */
-*is_secure_ptr = !(pte_secure
-   ? env->cp15.vstcr_el2 & VSTCR_SW
-   : env->cp15.vtcr_el2 & VTCR_NSW);
-}
+/* Check if page table walk is to secure or non-secure PA space. */
+res->is_secure = (is_secure &&
+  !(pte_secure
+? env->cp15.vstcr_el2 & VSTCR_SW
+: env->cp15.vtcr_el2 & VTCR_NSW));
 return true;
 
  fail:
@@ -314,36 +318,35 @@ static uint32_t arm_ldl_ptw(CPUARMState *env, hwaddr 
addr, bool is_secure,
 bool debug, ARMMMUFaultInfo *fi)
 {
 CPUState *cs = env_cpu(env);
-void *hphys;
-hwaddr gphys;
+S1TranslateResult s1;
 uint32_t data;
 bool be;
 
-if (!S1_ptw_translate(env, mmu_idx, ptw_idx, addr, _secure,
-  , , debug, fi)) {
+if (!S1_ptw_translate(env, mmu_idx, ptw_idx, addr, is_secure,
+  debug, , fi)) {
 /* Failure. */
 assert(fi->s1ptw);
 return 0;
 }
 
 be = regime_translation_big_endian(env, mmu_idx);
-if (likely(hphys)) {
+if (likely(s1.hphys)) {
 /* Page tables are in RAM, and we have the host address. */
 if (be) {
-data = ldl_be_p(hphys);
+data = ldl_be_p(s1.hphys);
 } else {
-data = ldl_le_p(hphys);
+data = ldl_le_p(s1.hphys);
 }
 } else {
 /* Page tables are in MMIO. */
-MemTxAttrs attrs = { .secure = is_secure };
+MemTxAttrs attrs = { .secure = s1.is_secure };
 AddressSpace *as = arm_addressspace(cs, attrs);
 MemTxResult result = MEMTX_OK;
 
 if (be) {
-data = address_space_ldl_be(as, gphys, attrs, );
+data = address_space_ldl_be(as, s1.gphys, attrs, );
 } else {
-data = address_space_ldl_le(as, gphys, attrs, );
+data = address_space_ldl_le(as, s1.gphys, attrs, );
 }
 if (unlikely(result != MEMTX_OK)) {
 fi->type = ARMFault_SyncExternalOnWalk;
@@ -359,36 +362,35 @@ static uint64_t arm_ldq_ptw(CPUARMState *env, hwaddr 
addr, bool is_secure,
 

[PATCH v3 37/42] target/arm: Remove loop from get_phys_addr_lpae

2022-10-01 Thread Richard Henderson
The unconditional loop was used both to iterate over levels
and to control parsing of attributes.  Use an explicit goto
in both cases.

While this appears less clean for iterating over levels, we
will need to jump back into the middle of this loop for
atomic updates, which is even uglier.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 176 +++
 1 file changed, 88 insertions(+), 88 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 84b55b640b..e6b385a8b1 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -1082,6 +1082,9 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
 uint64_t descaddrmask;
 bool aarch64 = arm_el_is_aa64(env, el);
 bool guarded = false;
+S1TranslateResult s1;
+uint64_t descriptor;
+bool nstable;
 
 /* TODO: This code does not support shareability levels. */
 if (aarch64) {
@@ -1280,96 +1283,93 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
uint64_t address,
  * bits at each step.
  */
 tableattrs = is_secure ? 0 : (1 << 4);
-for (;;) {
-S1TranslateResult s1;
-uint64_t descriptor;
-bool nstable;
 
-descaddr |= (address >> (stride * (4 - level))) & indexmask;
-descaddr &= ~7ULL;
-nstable = extract32(tableattrs, 4, 1);
-if (!S1_ptw_translate(env, mmu_idx, ptw_idx, descaddr,
-  !nstable, debug, , fi)) {
-goto do_fault;
-}
-descriptor = arm_ldq_ptw(env, , fi);
-if (fi->type != ARMFault_None) {
-goto do_fault;
-}
-
-if (!(descriptor & 1) ||
-(!(descriptor & 2) && (level == 3))) {
-/* Invalid, or the Reserved level 3 encoding */
-goto do_fault;
-}
-
-descaddr = descriptor & descaddrmask;
-
-/*
- * For FEAT_LPA and PS=6, bits [51:48] of descaddr are in [15:12]
- * of descriptor.  For FEAT_LPA2 and effective DS, bits [51:50] of
- * descaddr are in [9:8].  Otherwise, if descaddr is out of range,
- * raise AddressSizeFault.
- */
-if (outputsize > 48) {
-if (param.ds) {
-descaddr |= extract64(descriptor, 8, 2) << 50;
-} else {
-descaddr |= extract64(descriptor, 12, 4) << 48;
-}
-} else if (descaddr >> outputsize) {
-fault_type = ARMFault_AddressSize;
-goto do_fault;
-}
-
-if ((descriptor & 2) && (level < 3)) {
-/*
- * Table entry. The top five bits are attributes which may
- * propagate down through lower levels of the table (and
- * which are all arranged so that 0 means "no effect", so
- * we can gather them up by ORing in the bits at each level).
- */
-tableattrs |= extract64(descriptor, 59, 5);
-level++;
-indexmask = indexmask_grainsize;
-continue;
-}
-/*
- * Block entry at level 1 or 2, or page entry at level 3.
- * These are basically the same thing, although the number
- * of bits we pull in from the vaddr varies. Note that although
- * descaddrmask masks enough of the low bits of the descriptor
- * to give a correct page or table address, the address field
- * in a block descriptor is smaller; so we need to explicitly
- * clear the lower bits here before ORing in the low vaddr bits.
- */
-page_size = (1ULL << ((stride * (4 - level)) + 3));
-descaddr &= ~(hwaddr)(page_size - 1);
-descaddr |= (address & (page_size - 1));
-/* Extract attributes from the descriptor */
-attrs = extract64(descriptor, 2, 10)
-| (extract64(descriptor, 52, 12) << 10);
-
-if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
-/* Stage 2 table descriptors do not include any attribute fields */
-break;
-}
-/* Merge in attributes from table descriptors */
-attrs |= nstable << 3; /* NS */
-guarded = extract64(descriptor, 50, 1);  /* GP */
-if (param.hpd) {
-/* HPD disables all the table attributes except NSTable.  */
-break;
-}
-attrs |= extract32(tableattrs, 0, 2) << 11; /* XN, PXN */
-/*
- * The sense of AP[1] vs APTable[0] is reversed, as APTable[0] == 1
- * means "force PL1 access only", which means forcing AP[1] to 0.
- */
-attrs &= ~(extract32(tableattrs, 2, 1) << 4);   /* !APT[0] => AP[1] */
-attrs |= extract32(tableattrs, 3, 1) << 5;  /* APT[1] => AP[2] */
-break;
+ next_level:
+descaddr |= (address >> (stride * (4 - level))) & indexmask;
+descaddr &= ~7ULL;
+nstable = extract32(tableattrs, 4, 1);
+if (!S1_ptw_translate(env, mmu_idx, ptw_idx, 

[PATCH v3 30/42] target/arm: Add ptw_idx argument to S1_ptw_translate

2022-10-01 Thread Richard Henderson
Hoist the computation of the mmu_idx for the ptw up to
get_phys_addr_with_secure_debug and get_phys_addr_twostage.
This removes the duplicate check for stage2 disabled
from the middle of the walk, performing it only once.

Pass ptw_idx through get_phys_addr_{v5,v6,lpae} and arm_{ldl,ldq}_ptw.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 104 ---
 1 file changed, 71 insertions(+), 33 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 445382ab03..7a77bea2c7 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -17,7 +17,8 @@
 
 static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
MMUAccessType access_type, ARMMMUIdx mmu_idx,
-   bool is_secure, bool s1_is_el0, bool debug,
+   ARMMMUIdx ptw_idx, bool is_secure,
+   bool s1_is_el0, bool debug,
GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
 __attribute__((nonnull));
 
@@ -220,21 +221,16 @@ static bool S2_attrs_are_device(uint64_t hcr, uint8_t 
attrs)
 }
 
 /* Translate a S1 pagetable walk through S2 if needed.  */
-static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx, hwaddr addr,
+static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
+ ARMMMUIdx s2_mmu_idx, hwaddr addr,
  bool *is_secure_ptr, void **hphys, hwaddr *gphys,
  bool debug, ARMMMUFaultInfo *fi)
 {
 bool is_secure = *is_secure_ptr;
-ARMMMUIdx s2_mmu_idx = is_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
-bool s2_phys = false;
 uint8_t pte_attrs;
-bool pte_secure;
+bool s2_phys, pte_secure;
 
-if (!arm_mmu_idx_is_stage1_of_2(mmu_idx)
-|| regime_translation_disabled(env, s2_mmu_idx, is_secure)) {
-s2_mmu_idx = is_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS;
-s2_phys = true;
-}
+s2_phys = s2_mmu_idx == ARMMMUIdx_Phys_S || s2_mmu_idx == 
ARMMMUIdx_Phys_NS;
 
 if (unlikely(debug)) {
 /*
@@ -247,8 +243,12 @@ static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx, hwaddr addr,
 pte_secure = is_secure;
 } else {
 GetPhysAddrResult s2 = { };
+ARMMMUIdx phys_idx = (is_secure ? ARMMMUIdx_Phys_S
+  : ARMMMUIdx_Phys_NS);
+
 if (!get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, s2_mmu_idx,
-is_secure, false, debug, , fi)) {
+phys_idx, is_secure, false, debug,
+, fi)) {
 goto fail;
 }
 *gphys = s2.f.phys_addr;
@@ -310,7 +310,8 @@ static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx, hwaddr addr,
 
 /* All loads done in the course of a page table walk go through here. */
 static uint32_t arm_ldl_ptw(CPUARMState *env, hwaddr addr, bool is_secure,
-ARMMMUIdx mmu_idx, bool debug, ARMMMUFaultInfo *fi)
+ARMMMUIdx mmu_idx, ARMMMUIdx ptw_idx,
+bool debug, ARMMMUFaultInfo *fi)
 {
 CPUState *cs = env_cpu(env);
 void *hphys;
@@ -318,7 +319,7 @@ static uint32_t arm_ldl_ptw(CPUARMState *env, hwaddr addr, 
bool is_secure,
 uint32_t data;
 bool be;
 
-if (!S1_ptw_translate(env, mmu_idx, addr, _secure,
+if (!S1_ptw_translate(env, mmu_idx, ptw_idx, addr, _secure,
   , , debug, fi)) {
 /* Failure. */
 assert(fi->s1ptw);
@@ -354,7 +355,8 @@ static uint32_t arm_ldl_ptw(CPUARMState *env, hwaddr addr, 
bool is_secure,
 }
 
 static uint64_t arm_ldq_ptw(CPUARMState *env, hwaddr addr, bool is_secure,
-ARMMMUIdx mmu_idx, bool debug, ARMMMUFaultInfo *fi)
+ARMMMUIdx mmu_idx, ARMMMUIdx ptw_idx,
+bool debug, ARMMMUFaultInfo *fi)
 {
 CPUState *cs = env_cpu(env);
 void *hphys;
@@ -362,7 +364,7 @@ static uint64_t arm_ldq_ptw(CPUARMState *env, hwaddr addr, 
bool is_secure,
 uint64_t data;
 bool be;
 
-if (!S1_ptw_translate(env, mmu_idx, addr, _secure,
+if (!S1_ptw_translate(env, mmu_idx, ptw_idx, addr, _secure,
   , , debug, fi)) {
 /* Failure. */
 assert(fi->s1ptw);
@@ -507,7 +509,7 @@ static int simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx 
mmu_idx, int ap)
 
 static bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
  MMUAccessType access_type, ARMMMUIdx mmu_idx,
- bool is_secure, bool debug,
+ ARMMMUIdx ptw_idx, bool is_secure, bool debug,
  GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
 {
 int level = 1;
@@ -527,7 +529,7 @@ static bool 

[PATCH v3 22/42] target/arm: Use probe_access_full for MTE

2022-10-01 Thread Richard Henderson
The CPUTLBEntryFull structure now stores the original pte attributes, as
well as the physical address.  Therefore, we no longer need a separate
bit in MemTxAttrs, nor do we need to walk the tree of memory regions.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h   |  1 -
 target/arm/sve_ldst_internal.h |  1 +
 target/arm/mte_helper.c| 61 +-
 target/arm/sve_helper.c| 54 ++
 target/arm/tlb_helper.c|  4 ---
 5 files changed, 35 insertions(+), 86 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 0f82f4aa1d..2694a93894 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3394,7 +3394,6 @@ static inline MemTxAttrs *typecheck_memtxattrs(MemTxAttrs 
*x)
  * generic target bits directly.
  */
 #define arm_tlb_bti_gp(x) (typecheck_memtxattrs(x)->target_tlb_bit0)
-#define arm_tlb_mte_tagged(x) (typecheck_memtxattrs(x)->target_tlb_bit1)
 
 /*
  * AArch64 usage of the PAGE_TARGET_* bits for linux-user.
diff --git a/target/arm/sve_ldst_internal.h b/target/arm/sve_ldst_internal.h
index b5c473fc48..4f159ec4ad 100644
--- a/target/arm/sve_ldst_internal.h
+++ b/target/arm/sve_ldst_internal.h
@@ -134,6 +134,7 @@ typedef struct {
 void *host;
 int flags;
 MemTxAttrs attrs;
+bool tagged;
 } SVEHostPage;
 
 bool sve_probe_page(SVEHostPage *info, bool nofault, CPUARMState *env,
diff --git a/target/arm/mte_helper.c b/target/arm/mte_helper.c
index fdd23ab3f8..a81c4a3318 100644
--- a/target/arm/mte_helper.c
+++ b/target/arm/mte_helper.c
@@ -105,10 +105,9 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int 
ptr_mmu_idx,
   TARGET_PAGE_BITS - LOG2_TAG_GRANULE - 1);
 return tags + index;
 #else
-uintptr_t index;
 CPUTLBEntryFull *full;
+MemTxAttrs attrs;
 int in_page, flags;
-ram_addr_t ptr_ra;
 hwaddr ptr_paddr, tag_paddr, xlat;
 MemoryRegion *mr;
 ARMASIdx tag_asi;
@@ -124,30 +123,12 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int 
ptr_mmu_idx,
  * valid.  Indicate to probe_access_flags no-fault, then assert that
  * we received a valid page.
  */
-flags = probe_access_flags(env, ptr, ptr_access, ptr_mmu_idx,
-   ra == 0, , ra);
+flags = probe_access_full(env, ptr, ptr_access, ptr_mmu_idx,
+  ra == 0, , , ra);
 assert(!(flags & TLB_INVALID_MASK));
 
-/*
- * Find the CPUTLBEntryFull for ptr.  This *must* be present in the TLB
- * because we just found the mapping.
- * TODO: Perhaps there should be a cputlb helper that returns a
- * matching tlb entry + iotlb entry.
- */
-index = tlb_index(env, ptr_mmu_idx, ptr);
-# ifdef CONFIG_DEBUG_TCG
-{
-CPUTLBEntry *entry = tlb_entry(env, ptr_mmu_idx, ptr);
-target_ulong comparator = (ptr_access == MMU_DATA_LOAD
-   ? entry->addr_read
-   : tlb_addr_write(entry));
-g_assert(tlb_hit(comparator, ptr));
-}
-# endif
-full = _tlb(env)->d[ptr_mmu_idx].fulltlb[index];
-
 /* If the virtual page MemAttr != Tagged, access unchecked. */
-if (!arm_tlb_mte_tagged(>attrs)) {
+if (full->pte_attrs != 0xf0) {
 return NULL;
 }
 
@@ -162,6 +143,13 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int 
ptr_mmu_idx,
 return NULL;
 }
 
+/*
+ * Remember these values across the second lookup below,
+ * which may invalidate this pointer via tlb resize.
+ */
+ptr_paddr = full->phys_addr;
+attrs = full->attrs;
+
 /*
  * The Normal memory access can extend to the next page.  E.g. a single
  * 8-byte access to the last byte of a page will check only the last
@@ -170,9 +158,8 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int 
ptr_mmu_idx,
  */
 in_page = -(ptr | TARGET_PAGE_MASK);
 if (unlikely(ptr_size > in_page)) {
-void *ignore;
-flags |= probe_access_flags(env, ptr + in_page, ptr_access,
-ptr_mmu_idx, ra == 0, , ra);
+flags |= probe_access_full(env, ptr + in_page, ptr_access,
+   ptr_mmu_idx, ra == 0, , , ra);
 assert(!(flags & TLB_INVALID_MASK));
 }
 
@@ -180,33 +167,17 @@ static uint8_t *allocation_tag_mem(CPUARMState *env, int 
ptr_mmu_idx,
 if (unlikely(flags & TLB_WATCHPOINT)) {
 int wp = ptr_access == MMU_DATA_LOAD ? BP_MEM_READ : BP_MEM_WRITE;
 assert(ra != 0);
-cpu_check_watchpoint(env_cpu(env), ptr, ptr_size,
- full->attrs, wp, ra);
+cpu_check_watchpoint(env_cpu(env), ptr, ptr_size, attrs, wp, ra);
 }
 
-/*
- * Find the physical address within the normal mem space.
- * The memory region lookup must succeed because TLB_MMIO was
- * not set in the cputlb lookup above.
- */
-mr = 

[PATCH v3 34/42] target/arm: Move be test for regime into S1TranslateResult

2022-10-01 Thread Richard Henderson
Hoist this test out of arm_ld[lq]_ptw into S1_ptw_translate.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 14 ++
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 99ad894180..d356b0b22d 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -222,6 +222,7 @@ static bool S2_attrs_are_device(uint64_t hcr, uint8_t attrs)
 
 typedef struct {
 bool is_secure;
+bool be;
 void *hphys;
 hwaddr gphys;
 } S1TranslateResult;
@@ -301,6 +302,7 @@ static bool S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx,
   !(pte_secure
 ? env->cp15.vstcr_el2 & VSTCR_SW
 : env->cp15.vtcr_el2 & VTCR_NSW));
+res->be = regime_translation_big_endian(env, mmu_idx);
 return true;
 
  fail:
@@ -320,7 +322,6 @@ static uint32_t arm_ldl_ptw(CPUARMState *env, hwaddr addr, 
bool is_secure,
 CPUState *cs = env_cpu(env);
 S1TranslateResult s1;
 uint32_t data;
-bool be;
 
 if (!S1_ptw_translate(env, mmu_idx, ptw_idx, addr, is_secure,
   debug, , fi)) {
@@ -329,10 +330,9 @@ static uint32_t arm_ldl_ptw(CPUARMState *env, hwaddr addr, 
bool is_secure,
 return 0;
 }
 
-be = regime_translation_big_endian(env, mmu_idx);
 if (likely(s1.hphys)) {
 /* Page tables are in RAM, and we have the host address. */
-if (be) {
+if (s1.be) {
 data = ldl_be_p(s1.hphys);
 } else {
 data = ldl_le_p(s1.hphys);
@@ -343,7 +343,7 @@ static uint32_t arm_ldl_ptw(CPUARMState *env, hwaddr addr, 
bool is_secure,
 AddressSpace *as = arm_addressspace(cs, attrs);
 MemTxResult result = MEMTX_OK;
 
-if (be) {
+if (s1.be) {
 data = address_space_ldl_be(as, s1.gphys, attrs, );
 } else {
 data = address_space_ldl_le(as, s1.gphys, attrs, );
@@ -364,7 +364,6 @@ static uint64_t arm_ldq_ptw(CPUARMState *env, hwaddr addr, 
bool is_secure,
 CPUState *cs = env_cpu(env);
 S1TranslateResult s1;
 uint64_t data;
-bool be;
 
 if (!S1_ptw_translate(env, mmu_idx, ptw_idx, addr, is_secure,
   debug, , fi)) {
@@ -373,10 +372,9 @@ static uint64_t arm_ldq_ptw(CPUARMState *env, hwaddr addr, 
bool is_secure,
 return 0;
 }
 
-be = regime_translation_big_endian(env, mmu_idx);
 if (likely(s1.hphys)) {
 /* Page tables are in RAM, and we have the host address. */
-if (be) {
+if (s1.be) {
 data = ldq_be_p(s1.hphys);
 } else {
 data = ldq_le_p(s1.hphys);
@@ -387,7 +385,7 @@ static uint64_t arm_ldq_ptw(CPUARMState *env, hwaddr addr, 
bool is_secure,
 AddressSpace *as = arm_addressspace(cs, attrs);
 MemTxResult result = MEMTX_OK;
 
-if (be) {
+if (s1.be) {
 data = address_space_ldq_be(as, s1.gphys, attrs, );
 } else {
 data = address_space_ldq_le(as, s1.gphys, attrs, );
-- 
2.34.1




[PATCH v3 26/42] target/arm: Plumb debug into S1_ptw_translate

2022-10-01 Thread Richard Henderson
Before using softmmu page tables for the ptw, plumb down
a debug parameter so that we can query page table entries
from gdbstub without modifying cpu state.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 66 +---
 1 file changed, 40 insertions(+), 26 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 05dcacf45b..45adb9d5a9 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -16,7 +16,7 @@
 
 static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
MMUAccessType access_type, ARMMMUIdx mmu_idx,
-   bool is_secure, bool s1_is_el0,
+   bool is_secure, bool s1_is_el0, bool debug,
GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
 __attribute__((nonnull));
 
@@ -212,7 +212,7 @@ static bool ptw_attrs_are_device(uint64_t hcr, 
ARMCacheAttrs cacheattrs)
 
 /* Translate a S1 pagetable walk through S2 if needed.  */
 static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
-   hwaddr addr, bool *is_secure_ptr,
+   hwaddr addr, bool *is_secure_ptr, bool debug,
ARMMMUFaultInfo *fi)
 {
 bool is_secure = *is_secure_ptr;
@@ -225,7 +225,7 @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 int ret;
 
 ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, s2_mmu_idx,
- is_secure, false, , fi);
+ is_secure, false, debug, , fi);
 if (ret) {
 assert(fi->type != ARMFault_None);
 fi->s2addr = addr;
@@ -268,7 +268,7 @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 
 /* All loads done in the course of a page table walk go through here. */
 static uint32_t arm_ldl_ptw(CPUARMState *env, hwaddr addr, bool is_secure,
-ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
+ARMMMUIdx mmu_idx, bool debug, ARMMMUFaultInfo *fi)
 {
 CPUState *cs = env_cpu(env);
 MemTxAttrs attrs = {};
@@ -276,7 +276,7 @@ static uint32_t arm_ldl_ptw(CPUARMState *env, hwaddr addr, 
bool is_secure,
 AddressSpace *as;
 uint32_t data;
 
-addr = S1_ptw_translate(env, mmu_idx, addr, _secure, fi);
+addr = S1_ptw_translate(env, mmu_idx, addr, _secure, debug, fi);
 attrs.secure = is_secure;
 as = arm_addressspace(cs, attrs);
 if (fi->s1ptw) {
@@ -296,7 +296,7 @@ static uint32_t arm_ldl_ptw(CPUARMState *env, hwaddr addr, 
bool is_secure,
 }
 
 static uint64_t arm_ldq_ptw(CPUARMState *env, hwaddr addr, bool is_secure,
-ARMMMUIdx mmu_idx, ARMMMUFaultInfo *fi)
+ARMMMUIdx mmu_idx, bool debug, ARMMMUFaultInfo *fi)
 {
 CPUState *cs = env_cpu(env);
 MemTxAttrs attrs = {};
@@ -304,7 +304,7 @@ static uint64_t arm_ldq_ptw(CPUARMState *env, hwaddr addr, 
bool is_secure,
 AddressSpace *as;
 uint64_t data;
 
-addr = S1_ptw_translate(env, mmu_idx, addr, _secure, fi);
+addr = S1_ptw_translate(env, mmu_idx, addr, _secure, debug, fi);
 attrs.secure = is_secure;
 as = arm_addressspace(cs, attrs);
 if (fi->s1ptw) {
@@ -433,8 +433,8 @@ static int simple_ap_to_rw_prot(CPUARMState *env, ARMMMUIdx 
mmu_idx, int ap)
 
 static bool get_phys_addr_v5(CPUARMState *env, uint32_t address,
  MMUAccessType access_type, ARMMMUIdx mmu_idx,
- bool is_secure, GetPhysAddrResult *result,
- ARMMMUFaultInfo *fi)
+ bool is_secure, bool debug,
+ GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
 {
 int level = 1;
 uint32_t table;
@@ -453,7 +453,7 @@ static bool get_phys_addr_v5(CPUARMState *env, uint32_t 
address,
 fi->type = ARMFault_Translation;
 goto do_fault;
 }
-desc = arm_ldl_ptw(env, table, is_secure, mmu_idx, fi);
+desc = arm_ldl_ptw(env, table, is_secure, mmu_idx, debug, fi);
 if (fi->type != ARMFault_None) {
 goto do_fault;
 }
@@ -491,7 +491,7 @@ static bool get_phys_addr_v5(CPUARMState *env, uint32_t 
address,
 /* Fine pagetable.  */
 table = (desc & 0xf000) | ((address >> 8) & 0xffc);
 }
-desc = arm_ldl_ptw(env, table, is_secure, mmu_idx, fi);
+desc = arm_ldl_ptw(env, table, is_secure, mmu_idx, debug, fi);
 if (fi->type != ARMFault_None) {
 goto do_fault;
 }
@@ -552,8 +552,8 @@ do_fault:
 
 static bool get_phys_addr_v6(CPUARMState *env, uint32_t address,
  MMUAccessType access_type, ARMMMUIdx mmu_idx,
- bool is_secure, GetPhysAddrResult *result,
- ARMMMUFaultInfo *fi)
+ bool is_secure, bool debug,

[PATCH v3 21/42] target/arm: Enable TARGET_PAGE_ENTRY_EXTRA

2022-10-01 Thread Richard Henderson
Copy attrs and sharability, into the TLB.  This will eventually
be used by S1_ptw_translate to report stage1 translation failures,
and by do_ats_write to fill in PAR_EL1.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu-param.h  | 8 
 target/arm/tlb_helper.c | 3 +++
 2 files changed, 11 insertions(+)

diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index 08681828ac..118ca0e5c0 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -30,6 +30,14 @@
  */
 # define TARGET_PAGE_BITS_VARY
 # define TARGET_PAGE_BITS_MIN  10
+
+/*
+ * Cache the attrs and sharability fields from the page table entry.
+ */
+# define TARGET_PAGE_ENTRY_EXTRA  \
+ uint8_t pte_attrs;   \
+ uint8_t shareability;
+
 #endif
 
 #define NB_MMU_MODES 8
diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
index 49601394ec..353edbeb1d 100644
--- a/target/arm/tlb_helper.c
+++ b/target/arm/tlb_helper.c
@@ -236,6 +236,9 @@ bool arm_cpu_tlb_fill(CPUState *cs, vaddr address, int size,
 arm_tlb_mte_tagged() = true;
 }
 
+res.f.pte_attrs = res.cacheattrs.attrs;
+res.f.shareability = res.cacheattrs.shareability;
+
 tlb_set_page_full(cs, mmu_idx, address, );
 return true;
 } else if (probe) {
-- 
2.34.1




[PATCH v3 28/42] target/arm: Split out get_phys_addr_twostage

2022-10-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 196 +--
 1 file changed, 106 insertions(+), 90 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index ba496c3421..3f5733a237 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -21,6 +21,15 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
 __attribute__((nonnull));
 
+static bool get_phys_addr_with_secure_debug(CPUARMState *env,
+target_ulong address,
+MMUAccessType access_type,
+ARMMMUIdx mmu_idx,
+bool is_secure, bool debug,
+GetPhysAddrResult *result,
+ARMMMUFaultInfo *fi)
+__attribute__((nonnull));
+
 /* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
 static const uint8_t pamax_map[] = {
 [0] = 32,
@@ -2426,6 +2435,98 @@ static bool get_phys_addr_disabled(CPUARMState *env, 
target_ulong address,
 return 0;
 }
 
+static bool get_phys_addr_twostage(CPUARMState *env, target_ulong address,
+   MMUAccessType access_type,
+   ARMMMUIdx s1_mmu_idx,
+   bool is_secure, bool debug,
+   GetPhysAddrResult *result,
+   ARMMMUFaultInfo *fi)
+{
+hwaddr ipa;
+int s1_prot;
+int ret;
+bool ipa_secure, s2walk_secure;
+ARMCacheAttrs cacheattrs1;
+ARMMMUIdx s2_mmu_idx;
+bool is_el0;
+uint64_t hcr;
+
+ret = get_phys_addr_with_secure_debug(env, address, access_type,
+  s1_mmu_idx, is_secure, debug,
+  result, fi);
+
+/* If S1 fails or S2 is disabled, return early.  */
+if (ret || regime_translation_disabled(env, ARMMMUIdx_Stage2, is_secure)) {
+return ret;
+}
+
+ipa = result->f.phys_addr;
+ipa_secure = result->f.attrs.secure;
+if (is_secure) {
+/* Select TCR based on the NS bit from the S1 walk. */
+s2walk_secure = !(ipa_secure
+  ? env->cp15.vstcr_el2 & VSTCR_SW
+  : env->cp15.vtcr_el2 & VTCR_NSW);
+} else {
+assert(!ipa_secure);
+s2walk_secure = false;
+}
+
+s2_mmu_idx = (s2walk_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2);
+is_el0 = s1_mmu_idx == ARMMMUIdx_Stage1_E0;
+
+/*
+ * S1 is done, now do S2 translation.
+ * Save the stage1 results so that we may merge prot and cacheattrs later.
+ */
+s1_prot = result->f.prot;
+cacheattrs1 = result->cacheattrs;
+memset(result, 0, sizeof(*result));
+
+ret = get_phys_addr_lpae(env, ipa, access_type, s2_mmu_idx,
+ s2walk_secure, is_el0, debug, result, fi);
+fi->s2addr = ipa;
+
+/* Combine the S1 and S2 perms.  */
+result->f.prot &= s1_prot;
+
+/* If S2 fails, return early.  */
+if (ret) {
+return ret;
+}
+
+/* Combine the S1 and S2 cache attributes. */
+hcr = arm_hcr_el2_eff_secstate(env, is_secure);
+if (hcr & HCR_DC) {
+/*
+ * HCR.DC forces the first stage attributes to
+ *  Normal Non-Shareable,
+ *  Inner Write-Back Read-Allocate Write-Allocate,
+ *  Outer Write-Back Read-Allocate Write-Allocate.
+ * Do not overwrite Tagged within attrs.
+ */
+if (cacheattrs1.attrs != 0xf0) {
+cacheattrs1.attrs = 0xff;
+}
+cacheattrs1.shareability = 0;
+}
+result->cacheattrs = combine_cacheattrs(hcr, cacheattrs1,
+result->cacheattrs);
+
+/* Check if IPA translates to secure or non-secure PA space. */
+if (is_secure) {
+if (ipa_secure) {
+result->f.attrs.secure =
+!(env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW));
+} else {
+result->f.attrs.secure =
+!((env->cp15.vtcr_el2 & (VTCR_NSA | VTCR_NSW))
+|| (env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW)));
+}
+}
+return 0;
+}
+
 static bool get_phys_addr_with_secure_debug(CPUARMState *env,
 target_ulong address,
 MMUAccessType access_type,
@@ -2442,97 +2543,12 @@ static bool get_phys_addr_with_secure_debug(CPUARMState 
*env,
  * translations if mmu_idx is a two-stage regime.
  */
 if (arm_feature(env, ARM_FEATURE_EL2)) {
-hwaddr ipa;
-int s1_prot;
-int ret;
-bool ipa_secure, s2walk_secure;
-ARMCacheAttrs cacheattrs1;
- 

[PATCH v3 19/42] target/arm: Fix cacheattr in get_phys_addr_disabled

2022-10-01 Thread Richard Henderson
Do not apply memattr or shareability for Stage2 translations.
Make sure to apply HCR_{DC,DCT} only to Regime_EL10, per the
pseudocode in AArch64.S1DisabledOutput.

Signed-off-by: Richard Henderson 
---
v3: Do not use a switch or a goto.
---
 target/arm/ptw.c | 48 +---
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index e494a9de67..8d27a98a42 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -2282,11 +2282,12 @@ static bool get_phys_addr_disabled(CPUARMState *env, 
target_ulong address,
GetPhysAddrResult *result,
ARMMMUFaultInfo *fi)
 {
-uint64_t hcr;
-uint8_t memattr;
+uint8_t memattr = 0x00;/* Device nGnRnE */
+uint8_t shareability = 0;  /* non-sharable */
 
 if (mmu_idx != ARMMMUIdx_Stage2 && mmu_idx != ARMMMUIdx_Stage2_S) {
 int r_el = regime_el(env, mmu_idx);
+
 if (arm_el_is_aa64(env, r_el)) {
 int pamax = arm_pamax(env_archcpu(env));
 uint64_t tcr = env->cp15.tcr_el[r_el];
@@ -2314,32 +2315,33 @@ static bool get_phys_addr_disabled(CPUARMState *env, 
target_ulong address,
  */
 address = extract64(address, 0, 52);
 }
+
+/* Fill in cacheattr a-la AArch64.TranslateAddressS1Off. */
+if (r_el == 1) {
+uint64_t hcr = arm_hcr_el2_eff_secstate(env, is_secure);
+if (hcr & HCR_DC) {
+if (hcr & HCR_DCT) {
+memattr = 0xf0;  /* Tagged, Normal, WB, RWA */
+} else {
+memattr = 0xff;  /* Normal, WB, RWA */
+}
+}
+}
+if (memattr == 0 && access_type == MMU_INST_FETCH) {
+if (regime_sctlr(env, mmu_idx) & SCTLR_I) {
+memattr = 0xee;  /* Normal, WT, RA, NT */
+} else {
+memattr = 0x44;  /* Normal, NC, No */
+}
+shareability = 2; /* outer sharable */
+}
+result->cacheattrs.is_s2_format = false;
 }
 
 result->phys = address;
 result->prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
 result->page_size = TARGET_PAGE_SIZE;
-
-/* Fill in cacheattr a-la AArch64.TranslateAddressS1Off. */
-hcr = arm_hcr_el2_eff_secstate(env, is_secure);
-result->cacheattrs.shareability = 0;
-result->cacheattrs.is_s2_format = false;
-if (hcr & HCR_DC) {
-if (hcr & HCR_DCT) {
-memattr = 0xf0;  /* Tagged, Normal, WB, RWA */
-} else {
-memattr = 0xff;  /* Normal, WB, RWA */
-}
-} else if (access_type == MMU_INST_FETCH) {
-if (regime_sctlr(env, mmu_idx) & SCTLR_I) {
-memattr = 0xee;  /* Normal, WT, RA, NT */
-} else {
-memattr = 0x44;  /* Normal, NC, No */
-}
-result->cacheattrs.shareability = 2; /* outer sharable */
-} else {
-memattr = 0x00;  /* Device, nGnRnE */
-}
+result->cacheattrs.shareability = shareability;
 result->cacheattrs.attrs = memattr;
 return 0;
 }
-- 
2.34.1




[PATCH v3 23/42] target/arm: Use probe_access_full for BTI

2022-10-01 Thread Richard Henderson
Add a field to TARGET_PAGE_ENTRY_EXTRA to hold the guarded bit.
In is_guarded_page, use probe_access_full instead of just guessing
that the tlb entry is still present.  Also handles the FIXME about
executing from device memory.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu-param.h |  8 
 target/arm/cpu.h   | 13 -
 target/arm/internals.h |  1 +
 target/arm/ptw.c   |  7 ---
 target/arm/translate-a64.c | 22 --
 5 files changed, 17 insertions(+), 34 deletions(-)

diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index 118ca0e5c0..689a9645dc 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -32,12 +32,12 @@
 # define TARGET_PAGE_BITS_MIN  10
 
 /*
- * Cache the attrs and sharability fields from the page table entry.
+ * Cache the attrs, sharability, and gp fields from the page table entry.
  */
 # define TARGET_PAGE_ENTRY_EXTRA  \
- uint8_t pte_attrs;   \
- uint8_t shareability;
-
+uint8_t pte_attrs;\
+uint8_t shareability; \
+bool guarded;
 #endif
 
 #define NB_MMU_MODES 8
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 2694a93894..c8cad2ef7c 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3382,19 +3382,6 @@ static inline uint64_t *aa64_vfp_qreg(CPUARMState *env, 
unsigned regno)
 /* Shared between translate-sve.c and sve_helper.c.  */
 extern const uint64_t pred_esz_masks[5];
 
-/* Helper for the macros below, validating the argument type. */
-static inline MemTxAttrs *typecheck_memtxattrs(MemTxAttrs *x)
-{
-return x;
-}
-
-/*
- * Lvalue macros for ARM TLB bits that we must cache in the TCG TLB.
- * Using these should be a bit more self-documenting than using the
- * generic target bits directly.
- */
-#define arm_tlb_bti_gp(x) (typecheck_memtxattrs(x)->target_tlb_bit0)
-
 /*
  * AArch64 usage of the PAGE_TARGET_* bits for linux-user.
  * Note that with the Linux kernel, PROT_MTE may not be cleared by mprotect
diff --git a/target/arm/internals.h b/target/arm/internals.h
index fd17aee459..a50189e2e4 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1067,6 +1067,7 @@ typedef struct ARMCacheAttrs {
 unsigned int attrs:8;
 unsigned int shareability:2; /* as in the SH field of the VMSAv8-64 PTEs */
 bool is_s2_format:1;
+bool guarded:1;  /* guarded bit of the v8-64 PTE */
 } ARMCacheAttrs;
 
 /* Fields that are valid upon success. */
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 1bc194ffa1..ccfef2caca 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -1319,9 +1319,10 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
uint64_t address,
  */
 result->f.attrs.secure = false;
 }
-/* When in aarch64 mode, and BTI is enabled, remember GP in the IOTLB.  */
-if (aarch64 && guarded && cpu_isar_feature(aa64_bti, cpu)) {
-arm_tlb_bti_gp(>f.attrs) = true;
+
+/* When in aarch64 mode, and BTI is enabled, remember GP in the TLB.  */
+if (aarch64 && cpu_isar_feature(aa64_bti, cpu)) {
+result->f.guarded = guarded;
 }
 
 if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 5b67375f4e..22802d1d2f 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -14601,22 +14601,16 @@ static bool is_guarded_page(CPUARMState *env, 
DisasContext *s)
 #ifdef CONFIG_USER_ONLY
 return page_get_flags(addr) & PAGE_BTI;
 #else
+CPUTLBEntryFull *full;
+void *host;
 int mmu_idx = arm_to_core_mmu_idx(s->mmu_idx);
-unsigned int index = tlb_index(env, mmu_idx, addr);
-CPUTLBEntry *entry = tlb_entry(env, mmu_idx, addr);
+int flags;
 
-/*
- * We test this immediately after reading an insn, which means
- * that any normal page must be in the TLB.  The only exception
- * would be for executing from flash or device memory, which
- * does not retain the TLB entry.
- *
- * FIXME: Assume false for those, for now.  We could use
- * arm_cpu_get_phys_page_attrs_debug to re-read the page
- * table entry even for that case.
- */
-return (tlb_hit(entry->addr_code, addr) &&
-arm_tlb_bti_gp(_tlb(env)->d[mmu_idx].fulltlb[index].attrs));
+flags = probe_access_full(env, addr, MMU_INST_FETCH, mmu_idx,
+  false, , , 0);
+assert(!(flags & TLB_INVALID_MASK));
+
+return full->guarded;
 #endif
 }
 
-- 
2.34.1




[PATCH v3 25/42] target/arm: Move ARMMMUIdx_Stage2 to a real tlb mmu_idx

2022-10-01 Thread Richard Henderson
We had been marking this ARM_MMU_IDX_NOTLB, move it to a real tlb.
Flush the tlb when invalidating stage 1+2 translations.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu-param.h |  2 +-
 target/arm/cpu.h   | 23 +--
 target/arm/helper.c|  4 +++-
 3 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index 98bd9e435e..283618f601 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -40,6 +40,6 @@
 bool guarded;
 #endif
 
-#define NB_MMU_MODES 10
+#define NB_MMU_MODES 12
 
 #endif
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 0effa85c56..732c0c00ac 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2900,8 +2900,9 @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  * EL2 (aka NS PL2)
  * EL3 (aka S PL1)
  * Physical (NS & S)
+ * Stage2 (NS & S)
  *
- * for a total of 10 different mmu_idx.
+ * for a total of 12 different mmu_idx.
  *
  * R profile CPUs have an MPU, but can use the same set of MMU indexes
  * as A profile. They only need to distinguish EL0 and EL1 (and
@@ -2970,6 +2971,15 @@ typedef enum ARMMMUIdx {
 ARMMMUIdx_Phys_NS   = 8 | ARM_MMU_IDX_A,
 ARMMMUIdx_Phys_S= 9 | ARM_MMU_IDX_A,
 
+/*
+ * Used for second stage of an S12 page table walk, or for descriptor
+ * loads during first stage of an S1 page table walk.  Note that both
+ * are in use simultaneously for SecureEL2: the security state for
+ * the S2 ptw is selected by the NS bit from the S1 ptw.
+ */
+ARMMMUIdx_Stage2= 10 | ARM_MMU_IDX_A,
+ARMMMUIdx_Stage2_S  = 11 | ARM_MMU_IDX_A,
+
 /*
  * These are not allocated TLBs and are used only for AT system
  * instructions or for the first stage of an S12 page table walk.
@@ -2977,15 +2987,6 @@ typedef enum ARMMMUIdx {
 ARMMMUIdx_Stage1_E0 = 0 | ARM_MMU_IDX_NOTLB,
 ARMMMUIdx_Stage1_E1 = 1 | ARM_MMU_IDX_NOTLB,
 ARMMMUIdx_Stage1_E1_PAN = 2 | ARM_MMU_IDX_NOTLB,
-/*
- * Not allocated a TLB: used only for second stage of an S12 page
- * table walk, or for descriptor loads during first stage of an S1
- * page table walk. Note that if we ever want to have a TLB for this
- * then various TLB flush insns which currently are no-ops or flush
- * only stage 1 MMU indexes will need to change to flush stage 2.
- */
-ARMMMUIdx_Stage2 = 3 | ARM_MMU_IDX_NOTLB,
-ARMMMUIdx_Stage2_S   = 4 | ARM_MMU_IDX_NOTLB,
 
 /*
  * M-profile.
@@ -3016,6 +3017,8 @@ typedef enum ARMMMUIdxBit {
 TO_CORE_BIT(E20_2),
 TO_CORE_BIT(E20_2_PAN),
 TO_CORE_BIT(E3),
+TO_CORE_BIT(Stage2),
+TO_CORE_BIT(Stage2_S),
 
 TO_CORE_BIT(MUser),
 TO_CORE_BIT(MPriv),
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 6fe85c6642..19a03eb200 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -4319,7 +4319,9 @@ static int alle1_tlbmask(CPUARMState *env)
  */
 return (ARMMMUIdxBit_E10_1 |
 ARMMMUIdxBit_E10_1_PAN |
-ARMMMUIdxBit_E10_0);
+ARMMMUIdxBit_E10_0 |
+ARMMMUIdxBit_Stage2 |
+ARMMMUIdxBit_Stage2_S);
 }
 
 static int e2_tlbmask(CPUARMState *env)
-- 
2.34.1




[PATCH v3 17/42] target/arm: Fix ATS12NSO* from S PL1

2022-10-01 Thread Richard Henderson
Use arm_hcr_el2_eff_secstate instead of arm_hcr_el2_eff, so
that we use is_secure instead of the current security state.
These AT* operations have been broken since arm_hcr_el2_eff
gained a check for "el2 enabled" for Secure EL2.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index a0dce9c313..7bf79779da 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -146,7 +146,7 @@ static bool regime_translation_disabled(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 }
 }
 
-hcr_el2 = arm_hcr_el2_eff(env);
+hcr_el2 = arm_hcr_el2_eff_secstate(env, is_secure);
 
 switch (mmu_idx) {
 case ARMMMUIdx_Stage2:
@@ -230,7 +230,7 @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 return ~0;
 }
 
-hcr = arm_hcr_el2_eff(env);
+hcr = arm_hcr_el2_eff_secstate(env, is_secure);
 if ((hcr & HCR_PTW) && ptw_attrs_are_device(hcr, s2.cacheattrs)) {
 /*
  * PTW set and S1 walk touched S2 Device memory:
@@ -2341,7 +2341,7 @@ bool get_phys_addr_with_secure(CPUARMState *env, 
target_ulong address,
 }
 
 /* Combine the S1 and S2 cache attributes. */
-hcr = arm_hcr_el2_eff(env);
+hcr = arm_hcr_el2_eff_secstate(env, is_secure);
 if (hcr & HCR_DC) {
 /*
  * HCR.DC forces the first stage attributes to
@@ -2474,7 +2474,7 @@ bool get_phys_addr_with_secure(CPUARMState *env, 
target_ulong address,
 result->page_size = TARGET_PAGE_SIZE;
 
 /* Fill in cacheattr a-la AArch64.TranslateAddressS1Off. */
-hcr = arm_hcr_el2_eff(env);
+hcr = arm_hcr_el2_eff_secstate(env, is_secure);
 result->cacheattrs.shareability = 0;
 result->cacheattrs.is_s2_format = false;
 if (hcr & HCR_DC) {
-- 
2.34.1




[PATCH v3 20/42] target/arm: Use tlb_set_page_full

2022-10-01 Thread Richard Henderson
Adjust GetPhysAddrResult to fill in CPUTLBEntryFull,
so that it may be passed directly to tlb_set_page_full.

The change is large, but mostly mechanical.  The major
non-mechanical change is page_size -> lg_page_size.
Most of the time this is obvious, and is related to
TARGET_PAGE_BITS.

Signed-off-by: Richard Henderson 
---
 target/arm/internals.h  |   5 +-
 target/arm/helper.c |  12 +--
 target/arm/m_helper.c   |  20 ++---
 target/arm/ptw.c| 181 
 target/arm/tlb_helper.c |   9 +-
 5 files changed, 112 insertions(+), 115 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index b509d70851..fd17aee459 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1071,10 +1071,7 @@ typedef struct ARMCacheAttrs {
 
 /* Fields that are valid upon success. */
 typedef struct GetPhysAddrResult {
-hwaddr phys;
-target_ulong page_size;
-int prot;
-MemTxAttrs attrs;
+CPUTLBEntryFull f;
 ARMCacheAttrs cacheattrs;
 } GetPhysAddrResult;
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 4eec22b1f8..6fe85c6642 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -3320,8 +3320,8 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t 
value,
 /* Create a 64-bit PAR */
 par64 = (1 << 11); /* LPAE bit always set */
 if (!ret) {
-par64 |= res.phys & ~0xfffULL;
-if (!res.attrs.secure) {
+par64 |= res.f.phys_addr & ~0xfffULL;
+if (!res.f.attrs.secure) {
 par64 |= (1 << 9); /* NS */
 }
 par64 |= (uint64_t)res.cacheattrs.attrs << 56; /* ATTR */
@@ -3345,13 +3345,13 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t 
value,
  */
 if (!ret) {
 /* We do not set any attribute bits in the PAR */
-if (res.page_size == (1 << 24)
+if (res.f.lg_page_size == 24
 && arm_feature(env, ARM_FEATURE_V7)) {
-par64 = (res.phys & 0xff00) | (1 << 1);
+par64 = (res.f.phys_addr & 0xff00) | (1 << 1);
 } else {
-par64 = res.phys & 0xf000;
+par64 = res.f.phys_addr & 0xf000;
 }
-if (!res.attrs.secure) {
+if (!res.f.attrs.secure) {
 par64 |= (1 << 9); /* NS */
 }
 } else {
diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index 203ba411f6..355cd4d60a 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -223,8 +223,8 @@ static bool v7m_stack_write(ARMCPU *cpu, uint32_t addr, 
uint32_t value,
 }
 goto pend_fault;
 }
-address_space_stl_le(arm_addressspace(cs, res.attrs), res.phys, value,
- res.attrs, );
+address_space_stl_le(arm_addressspace(cs, res.f.attrs), res.f.phys_addr,
+ value, res.f.attrs, );
 if (txres != MEMTX_OK) {
 /* BusFault trying to write the data */
 if (mode == STACK_LAZYFP) {
@@ -298,8 +298,8 @@ static bool v7m_stack_read(ARMCPU *cpu, uint32_t *dest, 
uint32_t addr,
 goto pend_fault;
 }
 
-value = address_space_ldl(arm_addressspace(cs, res.attrs), res.phys,
-  res.attrs, );
+value = address_space_ldl(arm_addressspace(cs, res.f.attrs),
+  res.f.phys_addr, res.f.attrs, );
 if (txres != MEMTX_OK) {
 /* BusFault trying to read the data */
 qemu_log_mask(CPU_LOG_INT, "...BusFault with BFSR.UNSTKERR\n");
@@ -2022,8 +2022,8 @@ static bool v7m_read_half_insn(ARMCPU *cpu, ARMMMUIdx 
mmu_idx, bool secure,
 qemu_log_mask(CPU_LOG_INT, "...really MemManage with CFSR.IACCVIOL\n");
 return false;
 }
-*insn = address_space_lduw_le(arm_addressspace(cs, res.attrs), res.phys,
-  res.attrs, );
+*insn = address_space_lduw_le(arm_addressspace(cs, res.f.attrs),
+  res.f.phys_addr, res.f.attrs, );
 if (txres != MEMTX_OK) {
 env->v7m.cfsr[M_REG_NS] |= R_V7M_CFSR_IBUSERR_MASK;
 armv7m_nvic_set_pending(env->nvic, ARMV7M_EXCP_BUS, false);
@@ -2069,8 +2069,8 @@ static bool v7m_read_sg_stack_word(ARMCPU *cpu, ARMMMUIdx 
mmu_idx,
 }
 return false;
 }
-value = address_space_ldl(arm_addressspace(cs, res.attrs), res.phys,
-  res.attrs, );
+value = address_space_ldl(arm_addressspace(cs, res.f.attrs),
+  res.f.phys_addr, res.f.attrs, );
 if (txres != MEMTX_OK) {
 /* BusFault trying to read the data */
 qemu_log_mask(CPU_LOG_INT,
@@ -2817,8 +2817,8 @@ uint32_t HELPER(v7m_tt)(CPUARMState *env, uint32_t addr, 
uint32_t op)
 } else {
 mrvalid = true;
 }
-r = res.prot & PAGE_READ;
-rw = res.prot & PAGE_WRITE;
+r = res.f.prot & 

[PATCH v3 36/42] target/arm: Add ARMFault_UnsuppAtomicUpdate

2022-10-01 Thread Richard Henderson
This fault type is to be used with FEAT_HAFDBS when
the guest enables hw updates, but places the tables
in memory where atomic updates are unsupported.

Signed-off-by: Richard Henderson 
---
 target/arm/internals.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index e95b6b1b8f..4a2b1ec31c 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -338,6 +338,7 @@ typedef enum ARMFaultType {
 ARMFault_AsyncExternal,
 ARMFault_Debug,
 ARMFault_TLBConflict,
+ARMFault_UnsuppAtomicUpdate,
 ARMFault_Lockdown,
 ARMFault_Exclusive,
 ARMFault_ICacheMaint,
@@ -524,6 +525,9 @@ static inline uint32_t arm_fi_to_lfsc(ARMMMUFaultInfo *fi)
 case ARMFault_TLBConflict:
 fsc = 0x30;
 break;
+case ARMFault_UnsuppAtomicUpdate:
+fsc = 0x31;
+break;
 case ARMFault_Lockdown:
 fsc = 0x34;
 break;
-- 
2.34.1




[PATCH v3 15/42] target/arm: Remove env argument from combined_attrs_fwb

2022-10-01 Thread Richard Henderson
This value is unused.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index cb072792a2..2f0161 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -2172,8 +2172,7 @@ static uint8_t force_cacheattr_nibble_wb(uint8_t attr)
  * s1 and s2 for the HCR_EL2.FWB == 1 case, returning the
  * combined attributes in MAIR_EL1 format.
  */
-static uint8_t combined_attrs_fwb(CPUARMState *env,
-  ARMCacheAttrs s1, ARMCacheAttrs s2)
+static uint8_t combined_attrs_fwb(ARMCacheAttrs s1, ARMCacheAttrs s2)
 {
 switch (s2.attrs) {
 case 7:
@@ -2246,7 +2245,7 @@ static ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
 
 /* Combine memory type and cacheability attributes */
 if (arm_hcr_el2_eff(env) & HCR_FWB) {
-ret.attrs = combined_attrs_fwb(env, s1, s2);
+ret.attrs = combined_attrs_fwb(s1, s2);
 } else {
 ret.attrs = combined_attrs_nofwb(env, s1, s2);
 }
-- 
2.34.1




[PATCH v3 18/42] target/arm: Split out get_phys_addr_disabled

2022-10-01 Thread Richard Henderson
Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 138 +--
 1 file changed, 74 insertions(+), 64 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 7bf79779da..e494a9de67 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -2272,6 +2272,78 @@ static ARMCacheAttrs combine_cacheattrs(uint64_t hcr,
 return ret;
 }
 
+/*
+ * MMU disabled.  S1 addresses within aa64 translation regimes are
+ * still checked for bounds -- see AArch64.S1DisabledOutput().
+ */
+static bool get_phys_addr_disabled(CPUARMState *env, target_ulong address,
+   MMUAccessType access_type,
+   ARMMMUIdx mmu_idx, bool is_secure,
+   GetPhysAddrResult *result,
+   ARMMMUFaultInfo *fi)
+{
+uint64_t hcr;
+uint8_t memattr;
+
+if (mmu_idx != ARMMMUIdx_Stage2 && mmu_idx != ARMMMUIdx_Stage2_S) {
+int r_el = regime_el(env, mmu_idx);
+if (arm_el_is_aa64(env, r_el)) {
+int pamax = arm_pamax(env_archcpu(env));
+uint64_t tcr = env->cp15.tcr_el[r_el];
+int addrtop, tbi;
+
+tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
+if (access_type == MMU_INST_FETCH) {
+tbi &= ~aa64_va_parameter_tbid(tcr, mmu_idx);
+}
+tbi = (tbi >> extract64(address, 55, 1)) & 1;
+addrtop = (tbi ? 55 : 63);
+
+if (extract64(address, pamax, addrtop - pamax + 1) != 0) {
+fi->type = ARMFault_AddressSize;
+fi->level = 0;
+fi->stage2 = false;
+return 1;
+}
+
+/*
+ * When TBI is disabled, we've just validated that all of the
+ * bits above PAMax are zero, so logically we only need to
+ * clear the top byte for TBI.  But it's clearer to follow
+ * the pseudocode set of addrdesc.paddress.
+ */
+address = extract64(address, 0, 52);
+}
+}
+
+result->phys = address;
+result->prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
+result->page_size = TARGET_PAGE_SIZE;
+
+/* Fill in cacheattr a-la AArch64.TranslateAddressS1Off. */
+hcr = arm_hcr_el2_eff_secstate(env, is_secure);
+result->cacheattrs.shareability = 0;
+result->cacheattrs.is_s2_format = false;
+if (hcr & HCR_DC) {
+if (hcr & HCR_DCT) {
+memattr = 0xf0;  /* Tagged, Normal, WB, RWA */
+} else {
+memattr = 0xff;  /* Normal, WB, RWA */
+}
+} else if (access_type == MMU_INST_FETCH) {
+if (regime_sctlr(env, mmu_idx) & SCTLR_I) {
+memattr = 0xee;  /* Normal, WT, RA, NT */
+} else {
+memattr = 0x44;  /* Normal, NC, No */
+}
+result->cacheattrs.shareability = 2; /* outer sharable */
+} else {
+memattr = 0x00;  /* Device, nGnRnE */
+}
+result->cacheattrs.attrs = memattr;
+return 0;
+}
+
 bool get_phys_addr_with_secure(CPUARMState *env, target_ulong address,
MMUAccessType access_type, ARMMMUIdx mmu_idx,
bool is_secure, GetPhysAddrResult *result,
@@ -2432,71 +2504,9 @@ bool get_phys_addr_with_secure(CPUARMState *env, 
target_ulong address,
 /* Definitely a real MMU, not an MPU */
 
 if (regime_translation_disabled(env, mmu_idx, is_secure)) {
-uint64_t hcr;
-uint8_t memattr;
-
-/*
- * MMU disabled.  S1 addresses within aa64 translation regimes are
- * still checked for bounds -- see AArch64.TranslateAddressS1Off.
- */
-if (mmu_idx != ARMMMUIdx_Stage2 && mmu_idx != ARMMMUIdx_Stage2_S) {
-int r_el = regime_el(env, mmu_idx);
-if (arm_el_is_aa64(env, r_el)) {
-int pamax = arm_pamax(env_archcpu(env));
-uint64_t tcr = env->cp15.tcr_el[r_el];
-int addrtop, tbi;
-
-tbi = aa64_va_parameter_tbi(tcr, mmu_idx);
-if (access_type == MMU_INST_FETCH) {
-tbi &= ~aa64_va_parameter_tbid(tcr, mmu_idx);
-}
-tbi = (tbi >> extract64(address, 55, 1)) & 1;
-addrtop = (tbi ? 55 : 63);
-
-if (extract64(address, pamax, addrtop - pamax + 1) != 0) {
-fi->type = ARMFault_AddressSize;
-fi->level = 0;
-fi->stage2 = false;
-return 1;
-}
-
-/*
- * When TBI is disabled, we've just validated that all of the
- * bits above PAMax are zero, so logically we only need to
- * clear the top byte for TBI.  But it's clearer to follow
- * the pseudocode set of addrdesc.paddress.
- */
-   

[PATCH v3 14/42] target/arm: Hoist read of *is_secure in S1_ptw_translate

2022-10-01 Thread Richard Henderson
Rename the argument to is_secure_ptr, and introduce a
local variable is_secure with the value.  We only write
back to the pointer toward the end of the function.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 1ea29bec58..cb072792a2 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -207,24 +207,25 @@ static bool ptw_attrs_are_device(CPUARMState *env, 
ARMCacheAttrs cacheattrs)
 
 /* Translate a S1 pagetable walk through S2 if needed.  */
 static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
-   hwaddr addr, bool *is_secure,
+   hwaddr addr, bool *is_secure_ptr,
ARMMMUFaultInfo *fi)
 {
-ARMMMUIdx s2_mmu_idx = *is_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
+bool is_secure = *is_secure_ptr;
+ARMMMUIdx s2_mmu_idx = is_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
 
 if (arm_mmu_idx_is_stage1_of_2(mmu_idx) &&
-!regime_translation_disabled(env, s2_mmu_idx, *is_secure)) {
+!regime_translation_disabled(env, s2_mmu_idx, is_secure)) {
 GetPhysAddrResult s2 = {};
 int ret;
 
 ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, s2_mmu_idx,
- *is_secure, false, , fi);
+ is_secure, false, , fi);
 if (ret) {
 assert(fi->type != ARMFault_None);
 fi->s2addr = addr;
 fi->stage2 = true;
 fi->s1ptw = true;
-fi->s1ns = !*is_secure;
+fi->s1ns = !is_secure;
 return ~0;
 }
 if ((arm_hcr_el2_eff(env) & HCR_PTW) &&
@@ -237,19 +238,20 @@ static hwaddr S1_ptw_translate(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 fi->s2addr = addr;
 fi->stage2 = true;
 fi->s1ptw = true;
-fi->s1ns = !*is_secure;
+fi->s1ns = !is_secure;
 return ~0;
 }
 
 if (arm_is_secure_below_el3(env)) {
 /* Check if page table walk is to secure or non-secure PA space. */
-if (*is_secure) {
-*is_secure = !(env->cp15.vstcr_el2 & VSTCR_SW);
+if (is_secure) {
+is_secure = !(env->cp15.vstcr_el2 & VSTCR_SW);
 } else {
-*is_secure = !(env->cp15.vtcr_el2 & VTCR_NSW);
+is_secure = !(env->cp15.vtcr_el2 & VTCR_NSW);
 }
+*is_secure_ptr = is_secure;
 } else {
-assert(!*is_secure);
+assert(!is_secure);
 }
 
 addr = s2.phys;
-- 
2.34.1




[PATCH v3 29/42] target/arm: Use bool consistently for get_phys_addr subroutines

2022-10-01 Thread Richard Henderson
The return type of the functions is already bool, but in a few
instances we used an integer type with the return statement.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 3f5733a237..445382ab03 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -2432,7 +2432,7 @@ static bool get_phys_addr_disabled(CPUARMState *env, 
target_ulong address,
 result->f.lg_page_size = TARGET_PAGE_BITS;
 result->cacheattrs.shareability = shareability;
 result->cacheattrs.attrs = memattr;
-return 0;
+return false;
 }
 
 static bool get_phys_addr_twostage(CPUARMState *env, target_ulong address,
@@ -2444,8 +2444,7 @@ static bool get_phys_addr_twostage(CPUARMState *env, 
target_ulong address,
 {
 hwaddr ipa;
 int s1_prot;
-int ret;
-bool ipa_secure, s2walk_secure;
+bool ret, ipa_secure, s2walk_secure;
 ARMCacheAttrs cacheattrs1;
 ARMMMUIdx s2_mmu_idx;
 bool is_el0;
@@ -2524,7 +2523,7 @@ static bool get_phys_addr_twostage(CPUARMState *env, 
target_ulong address,
 || (env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW)));
 }
 }
-return 0;
+return false;
 }
 
 static bool get_phys_addr_with_secure_debug(CPUARMState *env,
-- 
2.34.1




[PATCH v3 16/42] target/arm: Pass HCR to attribute subroutines.

2022-10-01 Thread Richard Henderson
These subroutines did not need ENV for anything except
retrieving the effective value of HCR anyway.

We have computed the effective value of HCR in the callers,
and this will be especially important for interpreting HCR
in a non-current security state.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 2f0161..a0dce9c313 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -186,7 +186,7 @@ static bool regime_translation_disabled(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
 }
 
-static bool ptw_attrs_are_device(CPUARMState *env, ARMCacheAttrs cacheattrs)
+static bool ptw_attrs_are_device(uint64_t hcr, ARMCacheAttrs cacheattrs)
 {
 /*
  * For an S1 page table walk, the stage 1 attributes are always
@@ -198,7 +198,7 @@ static bool ptw_attrs_are_device(CPUARMState *env, 
ARMCacheAttrs cacheattrs)
  * when cacheattrs.attrs bit [2] is 0.
  */
 assert(cacheattrs.is_s2_format);
-if (arm_hcr_el2_eff(env) & HCR_FWB) {
+if (hcr & HCR_FWB) {
 return (cacheattrs.attrs & 0x4) == 0;
 } else {
 return (cacheattrs.attrs & 0xc) == 0;
@@ -216,6 +216,7 @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 if (arm_mmu_idx_is_stage1_of_2(mmu_idx) &&
 !regime_translation_disabled(env, s2_mmu_idx, is_secure)) {
 GetPhysAddrResult s2 = {};
+uint64_t hcr;
 int ret;
 
 ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, s2_mmu_idx,
@@ -228,8 +229,9 @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 fi->s1ns = !is_secure;
 return ~0;
 }
-if ((arm_hcr_el2_eff(env) & HCR_PTW) &&
-ptw_attrs_are_device(env, s2.cacheattrs)) {
+
+hcr = arm_hcr_el2_eff(env);
+if ((hcr & HCR_PTW) && ptw_attrs_are_device(hcr, s2.cacheattrs)) {
 /*
  * PTW set and S1 walk touched S2 Device memory:
  * generate Permission fault.
@@ -2059,14 +2061,14 @@ static bool get_phys_addr_pmsav8(CPUARMState *env, 
uint32_t address,
  * ref: shared/translation/attrs/S2AttrDecode()
  *  .../S2ConvertAttrsHints()
  */
-static uint8_t convert_stage2_attrs(CPUARMState *env, uint8_t s2attrs)
+static uint8_t convert_stage2_attrs(uint64_t hcr, uint8_t s2attrs)
 {
 uint8_t hiattr = extract32(s2attrs, 2, 2);
 uint8_t loattr = extract32(s2attrs, 0, 2);
 uint8_t hihint = 0, lohint = 0;
 
 if (hiattr != 0) { /* normal memory */
-if (arm_hcr_el2_eff(env) & HCR_CD) { /* cache disabled */
+if (hcr & HCR_CD) { /* cache disabled */
 hiattr = loattr = 1; /* non-cacheable */
 } else {
 if (hiattr != 1) { /* Write-through or write-back */
@@ -2112,12 +2114,12 @@ static uint8_t combine_cacheattr_nibble(uint8_t s1, 
uint8_t s2)
  * s1 and s2 for the HCR_EL2.FWB == 0 case, returning the
  * combined attributes in MAIR_EL1 format.
  */
-static uint8_t combined_attrs_nofwb(CPUARMState *env,
+static uint8_t combined_attrs_nofwb(uint64_t hcr,
 ARMCacheAttrs s1, ARMCacheAttrs s2)
 {
 uint8_t s1lo, s2lo, s1hi, s2hi, s2_mair_attrs, ret_attrs;
 
-s2_mair_attrs = convert_stage2_attrs(env, s2.attrs);
+s2_mair_attrs = convert_stage2_attrs(hcr, s2.attrs);
 
 s1lo = extract32(s1.attrs, 0, 4);
 s2lo = extract32(s2_mair_attrs, 0, 4);
@@ -2217,7 +2219,7 @@ static uint8_t combined_attrs_fwb(ARMCacheAttrs s1, 
ARMCacheAttrs s2)
  * @s1:  Attributes from stage 1 walk
  * @s2:  Attributes from stage 2 walk
  */
-static ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
+static ARMCacheAttrs combine_cacheattrs(uint64_t hcr,
 ARMCacheAttrs s1, ARMCacheAttrs s2)
 {
 ARMCacheAttrs ret;
@@ -2244,10 +2246,10 @@ static ARMCacheAttrs combine_cacheattrs(CPUARMState 
*env,
 }
 
 /* Combine memory type and cacheability attributes */
-if (arm_hcr_el2_eff(env) & HCR_FWB) {
+if (hcr & HCR_FWB) {
 ret.attrs = combined_attrs_fwb(s1, s2);
 } else {
-ret.attrs = combined_attrs_nofwb(env, s1, s2);
+ret.attrs = combined_attrs_nofwb(hcr, s1, s2);
 }
 
 /*
@@ -2290,6 +2292,7 @@ bool get_phys_addr_with_secure(CPUARMState *env, 
target_ulong address,
 ARMCacheAttrs cacheattrs1;
 ARMMMUIdx s2_mmu_idx;
 bool is_el0;
+uint64_t hcr;
 
 ret = get_phys_addr_with_secure(env, address, access_type,
 s1_mmu_idx, is_secure, result, fi);
@@ -2338,7 +2341,8 @@ bool get_phys_addr_with_secure(CPUARMState *env, 
target_ulong address,
 }
 
 /* Combine the S1 and S2 cache attributes. */
-if (arm_hcr_el2_eff(env) & HCR_DC) {
+   

[PATCH v3 12/42] target/arm: Drop secure check for HCR.TGE vs SCTLR_EL1.M

2022-10-01 Thread Richard Henderson
The effect of TGE does not only apply to non-secure state,
now that Secure EL2 exists.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 2875ea881c..1ea29bec58 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -157,8 +157,8 @@ static bool regime_translation_disabled(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 case ARMMMUIdx_E10_0:
 case ARMMMUIdx_E10_1:
 case ARMMMUIdx_E10_1_PAN:
-/* TGE means that NS EL0/1 act as if SCTLR_EL1.M is zero */
-if (!is_secure && (hcr_el2 & HCR_TGE)) {
+/* TGE means that EL0/1 act as if SCTLR_EL1.M is zero */
+if (hcr_el2 & HCR_TGE) {
 return true;
 }
 break;
-- 
2.34.1




[PATCH v3 13/42] target/arm: Introduce arm_hcr_el2_eff_secstate

2022-10-01 Thread Richard Henderson
For page walking, we may require HCR for a security state
that is not "current".

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h| 20 +---
 target/arm/helper.c | 11 ---
 2 files changed, 21 insertions(+), 10 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 6475dc0cfd..0f82f4aa1d 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2406,15 +2406,15 @@ static inline bool arm_is_secure(CPUARMState *env)
  * Return true if the current security state has AArch64 EL2 or AArch32 Hyp.
  * This corresponds to the pseudocode EL2Enabled()
  */
+static inline bool arm_is_el2_enabled_secstate(CPUARMState *env, bool secure)
+{
+return arm_feature(env, ARM_FEATURE_EL2)
+   && (!secure || (env->cp15.scr_el3 & SCR_EEL2));
+}
+
 static inline bool arm_is_el2_enabled(CPUARMState *env)
 {
-if (arm_feature(env, ARM_FEATURE_EL2)) {
-if (arm_is_secure_below_el3(env)) {
-return (env->cp15.scr_el3 & SCR_EEL2) != 0;
-}
-return true;
-}
-return false;
+return arm_is_el2_enabled_secstate(env, arm_is_secure_below_el3(env));
 }
 
 #else
@@ -2428,6 +2428,11 @@ static inline bool arm_is_secure(CPUARMState *env)
 return false;
 }
 
+static inline bool arm_is_el2_enabled_secstate(CPUARMState *env, bool secure)
+{
+return false;
+}
+
 static inline bool arm_is_el2_enabled(CPUARMState *env)
 {
 return false;
@@ -2440,6 +2445,7 @@ static inline bool arm_is_el2_enabled(CPUARMState *env)
  * "for all purposes other than a direct read or write access of HCR_EL2."
  * Not included here is HCR_RW.
  */
+uint64_t arm_hcr_el2_eff_secstate(CPUARMState *env, bool secure);
 uint64_t arm_hcr_el2_eff(CPUARMState *env);
 uint64_t arm_hcrx_el2_eff(CPUARMState *env);
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 0fd0c73092..4eec22b1f8 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5216,15 +5216,15 @@ static void hcr_writelow(CPUARMState *env, const 
ARMCPRegInfo *ri,
 }
 
 /*
- * Return the effective value of HCR_EL2.
+ * Return the effective value of HCR_EL2, at the given security state.
  * Bits that are not included here:
  * RW   (read from SCR_EL3.RW as needed)
  */
-uint64_t arm_hcr_el2_eff(CPUARMState *env)
+uint64_t arm_hcr_el2_eff_secstate(CPUARMState *env, bool secure)
 {
 uint64_t ret = env->cp15.hcr_el2;
 
-if (!arm_is_el2_enabled(env)) {
+if (!arm_is_el2_enabled_secstate(env, secure)) {
 /*
  * "This register has no effect if EL2 is not enabled in the
  * current Security state".  This is ARMv8.4-SecEL2 speak for
@@ -5283,6 +5283,11 @@ uint64_t arm_hcr_el2_eff(CPUARMState *env)
 return ret;
 }
 
+uint64_t arm_hcr_el2_eff(CPUARMState *env)
+{
+return arm_hcr_el2_eff_secstate(env, arm_is_secure_below_el3(env));
+}
+
 /*
  * Corresponds to ARM pseudocode function ELIsInHost().
  */
-- 
2.34.1




[PATCH v3 09/42] target/arm: Add is_secure parameter to do_ats_write

2022-10-01 Thread Richard Henderson
Use get_phys_addr_with_secure directly.  For a-profile, this is the
one place where the value of is_secure may not equal arm_is_secure(env).

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/helper.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 772218f0d2..3adeb4cab4 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -3188,7 +3188,8 @@ static CPAccessResult ats_access(CPUARMState *env, const 
ARMCPRegInfo *ri,
 
 #ifdef CONFIG_TCG
 static uint64_t do_ats_write(CPUARMState *env, uint64_t value,
- MMUAccessType access_type, ARMMMUIdx mmu_idx)
+ MMUAccessType access_type, ARMMMUIdx mmu_idx,
+ bool is_secure)
 {
 bool ret;
 uint64_t par64;
@@ -3196,7 +3197,8 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t 
value,
 ARMMMUFaultInfo fi = {};
 GetPhysAddrResult res = {};
 
-ret = get_phys_addr(env, value, access_type, mmu_idx, , );
+ret = get_phys_addr_with_secure(env, value, access_type, mmu_idx,
+is_secure, , );
 
 /*
  * ATS operations only do S1 or S1+S2 translations, so we never
@@ -3368,6 +3370,7 @@ static void ats_write(CPUARMState *env, const 
ARMCPRegInfo *ri, uint64_t value)
 switch (el) {
 case 3:
 mmu_idx = ARMMMUIdx_SE3;
+secure = true;
 break;
 case 2:
 g_assert(!secure);  /* ARMv8.4-SecEL2 is 64-bit only */
@@ -3389,6 +3392,7 @@ static void ats_write(CPUARMState *env, const 
ARMCPRegInfo *ri, uint64_t value)
 switch (el) {
 case 3:
 mmu_idx = ARMMMUIdx_SE10_0;
+secure = true;
 break;
 case 2:
 g_assert(!secure);  /* ARMv8.4-SecEL2 is 64-bit only */
@@ -3404,16 +3408,18 @@ static void ats_write(CPUARMState *env, const 
ARMCPRegInfo *ri, uint64_t value)
 case 4:
 /* stage 1+2 NonSecure PL1: ATS12NSOPR, ATS12NSOPW */
 mmu_idx = ARMMMUIdx_E10_1;
+secure = false;
 break;
 case 6:
 /* stage 1+2 NonSecure PL0: ATS12NSOUR, ATS12NSOUW */
 mmu_idx = ARMMMUIdx_E10_0;
+secure = false;
 break;
 default:
 g_assert_not_reached();
 }
 
-par64 = do_ats_write(env, value, access_type, mmu_idx);
+par64 = do_ats_write(env, value, access_type, mmu_idx, secure);
 
 A32_BANKED_CURRENT_REG_SET(env, par, par64);
 #else
@@ -3429,7 +3435,8 @@ static void ats1h_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 MMUAccessType access_type = ri->opc2 & 1 ? MMU_DATA_STORE : MMU_DATA_LOAD;
 uint64_t par64;
 
-par64 = do_ats_write(env, value, access_type, ARMMMUIdx_E2);
+/* There is no SecureEL2 for AArch32. */
+par64 = do_ats_write(env, value, access_type, ARMMMUIdx_E2, false);
 
 A32_BANKED_CURRENT_REG_SET(env, par, par64);
 #else
@@ -3472,6 +3479,7 @@ static void ats_write64(CPUARMState *env, const 
ARMCPRegInfo *ri,
 break;
 case 6: /* AT S1E3R, AT S1E3W */
 mmu_idx = ARMMMUIdx_SE3;
+secure = true;
 break;
 default:
 g_assert_not_reached();
@@ -3490,7 +3498,8 @@ static void ats_write64(CPUARMState *env, const 
ARMCPRegInfo *ri,
 g_assert_not_reached();
 }
 
-env->cp15.par_el[1] = do_ats_write(env, value, access_type, mmu_idx);
+env->cp15.par_el[1] = do_ats_write(env, value, access_type,
+   mmu_idx, secure);
 #else
 /* Handled by hardware accelerator. */
 g_assert_not_reached();
-- 
2.34.1




[PATCH v3 11/42] target/arm: Reorg regime_translation_disabled

2022-10-01 Thread Richard Henderson
Use a switch on mmu_idx for the a-profile indexes, instead of
three different if's vs regime_el and arm_mmu_idx_is_stage1_of_2.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 32 +---
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 9be11f1673..2875ea881c 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -148,21 +148,39 @@ static bool regime_translation_disabled(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 
 hcr_el2 = arm_hcr_el2_eff(env);
 
-if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
+switch (mmu_idx) {
+case ARMMMUIdx_Stage2:
+case ARMMMUIdx_Stage2_S:
 /* HCR.DC means HCR.VM behaves as 1 */
 return (hcr_el2 & (HCR_DC | HCR_VM)) == 0;
-}
 
-if (hcr_el2 & HCR_TGE) {
+case ARMMMUIdx_E10_0:
+case ARMMMUIdx_E10_1:
+case ARMMMUIdx_E10_1_PAN:
 /* TGE means that NS EL0/1 act as if SCTLR_EL1.M is zero */
-if (!is_secure && regime_el(env, mmu_idx) == 1) {
+if (!is_secure && (hcr_el2 & HCR_TGE)) {
 return true;
 }
-}
+break;
 
-if ((hcr_el2 & HCR_DC) && arm_mmu_idx_is_stage1_of_2(mmu_idx)) {
+case ARMMMUIdx_Stage1_E0:
+case ARMMMUIdx_Stage1_E1:
+case ARMMMUIdx_Stage1_E1_PAN:
 /* HCR.DC means SCTLR_EL1.M behaves as 0 */
-return true;
+if (hcr_el2 & HCR_DC) {
+return true;
+}
+break;
+
+case ARMMMUIdx_E20_0:
+case ARMMMUIdx_E20_2:
+case ARMMMUIdx_E20_2_PAN:
+case ARMMMUIdx_E2:
+case ARMMMUIdx_E3:
+break;
+
+default:
+g_assert_not_reached();
 }
 
 return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
-- 
2.34.1




[PATCH v3 10/42] target/arm: Fold secure and non-secure a-profile mmu indexes

2022-10-01 Thread Richard Henderson
For a-profile aarch64, which does not bank system registers, it takes
quite a lot of code to switch between security states.  In the process,
registers such as TCR_EL{1,2} must be swapped, which in itself requires
the flushing of softmmu tlbs.  Therefore it doesn't buy us anything to
separate tlbs by security state.

Retain the distinction between Stage2 and Stage2_S.

This will be important as we implement FEAT_RME, and do not wish to
add a third set of mmu indexes for Realm state.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu-param.h |   2 +-
 target/arm/cpu.h   |  72 +++
 target/arm/internals.h |  31 +---
 target/arm/helper.c| 144 +
 target/arm/ptw.c   |  25 ++-
 target/arm/translate-a64.c |   8 ---
 target/arm/translate.c |   6 +-
 7 files changed, 85 insertions(+), 203 deletions(-)

diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index 68ffb12427..08681828ac 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -32,6 +32,6 @@
 # define TARGET_PAGE_BITS_MIN  10
 #endif
 
-#define NB_MMU_MODES 15
+#define NB_MMU_MODES 8
 
 #endif
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 790328c598..6475dc0cfd 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2878,26 +2878,27 @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
  * table over and over.
  *  6. we need separate EL1/EL2 mmu_idx for handling the Privileged Access
  * Never (PAN) bit within PSTATE.
+ *  7. we fold together the secure and non-secure regimes for A-profile,
+ * because there are no banked system registers for aarch64, so the
+ * process of switching between secure and non-secure is
+ * already heavyweight.
  *
  * This gives us the following list of cases:
  *
- * NS EL0 EL1&0 stage 1+2 (aka NS PL0)
- * NS EL1 EL1&0 stage 1+2 (aka NS PL1)
- * NS EL1 EL1&0 stage 1+2 +PAN
- * NS EL0 EL2&0
- * NS EL2 EL2&0
- * NS EL2 EL2&0 +PAN
- * NS EL2 (aka NS PL2)
- * S EL0 EL1&0 (aka S PL0)
- * S EL1 EL1&0 (not used if EL3 is 32 bit)
- * S EL1 EL1&0 +PAN
- * S EL3 (aka S PL1)
+ * EL0 EL1&0 stage 1+2 (aka NS PL0)
+ * EL1 EL1&0 stage 1+2 (aka NS PL1)
+ * EL1 EL1&0 stage 1+2 +PAN
+ * EL0 EL2&0
+ * EL2 EL2&0
+ * EL2 EL2&0 +PAN
+ * EL2 (aka NS PL2)
+ * EL3 (aka S PL1)
  *
- * for a total of 11 different mmu_idx.
+ * for a total of 8 different mmu_idx.
  *
  * R profile CPUs have an MPU, but can use the same set of MMU indexes
- * as A profile. They only need to distinguish NS EL0 and NS EL1 (and
- * NS EL2 if we ever model a Cortex-R52).
+ * as A profile. They only need to distinguish EL0 and EL1 (and
+ * EL2 if we ever model a Cortex-R52).
  *
  * M profile CPUs are rather different as they do not have a true MMU.
  * They have the following different MMU indexes:
@@ -2936,9 +2937,6 @@ bool write_cpustate_to_list(ARMCPU *cpu, bool kvm_sync);
 #define ARM_MMU_IDX_NOTLB 0x20  /* does not have a TLB */
 #define ARM_MMU_IDX_M 0x40  /* M profile */
 
-/* Meanings of the bits for A profile mmu idx values */
-#define ARM_MMU_IDX_A_NS 0x8
-
 /* Meanings of the bits for M profile mmu idx values */
 #define ARM_MMU_IDX_M_PRIV   0x1
 #define ARM_MMU_IDX_M_NEGPRI 0x2
@@ -2952,22 +2950,14 @@ typedef enum ARMMMUIdx {
 /*
  * A-profile.
  */
-ARMMMUIdx_SE10_0 =  0 | ARM_MMU_IDX_A,
-ARMMMUIdx_SE20_0 =  1 | ARM_MMU_IDX_A,
-ARMMMUIdx_SE10_1 =  2 | ARM_MMU_IDX_A,
-ARMMMUIdx_SE20_2 =  3 | ARM_MMU_IDX_A,
-ARMMMUIdx_SE10_1_PAN =  4 | ARM_MMU_IDX_A,
-ARMMMUIdx_SE20_2_PAN =  5 | ARM_MMU_IDX_A,
-ARMMMUIdx_SE2=  6 | ARM_MMU_IDX_A,
-ARMMMUIdx_SE3=  7 | ARM_MMU_IDX_A,
-
-ARMMMUIdx_E10_0 = ARMMMUIdx_SE10_0 | ARM_MMU_IDX_A_NS,
-ARMMMUIdx_E20_0 = ARMMMUIdx_SE20_0 | ARM_MMU_IDX_A_NS,
-ARMMMUIdx_E10_1 = ARMMMUIdx_SE10_1 | ARM_MMU_IDX_A_NS,
-ARMMMUIdx_E20_2 = ARMMMUIdx_SE20_2 | ARM_MMU_IDX_A_NS,
-ARMMMUIdx_E10_1_PAN = ARMMMUIdx_SE10_1_PAN | ARM_MMU_IDX_A_NS,
-ARMMMUIdx_E20_2_PAN = ARMMMUIdx_SE20_2_PAN | ARM_MMU_IDX_A_NS,
-ARMMMUIdx_E2= ARMMMUIdx_SE2 | ARM_MMU_IDX_A_NS,
+ARMMMUIdx_E10_0 = 0 | ARM_MMU_IDX_A,
+ARMMMUIdx_E20_0 = 1 | ARM_MMU_IDX_A,
+ARMMMUIdx_E10_1 = 2 | ARM_MMU_IDX_A,
+ARMMMUIdx_E20_2 = 3 | ARM_MMU_IDX_A,
+ARMMMUIdx_E10_1_PAN = 4 | ARM_MMU_IDX_A,
+ARMMMUIdx_E20_2_PAN = 5 | ARM_MMU_IDX_A,
+ARMMMUIdx_E2= 6 | ARM_MMU_IDX_A,
+ARMMMUIdx_E3= 7 | ARM_MMU_IDX_A,
 
 /*
  * These are not allocated TLBs and are used only for AT system
@@ -2976,9 +2966,6 @@ typedef enum ARMMMUIdx {
 ARMMMUIdx_Stage1_E0 = 0 | ARM_MMU_IDX_NOTLB,
 ARMMMUIdx_Stage1_E1 = 1 | ARM_MMU_IDX_NOTLB,
 ARMMMUIdx_Stage1_E1_PAN = 2 | ARM_MMU_IDX_NOTLB,
-ARMMMUIdx_Stage1_SE0 = 3 | ARM_MMU_IDX_NOTLB,
-ARMMMUIdx_Stage1_SE1 = 4 | ARM_MMU_IDX_NOTLB,
-ARMMMUIdx_Stage1_SE1_PAN = 5 | ARM_MMU_IDX_NOTLB,

[PATCH v3 08/42] target/arm: Merge regime_is_secure into get_phys_addr

2022-10-01 Thread Richard Henderson
This is the last use of regime_is_secure; remove it
entirely before changing the layout of ARMMMUIdx.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/internals.h | 42 
 target/arm/ptw.c   | 44 --
 2 files changed, 42 insertions(+), 44 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index 3524d11dc5..14428730d4 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -670,48 +670,6 @@ static inline bool regime_has_2_ranges(ARMMMUIdx mmu_idx)
 }
 }
 
-/* Return true if this address translation regime is secure */
-static inline bool regime_is_secure(CPUARMState *env, ARMMMUIdx mmu_idx)
-{
-switch (mmu_idx) {
-case ARMMMUIdx_E10_0:
-case ARMMMUIdx_E10_1:
-case ARMMMUIdx_E10_1_PAN:
-case ARMMMUIdx_E20_0:
-case ARMMMUIdx_E20_2:
-case ARMMMUIdx_E20_2_PAN:
-case ARMMMUIdx_Stage1_E0:
-case ARMMMUIdx_Stage1_E1:
-case ARMMMUIdx_Stage1_E1_PAN:
-case ARMMMUIdx_E2:
-case ARMMMUIdx_Stage2:
-case ARMMMUIdx_MPrivNegPri:
-case ARMMMUIdx_MUserNegPri:
-case ARMMMUIdx_MPriv:
-case ARMMMUIdx_MUser:
-return false;
-case ARMMMUIdx_SE3:
-case ARMMMUIdx_SE10_0:
-case ARMMMUIdx_SE10_1:
-case ARMMMUIdx_SE10_1_PAN:
-case ARMMMUIdx_SE20_0:
-case ARMMMUIdx_SE20_2:
-case ARMMMUIdx_SE20_2_PAN:
-case ARMMMUIdx_Stage1_SE0:
-case ARMMMUIdx_Stage1_SE1:
-case ARMMMUIdx_Stage1_SE1_PAN:
-case ARMMMUIdx_SE2:
-case ARMMMUIdx_Stage2_S:
-case ARMMMUIdx_MSPrivNegPri:
-case ARMMMUIdx_MSUserNegPri:
-case ARMMMUIdx_MSPriv:
-case ARMMMUIdx_MSUser:
-return true;
-default:
-g_assert_not_reached();
-}
-}
-
 static inline bool regime_is_pan(CPUARMState *env, ARMMMUIdx mmu_idx)
 {
 switch (mmu_idx) {
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 542111f99e..9454ee9df5 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -2499,9 +2499,49 @@ bool get_phys_addr(CPUARMState *env, target_ulong 
address,
MMUAccessType access_type, ARMMMUIdx mmu_idx,
GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
 {
+bool is_secure;
+
+switch (mmu_idx) {
+case ARMMMUIdx_E10_0:
+case ARMMMUIdx_E10_1:
+case ARMMMUIdx_E10_1_PAN:
+case ARMMMUIdx_E20_0:
+case ARMMMUIdx_E20_2:
+case ARMMMUIdx_E20_2_PAN:
+case ARMMMUIdx_Stage1_E0:
+case ARMMMUIdx_Stage1_E1:
+case ARMMMUIdx_Stage1_E1_PAN:
+case ARMMMUIdx_E2:
+case ARMMMUIdx_Stage2:
+case ARMMMUIdx_MPrivNegPri:
+case ARMMMUIdx_MUserNegPri:
+case ARMMMUIdx_MPriv:
+case ARMMMUIdx_MUser:
+is_secure = false;
+break;
+case ARMMMUIdx_SE3:
+case ARMMMUIdx_SE10_0:
+case ARMMMUIdx_SE10_1:
+case ARMMMUIdx_SE10_1_PAN:
+case ARMMMUIdx_SE20_0:
+case ARMMMUIdx_SE20_2:
+case ARMMMUIdx_SE20_2_PAN:
+case ARMMMUIdx_Stage1_SE0:
+case ARMMMUIdx_Stage1_SE1:
+case ARMMMUIdx_Stage1_SE1_PAN:
+case ARMMMUIdx_SE2:
+case ARMMMUIdx_Stage2_S:
+case ARMMMUIdx_MSPrivNegPri:
+case ARMMMUIdx_MSUserNegPri:
+case ARMMMUIdx_MSPriv:
+case ARMMMUIdx_MSUser:
+is_secure = true;
+break;
+default:
+g_assert_not_reached();
+}
 return get_phys_addr_with_secure(env, address, access_type, mmu_idx,
- regime_is_secure(env, mmu_idx),
- result, fi);
+ is_secure, result, fi);
 }
 
 hwaddr arm_cpu_get_phys_page_attrs_debug(CPUState *cs, vaddr addr,
-- 
2.34.1




[PATCH v3 06/42] target/arm: Add is_secure parameter to v7m_read_half_insn

2022-10-01 Thread Richard Henderson
Remove the use of regime_is_secure from v7m_read_half_insn, using
the new parameter instead.

As it happens, both callers pass true, propagated from the argument
to arm_v7m_mmu_idx_for_secstate which created the mmu_idx argument,
but that is a detail of v7m_handle_execute_nsc we need not expose
to the callee.

Reviewed-by: Peter Maydell 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 target/arm/m_helper.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/target/arm/m_helper.c b/target/arm/m_helper.c
index 5ee4ee15b3..203ba411f6 100644
--- a/target/arm/m_helper.c
+++ b/target/arm/m_helper.c
@@ -1981,7 +1981,7 @@ static bool do_v7m_function_return(ARMCPU *cpu)
 return true;
 }
 
-static bool v7m_read_half_insn(ARMCPU *cpu, ARMMMUIdx mmu_idx,
+static bool v7m_read_half_insn(ARMCPU *cpu, ARMMMUIdx mmu_idx, bool secure,
uint32_t addr, uint16_t *insn)
 {
 /*
@@ -2003,8 +2003,7 @@ static bool v7m_read_half_insn(ARMCPU *cpu, ARMMMUIdx 
mmu_idx,
 ARMMMUFaultInfo fi = {};
 MemTxResult txres;
 
-v8m_security_lookup(env, addr, MMU_INST_FETCH, mmu_idx,
-regime_is_secure(env, mmu_idx), );
+v8m_security_lookup(env, addr, MMU_INST_FETCH, mmu_idx, secure, );
 if (!sattrs.nsc || sattrs.ns) {
 /*
  * This must be the second half of the insn, and it straddles a
@@ -2109,7 +2108,7 @@ static bool v7m_handle_execute_nsc(ARMCPU *cpu)
 /* We want to do the MPU lookup as secure; work out what mmu_idx that is */
 mmu_idx = arm_v7m_mmu_idx_for_secstate(env, true);
 
-if (!v7m_read_half_insn(cpu, mmu_idx, env->regs[15], )) {
+if (!v7m_read_half_insn(cpu, mmu_idx, true, env->regs[15], )) {
 return false;
 }
 
@@ -2125,7 +2124,7 @@ static bool v7m_handle_execute_nsc(ARMCPU *cpu)
 goto gen_invep;
 }
 
-if (!v7m_read_half_insn(cpu, mmu_idx, env->regs[15] + 2, )) {
+if (!v7m_read_half_insn(cpu, mmu_idx, true, env->regs[15] + 2, )) {
 return false;
 }
 
-- 
2.34.1




[PATCH v3 05/42] target/arm: Split out get_phys_addr_with_secure

2022-10-01 Thread Richard Henderson
Retain the existing get_phys_addr interface using the security
state derived from mmu_idx.  Move the kerneldoc comments to the
header file where they belong.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
v3: Move the kerneldoc to internals.h
---
 target/arm/internals.h | 40 ++
 target/arm/ptw.c   | 44 ++
 2 files changed, 55 insertions(+), 29 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index 307a596505..3524d11dc5 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1145,6 +1145,46 @@ typedef struct GetPhysAddrResult {
 ARMCacheAttrs cacheattrs;
 } GetPhysAddrResult;
 
+/**
+ * get_phys_addr_with_secure: get the physical address for a virtual address
+ * @env: CPUARMState
+ * @address: virtual address to get physical address for
+ * @access_type: 0 for read, 1 for write, 2 for execute
+ * @mmu_idx: MMU index indicating required translation regime
+ * @is_secure: security state for the access
+ * @result: set on translation success.
+ * @fi: set to fault info if the translation fails
+ *
+ * Find the physical address corresponding to the given virtual address,
+ * by doing a translation table walk on MMU based systems or using the
+ * MPU state on MPU based systems.
+ *
+ * Returns false if the translation was successful. Otherwise, phys_ptr, attrs,
+ * prot and page_size may not be filled in, and the populated fsr value 
provides
+ * information on why the translation aborted, in the format of a
+ * DFSR/IFSR fault register, with the following caveats:
+ *  * we honour the short vs long DFSR format differences.
+ *  * the WnR bit is never set (the caller must do this).
+ *  * for PSMAv5 based systems we don't bother to return a full FSR format
+ *value.
+ */
+bool get_phys_addr_with_secure(CPUARMState *env, target_ulong address,
+   MMUAccessType access_type,
+   ARMMMUIdx mmu_idx, bool is_secure,
+   GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
+__attribute__((nonnull));
+
+/**
+ * get_phys_addr: get the physical address for a virtual address
+ * @env: CPUARMState
+ * @address: virtual address to get physical address for
+ * @access_type: 0 for read, 1 for write, 2 for execute
+ * @mmu_idx: MMU index indicating required translation regime
+ * @result: set on translation success.
+ * @fi: set to fault info if the translation fails
+ *
+ * Similarly, but use the security regime of @mmu_idx.
+ */
 bool get_phys_addr(CPUARMState *env, target_ulong address,
MMUAccessType access_type, ARMMMUIdx mmu_idx,
GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index f9b7c316d0..542111f99e 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -2260,35 +2260,12 @@ static ARMCacheAttrs combine_cacheattrs(CPUARMState 
*env,
 return ret;
 }
 
-/**
- * get_phys_addr - get the physical address for this virtual address
- *
- * Find the physical address corresponding to the given virtual address,
- * by doing a translation table walk on MMU based systems or using the
- * MPU state on MPU based systems.
- *
- * Returns false if the translation was successful. Otherwise, phys_ptr, attrs,
- * prot and page_size may not be filled in, and the populated fsr value 
provides
- * information on why the translation aborted, in the format of a
- * DFSR/IFSR fault register, with the following caveats:
- *  * we honour the short vs long DFSR format differences.
- *  * the WnR bit is never set (the caller must do this).
- *  * for PSMAv5 based systems we don't bother to return a full FSR format
- *value.
- *
- * @env: CPUARMState
- * @address: virtual address to get physical address for
- * @access_type: 0 for read, 1 for write, 2 for execute
- * @mmu_idx: MMU index indicating required translation regime
- * @result: set on translation success.
- * @fi: set to fault info if the translation fails
- */
-bool get_phys_addr(CPUARMState *env, target_ulong address,
-   MMUAccessType access_type, ARMMMUIdx mmu_idx,
-   GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
+bool get_phys_addr_with_secure(CPUARMState *env, target_ulong address,
+   MMUAccessType access_type, ARMMMUIdx mmu_idx,
+   bool is_secure, GetPhysAddrResult *result,
+   ARMMMUFaultInfo *fi)
 {
 ARMMMUIdx s1_mmu_idx = stage_1_mmu_idx(mmu_idx);
-bool is_secure = regime_is_secure(env, mmu_idx);
 
 if (mmu_idx != s1_mmu_idx) {
 /*
@@ -2304,8 +2281,8 @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
 ARMMMUIdx s2_mmu_idx;
 bool is_el0;
 
-ret = get_phys_addr(env, address, access_type, s1_mmu_idx,
-result, fi);
+ret = 

[PATCH v3 07/42] target/arm: Add TBFLAG_M32.SECURE

2022-10-01 Thread Richard Henderson
Remove the use of regime_is_secure from arm_tr_init_disas_context.
Instead, provide the value of v8m_secure directly from tb_flags.
Rather than use regime_is_secure, use the env->v7m.secure directly,
as per arm_mmu_idx_el.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h   | 2 ++
 target/arm/helper.c| 4 
 target/arm/translate.c | 3 +--
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 33cdbc0143..790328c598 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3197,6 +3197,8 @@ FIELD(TBFLAG_M32, NEW_FP_CTXT_NEEDED, 3, 1) /* Not 
cached. */
 FIELD(TBFLAG_M32, FPCCR_S_WRONG, 4, 1)  /* Not cached. */
 /* Set if MVE insns are definitely not predicated by VPR or LTPSIZE */
 FIELD(TBFLAG_M32, MVE_NO_PRED, 5, 1)/* Not cached. */
+/* Set if in secure mode */
+FIELD(TBFLAG_M32, SECURE, 6, 1)
 
 /*
  * Bit usage when in AArch64 state
diff --git a/target/arm/helper.c b/target/arm/helper.c
index b5dac651e7..772218f0d2 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -10935,6 +10935,10 @@ static CPUARMTBFlags rebuild_hflags_m32(CPUARMState 
*env, int fp_el,
 DP_TBFLAG_M32(flags, STACKCHECK, 1);
 }
 
+if (arm_feature(env, ARM_FEATURE_M_SECURITY) && env->v7m.secure) {
+DP_TBFLAG_M32(flags, SECURE, 1);
+}
+
 return rebuild_hflags_common_32(env, fp_el, mmu_idx, flags);
 }
 
diff --git a/target/arm/translate.c b/target/arm/translate.c
index 5aaccbbf71..ac647e0262 100644
--- a/target/arm/translate.c
+++ b/target/arm/translate.c
@@ -9351,8 +9351,7 @@ static void arm_tr_init_disas_context(DisasContextBase 
*dcbase, CPUState *cs)
 dc->vfp_enabled = 1;
 dc->be_data = MO_TE;
 dc->v7m_handler_mode = EX_TBFLAG_M32(tb_flags, HANDLER);
-dc->v8m_secure = arm_feature(env, ARM_FEATURE_M_SECURITY) &&
-regime_is_secure(env, dc->mmu_idx);
+dc->v8m_secure = EX_TBFLAG_M32(tb_flags, SECURE);
 dc->v8m_stackcheck = EX_TBFLAG_M32(tb_flags, STACKCHECK);
 dc->v8m_fpccr_s_wrong = EX_TBFLAG_M32(tb_flags, FPCCR_S_WRONG);
 dc->v7m_new_fp_ctxt_needed =
-- 
2.34.1




[PATCH v3 02/42] target/arm: Add is_secure parameter to get_phys_addr_lpae

2022-10-01 Thread Richard Henderson
Remove the use of regime_is_secure from get_phys_addr_lpae,
using the new parameter instead.

Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
v3: Update to use s2walk_secure.
---
 target/arm/ptw.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index b8c494ad9f..b7c999ffce 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -16,8 +16,8 @@
 
 static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
MMUAccessType access_type, ARMMMUIdx mmu_idx,
-   bool s1_is_el0, GetPhysAddrResult *result,
-   ARMMMUFaultInfo *fi)
+   bool is_secure, bool s1_is_el0,
+   GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
 __attribute__((nonnull));
 
 /* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
@@ -207,8 +207,8 @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 GetPhysAddrResult s2 = {};
 int ret;
 
-ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, s2_mmu_idx, false,
- , fi);
+ret = get_phys_addr_lpae(env, addr, MMU_DATA_LOAD, s2_mmu_idx,
+ *is_secure, false, , fi);
 if (ret) {
 assert(fi->type != ARMFault_None);
 fi->s2addr = addr;
@@ -965,8 +965,8 @@ static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, 
int level,
  */
 static bool get_phys_addr_lpae(CPUARMState *env, uint64_t address,
MMUAccessType access_type, ARMMMUIdx mmu_idx,
-   bool s1_is_el0, GetPhysAddrResult *result,
-   ARMMMUFaultInfo *fi)
+   bool is_secure, bool s1_is_el0,
+   GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
 {
 ARMCPU *cpu = env_archcpu(env);
 /* Read an LPAE long-descriptor translation table. */
@@ -1183,7 +1183,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, uint64_t 
address,
  * remain non-secure. We implement this by just ORing in the NSTable/NS
  * bits at each step.
  */
-tableattrs = regime_is_secure(env, mmu_idx) ? 0 : (1 << 4);
+tableattrs = is_secure ? 0 : (1 << 4);
 for (;;) {
 uint64_t descriptor;
 bool nstable;
@@ -2337,7 +2337,7 @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
 memset(result, 0, sizeof(*result));
 
 ret = get_phys_addr_lpae(env, ipa, access_type, s2_mmu_idx,
- is_el0, result, fi);
+ s2walk_secure, is_el0, result, fi);
 fi->s2addr = ipa;
 
 /* Combine the S1 and S2 perms.  */
@@ -2505,8 +2505,8 @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
 }
 
 if (regime_using_lpae_format(env, mmu_idx)) {
-return get_phys_addr_lpae(env, address, access_type, mmu_idx, false,
-  result, fi);
+return get_phys_addr_lpae(env, address, access_type, mmu_idx,
+  is_secure, false, result, fi);
 } else if (regime_sctlr(env, mmu_idx) & SCTLR_XP) {
 return get_phys_addr_v6(env, address, access_type, mmu_idx,
 is_secure, result, fi);
-- 
2.34.1




[PATCH v3 03/42] target/arm: Fix S2 disabled check in S1_ptw_translate

2022-10-01 Thread Richard Henderson
Pass the correct stage2 mmu_idx to regime_translation_disabled,
which we computed afterward.

Signed-off-by: Richard Henderson 
---
v3: Move earlier in the patch set.
---
 target/arm/ptw.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index b7c999ffce..5192418c0e 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -200,10 +200,10 @@ static hwaddr S1_ptw_translate(CPUARMState *env, 
ARMMMUIdx mmu_idx,
hwaddr addr, bool *is_secure,
ARMMMUFaultInfo *fi)
 {
+ARMMMUIdx s2_mmu_idx = *is_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
+
 if (arm_mmu_idx_is_stage1_of_2(mmu_idx) &&
-!regime_translation_disabled(env, ARMMMUIdx_Stage2)) {
-ARMMMUIdx s2_mmu_idx = *is_secure ? ARMMMUIdx_Stage2_S
-  : ARMMMUIdx_Stage2;
+!regime_translation_disabled(env, s2_mmu_idx)) {
 GetPhysAddrResult s2 = {};
 int ret;
 
-- 
2.34.1




[PATCH v3 01/42] target/arm: Split s2walk_secure from ipa_secure in get_phys_addr

2022-10-01 Thread Richard Henderson
The starting security state comes with the translation regime,
not the current state of arm_is_secure_below_el3().

Create a new local variable, s2walk_secure, which does not need
to be written back to result->attrs.secure -- we compute that
value later, after the S2 walk is complete.

Signed-off-by: Richard Henderson 
---
v3: Do not modify ipa_secure, per review.
---
 target/arm/ptw.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 2ddfc028ab..b8c494ad9f 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -2298,7 +2298,7 @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
 hwaddr ipa;
 int s1_prot;
 int ret;
-bool ipa_secure;
+bool ipa_secure, s2walk_secure;
 ARMCacheAttrs cacheattrs1;
 ARMMMUIdx s2_mmu_idx;
 bool is_el0;
@@ -2313,17 +2313,17 @@ bool get_phys_addr(CPUARMState *env, target_ulong 
address,
 
 ipa = result->phys;
 ipa_secure = result->attrs.secure;
-if (arm_is_secure_below_el3(env)) {
-if (ipa_secure) {
-result->attrs.secure = !(env->cp15.vstcr_el2 & VSTCR_SW);
-} else {
-result->attrs.secure = !(env->cp15.vtcr_el2 & VTCR_NSW);
-}
+if (is_secure) {
+/* Select TCR based on the NS bit from the S1 walk. */
+s2walk_secure = !(ipa_secure
+  ? env->cp15.vstcr_el2 & VSTCR_SW
+  : env->cp15.vtcr_el2 & VTCR_NSW);
 } else {
 assert(!ipa_secure);
+s2walk_secure = false;
 }
 
-s2_mmu_idx = (result->attrs.secure
+s2_mmu_idx = (s2walk_secure
   ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2);
 is_el0 = mmu_idx == ARMMMUIdx_E10_0 || mmu_idx == ARMMMUIdx_SE10_0;
 
@@ -2366,7 +2366,7 @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
 result->cacheattrs);
 
 /* Check if IPA translates to secure or non-secure PA space. */
-if (arm_is_secure_below_el3(env)) {
+if (is_secure) {
 if (ipa_secure) {
 result->attrs.secure =
 !(env->cp15.vstcr_el2 & (VSTCR_SA | VSTCR_SW));
-- 
2.34.1




[PATCH v3 00/42] target/arm: Implement FEAT_HAFDBS

2022-10-01 Thread Richard Henderson
This is a major reorg to arm page table walking.  While the result
here is "merely" Hardware-assited Access Flag and Dirty Bit Setting
(HAFDBS), the ultimate goal is the Realm Management Extension (RME).
RME "recommends" that HAFDBS be implemented (I_CSLWZ).

For HAFDBS, being able to find a host pointer for the ram that
backs a given page table entry is required in order to perform the
atomic update to that PTE.  The easiest way to find a host pointer
is to use the existing softtlb mechanism.  Thus all of the page
table walkers have been adjusted to take an mmu_idx that corresponds
to the regime in which the page table is stored.  In some cases,
this is a new "physical" mmu_idx that has a permanent 1-1 mapping.

For RME, "physical" addresses also have page permissions, coming
from the Root realm Granule Protection Table, which can be thought
of as a third stage page table lookup.  So eventually the new
Secure and Nonsecure physical mmu indexes will joined by
Realm and Root physical mmu indexes, and all of them will take
the new Granule Page Table into account.

Previously, we had A-profile allocate separate mmu_idx for secure
vs non-secure.  I've done away with that.  Now, I flush all mmu_idx
when SCR_EL3.NS is changed.  I did not see how we could reasonably
add 8 more mmu_idx for Realm.  Moreover, I had a look through ARM
Trusted Firmware, at the code paths used to change between Secure
and Nonsecure.  We wind up flushing all of these mmu_idx anyway while
swapping the EL1+EL2 cpregs, so there is no gain at all in attempting
to keep them live at the same time within qemu.

Changes for v3:
  * 20-odd patches upstreamed.
  * Changes to the base CPUTLBEntryFull patch set, propogated.
  * Queries via arm_cpu_get_phys_page_attrs_debug, i.e. gdbstub,
do not use the softmmu tlb, and so do not modify cpu state.


r~


Based-on: 20220930212622.108363-1-richard.hender...@linaro.org
("[PATCH v6 00/18] tcg: CPUTLBEntryFull and TARGET_TB_PCREL")


Richard Henderson (42):
  target/arm: Split s2walk_secure from ipa_secure in get_phys_addr
  target/arm: Add is_secure parameter to get_phys_addr_lpae
  target/arm: Fix S2 disabled check in S1_ptw_translate
  target/arm: Add is_secure parameter to regime_translation_disabled
  target/arm: Split out get_phys_addr_with_secure
  target/arm: Add is_secure parameter to v7m_read_half_insn
  target/arm: Add TBFLAG_M32.SECURE
  target/arm: Merge regime_is_secure into get_phys_addr
  target/arm: Add is_secure parameter to do_ats_write
  target/arm: Fold secure and non-secure a-profile mmu indexes
  target/arm: Reorg regime_translation_disabled
  target/arm: Drop secure check for HCR.TGE vs SCTLR_EL1.M
  target/arm: Introduce arm_hcr_el2_eff_secstate
  target/arm: Hoist read of *is_secure in S1_ptw_translate
  target/arm: Remove env argument from combined_attrs_fwb
  target/arm: Pass HCR to attribute subroutines.
  target/arm: Fix ATS12NSO* from S PL1
  target/arm: Split out get_phys_addr_disabled
  target/arm: Fix cacheattr in get_phys_addr_disabled
  target/arm: Use tlb_set_page_full
  target/arm: Enable TARGET_PAGE_ENTRY_EXTRA
  target/arm: Use probe_access_full for MTE
  target/arm: Use probe_access_full for BTI
  target/arm: Add ARMMMUIdx_Phys_{S,NS}
  target/arm: Move ARMMMUIdx_Stage2 to a real tlb mmu_idx
  target/arm: Plumb debug into S1_ptw_translate
  target/arm: Use softmmu tlbs for page table walking
  target/arm: Split out get_phys_addr_twostage
  target/arm: Use bool consistently for get_phys_addr subroutines
  target/arm: Add ptw_idx argument to S1_ptw_translate
  target/arm: Add isar predicates for FEAT_HAFDBS
  target/arm: Extract HA and HD in aa64_va_parameters
  target/arm: Split out S1TranslateResult type
  target/arm: Move be test for regime into S1TranslateResult
  target/arm: Move S1_ptw_translate outside arm_ld[lq]_ptw
  target/arm: Add ARMFault_UnsuppAtomicUpdate
  target/arm: Remove loop from get_phys_addr_lpae
  target/arm: Fix fault reporting in get_phys_addr_lpae
  target/arm: Don't shift attrs in get_phys_addr_lpae
  target/arm: Consider GP an attribute in get_phys_addr_lpae
  target/arm: Implement FEAT_HAFDBS
  target/arm: Use the max page size in a 2-stage ptw

 docs/system/arm/emulation.rst  |1 +
 target/arm/cpu-param.h |   10 +-
 target/arm/cpu.h   |  143 ++--
 target/arm/internals.h |  125 ++-
 target/arm/sve_ldst_internal.h |1 +
 target/arm/cpu64.c |1 +
 target/arm/helper.c|  200 +++--
 target/arm/m_helper.c  |   29 +-
 target/arm/mte_helper.c|   61 +-
 target/arm/ptw.c   | 1315 
 target/arm/sve_helper.c|   54 +-
 target/arm/tlb_helper.c|   31 +-
 target/arm/translate-a64.c |   30 +-
 target/arm/translate.c |9 +-
 14 files changed, 1113 insertions(+), 897 deletions(-)

-- 
2.34.1




[PATCH v3 04/42] target/arm: Add is_secure parameter to regime_translation_disabled

2022-10-01 Thread Richard Henderson
Remove the use of regime_is_secure from regime_translation_disabled,
using the new parameter instead.

This fixes a bug in S1_ptw_translate and get_phys_addr where we had
passed ARMMMUIdx_Stage2 and not ARMMMUIdx_Stage2_S to determine if
Stage2 is disabled, affecting FEAT_SEL2.

Reviewed-by: Peter Maydell 
Reviewed-by: Alex Bennée 
Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 20 +++-
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 5192418c0e..f9b7c316d0 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -131,12 +131,13 @@ static uint64_t regime_ttbr(CPUARMState *env, ARMMMUIdx 
mmu_idx, int ttbrn)
 }
 
 /* Return true if the specified stage of address translation is disabled */
-static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx)
+static bool regime_translation_disabled(CPUARMState *env, ARMMMUIdx mmu_idx,
+bool is_secure)
 {
 uint64_t hcr_el2;
 
 if (arm_feature(env, ARM_FEATURE_M)) {
-switch (env->v7m.mpu_ctrl[regime_is_secure(env, mmu_idx)] &
+switch (env->v7m.mpu_ctrl[is_secure] &
 (R_V7M_MPU_CTRL_ENABLE_MASK | R_V7M_MPU_CTRL_HFNMIENA_MASK)) {
 case R_V7M_MPU_CTRL_ENABLE_MASK:
 /* Enabled, but not for HardFault and NMI */
@@ -163,7 +164,7 @@ static bool regime_translation_disabled(CPUARMState *env, 
ARMMMUIdx mmu_idx)
 
 if (hcr_el2 & HCR_TGE) {
 /* TGE means that NS EL0/1 act as if SCTLR_EL1.M is zero */
-if (!regime_is_secure(env, mmu_idx) && regime_el(env, mmu_idx) == 1) {
+if (!is_secure && regime_el(env, mmu_idx) == 1) {
 return true;
 }
 }
@@ -203,7 +204,7 @@ static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx 
mmu_idx,
 ARMMMUIdx s2_mmu_idx = *is_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
 
 if (arm_mmu_idx_is_stage1_of_2(mmu_idx) &&
-!regime_translation_disabled(env, s2_mmu_idx)) {
+!regime_translation_disabled(env, s2_mmu_idx, *is_secure)) {
 GetPhysAddrResult s2 = {};
 int ret;
 
@@ -1357,7 +1358,7 @@ static bool get_phys_addr_pmsav5(CPUARMState *env, 
uint32_t address,
 uint32_t base;
 bool is_user = regime_is_user(env, mmu_idx);
 
-if (regime_translation_disabled(env, mmu_idx)) {
+if (regime_translation_disabled(env, mmu_idx, is_secure)) {
 /* MPU disabled.  */
 result->phys = address;
 result->prot = PAGE_READ | PAGE_WRITE | PAGE_EXEC;
@@ -1521,7 +1522,7 @@ static bool get_phys_addr_pmsav7(CPUARMState *env, 
uint32_t address,
 result->page_size = TARGET_PAGE_SIZE;
 result->prot = 0;
 
-if (regime_translation_disabled(env, mmu_idx) ||
+if (regime_translation_disabled(env, mmu_idx, secure) ||
 m_is_ppb_region(env, address)) {
 /*
  * MPU disabled or M profile PPB access: use default memory map.
@@ -1733,7 +1734,7 @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t address,
  * are done in arm_v7m_load_vector(), which always does a direct
  * read using address_space_ldl(), rather than going via this function.
  */
-if (regime_translation_disabled(env, mmu_idx)) { /* MPU disabled */
+if (regime_translation_disabled(env, mmu_idx, secure)) { /* MPU disabled */
 hit = true;
 } else if (m_is_ppb_region(env, address)) {
 hit = true;
@@ -2307,7 +2308,8 @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
 result, fi);
 
 /* If S1 fails or S2 is disabled, return early.  */
-if (ret || regime_translation_disabled(env, ARMMMUIdx_Stage2)) {
+if (ret || regime_translation_disabled(env, ARMMMUIdx_Stage2,
+   is_secure)) {
 return ret;
 }
 
@@ -2438,7 +2440,7 @@ bool get_phys_addr(CPUARMState *env, target_ulong address,
 
 /* Definitely a real MMU, not an MPU */
 
-if (regime_translation_disabled(env, mmu_idx)) {
+if (regime_translation_disabled(env, mmu_idx, is_secure)) {
 uint64_t hcr;
 uint8_t memattr;
 
-- 
2.34.1




Re: [PATCH v1 0/8] migration: introduce dirtylimit capability

2022-10-01 Thread Hyman Huang




在 2022/10/1 22:37, Markus Armbruster 写道:

huang...@chinatelecom.cn writes:


From: Hyman Huang(黄勇) 

v1:
- make parameter vcpu-dirty-limit experimental
- switch dirty limit off when cancel migrate
- add cancel logic in migration test

Please review, thanks,

Yong


Are you still pursuing this feature?
Yes, of course, but the detailed test report has not been prepared, and 
the last 3 commits of this patchset has not been commentted, i'm waiting 
for the it and the next version can be improved hugely.


Ping to Daniel and David, what do you think of these 3 test patches?
I would be very pleased if you could help me with the review.  :)

Thanks

Yong



Abstract


This series added a new migration capability called "dirtylimit".  It can
be enabled when dirty ring is enabled, and it'll improve the vCPU performance
during the process of migration. It is based on the previous patchset:
https://lore.kernel.org/qemu-devel/cover.1656177590.git.huang...@chinatelecom.cn/

As mentioned in patchset "support dirty restraint on vCPU", dirtylimit way of
migration can make the read-process not be penalized. This series wires up the
vcpu dirty limit and wrappers as dirtylimit capability of migration. I introduce
two parameters vcpu-dirtylimit-period and vcpu-dirtylimit to implement the setup
of dirtylimit during live migration.

To validate the implementation, i tested a 32 vCPU vm live migration with such
model:
Only dirty vcpu0, vcpu1 with heavy memory workoad and leave the rest vcpus
untouched, running unixbench on the vpcu8-vcpu15 by setup the cpu affinity as
the following command:
taskset -c 8-15 ./Run -i 2 -c 8 {unixbench test item}

The following are results:

host cpu: Intel(R) Xeon(R) Platinum 8378A
host interface speed: 1000Mb/s
   |-+++---|
   | UnixBench test item | Normal | Dirtylimit | Auto-converge |
   |-+++---|
   | dhry2reg| 32800  | 32786  | 25292 |
   | whetstone-double| 10326  | 10315  | 9847  |
   | pipe| 15442  | 15271  | 14506 |
   | context1| 7260   | 6235   | 4514  |
   | spawn   | 3663   | 3317   | 3249  |
   | syscall | 4669   | 4667   | 3841  |
   |-+++---|
>From the data above we can draw a conclusion that vcpus that do not dirty 
memory
in vm are almost unaffected during the dirtylimit migration, but the auto 
converge
way does.

I also tested the total time of dirtylimit migration with variable dirty memory
size in vm.

senario 1:
host cpu: Intel(R) Xeon(R) Platinum 8378A
host interface speed: 1000Mb/s
   |---++---|
   | dirty memory size(MB) | Dirtylimit(ms) | Auto-converge(ms) |
   |---++---|
   | 60| 2014   | 2131  |
   | 70| 5381   | 12590 |
   | 90| 6037   | 33545 |
   | 110   | 7660   | [*]   |
   |---++---|
   [*]: This case means migration is not convergent.

senario 2:
host cpu: Intel(R) Xeon(R) CPU E5-2650
host interface speed: 1Mb/s
   |---++---|
   | dirty memory size(MB) | Dirtylimit(ms) | Auto-converge(ms) |
   |---++---|
   | 1600  | 15842  | 27548 |
   | 2000  | 19026  | 38447 |
   | 2400  | 19897  | 46381 |
   | 2800  | 22338  | 57149 |
   |---++---|
Above data shows that dirtylimit way of migration can also reduce the total
time of migration and it achieves convergence more easily in some case.

In addition to implement dirtylimit capability itself, this series
add 3 tests for migration, aiming at playing around for developer simply:
  1. qtest for dirty limit migration
  2. support dirty ring way of migration for guestperf tool
  3. support dirty limit migration for guestperf tool

Please review, thanks !

Hyman Huang (8):
   qapi/migration: Introduce x-vcpu-dirty-limit-period parameter
   qapi/migration: Introduce x-vcpu-dirty-limit parameters
   migration: Introduce dirty-limit capability
   migration: Implement dirty-limit convergence algo
   migration: Export dirty-limit time info
   tests: Add migration dirty-limit capability test
   tests/migration: Introduce dirty-ring-size option into guestperf
   tests/migration: Introduce dirty-limit into guestperf

  include/sysemu/dirtylimit.h |   2 +
  migration/migration.c   |  51 

[PATCH v3 25/26] target/i386: Inline gen_jmp_im

2022-10-01 Thread Richard Henderson
Expand this function at each of its callers.

Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index f08fa060c4..689a45256c 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -517,19 +517,14 @@ static inline void gen_op_st_rm_T0_A0(DisasContext *s, 
int idx, int d)
 }
 }
 
-static void gen_jmp_im(DisasContext *s, target_ulong pc)
-{
-tcg_gen_movi_tl(cpu_eip, pc);
-}
-
 static void gen_update_eip_cur(DisasContext *s)
 {
-gen_jmp_im(s, s->base.pc_next - s->cs_base);
+tcg_gen_movi_tl(cpu_eip, s->base.pc_next - s->cs_base);
 }
 
 static void gen_update_eip_next(DisasContext *s)
 {
-gen_jmp_im(s, s->pc - s->cs_base);
+tcg_gen_movi_tl(cpu_eip, s->pc - s->cs_base);
 }
 
 static int cur_insn_len(DisasContext *s)
@@ -2767,17 +2762,17 @@ static void gen_jmp_rel(DisasContext *s, MemOp ot, int 
diff, int tb_num)
 gen_update_cc_op(s);
 set_cc_op(s, CC_OP_DYNAMIC);
 if (!s->jmp_opt) {
-gen_jmp_im(s, dest);
+tcg_gen_movi_tl(cpu_eip, dest);
 gen_eob(s);
 } else if (translator_use_goto_tb(>base, dest))  {
 /* jump to same page: we can use a direct jump */
 tcg_gen_goto_tb(tb_num);
-gen_jmp_im(s, dest);
+tcg_gen_movi_tl(cpu_eip, dest);
 tcg_gen_exit_tb(s->base.tb, tb_num);
 s->base.is_jmp = DISAS_NORETURN;
 } else {
 /* jump to another page */
-gen_jmp_im(s, dest);
+tcg_gen_movi_tl(cpu_eip, dest);
 gen_jr(s);
 }
 }
-- 
2.34.1




[PATCH v3 23/26] target/i386: Create eip_cur_tl

2022-10-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 2e7b94700b..5b0dab8633 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -562,6 +562,11 @@ static TCGv eip_next_tl(DisasContext *s)
 return tcg_constant_tl(s->pc - s->cs_base);
 }
 
+static TCGv eip_cur_tl(DisasContext *s)
+{
+return tcg_constant_tl(s->base.pc_next - s->cs_base);
+}
+
 /* Compute SEG:REG into A0.  SEG is selected from the override segment
(OVR_SEG) and the default segment (DEF_SEG).  OVR_SEG may be -1 to
indicate no override.  */
@@ -6617,7 +6622,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
offsetof(CPUX86State, segs[R_CS].selector));
 tcg_gen_st16_i32(s->tmp2_i32, cpu_env,
  offsetof(CPUX86State, fpcs));
-tcg_gen_st_tl(tcg_constant_tl(s->base.pc_next - s->cs_base),
+tcg_gen_st_tl(eip_cur_tl(s),
   cpu_env, offsetof(CPUX86State, fpip));
 }
 }
-- 
2.34.1




Re: [PATCH v1 0/8] migration: introduce dirtylimit capability

2022-10-01 Thread Markus Armbruster
huang...@chinatelecom.cn writes:

> From: Hyman Huang(黄勇) 
>
> v1:
> - make parameter vcpu-dirty-limit experimental 
> - switch dirty limit off when cancel migrate
> - add cancel logic in migration test 
>
> Please review, thanks,
>
> Yong 

Are you still pursuing this feature?

> Abstract
> 
>
> This series added a new migration capability called "dirtylimit".  It can
> be enabled when dirty ring is enabled, and it'll improve the vCPU performance
> during the process of migration. It is based on the previous patchset:
> https://lore.kernel.org/qemu-devel/cover.1656177590.git.huang...@chinatelecom.cn/
>
> As mentioned in patchset "support dirty restraint on vCPU", dirtylimit way of
> migration can make the read-process not be penalized. This series wires up the
> vcpu dirty limit and wrappers as dirtylimit capability of migration. I 
> introduce
> two parameters vcpu-dirtylimit-period and vcpu-dirtylimit to implement the 
> setup 
> of dirtylimit during live migration.
>
> To validate the implementation, i tested a 32 vCPU vm live migration with 
> such 
> model:
> Only dirty vcpu0, vcpu1 with heavy memory workoad and leave the rest vcpus
> untouched, running unixbench on the vpcu8-vcpu15 by setup the cpu affinity as
> the following command:
> taskset -c 8-15 ./Run -i 2 -c 8 {unixbench test item}
>
> The following are results:
>
> host cpu: Intel(R) Xeon(R) Platinum 8378A
> host interface speed: 1000Mb/s
>   |-+++---|
>   | UnixBench test item | Normal | Dirtylimit | Auto-converge |
>   |-+++---|
>   | dhry2reg| 32800  | 32786  | 25292 |
>   | whetstone-double| 10326  | 10315  | 9847  |
>   | pipe| 15442  | 15271  | 14506 |
>   | context1| 7260   | 6235   | 4514  |
>   | spawn   | 3663   | 3317   | 3249  |
>   | syscall | 4669   | 4667   | 3841  |
>   |-+++---|
>>From the data above we can draw a conclusion that vcpus that do not dirty 
>>memory
> in vm are almost unaffected during the dirtylimit migration, but the auto 
> converge
> way does. 
>
> I also tested the total time of dirtylimit migration with variable dirty 
> memory
> size in vm.
>
> senario 1:
> host cpu: Intel(R) Xeon(R) Platinum 8378A
> host interface speed: 1000Mb/s
>   |---++---|
>   | dirty memory size(MB) | Dirtylimit(ms) | Auto-converge(ms) |
>   |---++---|
>   | 60| 2014   | 2131  |
>   | 70| 5381   | 12590 |
>   | 90| 6037   | 33545 |
>   | 110   | 7660   | [*]   |
>   |---++---|
>   [*]: This case means migration is not convergent. 
>
> senario 2:
> host cpu: Intel(R) Xeon(R) CPU E5-2650
> host interface speed: 1Mb/s
>   |---++---|
>   | dirty memory size(MB) | Dirtylimit(ms) | Auto-converge(ms) |
>   |---++---|
>   | 1600  | 15842  | 27548 |
>   | 2000  | 19026  | 38447 |
>   | 2400  | 19897  | 46381 |
>   | 2800  | 22338  | 57149 |
>   |---++---|
> Above data shows that dirtylimit way of migration can also reduce the total
> time of migration and it achieves convergence more easily in some case.
>
> In addition to implement dirtylimit capability itself, this series
> add 3 tests for migration, aiming at playing around for developer simply: 
>  1. qtest for dirty limit migration
>  2. support dirty ring way of migration for guestperf tool
>  3. support dirty limit migration for guestperf tool
>
> Please review, thanks !
>
> Hyman Huang (8):
>   qapi/migration: Introduce x-vcpu-dirty-limit-period parameter
>   qapi/migration: Introduce x-vcpu-dirty-limit parameters
>   migration: Introduce dirty-limit capability
>   migration: Implement dirty-limit convergence algo
>   migration: Export dirty-limit time info
>   tests: Add migration dirty-limit capability test
>   tests/migration: Introduce dirty-ring-size option into guestperf
>   tests/migration: Introduce dirty-limit into guestperf
>
>  include/sysemu/dirtylimit.h |   2 +
>  migration/migration.c   |  51 +++
>  migration/migration.h   |   1 +
>  migration/ram.c |  53 ---
>  migration/trace-events  |   1 +
>  monitor/hmp-cmds.c  |  26 ++
>  

[PATCH v3 26/26] target/i386: Enable TARGET_TB_PCREL

2022-10-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/i386/cpu-param.h |   4 ++
 target/i386/tcg/tcg-cpu.c   |   8 ++-
 target/i386/tcg/translate.c | 130 
 3 files changed, 113 insertions(+), 29 deletions(-)

diff --git a/target/i386/cpu-param.h b/target/i386/cpu-param.h
index 9740bd7abd..1e79389761 100644
--- a/target/i386/cpu-param.h
+++ b/target/i386/cpu-param.h
@@ -25,4 +25,8 @@
 #define TARGET_PAGE_BITS 12
 #define NB_MMU_MODES 3
 
+#ifndef CONFIG_USER_ONLY
+# define TARGET_TB_PCREL 1
+#endif
+
 #endif
diff --git a/target/i386/tcg/tcg-cpu.c b/target/i386/tcg/tcg-cpu.c
index 6cf14c83ff..828244abe2 100644
--- a/target/i386/tcg/tcg-cpu.c
+++ b/target/i386/tcg/tcg-cpu.c
@@ -49,9 +49,11 @@ static void x86_cpu_exec_exit(CPUState *cs)
 static void x86_cpu_synchronize_from_tb(CPUState *cs,
 const TranslationBlock *tb)
 {
-X86CPU *cpu = X86_CPU(cs);
-
-cpu->env.eip = tb_pc(tb) - tb->cs_base;
+/* The instruction pointer is always up to date with TARGET_TB_PCREL. */
+if (!TARGET_TB_PCREL) {
+CPUX86State *env = cs->env_ptr;
+env->eip = tb_pc(tb) - tb->cs_base;
+}
 }
 
 #ifndef CONFIG_USER_ONLY
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 689a45256c..279a3ae999 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -78,6 +78,7 @@ typedef struct DisasContext {
 
 target_ulong pc;   /* pc = eip + cs_base */
 target_ulong cs_base;  /* base of CS segment */
+target_ulong pc_save;
 
 MemOp aflag;
 MemOp dflag;
@@ -480,9 +481,10 @@ static void gen_add_A0_im(DisasContext *s, int val)
 }
 }
 
-static inline void gen_op_jmp_v(TCGv dest)
+static inline void gen_op_jmp_v(DisasContext *s, TCGv dest)
 {
 tcg_gen_mov_tl(cpu_eip, dest);
+s->pc_save = -1;
 }
 
 static inline
@@ -519,12 +521,24 @@ static inline void gen_op_st_rm_T0_A0(DisasContext *s, 
int idx, int d)
 
 static void gen_update_eip_cur(DisasContext *s)
 {
-tcg_gen_movi_tl(cpu_eip, s->base.pc_next - s->cs_base);
+assert(s->pc_save != -1);
+if (TARGET_TB_PCREL) {
+tcg_gen_addi_tl(cpu_eip, cpu_eip, s->base.pc_next - s->pc_save);
+} else {
+tcg_gen_movi_tl(cpu_eip, s->base.pc_next - s->cs_base);
+}
+s->pc_save = s->base.pc_next;
 }
 
 static void gen_update_eip_next(DisasContext *s)
 {
-tcg_gen_movi_tl(cpu_eip, s->pc - s->cs_base);
+assert(s->pc_save != -1);
+if (TARGET_TB_PCREL) {
+tcg_gen_addi_tl(cpu_eip, cpu_eip, s->pc - s->pc_save);
+} else {
+tcg_gen_movi_tl(cpu_eip, s->pc - s->cs_base);
+}
+s->pc_save = s->pc;
 }
 
 static int cur_insn_len(DisasContext *s)
@@ -539,6 +553,7 @@ static TCGv_i32 cur_insn_len_i32(DisasContext *s)
 
 static TCGv_i32 eip_next_i32(DisasContext *s)
 {
+assert(s->pc_save != -1);
 /*
  * This function has two users: lcall_real (always 16-bit mode), and
  * iret_protected (16, 32, or 64-bit mode).  IRET only uses the value
@@ -550,17 +565,38 @@ static TCGv_i32 eip_next_i32(DisasContext *s)
 if (CODE64(s)) {
 return tcg_constant_i32(-1);
 }
-return tcg_constant_i32(s->pc - s->cs_base);
+if (TARGET_TB_PCREL) {
+TCGv_i32 ret = tcg_temp_new_i32();
+tcg_gen_trunc_tl_i32(ret, cpu_eip);
+tcg_gen_addi_i32(ret, ret, s->pc - s->pc_save);
+return ret;
+} else {
+return tcg_constant_i32(s->pc - s->cs_base);
+}
 }
 
 static TCGv eip_next_tl(DisasContext *s)
 {
-return tcg_constant_tl(s->pc - s->cs_base);
+assert(s->pc_save != -1);
+if (TARGET_TB_PCREL) {
+TCGv ret = tcg_temp_new();
+tcg_gen_addi_tl(ret, cpu_eip, s->pc - s->pc_save);
+return ret;
+} else {
+return tcg_constant_tl(s->pc - s->cs_base);
+}
 }
 
 static TCGv eip_cur_tl(DisasContext *s)
 {
-return tcg_constant_tl(s->base.pc_next - s->cs_base);
+assert(s->pc_save != -1);
+if (TARGET_TB_PCREL) {
+TCGv ret = tcg_temp_new();
+tcg_gen_addi_tl(ret, cpu_eip, s->base.pc_next - s->pc_save);
+return ret;
+} else {
+return tcg_constant_tl(s->base.pc_next - s->cs_base);
+}
 }
 
 /* Compute SEG:REG into A0.  SEG is selected from the override segment
@@ -2260,7 +2296,12 @@ static TCGv gen_lea_modrm_1(DisasContext *s, 
AddressParts a)
 ea = cpu_regs[a.base];
 }
 if (!ea) {
-tcg_gen_movi_tl(s->A0, a.disp);
+if (TARGET_TB_PCREL && a.base == -2) {
+/* With cpu_eip ~= pc_save, the expression is pc-relative. */
+tcg_gen_addi_tl(s->A0, cpu_eip, a.disp - s->pc_save);
+} else {
+tcg_gen_movi_tl(s->A0, a.disp);
+}
 ea = s->A0;
 } else if (a.disp != 0) {
 tcg_gen_addi_tl(s->A0, ea, a.disp);
@@ -2748,32 +2789,58 @@ static void gen_jr(DisasContext *s)
 /* Jump to eip+diff, truncating the result to OT. */
 static void gen_jmp_rel(DisasContext *s, MemOp ot, 

[PATCH v3 22/26] target/i386: Merge gen_jmp_tb and gen_goto_tb into gen_jmp_rel

2022-10-01 Thread Richard Henderson
These functions have only one caller, and the logic is more
obvious this way.

Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 50 +
 1 file changed, 17 insertions(+), 33 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 9294f12f66..2e7b94700b 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -224,7 +224,6 @@ STUB_HELPER(wrmsr, TCGv_env env)
 
 static void gen_eob(DisasContext *s);
 static void gen_jr(DisasContext *s);
-static void gen_jmp_tb(DisasContext *s, target_ulong eip, int tb_num);
 static void gen_jmp_rel(DisasContext *s, MemOp ot, int diff, int tb_num);
 static void gen_jmp_rel_csize(DisasContext *s, int diff, int tb_num);
 static void gen_op(DisasContext *s1, int op, MemOp ot, int d);
@@ -2393,23 +2392,6 @@ static inline int insn_const_size(MemOp ot)
 }
 }
 
-static void gen_goto_tb(DisasContext *s, int tb_num, target_ulong eip)
-{
-target_ulong pc = s->cs_base + eip;
-
-if (translator_use_goto_tb(>base, pc))  {
-/* jump to same page: we can use a direct jump */
-tcg_gen_goto_tb(tb_num);
-gen_jmp_im(s, eip);
-tcg_gen_exit_tb(s->base.tb, tb_num);
-s->base.is_jmp = DISAS_NORETURN;
-} else {
-/* jump to another page */
-gen_jmp_im(s, eip);
-gen_jr(s);
-}
-}
-
 static void gen_jcc(DisasContext *s, int b, int diff)
 {
 TCGLabel *l1 = gen_new_label();
@@ -2762,20 +2744,6 @@ static void gen_jr(DisasContext *s)
 do_gen_eob_worker(s, false, false, true);
 }
 
-/* generate a jump to eip. No segment change must happen before as a
-   direct call to the next block may occur */
-static void gen_jmp_tb(DisasContext *s, target_ulong eip, int tb_num)
-{
-gen_update_cc_op(s);
-set_cc_op(s, CC_OP_DYNAMIC);
-if (s->jmp_opt) {
-gen_goto_tb(s, tb_num, eip);
-} else {
-gen_jmp_im(s, eip);
-gen_eob(s);
-}
-}
-
 /* Jump to eip+diff, truncating the result to OT. */
 static void gen_jmp_rel(DisasContext *s, MemOp ot, int diff, int tb_num)
 {
@@ -2789,7 +2757,23 @@ static void gen_jmp_rel(DisasContext *s, MemOp ot, int 
diff, int tb_num)
 dest &= 0x;
 }
 }
-gen_jmp_tb(s, dest, tb_num);
+
+gen_update_cc_op(s);
+set_cc_op(s, CC_OP_DYNAMIC);
+if (!s->jmp_opt) {
+gen_jmp_im(s, dest);
+gen_eob(s);
+} else if (translator_use_goto_tb(>base, dest))  {
+/* jump to same page: we can use a direct jump */
+tcg_gen_goto_tb(tb_num);
+gen_jmp_im(s, dest);
+tcg_gen_exit_tb(s->base.tb, tb_num);
+s->base.is_jmp = DISAS_NORETURN;
+} else {
+/* jump to another page */
+gen_jmp_im(s, dest);
+gen_jr(s);
+}
 }
 
 /* Jump to eip+diff, truncating to the current code size. */
-- 
2.34.1




[PATCH v3 19/26] target/i386: Use gen_jmp_rel for gen_jcc

2022-10-01 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 57 -
 1 file changed, 18 insertions(+), 39 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 434a6ad6cd..5b84be4975 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2409,32 +2409,14 @@ static void gen_goto_tb(DisasContext *s, int tb_num, 
target_ulong eip)
 }
 }
 
-static inline void gen_jcc(DisasContext *s, int b,
-   target_ulong val, target_ulong next_eip)
+static void gen_jcc(DisasContext *s, int b, int diff)
 {
-TCGLabel *l1, *l2;
+TCGLabel *l1 = gen_new_label();
 
-if (s->jmp_opt) {
-l1 = gen_new_label();
-gen_jcc1(s, b, l1);
-
-gen_goto_tb(s, 0, next_eip);
-
-gen_set_label(l1);
-gen_goto_tb(s, 1, val);
-} else {
-l1 = gen_new_label();
-l2 = gen_new_label();
-gen_jcc1(s, b, l1);
-
-gen_jmp_im(s, next_eip);
-tcg_gen_br(l2);
-
-gen_set_label(l1);
-gen_jmp_im(s, val);
-gen_set_label(l2);
-gen_eob(s);
-}
+gen_jcc1(s, b, l1);
+gen_jmp_rel_csize(s, 0, 1);
+gen_set_label(l1);
+gen_jmp_rel(s, s->dflag, diff, 0);
 }
 
 static void gen_cmovcc1(CPUX86State *env, DisasContext *s, MemOp ot, int b,
@@ -4780,7 +4762,6 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 int shift;
 MemOp ot, aflag, dflag;
 int modrm, reg, rm, mod, op, opreg, val;
-target_ulong next_eip, tval;
 bool orig_cc_op_dirty = s->cc_op_dirty;
 CCOp orig_cc_op = s->cc_op;
 
@@ -6933,22 +6914,20 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 break;
 case 0x70 ... 0x7f: /* jcc Jb */
-tval = (int8_t)insn_get(env, s, MO_8);
-goto do_jcc;
+{
+int diff = (int8_t)insn_get(env, s, MO_8);
+gen_bnd_jmp(s);
+gen_jcc(s, b, diff);
+}
+break;
 case 0x180 ... 0x18f: /* jcc Jv */
-if (dflag != MO_16) {
-tval = (int32_t)insn_get(env, s, MO_32);
-} else {
-tval = (int16_t)insn_get(env, s, MO_16);
+{
+int diff = (dflag != MO_16
+? (int32_t)insn_get(env, s, MO_32)
+: (int16_t)insn_get(env, s, MO_16));
+gen_bnd_jmp(s);
+gen_jcc(s, b, diff);
 }
-do_jcc:
-next_eip = s->pc - s->cs_base;
-tval += next_eip;
-if (dflag == MO_16) {
-tval &= 0x;
-}
-gen_bnd_jmp(s);
-gen_jcc(s, b, tval, next_eip);
 break;
 
 case 0x190 ... 0x19f: /* setcc Gv */
-- 
2.34.1




[PATCH v3 16/26] target/i386: Use DISAS_TOO_MANY to exit after gen_io_start

2022-10-01 Thread Richard Henderson
We can set is_jmp early, using only one if, and let that
be overwritten by gen_rep*'s calls to gen_jmp_tb.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 42 +
 1 file changed, 10 insertions(+), 32 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index be29ea7a03..11aaba8a65 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -5660,14 +5660,12 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 gen_io_start();
+s->base.is_jmp = DISAS_TOO_MANY;
 }
 gen_helper_rdrand(s->T0, cpu_env);
 rm = (modrm & 7) | REX_B(s);
 gen_op_mov_reg_v(s, dflag, rm, s->T0);
 set_cc_op(s, CC_OP_EFLAGS);
-if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
-gen_jmp(s, s->pc - s->cs_base);
-}
 break;
 
 default:
@@ -6704,15 +6702,12 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 gen_io_start();
+s->base.is_jmp = DISAS_TOO_MANY;
 }
 if (prefixes & (PREFIX_REPZ | PREFIX_REPNZ)) {
 gen_repz_ins(s, ot);
-/* jump generated by gen_repz_ins */
 } else {
 gen_ins(s, ot);
-if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
-gen_jmp(s, s->pc - s->cs_base);
-}
 }
 break;
 case 0x6e: /* outsS */
@@ -6725,15 +6720,12 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 gen_io_start();
+s->base.is_jmp = DISAS_TOO_MANY;
 }
 if (prefixes & (PREFIX_REPZ | PREFIX_REPNZ)) {
 gen_repz_outs(s, ot);
-/* jump generated by gen_repz_outs */
 } else {
 gen_outs(s, ot);
-if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
-gen_jmp(s, s->pc - s->cs_base);
-}
 }
 break;
 
@@ -6750,13 +6742,11 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 gen_io_start();
+s->base.is_jmp = DISAS_TOO_MANY;
 }
 gen_helper_in_func(ot, s->T1, s->tmp2_i32);
 gen_op_mov_reg_v(s, ot, R_EAX, s->T1);
 gen_bpt_io(s, s->tmp2_i32, ot);
-if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
-gen_jmp(s, s->pc - s->cs_base);
-}
 break;
 case 0xe6:
 case 0xe7:
@@ -6768,14 +6758,12 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 gen_io_start();
+s->base.is_jmp = DISAS_TOO_MANY;
 }
 gen_op_mov_v_reg(s, ot, s->T1, R_EAX);
 tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1);
 gen_helper_out_func(ot, s->tmp2_i32, s->tmp3_i32);
 gen_bpt_io(s, s->tmp2_i32, ot);
-if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
-gen_jmp(s, s->pc - s->cs_base);
-}
 break;
 case 0xec:
 case 0xed:
@@ -6787,13 +6775,11 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 gen_io_start();
+s->base.is_jmp = DISAS_TOO_MANY;
 }
 gen_helper_in_func(ot, s->T1, s->tmp2_i32);
 gen_op_mov_reg_v(s, ot, R_EAX, s->T1);
 gen_bpt_io(s, s->tmp2_i32, ot);
-if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
-gen_jmp(s, s->pc - s->cs_base);
-}
 break;
 case 0xee:
 case 0xef:
@@ -6805,14 +6791,12 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 gen_io_start();
+s->base.is_jmp = DISAS_TOO_MANY;
 }
 gen_op_mov_v_reg(s, ot, s->T1, R_EAX);
 tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1);
 gen_helper_out_func(ot, s->tmp2_i32, s->tmp3_i32);
 gen_bpt_io(s, s->tmp2_i32, ot);
-if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
-gen_jmp(s, s->pc - s->cs_base);
-}
 break;
 
 //
@@ -7478,11 +7462,9 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_update_eip_cur(s);
 if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 gen_io_start();
+s->base.is_jmp = DISAS_TOO_MANY;
 }
 gen_helper_rdtsc(cpu_env);
-if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
-gen_jmp(s, s->pc - s->cs_base);
-}
 break;
 case 0x133: /* rdpmc */
 gen_update_cc_op(s);
@@ -7939,11 +7921,9 @@ static bool 

[PATCH v3 24/26] target/i386: Add cpu_eip

2022-10-01 Thread Richard Henderson
Create a tcg global temp for this, and use it instead of explicit stores.

Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 5b0dab8633..f08fa060c4 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -64,6 +64,7 @@
 
 /* global register indexes */
 static TCGv cpu_cc_dst, cpu_cc_src, cpu_cc_src2;
+static TCGv cpu_eip;
 static TCGv_i32 cpu_cc_op;
 static TCGv cpu_regs[CPU_NB_REGS];
 static TCGv cpu_seg_base[6];
@@ -481,7 +482,7 @@ static void gen_add_A0_im(DisasContext *s, int val)
 
 static inline void gen_op_jmp_v(TCGv dest)
 {
-tcg_gen_st_tl(dest, cpu_env, offsetof(CPUX86State, eip));
+tcg_gen_mov_tl(cpu_eip, dest);
 }
 
 static inline
@@ -518,7 +519,7 @@ static inline void gen_op_st_rm_T0_A0(DisasContext *s, int 
idx, int d)
 
 static void gen_jmp_im(DisasContext *s, target_ulong pc)
 {
-gen_op_jmp_v(tcg_constant_tl(pc));
+tcg_gen_movi_tl(cpu_eip, pc);
 }
 
 static void gen_update_eip_cur(DisasContext *s)
@@ -8614,6 +8615,13 @@ void tcg_x86_init(void)
 [R_EDI] = "edi",
 [R_EBP] = "ebp",
 [R_ESP] = "esp",
+#endif
+};
+static const char eip_name[] = {
+#ifdef TARGET_X86_64
+"rip"
+#else
+"eip"
 #endif
 };
 static const char seg_base_names[6][8] = {
@@ -8640,6 +8648,7 @@ void tcg_x86_init(void)
 "cc_src");
 cpu_cc_src2 = tcg_global_mem_new(cpu_env, offsetof(CPUX86State, cc_src2),
  "cc_src2");
+cpu_eip = tcg_global_mem_new(cpu_env, offsetof(CPUX86State, eip), 
eip_name);
 
 for (i = 0; i < CPU_NB_REGS; ++i) {
 cpu_regs[i] = tcg_global_mem_new(cpu_env,
-- 
2.34.1




[PATCH v3 15/26] target/i386: Create eip_next_*

2022-10-01 Thread Richard Henderson
Create helpers for loading the address of the next insn.
Use tcg_constant_* in adjacent code where convenient.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 55 +++--
 1 file changed, 34 insertions(+), 21 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 1aa5b37ea6..be29ea7a03 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -541,6 +541,27 @@ static TCGv_i32 cur_insn_len_i32(DisasContext *s)
 return tcg_constant_i32(cur_insn_len(s));
 }
 
+static TCGv_i32 eip_next_i32(DisasContext *s)
+{
+/*
+ * This function has two users: lcall_real (always 16-bit mode), and
+ * iret_protected (16, 32, or 64-bit mode).  IRET only uses the value
+ * when EFLAGS.NT is set, which is illegal in 64-bit mode, which is
+ * why passing a 32-bit value isn't broken.  To avoid using this where
+ * we shouldn't, return -1 in 64-bit mode so that execution goes into
+ * the weeds quickly.
+ */
+if (CODE64(s)) {
+return tcg_constant_i32(-1);
+}
+return tcg_constant_i32(s->pc - s->cs_base);
+}
+
+static TCGv eip_next_tl(DisasContext *s)
+{
+return tcg_constant_tl(s->pc - s->cs_base);
+}
+
 /* Compute SEG:REG into A0.  SEG is selected from the override segment
(OVR_SEG) and the default segment (DEF_SEG).  OVR_SEG may be -1 to
indicate no override.  */
@@ -1213,12 +1234,9 @@ static void gen_bpt_io(DisasContext *s, TCGv_i32 t_port, 
int ot)
 /* user-mode cpu should not be in IOBPT mode */
 g_assert_not_reached();
 #else
-TCGv_i32 t_size = tcg_const_i32(1 << ot);
-TCGv t_next = tcg_const_tl(s->pc - s->cs_base);
-
+TCGv_i32 t_size = tcg_constant_i32(1 << ot);
+TCGv t_next = eip_next_tl(s);
 gen_helper_bpt_io(cpu_env, t_port, t_size, t_next);
-tcg_temp_free_i32(t_size);
-tcg_temp_free(t_next);
 #endif /* CONFIG_USER_ONLY */
 }
 }
@@ -5324,9 +5342,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 if (dflag == MO_16) {
 tcg_gen_ext16u_tl(s->T0, s->T0);
 }
-next_eip = s->pc - s->cs_base;
-tcg_gen_movi_tl(s->T1, next_eip);
-gen_push_v(s, s->T1);
+gen_push_v(s, eip_next_tl(s));
 gen_op_jmp_v(s->T0);
 gen_bnd_jmp(s);
 s->base.is_jmp = DISAS_JUMP;
@@ -5342,14 +5358,14 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 if (PE(s) && !VM86(s)) {
 tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
 gen_helper_lcall_protected(cpu_env, s->tmp2_i32, s->T1,
-   tcg_const_i32(dflag - 1),
-   tcg_const_tl(s->pc - s->cs_base));
+   tcg_constant_i32(dflag - 1),
+   eip_next_tl(s));
 } else {
 tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
 tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1);
 gen_helper_lcall_real(cpu_env, s->tmp2_i32, s->tmp3_i32,
-  tcg_const_i32(dflag - 1),
-  tcg_const_i32(s->pc - s->cs_base));
+  tcg_constant_i32(dflag - 1),
+  eip_next_i32(s));
 }
 s->base.is_jmp = DISAS_JUMP;
 break;
@@ -5372,7 +5388,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 if (PE(s) && !VM86(s)) {
 tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
 gen_helper_ljmp_protected(cpu_env, s->tmp2_i32, s->T1,
-  tcg_const_tl(s->pc - s->cs_base));
+  eip_next_tl(s));
 } else {
 gen_op_movl_seg_T0_vm(s, R_CS);
 gen_op_jmp_v(s->T1);
@@ -6854,8 +6870,8 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 gen_helper_iret_real(cpu_env, tcg_const_i32(dflag - 1));
 } else {
-gen_helper_iret_protected(cpu_env, tcg_const_i32(dflag - 1),
-  tcg_const_i32(s->pc - s->cs_base));
+gen_helper_iret_protected(cpu_env, tcg_constant_i32(dflag - 1),
+  eip_next_i32(s));
 }
 set_cc_op(s, CC_OP_EFLAGS);
 s->base.is_jmp = DISAS_EOB_ONLY;
@@ -6867,15 +6883,13 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 } else {
 tval = (int16_t)insn_get(env, s, MO_16);
 }
-next_eip = s->pc - s->cs_base;
-tval += next_eip;
+tval += s->pc - s->cs_base;
 if (dflag == MO_16) {
 tval &= 0x;
 } else if 

[PATCH v3 20/26] target/i386: Use gen_jmp_rel for DISAS_TOO_MANY

2022-10-01 Thread Richard Henderson
With gen_jmp_rel, we may chain between two translation blocks
which may only be separated because of TB size limits.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 5b84be4975..cf23ae6e5e 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -8798,6 +8798,9 @@ static void i386_tr_tb_stop(DisasContextBase *dcbase, 
CPUState *cpu)
 case DISAS_NORETURN:
 break;
 case DISAS_TOO_MANY:
+gen_update_cc_op(dc);
+gen_jmp_rel_csize(dc, 0, 0);
+break;
 case DISAS_EOB_NEXT:
 gen_update_cc_op(dc);
 gen_update_eip_cur(dc);
-- 
2.34.1




[PATCH v3 14/26] target/i386: Truncate values for lcall_real to i32

2022-10-01 Thread Richard Henderson
Use i32 not int or tl for eip and cs arguments.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Richard Henderson 
---
 target/i386/helper.h | 2 +-
 target/i386/tcg/seg_helper.c | 6 ++
 target/i386/tcg/translate.c  | 3 ++-
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/target/i386/helper.h b/target/i386/helper.h
index ac3b4d1ee3..39a3c24182 100644
--- a/target/i386/helper.h
+++ b/target/i386/helper.h
@@ -37,7 +37,7 @@ DEF_HELPER_2(lldt, void, env, int)
 DEF_HELPER_2(ltr, void, env, int)
 DEF_HELPER_3(load_seg, void, env, int, int)
 DEF_HELPER_4(ljmp_protected, void, env, int, tl, tl)
-DEF_HELPER_5(lcall_real, void, env, int, tl, int, int)
+DEF_HELPER_5(lcall_real, void, env, i32, i32, int, i32)
 DEF_HELPER_5(lcall_protected, void, env, int, tl, int, tl)
 DEF_HELPER_2(iret_real, void, env, int)
 DEF_HELPER_3(iret_protected, void, env, int, int)
diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c
index bffd82923f..539189b4d1 100644
--- a/target/i386/tcg/seg_helper.c
+++ b/target/i386/tcg/seg_helper.c
@@ -1504,14 +1504,12 @@ void helper_ljmp_protected(CPUX86State *env, int 
new_cs, target_ulong new_eip,
 }
 
 /* real mode call */
-void helper_lcall_real(CPUX86State *env, int new_cs, target_ulong new_eip1,
-   int shift, int next_eip)
+void helper_lcall_real(CPUX86State *env, uint32_t new_cs, uint32_t new_eip,
+   int shift, uint32_t next_eip)
 {
-int new_eip;
 uint32_t esp, esp_mask;
 target_ulong ssp;
 
-new_eip = new_eip1;
 esp = env->regs[R_ESP];
 esp_mask = get_sp_mask(env->segs[R_SS].flags);
 ssp = env->segs[R_SS].base;
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 7db6f617a1..1aa5b37ea6 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -5346,7 +5346,8 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
tcg_const_tl(s->pc - s->cs_base));
 } else {
 tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
-gen_helper_lcall_real(cpu_env, s->tmp2_i32, s->T1,
+tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1);
+gen_helper_lcall_real(cpu_env, s->tmp2_i32, s->tmp3_i32,
   tcg_const_i32(dflag - 1),
   tcg_const_i32(s->pc - s->cs_base));
 }
-- 
2.34.1




[PATCH v3 18/26] target/i386: Use gen_jmp_rel for loop, repz, jecxz insns

2022-10-01 Thread Richard Henderson
With gen_jmp_rel, we may chain to the next tb instead of merely
writing to eip and exiting.  For repz, subtract cur_insn_len to
restart the current insn.

Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 36 +++-
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index ba1bd7c707..434a6ad6cd 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -224,9 +224,9 @@ STUB_HELPER(wrmsr, TCGv_env env)
 
 static void gen_eob(DisasContext *s);
 static void gen_jr(DisasContext *s);
-static void gen_jmp(DisasContext *s, target_ulong eip);
 static void gen_jmp_tb(DisasContext *s, target_ulong eip, int tb_num);
 static void gen_jmp_rel(DisasContext *s, MemOp ot, int diff, int tb_num);
+static void gen_jmp_rel_csize(DisasContext *s, int diff, int tb_num);
 static void gen_op(DisasContext *s1, int op, MemOp ot, int d);
 static void gen_exception_gpf(DisasContext *s);
 
@@ -1185,7 +1185,7 @@ static TCGLabel *gen_jz_ecx_string(DisasContext *s)
 TCGLabel *l2 = gen_new_label();
 gen_op_jnz_ecx(s, s->aflag, l1);
 gen_set_label(l2);
-gen_jmp_tb(s, s->pc - s->cs_base, 1);
+gen_jmp_rel_csize(s, 0, 1);
 gen_set_label(l1);
 return l2;
 }
@@ -1288,7 +1288,7 @@ static void gen_repz(DisasContext *s, MemOp ot,
 if (s->repz_opt) {
 gen_op_jz_ecx(s, s->aflag, l2);
 }
-gen_jmp(s, s->base.pc_next - s->cs_base);
+gen_jmp_rel_csize(s, -cur_insn_len(s), 0);
 }
 
 #define GEN_REPZ(op) \
@@ -1308,7 +1308,7 @@ static void gen_repz2(DisasContext *s, MemOp ot, int nz,
 if (s->repz_opt) {
 gen_op_jz_ecx(s, s->aflag, l2);
 }
-gen_jmp(s, s->base.pc_next - s->cs_base);
+gen_jmp_rel_csize(s, -cur_insn_len(s), 0);
 }
 
 #define GEN_REPZ2(op) \
@@ -2793,6 +2793,7 @@ static void gen_jmp_tb(DisasContext *s, target_ulong eip, 
int tb_num)
 }
 }
 
+/* Jump to eip+diff, truncating the result to OT. */
 static void gen_jmp_rel(DisasContext *s, MemOp ot, int diff, int tb_num)
 {
 target_ulong dest = s->pc - s->cs_base + diff;
@@ -2808,9 +2809,11 @@ static void gen_jmp_rel(DisasContext *s, MemOp ot, int 
diff, int tb_num)
 gen_jmp_tb(s, dest, tb_num);
 }
 
-static void gen_jmp(DisasContext *s, target_ulong eip)
+/* Jump to eip+diff, truncating to the current code size. */
+static void gen_jmp_rel_csize(DisasContext *s, int diff, int tb_num)
 {
-gen_jmp_tb(s, eip, 0);
+/* CODE64 ignores the OT argument, so we need not consider it. */
+gen_jmp_rel(s, CODE32(s) ? MO_32 : MO_16, diff, tb_num);
 }
 
 static inline void gen_ldq_env_A0(DisasContext *s, int offset)
@@ -7404,24 +7407,18 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 case 0xe2: /* loop */
 case 0xe3: /* jecxz */
 {
-TCGLabel *l1, *l2, *l3;
-
-tval = (int8_t)insn_get(env, s, MO_8);
-tval += s->pc - s->cs_base;
-if (dflag == MO_16) {
-tval &= 0x;
-}
+TCGLabel *l1, *l2;
+int diff = (int8_t)insn_get(env, s, MO_8);
 
 l1 = gen_new_label();
 l2 = gen_new_label();
-l3 = gen_new_label();
 gen_update_cc_op(s);
 b &= 3;
 switch(b) {
 case 0: /* loopnz */
 case 1: /* loopz */
 gen_op_add_reg_im(s, s->aflag, R_ECX, -1);
-gen_op_jz_ecx(s, s->aflag, l3);
+gen_op_jz_ecx(s, s->aflag, l2);
 gen_jcc1(s, (JCC_Z << 1) | (b ^ 1), l1);
 break;
 case 2: /* loop */
@@ -7434,14 +7431,11 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 break;
 }
 
-gen_set_label(l3);
-gen_update_eip_next(s);
-tcg_gen_br(l2);
+gen_set_label(l2);
+gen_jmp_rel_csize(s, 0, 1);
 
 gen_set_label(l1);
-gen_jmp_im(s, tval);
-gen_set_label(l2);
-s->base.is_jmp = DISAS_EOB_ONLY;
+gen_jmp_rel(s, dflag, diff, 0);
 }
 break;
 case 0x130: /* wrmsr */
-- 
2.34.1




[PATCH v3 05/26] target/i386: Create gen_update_eip_cur

2022-10-01 Thread Richard Henderson
Like gen_update_cc_op, sync EIP before doing something
that could raise an exception.  Replace all gen_jmp_im
that use s->base.pc_next.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 52 -
 1 file changed, 28 insertions(+), 24 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 5a9c3b1e71..85253e1e17 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -511,10 +511,14 @@ static inline void gen_op_st_rm_T0_A0(DisasContext *s, 
int idx, int d)
 }
 }
 
-static inline void gen_jmp_im(DisasContext *s, target_ulong pc)
+static void gen_jmp_im(DisasContext *s, target_ulong pc)
 {
-tcg_gen_movi_tl(s->tmp0, pc);
-gen_op_jmp_v(s->tmp0);
+gen_op_jmp_v(tcg_constant_tl(pc));
+}
+
+static void gen_update_eip_cur(DisasContext *s)
+{
+gen_jmp_im(s, s->base.pc_next - s->cs_base);
 }
 
 /* Compute SEG:REG into A0.  SEG is selected from the override segment
@@ -703,7 +707,7 @@ static bool gen_check_io(DisasContext *s, MemOp ot, 
TCGv_i32 port,
 target_ulong next_eip = s->pc - s->cs_base;
 
 gen_update_cc_op(s);
-gen_jmp_im(s, cur_eip);
+gen_update_eip_cur(s);
 if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
 svm_flags |= SVM_IOIO_REP_MASK;
 }
@@ -1335,7 +1339,7 @@ static void gen_helper_fp_arith_STN_ST0(int op, int opreg)
 static void gen_exception(DisasContext *s, int trapno)
 {
 gen_update_cc_op(s);
-gen_jmp_im(s, s->base.pc_next - s->cs_base);
+gen_update_eip_cur(s);
 gen_helper_raise_exception(cpu_env, tcg_const_i32(trapno));
 s->base.is_jmp = DISAS_NORETURN;
 }
@@ -2630,7 +2634,7 @@ static void gen_unknown_opcode(CPUX86State *env, 
DisasContext *s)
 static void gen_interrupt(DisasContext *s, int intno)
 {
 gen_update_cc_op(s);
-gen_jmp_im(s, s->base.pc_next - s->cs_base);
+gen_update_eip_cur(s);
 gen_helper_raise_interrupt(cpu_env, tcg_constant_i32(intno),
tcg_constant_i32(s->pc - s->base.pc_next));
 s->base.is_jmp = DISAS_NORETURN;
@@ -6831,7 +6835,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 do_lret:
 if (PE(s) && !VM86(s)) {
 gen_update_cc_op(s);
-gen_jmp_im(s, s->base.pc_next - s->cs_base);
+gen_update_eip_cur(s);
 gen_helper_lret_protected(cpu_env, tcg_const_i32(dflag - 1),
   tcg_const_i32(val));
 } else {
@@ -7327,7 +7331,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 if (prefixes & PREFIX_REPZ) {
 gen_update_cc_op(s);
-gen_jmp_im(s, s->base.pc_next - s->cs_base);
+gen_update_eip_cur(s);
 gen_helper_pause(cpu_env, tcg_const_i32(s->pc - s->base.pc_next));
 s->base.is_jmp = DISAS_NORETURN;
 }
@@ -7353,7 +7357,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 if (CODE64(s))
 goto illegal_op;
 gen_update_cc_op(s);
-gen_jmp_im(s, s->base.pc_next - s->cs_base);
+gen_update_eip_cur(s);
 gen_helper_into(cpu_env, tcg_const_i32(s->pc - s->base.pc_next));
 break;
 #ifdef WANT_ICEBP
@@ -7460,7 +7464,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 case 0x132: /* rdmsr */
 if (check_cpl0(s)) {
 gen_update_cc_op(s);
-gen_jmp_im(s, s->base.pc_next - s->cs_base);
+gen_update_eip_cur(s);
 if (b & 2) {
 gen_helper_rdmsr(cpu_env);
 } else {
@@ -7472,7 +7476,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 break;
 case 0x131: /* rdtsc */
 gen_update_cc_op(s);
-gen_jmp_im(s, s->base.pc_next - s->cs_base);
+gen_update_eip_cur(s);
 if (tb_cflags(s->base.tb) & CF_USE_ICOUNT) {
 gen_io_start();
 }
@@ -7483,7 +7487,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 break;
 case 0x133: /* rdpmc */
 gen_update_cc_op(s);
-gen_jmp_im(s, s->base.pc_next - s->cs_base);
+gen_update_eip_cur(s);
 gen_helper_rdpmc(cpu_env);
 s->base.is_jmp = DISAS_NORETURN;
 break;
@@ -7513,7 +7517,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 case 0x105: /* syscall */
 /* XXX: is it usable in real mode ? */
 gen_update_cc_op(s);
-gen_jmp_im(s, s->base.pc_next - s->cs_base);
+gen_update_eip_cur(s);
 gen_helper_syscall(cpu_env, tcg_const_i32(s->pc - s->base.pc_next));
 /* TF handling for the syscall insn is different. The TF bit is  
checked
after the syscall insn completes. This allows #DB to not be
@@ -7539,13 +7543,13 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 #endif
 case 0x1a2: /* cpuid */
 gen_update_cc_op(s);
-

[PATCH v3 21/26] target/i386: Remove MemOp argument to gen_op_j*_ecx

2022-10-01 Thread Richard Henderson
These functions are always passed aflag, so we might as well
read it from DisasContext directly.  While we're at it, use
a common subroutine for these two functions.

Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index cf23ae6e5e..9294f12f66 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -676,20 +676,21 @@ static void gen_exts(MemOp ot, TCGv reg)
 gen_ext_tl(reg, reg, ot, true);
 }
 
-static inline
-void gen_op_jnz_ecx(DisasContext *s, MemOp size, TCGLabel *label1)
+static void gen_op_j_ecx(DisasContext *s, TCGCond cond, TCGLabel *label1)
 {
 tcg_gen_mov_tl(s->tmp0, cpu_regs[R_ECX]);
-gen_extu(size, s->tmp0);
-tcg_gen_brcondi_tl(TCG_COND_NE, s->tmp0, 0, label1);
+gen_extu(s->aflag, s->tmp0);
+tcg_gen_brcondi_tl(cond, s->tmp0, 0, label1);
 }
 
-static inline
-void gen_op_jz_ecx(DisasContext *s, MemOp size, TCGLabel *label1)
+static inline void gen_op_jz_ecx(DisasContext *s, TCGLabel *label1)
 {
-tcg_gen_mov_tl(s->tmp0, cpu_regs[R_ECX]);
-gen_extu(size, s->tmp0);
-tcg_gen_brcondi_tl(TCG_COND_EQ, s->tmp0, 0, label1);
+gen_op_j_ecx(s, TCG_COND_EQ, label1);
+}
+
+static inline void gen_op_jnz_ecx(DisasContext *s, TCGLabel *label1)
+{
+gen_op_j_ecx(s, TCG_COND_NE, label1);
 }
 
 static void gen_helper_in_func(MemOp ot, TCGv v, TCGv_i32 n)
@@ -1183,7 +1184,7 @@ static TCGLabel *gen_jz_ecx_string(DisasContext *s)
 {
 TCGLabel *l1 = gen_new_label();
 TCGLabel *l2 = gen_new_label();
-gen_op_jnz_ecx(s, s->aflag, l1);
+gen_op_jnz_ecx(s, l1);
 gen_set_label(l2);
 gen_jmp_rel_csize(s, 0, 1);
 gen_set_label(l1);
@@ -1286,7 +1287,7 @@ static void gen_repz(DisasContext *s, MemOp ot,
  * before rep string_insn
  */
 if (s->repz_opt) {
-gen_op_jz_ecx(s, s->aflag, l2);
+gen_op_jz_ecx(s, l2);
 }
 gen_jmp_rel_csize(s, -cur_insn_len(s), 0);
 }
@@ -1306,7 +1307,7 @@ static void gen_repz2(DisasContext *s, MemOp ot, int nz,
 gen_update_cc_op(s);
 gen_jcc1(s, (JCC_Z << 1) | (nz ^ 1), l2);
 if (s->repz_opt) {
-gen_op_jz_ecx(s, s->aflag, l2);
+gen_op_jz_ecx(s, l2);
 }
 gen_jmp_rel_csize(s, -cur_insn_len(s), 0);
 }
@@ -7397,16 +7398,16 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 case 0: /* loopnz */
 case 1: /* loopz */
 gen_op_add_reg_im(s, s->aflag, R_ECX, -1);
-gen_op_jz_ecx(s, s->aflag, l2);
+gen_op_jz_ecx(s, l2);
 gen_jcc1(s, (JCC_Z << 1) | (b ^ 1), l1);
 break;
 case 2: /* loop */
 gen_op_add_reg_im(s, s->aflag, R_ECX, -1);
-gen_op_jnz_ecx(s, s->aflag, l1);
+gen_op_jnz_ecx(s, l1);
 break;
 default:
 case 3: /* jcxz */
-gen_op_jz_ecx(s, s->aflag, l1);
+gen_op_jz_ecx(s, l1);
 break;
 }
 
-- 
2.34.1




[PATCH v3 12/26] target/i386: Remove cur_eip, next_eip arguments to gen_repz*

2022-10-01 Thread Richard Henderson
All callers pass s->base.pc_next and s->pc, which we can just
as well compute within the functions.  Pull out common helpers
and reduce the amount of code under macros.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 116 ++--
 1 file changed, 57 insertions(+), 59 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index fe99c4361c..c8ef9f0356 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -736,7 +736,7 @@ static bool gen_check_io(DisasContext *s, MemOp ot, 
TCGv_i32 port,
 #endif
 }
 
-static inline void gen_movs(DisasContext *s, MemOp ot)
+static void gen_movs(DisasContext *s, MemOp ot)
 {
 gen_string_movl_A0_ESI(s);
 gen_op_ld_v(s, ot, s->T0, s->A0);
@@ -1156,18 +1156,18 @@ static inline void gen_jcc1(DisasContext *s, int b, 
TCGLabel *l1)
 
 /* XXX: does not work with gdbstub "ice" single step - not a
serious problem */
-static TCGLabel *gen_jz_ecx_string(DisasContext *s, target_ulong next_eip)
+static TCGLabel *gen_jz_ecx_string(DisasContext *s)
 {
 TCGLabel *l1 = gen_new_label();
 TCGLabel *l2 = gen_new_label();
 gen_op_jnz_ecx(s, s->aflag, l1);
 gen_set_label(l2);
-gen_jmp_tb(s, next_eip, 1);
+gen_jmp_tb(s, s->pc - s->cs_base, 1);
 gen_set_label(l1);
 return l2;
 }
 
-static inline void gen_stos(DisasContext *s, MemOp ot)
+static void gen_stos(DisasContext *s, MemOp ot)
 {
 gen_op_mov_v_reg(s, MO_32, s->T0, R_EAX);
 gen_string_movl_A0_EDI(s);
@@ -1176,7 +1176,7 @@ static inline void gen_stos(DisasContext *s, MemOp ot)
 gen_op_add_reg_T0(s, s->aflag, R_EDI);
 }
 
-static inline void gen_lods(DisasContext *s, MemOp ot)
+static void gen_lods(DisasContext *s, MemOp ot)
 {
 gen_string_movl_A0_ESI(s);
 gen_op_ld_v(s, ot, s->T0, s->A0);
@@ -1185,7 +1185,7 @@ static inline void gen_lods(DisasContext *s, MemOp ot)
 gen_op_add_reg_T0(s, s->aflag, R_ESI);
 }
 
-static inline void gen_scas(DisasContext *s, MemOp ot)
+static void gen_scas(DisasContext *s, MemOp ot)
 {
 gen_string_movl_A0_EDI(s);
 gen_op_ld_v(s, ot, s->T1, s->A0);
@@ -1194,7 +1194,7 @@ static inline void gen_scas(DisasContext *s, MemOp ot)
 gen_op_add_reg_T0(s, s->aflag, R_EDI);
 }
 
-static inline void gen_cmps(DisasContext *s, MemOp ot)
+static void gen_cmps(DisasContext *s, MemOp ot)
 {
 gen_string_movl_A0_EDI(s);
 gen_op_ld_v(s, ot, s->T1, s->A0);
@@ -1222,7 +1222,7 @@ static void gen_bpt_io(DisasContext *s, TCGv_i32 t_port, 
int ot)
 }
 }
 
-static inline void gen_ins(DisasContext *s, MemOp ot)
+static void gen_ins(DisasContext *s, MemOp ot)
 {
 gen_string_movl_A0_EDI(s);
 /* Note: we must do this dummy write first to be restartable in
@@ -1238,7 +1238,7 @@ static inline void gen_ins(DisasContext *s, MemOp ot)
 gen_bpt_io(s, s->tmp2_i32, ot);
 }
 
-static inline void gen_outs(DisasContext *s, MemOp ot)
+static void gen_outs(DisasContext *s, MemOp ot)
 {
 gen_string_movl_A0_ESI(s);
 gen_op_ld_v(s, ot, s->T0, s->A0);
@@ -1252,42 +1252,49 @@ static inline void gen_outs(DisasContext *s, MemOp ot)
 gen_bpt_io(s, s->tmp2_i32, ot);
 }
 
-/* same method as Valgrind : we generate jumps to current or next
-   instruction */
-#define GEN_REPZ(op)  \
-static inline void gen_repz_ ## op(DisasContext *s, MemOp ot,  \
- target_ulong cur_eip, target_ulong next_eip) \
-{ \
-TCGLabel *l2; \
-gen_update_cc_op(s);  \
-l2 = gen_jz_ecx_string(s, next_eip);  \
-gen_ ## op(s, ot);\
-gen_op_add_reg_im(s, s->aflag, R_ECX, -1);\
-/* a loop would cause two single step exceptions if ECX = 1   \
-   before rep string_insn */  \
-if (s->repz_opt)  \
-gen_op_jz_ecx(s, s->aflag, l2);   \
-gen_jmp(s, cur_eip);  \
+/* Generate jumps to current or next instruction */
+static void gen_repz(DisasContext *s, MemOp ot,
+ void (*fn)(DisasContext *s, MemOp ot))
+{
+TCGLabel *l2;
+gen_update_cc_op(s);
+l2 = gen_jz_ecx_string(s);
+fn(s, ot);
+gen_op_add_reg_im(s, s->aflag, R_ECX, -1);
+/*
+ * A loop would cause two single step exceptions if ECX = 1
+ * before rep string_insn
+ */
+if (s->repz_opt) {
+gen_op_jz_ecx(s, s->aflag, l2);
+}
+gen_jmp(s, s->base.pc_next - s->cs_base);
 }
 
-#define GEN_REPZ2(op)   

[PATCH v3 03/26] target/i386: Remove cur_eip argument to gen_exception

2022-10-01 Thread Richard Henderson
All callers pass s->base.pc_next - s->cs_base, which we can just
as well compute within the function.  Note the special case of
EXCP_VSYSCALL in which s->cs_base wasn't subtracted, but cs_base
is always zero in 64-bit mode, when vsyscall is used.

Reviewed-by: Paolo Bonzini 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 26 +-
 1 file changed, 13 insertions(+), 13 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 3f3e79c096..617832fcb0 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -1332,10 +1332,10 @@ static void gen_helper_fp_arith_STN_ST0(int op, int 
opreg)
 }
 }
 
-static void gen_exception(DisasContext *s, int trapno, target_ulong cur_eip)
+static void gen_exception(DisasContext *s, int trapno)
 {
 gen_update_cc_op(s);
-gen_jmp_im(s, cur_eip);
+gen_jmp_im(s, s->base.pc_next - s->cs_base);
 gen_helper_raise_exception(cpu_env, tcg_const_i32(trapno));
 s->base.is_jmp = DISAS_NORETURN;
 }
@@ -1344,13 +1344,13 @@ static void gen_exception(DisasContext *s, int trapno, 
target_ulong cur_eip)
the instruction is known, but it isn't allowed in the current cpu mode.  */
 static void gen_illegal_opcode(DisasContext *s)
 {
-gen_exception(s, EXCP06_ILLOP, s->base.pc_next - s->cs_base);
+gen_exception(s, EXCP06_ILLOP);
 }
 
 /* Generate #GP for the current instruction. */
 static void gen_exception_gpf(DisasContext *s)
 {
-gen_exception(s, EXCP0D_GPF, s->base.pc_next - s->cs_base);
+gen_exception(s, EXCP0D_GPF);
 }
 
 /* Check for cpl == 0; if not, raise #GP and return false. */
@@ -3267,7 +3267,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, 
int b)
 }
 /* simple MMX/SSE operation */
 if (s->flags & HF_TS_MASK) {
-gen_exception(s, EXCP07_PREX, s->base.pc_next - s->cs_base);
+gen_exception(s, EXCP07_PREX);
 return;
 }
 if (s->flags & HF_EM_MASK) {
@@ -6077,7 +6077,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 if (s->flags & (HF_EM_MASK | HF_TS_MASK)) {
 /* if CR0.EM or CR0.TS are set, generate an FPU exception */
 /* XXX: what to do if illegal op ? */
-gen_exception(s, EXCP07_PREX, s->base.pc_next - s->cs_base);
+gen_exception(s, EXCP07_PREX);
 break;
 }
 modrm = x86_ldub_code(env, s);
@@ -7302,7 +7302,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 goto illegal_op;
 val = x86_ldub_code(env, s);
 if (val == 0) {
-gen_exception(s, EXCP00_DIVZ, s->base.pc_next - s->cs_base);
+gen_exception(s, EXCP00_DIVZ);
 } else {
 gen_helper_aam(cpu_env, tcg_const_i32(val));
 set_cc_op(s, CC_OP_LOGICB);
@@ -7336,7 +7336,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 case 0x9b: /* fwait */
 if ((s->flags & (HF_MP_MASK | HF_TS_MASK)) ==
 (HF_MP_MASK | HF_TS_MASK)) {
-gen_exception(s, EXCP07_PREX, s->base.pc_next - s->cs_base);
+gen_exception(s, EXCP07_PREX);
 } else {
 gen_helper_fwait(cpu_env);
 }
@@ -8393,7 +8393,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 goto illegal_op;
 }
 if ((s->flags & HF_EM_MASK) || (s->flags & HF_TS_MASK)) {
-gen_exception(s, EXCP07_PREX, s->base.pc_next - s->cs_base);
+gen_exception(s, EXCP07_PREX);
 break;
 }
 gen_lea_modrm(env, s, modrm);
@@ -8406,7 +8406,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 goto illegal_op;
 }
 if ((s->flags & HF_EM_MASK) || (s->flags & HF_TS_MASK)) {
-gen_exception(s, EXCP07_PREX, s->base.pc_next - s->cs_base);
+gen_exception(s, EXCP07_PREX);
 break;
 }
 gen_lea_modrm(env, s, modrm);
@@ -8418,7 +8418,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 goto illegal_op;
 }
 if (s->flags & HF_TS_MASK) {
-gen_exception(s, EXCP07_PREX, s->base.pc_next - s->cs_base);
+gen_exception(s, EXCP07_PREX);
 break;
 }
 gen_lea_modrm(env, s, modrm);
@@ -8431,7 +8431,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 goto illegal_op;
 }
 if (s->flags & HF_TS_MASK) {
-gen_exception(s, EXCP07_PREX, s->base.pc_next - s->cs_base);
+gen_exception(s, EXCP07_PREX);
 break;
 }
 gen_helper_update_mxcsr(cpu_env);
@@ -8822,7 +8822,7 @@ static void i386_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
  * Detect entry into the vsyscall 

[PATCH v3 17/26] target/i386: Create gen_jmp_rel

2022-10-01 Thread Richard Henderson
Create a common helper for pc-relative branches.  The jmp jb insn
was missing a mask for CODE32.  In all cases the CODE64 check was
incorrectly placed, allowing PREFIX_DATA to truncate %rip to 16 bits.

Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 58 ++---
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 11aaba8a65..ba1bd7c707 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -226,6 +226,7 @@ static void gen_eob(DisasContext *s);
 static void gen_jr(DisasContext *s);
 static void gen_jmp(DisasContext *s, target_ulong eip);
 static void gen_jmp_tb(DisasContext *s, target_ulong eip, int tb_num);
+static void gen_jmp_rel(DisasContext *s, MemOp ot, int diff, int tb_num);
 static void gen_op(DisasContext *s1, int op, MemOp ot, int d);
 static void gen_exception_gpf(DisasContext *s);
 
@@ -2792,6 +2793,21 @@ static void gen_jmp_tb(DisasContext *s, target_ulong 
eip, int tb_num)
 }
 }
 
+static void gen_jmp_rel(DisasContext *s, MemOp ot, int diff, int tb_num)
+{
+target_ulong dest = s->pc - s->cs_base + diff;
+
+/* In 64-bit mode, operand size is fixed at 64 bits. */
+if (!CODE64(s)) {
+if (ot == MO_16) {
+dest &= 0x;
+} else {
+dest &= 0x;
+}
+}
+gen_jmp_tb(s, dest, tb_num);
+}
+
 static void gen_jmp(DisasContext *s, target_ulong eip)
 {
 gen_jmp_tb(s, eip, 0);
@@ -6862,20 +6878,12 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 break;
 case 0xe8: /* call im */
 {
-if (dflag != MO_16) {
-tval = (int32_t)insn_get(env, s, MO_32);
-} else {
-tval = (int16_t)insn_get(env, s, MO_16);
-}
-tval += s->pc - s->cs_base;
-if (dflag == MO_16) {
-tval &= 0x;
-} else if (!CODE64(s)) {
-tval &= 0x;
-}
+int diff = (dflag != MO_16
+? (int32_t)insn_get(env, s, MO_32)
+: (int16_t)insn_get(env, s, MO_16));
 gen_push_v(s, eip_next_tl(s));
 gen_bnd_jmp(s);
-gen_jmp(s, tval);
+gen_jmp_rel(s, dflag, diff, 0);
 }
 break;
 case 0x9a: /* lcall im */
@@ -6893,19 +6901,13 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 goto do_lcall;
 case 0xe9: /* jmp im */
-if (dflag != MO_16) {
-tval = (int32_t)insn_get(env, s, MO_32);
-} else {
-tval = (int16_t)insn_get(env, s, MO_16);
+{
+int diff = (dflag != MO_16
+? (int32_t)insn_get(env, s, MO_32)
+: (int16_t)insn_get(env, s, MO_16));
+gen_bnd_jmp(s);
+gen_jmp_rel(s, dflag, diff, 0);
 }
-tval += s->pc - s->cs_base;
-if (dflag == MO_16) {
-tval &= 0x;
-} else if (!CODE64(s)) {
-tval &= 0x;
-}
-gen_bnd_jmp(s);
-gen_jmp(s, tval);
 break;
 case 0xea: /* ljmp im */
 {
@@ -6922,12 +6924,10 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 goto do_ljmp;
 case 0xeb: /* jmp Jb */
-tval = (int8_t)insn_get(env, s, MO_8);
-tval += s->pc - s->cs_base;
-if (dflag == MO_16) {
-tval &= 0x;
+{
+int diff = (int8_t)insn_get(env, s, MO_8);
+gen_jmp_rel(s, dflag, diff, 0);
 }
-gen_jmp(s, tval);
 break;
 case 0x70 ... 0x7f: /* jcc Jb */
 tval = (int8_t)insn_get(env, s, MO_8);
-- 
2.34.1




[PATCH v3 10/26] target/i386: USe DISAS_EOB_ONLY

2022-10-01 Thread Richard Henderson
Replace lone calls to gen_eob() with the new enumerator.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 717c978381..6b16c0b62c 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -6835,7 +6835,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 /* add stack offset */
 gen_stack_update(s, val + (2 << dflag));
 }
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_ONLY;
 break;
 case 0xcb: /* lret */
 val = 0;
@@ -6853,7 +6853,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
   tcg_const_i32(s->pc - s->cs_base));
 }
 set_cc_op(s, CC_OP_EFLAGS);
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_ONLY;
 break;
 case 0xe8: /* call im */
 {
@@ -7439,7 +7439,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_set_label(l1);
 gen_jmp_im(s, tval);
 gen_set_label(l2);
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_ONLY;
 }
 break;
 case 0x130: /* wrmsr */
@@ -7480,7 +7480,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_exception_gpf(s);
 } else {
 gen_helper_sysenter(cpu_env);
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_ONLY;
 }
 break;
 case 0x135: /* sysexit */
@@ -7491,7 +7491,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_exception_gpf(s);
 } else {
 gen_helper_sysexit(cpu_env, tcg_const_i32(dflag - 1));
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_ONLY;
 }
 break;
 #ifdef TARGET_X86_64
@@ -8574,7 +8574,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_update_eip_next(s);
 gen_helper_rsm(cpu_env);
 #endif /* CONFIG_USER_ONLY */
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_ONLY;
 break;
 case 0x1b8: /* SSE4.2 popcnt */
 if ((prefixes & (PREFIX_REPZ | PREFIX_LOCK | PREFIX_REPNZ)) !=
-- 
2.34.1




[PATCH v3 13/26] target/i386: Introduce DISAS_JUMP

2022-10-01 Thread Richard Henderson
Drop the unused dest argument to gen_jr().
Remove most of the calls to gen_jr, and use DISAS_JUMP.
Remove some unused loads of eip for lcall and ljmp.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index c8ef9f0356..7db6f617a1 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -135,6 +135,7 @@ typedef struct DisasContext {
 #define DISAS_EOB_ONLY DISAS_TARGET_0
 #define DISAS_EOB_NEXT DISAS_TARGET_1
 #define DISAS_EOB_INHIBIT_IRQ  DISAS_TARGET_2
+#define DISAS_JUMP DISAS_TARGET_3
 
 /* The environment in which user-only runs is constrained. */
 #ifdef CONFIG_USER_ONLY
@@ -222,7 +223,7 @@ STUB_HELPER(wrmsr, TCGv_env env)
 #endif
 
 static void gen_eob(DisasContext *s);
-static void gen_jr(DisasContext *s, TCGv dest);
+static void gen_jr(DisasContext *s);
 static void gen_jmp(DisasContext *s, target_ulong eip);
 static void gen_jmp_tb(DisasContext *s, target_ulong eip, int tb_num);
 static void gen_op(DisasContext *s1, int op, MemOp ot, int d);
@@ -2385,7 +2386,7 @@ static void gen_goto_tb(DisasContext *s, int tb_num, 
target_ulong eip)
 } else {
 /* jump to another page */
 gen_jmp_im(s, eip);
-gen_jr(s, s->tmp0);
+gen_jr(s);
 }
 }
 
@@ -2754,7 +2755,7 @@ static void gen_eob(DisasContext *s)
 }
 
 /* Jump to register */
-static void gen_jr(DisasContext *s, TCGv dest)
+static void gen_jr(DisasContext *s)
 {
 do_gen_eob_worker(s, false, false, true);
 }
@@ -5328,7 +5329,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_push_v(s, s->T1);
 gen_op_jmp_v(s->T0);
 gen_bnd_jmp(s);
-gen_jr(s, s->T0);
+s->base.is_jmp = DISAS_JUMP;
 break;
 case 3: /* lcall Ev */
 if (mod == 3) {
@@ -5349,8 +5350,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
   tcg_const_i32(dflag - 1),
   tcg_const_i32(s->pc - s->cs_base));
 }
-tcg_gen_ld_tl(s->tmp4, cpu_env, offsetof(CPUX86State, eip));
-gen_jr(s, s->tmp4);
+s->base.is_jmp = DISAS_JUMP;
 break;
 case 4: /* jmp Ev */
 if (dflag == MO_16) {
@@ -5358,7 +5358,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 gen_op_jmp_v(s->T0);
 gen_bnd_jmp(s);
-gen_jr(s, s->T0);
+s->base.is_jmp = DISAS_JUMP;
 break;
 case 5: /* ljmp Ev */
 if (mod == 3) {
@@ -5376,8 +5376,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_op_movl_seg_T0_vm(s, R_CS);
 gen_op_jmp_v(s->T1);
 }
-tcg_gen_ld_tl(s->tmp4, cpu_env, offsetof(CPUX86State, eip));
-gen_jr(s, s->tmp4);
+s->base.is_jmp = DISAS_JUMP;
 break;
 case 6: /* push Ev */
 gen_push_v(s, s->T0);
@@ -6808,7 +6807,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 /* Note that gen_pop_T0 uses a zero-extending load.  */
 gen_op_jmp_v(s->T0);
 gen_bnd_jmp(s);
-gen_jr(s, s->T0);
+s->base.is_jmp = DISAS_JUMP;
 break;
 case 0xc3: /* ret */
 ot = gen_pop_T0(s);
@@ -6816,7 +6815,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 /* Note that gen_pop_T0 uses a zero-extending load.  */
 gen_op_jmp_v(s->T0);
 gen_bnd_jmp(s);
-gen_jr(s, s->T0);
+s->base.is_jmp = DISAS_JUMP;
 break;
 case 0xca: /* lret im */
 val = x86_ldsw_code(env, s);
@@ -8846,6 +8845,9 @@ static void i386_tr_tb_stop(DisasContextBase *dcbase, 
CPUState *cpu)
 gen_update_eip_cur(dc);
 gen_eob_inhibit_irq(dc, true);
 break;
+case DISAS_JUMP:
+gen_jr(dc);
+break;
 default:
 g_assert_not_reached();
 }
-- 
2.34.1




[PATCH v3 08/26] target/i386: Use DISAS_EOB* in gen_movl_seg_T0

2022-10-01 Thread Richard Henderson
Set is_jmp properly in gen_movl_seg_T0, so that the callers
need to nothing special.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 36 +---
 1 file changed, 5 insertions(+), 31 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index caa22af5a7..8c0ef0f212 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2457,13 +2457,15 @@ static void gen_movl_seg_T0(DisasContext *s, X86Seg 
seg_reg)
because ss32 may change. For R_SS, translation must always
stop as a special handling must be done to disable hardware
interrupts for the next instruction */
-if (seg_reg == R_SS || (CODE32(s) && seg_reg < R_FS)) {
-s->base.is_jmp = DISAS_TOO_MANY;
+if (seg_reg == R_SS) {
+s->base.is_jmp = DISAS_EOB_INHIBIT_IRQ;
+} else if (CODE32(s) && seg_reg < R_FS) {
+s->base.is_jmp = DISAS_EOB_NEXT;
 }
 } else {
 gen_op_movl_seg_T0_vm(s, seg_reg);
 if (seg_reg == R_SS) {
-s->base.is_jmp = DISAS_TOO_MANY;
+s->base.is_jmp = DISAS_EOB_INHIBIT_IRQ;
 }
 }
 }
@@ -5726,26 +5728,12 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 ot = gen_pop_T0(s);
 gen_movl_seg_T0(s, reg);
 gen_pop_update(s, ot);
-/* Note that reg == R_SS in gen_movl_seg_T0 always sets is_jmp.  */
-if (s->base.is_jmp) {
-gen_update_eip_next(s);
-if (reg == R_SS) {
-s->flags &= ~HF_TF_MASK;
-gen_eob_inhibit_irq(s, true);
-} else {
-gen_eob(s);
-}
-}
 break;
 case 0x1a1: /* pop fs */
 case 0x1a9: /* pop gs */
 ot = gen_pop_T0(s);
 gen_movl_seg_T0(s, (b >> 3) & 7);
 gen_pop_update(s, ot);
-if (s->base.is_jmp) {
-gen_update_eip_next(s);
-gen_eob(s);
-}
 break;
 
 /**/
@@ -5792,16 +5780,6 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 goto illegal_op;
 gen_ldst_modrm(env, s, modrm, MO_16, OR_TMP0, 0);
 gen_movl_seg_T0(s, reg);
-/* Note that reg == R_SS in gen_movl_seg_T0 always sets is_jmp.  */
-if (s->base.is_jmp) {
-gen_update_eip_next(s);
-if (reg == R_SS) {
-s->flags &= ~HF_TF_MASK;
-gen_eob_inhibit_irq(s, true);
-} else {
-gen_eob(s);
-}
-}
 break;
 case 0x8c: /* mov Gv, seg */
 modrm = x86_ldub_code(env, s);
@@ -5991,10 +5969,6 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_movl_seg_T0(s, op);
 /* then put the data */
 gen_op_mov_reg_v(s, ot, reg, s->T1);
-if (s->base.is_jmp) {
-gen_update_eip_next(s);
-gen_eob(s);
-}
 break;
 
 //
-- 
2.34.1




[PATCH v3 11/26] target/i386: Create cur_insn_len, cur_insn_len_i32

2022-10-01 Thread Richard Henderson
Create common routines for computing the length of the insn.
Use tcg_constant_i32 in the new function, while we're at it.

Reviewed-by: Paolo Bonzini 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 31 +++
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 6b16c0b62c..fe99c4361c 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -530,6 +530,16 @@ static void gen_update_eip_next(DisasContext *s)
 gen_jmp_im(s, s->pc - s->cs_base);
 }
 
+static int cur_insn_len(DisasContext *s)
+{
+return s->pc - s->base.pc_next;
+}
+
+static TCGv_i32 cur_insn_len_i32(DisasContext *s)
+{
+return tcg_constant_i32(cur_insn_len(s));
+}
+
 /* Compute SEG:REG into A0.  SEG is selected from the override segment
(OVR_SEG) and the default segment (DEF_SEG).  OVR_SEG may be -1 to
indicate no override.  */
@@ -712,9 +722,6 @@ static bool gen_check_io(DisasContext *s, MemOp ot, 
TCGv_i32 port,
 gen_helper_check_io(cpu_env, port, tcg_constant_i32(1 << ot));
 }
 if (GUEST(s)) {
-target_ulong cur_eip = s->base.pc_next - s->cs_base;
-target_ulong next_eip = s->pc - s->cs_base;
-
 gen_update_cc_op(s);
 gen_update_eip_cur(s);
 if (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) {
@@ -723,7 +730,7 @@ static bool gen_check_io(DisasContext *s, MemOp ot, 
TCGv_i32 port,
 svm_flags |= 1 << (SVM_IOIO_SIZE_SHIFT + ot);
 gen_helper_svm_check_io(cpu_env, port,
 tcg_constant_i32(svm_flags),
-tcg_constant_i32(next_eip - cur_eip));
+cur_insn_len_i32(s));
 }
 return true;
 #endif
@@ -2028,7 +2035,7 @@ static uint64_t advance_pc(CPUX86State *env, DisasContext 
*s, int num_bytes)
 }
 
 s->pc += num_bytes;
-if (unlikely(s->pc - s->base.pc_next > X86_MAX_INSN_LENGTH)) {
+if (unlikely(cur_insn_len(s) > X86_MAX_INSN_LENGTH)) {
 /* If the instruction's 16th byte is on a different page than the 1st, 
a
  * page fault on the second page wins over the general protection fault
  * caused by the instruction being too long.
@@ -2647,7 +2654,7 @@ static void gen_interrupt(DisasContext *s, int intno)
 gen_update_cc_op(s);
 gen_update_eip_cur(s);
 gen_helper_raise_interrupt(cpu_env, tcg_constant_i32(intno),
-   tcg_constant_i32(s->pc - s->base.pc_next));
+   cur_insn_len_i32(s));
 s->base.is_jmp = DISAS_NORETURN;
 }
 
@@ -7314,7 +7321,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 if (prefixes & PREFIX_REPZ) {
 gen_update_cc_op(s);
 gen_update_eip_cur(s);
-gen_helper_pause(cpu_env, tcg_const_i32(s->pc - s->base.pc_next));
+gen_helper_pause(cpu_env, cur_insn_len_i32(s));
 s->base.is_jmp = DISAS_NORETURN;
 }
 break;
@@ -7340,7 +7347,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 goto illegal_op;
 gen_update_cc_op(s);
 gen_update_eip_cur(s);
-gen_helper_into(cpu_env, tcg_const_i32(s->pc - s->base.pc_next));
+gen_helper_into(cpu_env, cur_insn_len_i32(s));
 break;
 #ifdef WANT_ICEBP
 case 0xf1: /* icebp (undocumented, exits to external debugger) */
@@ -7499,7 +7506,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 /* XXX: is it usable in real mode ? */
 gen_update_cc_op(s);
 gen_update_eip_cur(s);
-gen_helper_syscall(cpu_env, tcg_const_i32(s->pc - s->base.pc_next));
+gen_helper_syscall(cpu_env, cur_insn_len_i32(s));
 /* TF handling for the syscall insn is different. The TF bit is  
checked
after the syscall insn completes. This allows #DB to not be
generated after one has entered CPL0 if TF is set in FMASK.  */
@@ -7531,7 +7538,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 if (check_cpl0(s)) {
 gen_update_cc_op(s);
 gen_update_eip_cur(s);
-gen_helper_hlt(cpu_env, tcg_const_i32(s->pc - s->base.pc_next));
+gen_helper_hlt(cpu_env, cur_insn_len_i32(s));
 s->base.is_jmp = DISAS_NORETURN;
 }
 break;
@@ -7640,7 +7647,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 gen_update_cc_op(s);
 gen_update_eip_cur(s);
-gen_helper_mwait(cpu_env, tcg_const_i32(s->pc - s->base.pc_next));
+gen_helper_mwait(cpu_env, cur_insn_len_i32(s));
 s->base.is_jmp = DISAS_NORETURN;
 break;
 
@@ -7716,7 +7723,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_update_cc_op(s);
 gen_update_eip_cur(s);
 gen_helper_vmrun(cpu_env, 

[PATCH v3 02/26] target/i386: Return bool from disas_insn

2022-10-01 Thread Richard Henderson
Instead of returning the new pc, which is present in
DisasContext, return true if an insn was translated.
This is false when we detect a page crossing and must
undo the insn under translation.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 44 +++--
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 16bf56dbc7..3f3e79c096 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -4707,7 +4707,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, 
int b)
 
 /* convert one instruction. s->base.is_jmp is set if the translation must
be stopped. Return the next pc value */
-static target_ulong disas_insn(DisasContext *s, CPUState *cpu)
+static bool disas_insn(DisasContext *s, CPUState *cpu)
 {
 CPUX86State *env = cpu->env_ptr;
 int b, prefixes;
@@ -4734,15 +4734,16 @@ static target_ulong disas_insn(DisasContext *s, 
CPUState *cpu)
 break;
 case 1:
 gen_exception_gpf(s);
-return s->pc;
+return true;
 case 2:
 /* Restore state that may affect the next instruction. */
+s->pc = s->base.pc_next;
 s->cc_op_dirty = orig_cc_op_dirty;
 s->cc_op = orig_cc_op;
 s->base.num_insns--;
 tcg_remove_ops_after(s->prev_insn_end);
 s->base.is_jmp = DISAS_TOO_MANY;
-return s->base.pc_next;
+return false;
 default:
 g_assert_not_reached();
 }
@@ -8644,13 +8645,13 @@ static target_ulong disas_insn(DisasContext *s, 
CPUState *cpu)
 default:
 goto unknown_op;
 }
-return s->pc;
+return true;
  illegal_op:
 gen_illegal_opcode(s);
-return s->pc;
+return true;
  unknown_op:
 gen_unknown_opcode(env, s);
-return s->pc;
+return true;
 }
 
 void tcg_x86_init(void)
@@ -8815,7 +8816,6 @@ static void i386_tr_insn_start(DisasContextBase *dcbase, 
CPUState *cpu)
 static void i386_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
 {
 DisasContext *dc = container_of(dcbase, DisasContext, base);
-target_ulong pc_next;
 
 #ifdef TARGET_VSYSCALL_PAGE
 /*
@@ -8828,21 +8828,23 @@ static void i386_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 }
 #endif
 
-pc_next = disas_insn(dc, cpu);
-dc->base.pc_next = pc_next;
+if (disas_insn(dc, cpu)) {
+target_ulong pc_next = dc->pc;
+dc->base.pc_next = pc_next;
 
-if (dc->base.is_jmp == DISAS_NEXT) {
-if (dc->flags & (HF_TF_MASK | HF_INHIBIT_IRQ_MASK)) {
-/*
- * If single step mode, we generate only one instruction and
- * generate an exception.
- * If irq were inhibited with HF_INHIBIT_IRQ_MASK, we clear
- * the flag and abort the translation to give the irqs a
- * chance to happen.
- */
-dc->base.is_jmp = DISAS_TOO_MANY;
-} else if (!is_same_page(>base, pc_next)) {
-dc->base.is_jmp = DISAS_TOO_MANY;
+if (dc->base.is_jmp == DISAS_NEXT) {
+if (dc->flags & (HF_TF_MASK | HF_INHIBIT_IRQ_MASK)) {
+/*
+ * If single step mode, we generate only one instruction and
+ * generate an exception.
+ * If irq were inhibited with HF_INHIBIT_IRQ_MASK, we clear
+ * the flag and abort the translation to give the irqs a
+ * chance to happen.
+ */
+dc->base.is_jmp = DISAS_TOO_MANY;
+} else if (!is_same_page(>base, pc_next)) {
+dc->base.is_jmp = DISAS_TOO_MANY;
+}
 }
 }
 }
-- 
2.34.1




[PATCH v3 04/26] target/i386: Remove cur_eip, next_eip arguments to gen_interrupt

2022-10-01 Thread Richard Henderson
All callers pass s->base.pc_next and s->pc, which we can just as
well compute within the function.  Adjust to use tcg_constant_i32
while we're at it.

Reviewed-by: Paolo Bonzini 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 13 ++---
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 617832fcb0..5a9c3b1e71 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2627,13 +2627,12 @@ static void gen_unknown_opcode(CPUX86State *env, 
DisasContext *s)
 
 /* an interrupt is different from an exception because of the
privilege checks */
-static void gen_interrupt(DisasContext *s, int intno,
-  target_ulong cur_eip, target_ulong next_eip)
+static void gen_interrupt(DisasContext *s, int intno)
 {
 gen_update_cc_op(s);
-gen_jmp_im(s, cur_eip);
-gen_helper_raise_interrupt(cpu_env, tcg_const_i32(intno),
-   tcg_const_i32(next_eip - cur_eip));
+gen_jmp_im(s, s->base.pc_next - s->cs_base);
+gen_helper_raise_interrupt(cpu_env, tcg_constant_i32(intno),
+   tcg_constant_i32(s->pc - s->base.pc_next));
 s->base.is_jmp = DISAS_NORETURN;
 }
 
@@ -7342,12 +7341,12 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 break;
 case 0xcc: /* int3 */
-gen_interrupt(s, EXCP03_INT3, s->base.pc_next - s->cs_base, s->pc - 
s->cs_base);
+gen_interrupt(s, EXCP03_INT3);
 break;
 case 0xcd: /* int N */
 val = x86_ldub_code(env, s);
 if (check_vm86_iopl(s)) {
-gen_interrupt(s, val, s->base.pc_next - s->cs_base, s->pc - 
s->cs_base);
+gen_interrupt(s, val);
 }
 break;
 case 0xce: /* into */
-- 
2.34.1




[PATCH v3 06/26] target/i386: Create gen_update_eip_next

2022-10-01 Thread Richard Henderson
Sync EIP before exiting a translation block.
Replace all gen_jmp_im that use s->pc.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 45 -
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 85253e1e17..4c1548da8e 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -521,6 +521,11 @@ static void gen_update_eip_cur(DisasContext *s)
 gen_jmp_im(s, s->base.pc_next - s->cs_base);
 }
 
+static void gen_update_eip_next(DisasContext *s)
+{
+gen_jmp_im(s, s->pc - s->cs_base);
+}
+
 /* Compute SEG:REG into A0.  SEG is selected from the override segment
(OVR_SEG) and the default segment (DEF_SEG).  OVR_SEG may be -1 to
indicate no override.  */
@@ -5719,7 +5724,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_pop_update(s, ot);
 /* Note that reg == R_SS in gen_movl_seg_T0 always sets is_jmp.  */
 if (s->base.is_jmp) {
-gen_jmp_im(s, s->pc - s->cs_base);
+gen_update_eip_next(s);
 if (reg == R_SS) {
 s->flags &= ~HF_TF_MASK;
 gen_eob_inhibit_irq(s, true);
@@ -5734,7 +5739,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_movl_seg_T0(s, (b >> 3) & 7);
 gen_pop_update(s, ot);
 if (s->base.is_jmp) {
-gen_jmp_im(s, s->pc - s->cs_base);
+gen_update_eip_next(s);
 gen_eob(s);
 }
 break;
@@ -5785,7 +5790,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_movl_seg_T0(s, reg);
 /* Note that reg == R_SS in gen_movl_seg_T0 always sets is_jmp.  */
 if (s->base.is_jmp) {
-gen_jmp_im(s, s->pc - s->cs_base);
+gen_update_eip_next(s);
 if (reg == R_SS) {
 s->flags &= ~HF_TF_MASK;
 gen_eob_inhibit_irq(s, true);
@@ -5983,7 +5988,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 /* then put the data */
 gen_op_mov_reg_v(s, ot, reg, s->T1);
 if (s->base.is_jmp) {
-gen_jmp_im(s, s->pc - s->cs_base);
+gen_update_eip_next(s);
 gen_eob(s);
 }
 break;
@@ -7039,7 +7044,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_pop_update(s, ot);
 set_cc_op(s, CC_OP_EFLAGS);
 /* abort translation because TF/AC flag may change */
-gen_jmp_im(s, s->pc - s->cs_base);
+gen_update_eip_next(s);
 gen_eob(s);
 }
 break;
@@ -7375,7 +7380,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 if (check_iopl(s)) {
 gen_helper_sti(cpu_env);
 /* interruptions are enabled only the first insn after sti */
-gen_jmp_im(s, s->pc - s->cs_base);
+gen_update_eip_next(s);
 gen_eob_inhibit_irq(s, true);
 }
 break;
@@ -7451,7 +7456,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 
 gen_set_label(l3);
-gen_jmp_im(s, next_eip);
+gen_update_eip_next(s);
 tcg_gen_br(l2);
 
 gen_set_label(l1);
@@ -7469,7 +7474,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_helper_rdmsr(cpu_env);
 } else {
 gen_helper_wrmsr(cpu_env);
-gen_jmp_im(s, s->pc - s->cs_base);
+gen_update_eip_next(s);
 gen_eob(s);
 }
 }
@@ -7669,7 +7674,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 goto illegal_op;
 }
 gen_helper_clac(cpu_env);
-gen_jmp_im(s, s->pc - s->cs_base);
+gen_update_eip_next(s);
 gen_eob(s);
 break;
 
@@ -7679,7 +7684,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 goto illegal_op;
 }
 gen_helper_stac(cpu_env);
-gen_jmp_im(s, s->pc - s->cs_base);
+gen_update_eip_next(s);
 gen_eob(s);
 break;
 
@@ -7724,7 +7729,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
 gen_helper_xsetbv(cpu_env, s->tmp2_i32, s->tmp1_i64);
 /* End TB because translation flags may change.  */
-gen_jmp_im(s, s->pc - s->cs_base);
+gen_update_eip_next(s);
 gen_eob(s);
 break;
 
@@ -7786,7 +7791,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 gen_update_cc_op(s);
 gen_helper_stgi(cpu_env);
-gen_jmp_im(s, s->pc - s->cs_base);
+gen_update_eip_next(s);
 gen_eob(s);
 break;
 
@@ -7825,7 +7830,7 @@ static 

[PATCH v3 00/26] target/i386: pc-relative translation blocks

2022-10-01 Thread Richard Henderson
This is the x86 specific changes required to reduce the
amount of translation for address space randomization.
For v3, quite a few changes based on Paolo's feedback.


r~

Based-on: 20220930212622.108363-1-richard.hender...@linaro.org
("[PATCH v6 00/18] tcg: CPUTLBEntryFull and TARGET_TB_PCREL")


Richard Henderson (26):
  target/i386: Remove pc_start
  target/i386: Return bool from disas_insn
  target/i386: Remove cur_eip argument to gen_exception
  target/i386: Remove cur_eip, next_eip arguments to gen_interrupt
  target/i386: Create gen_update_eip_cur
  target/i386: Create gen_update_eip_next
  target/i386: Introduce DISAS_EOB*
  target/i386: Use DISAS_EOB* in gen_movl_seg_T0
  target/i386: Use DISAS_EOB_NEXT
  target/i386: USe DISAS_EOB_ONLY
  target/i386: Create cur_insn_len, cur_insn_len_i32
  target/i386: Remove cur_eip, next_eip arguments to gen_repz*
  target/i386: Introduce DISAS_JUMP
  target/i386: Truncate values for lcall_real to i32
  target/i386: Create eip_next_*
  target/i386: Use DISAS_TOO_MANY to exit after gen_io_start
  target/i386: Create gen_jmp_rel
  target/i386: Use gen_jmp_rel for loop, repz, jecxz insns
  target/i386: Use gen_jmp_rel for gen_jcc
  target/i386: Use gen_jmp_rel for DISAS_TOO_MANY
  target/i386: Remove MemOp argument to gen_op_j*_ecx
  target/i386: Merge gen_jmp_tb and gen_goto_tb into gen_jmp_rel
  target/i386: Create eip_cur_tl
  target/i386: Add cpu_eip
  target/i386: Inline gen_jmp_im
  target/i386: Enable TARGET_TB_PCREL

 target/i386/cpu-param.h  |   4 +
 target/i386/helper.h |   2 +-
 target/i386/tcg/seg_helper.c |   6 +-
 target/i386/tcg/tcg-cpu.c|   8 +-
 target/i386/tcg/translate.c  | 830 ++-
 5 files changed, 448 insertions(+), 402 deletions(-)

-- 
2.34.1




[PATCH v3 01/26] target/i386: Remove pc_start

2022-10-01 Thread Richard Henderson
The DisasContext member and the disas_insn local variable of
the same name are identical to DisasContextBase.pc_next.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 114 +++-
 1 file changed, 60 insertions(+), 54 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 44af8c107f..16bf56dbc7 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -76,7 +76,6 @@ typedef struct DisasContext {
 DisasContextBase base;
 
 target_ulong pc;   /* pc = eip + cs_base */
-target_ulong pc_start; /* pc at TB entry */
 target_ulong cs_base;  /* base of CS segment */
 
 MemOp aflag;
@@ -1345,13 +1344,13 @@ static void gen_exception(DisasContext *s, int trapno, 
target_ulong cur_eip)
the instruction is known, but it isn't allowed in the current cpu mode.  */
 static void gen_illegal_opcode(DisasContext *s)
 {
-gen_exception(s, EXCP06_ILLOP, s->pc_start - s->cs_base);
+gen_exception(s, EXCP06_ILLOP, s->base.pc_next - s->cs_base);
 }
 
 /* Generate #GP for the current instruction. */
 static void gen_exception_gpf(DisasContext *s)
 {
-gen_exception(s, EXCP0D_GPF, s->pc_start - s->cs_base);
+gen_exception(s, EXCP0D_GPF, s->base.pc_next - s->cs_base);
 }
 
 /* Check for cpl == 0; if not, raise #GP and return false. */
@@ -2016,7 +2015,7 @@ static uint64_t advance_pc(CPUX86State *env, DisasContext 
*s, int num_bytes)
 }
 
 s->pc += num_bytes;
-if (unlikely(s->pc - s->pc_start > X86_MAX_INSN_LENGTH)) {
+if (unlikely(s->pc - s->base.pc_next > X86_MAX_INSN_LENGTH)) {
 /* If the instruction's 16th byte is on a different page than the 1st, 
a
  * page fault on the second page wins over the general protection fault
  * caused by the instruction being too long.
@@ -2614,7 +2613,7 @@ static void gen_unknown_opcode(CPUX86State *env, 
DisasContext *s)
 if (qemu_loglevel_mask(LOG_UNIMP)) {
 FILE *logfile = qemu_log_trylock();
 if (logfile) {
-target_ulong pc = s->pc_start, end = s->pc;
+target_ulong pc = s->base.pc_next, end = s->pc;
 
 fprintf(logfile, "ILLOPC: " TARGET_FMT_lx ":", pc);
 for (; pc < end; ++pc) {
@@ -3226,8 +3225,7 @@ static const struct SSEOpHelper_table7 sse_op_table7[256] 
= {
 goto illegal_op; \
 } while (0)
 
-static void gen_sse(CPUX86State *env, DisasContext *s, int b,
-target_ulong pc_start)
+static void gen_sse(CPUX86State *env, DisasContext *s, int b)
 {
 int b1, op1_offset, op2_offset, is_xmm, val;
 int modrm, mod, rm, reg;
@@ -3269,7 +3267,7 @@ static void gen_sse(CPUX86State *env, DisasContext *s, 
int b,
 }
 /* simple MMX/SSE operation */
 if (s->flags & HF_TS_MASK) {
-gen_exception(s, EXCP07_PREX, pc_start - s->cs_base);
+gen_exception(s, EXCP07_PREX, s->base.pc_next - s->cs_base);
 return;
 }
 if (s->flags & HF_EM_MASK) {
@@ -4717,11 +4715,10 @@ static target_ulong disas_insn(DisasContext *s, 
CPUState *cpu)
 MemOp ot, aflag, dflag;
 int modrm, reg, rm, mod, op, opreg, val;
 target_ulong next_eip, tval;
-target_ulong pc_start = s->base.pc_next;
 bool orig_cc_op_dirty = s->cc_op_dirty;
 CCOp orig_cc_op = s->cc_op;
 
-s->pc_start = s->pc = pc_start;
+s->pc = s->base.pc_next;
 s->override = -1;
 #ifdef TARGET_X86_64
 s->rex_w = false;
@@ -4745,7 +4742,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState 
*cpu)
 s->base.num_insns--;
 tcg_remove_ops_after(s->prev_insn_end);
 s->base.is_jmp = DISAS_TOO_MANY;
-return pc_start;
+return s->base.pc_next;
 default:
 g_assert_not_reached();
 }
@@ -6079,7 +6076,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState 
*cpu)
 if (s->flags & (HF_EM_MASK | HF_TS_MASK)) {
 /* if CR0.EM or CR0.TS are set, generate an FPU exception */
 /* XXX: what to do if illegal op ? */
-gen_exception(s, EXCP07_PREX, pc_start - s->cs_base);
+gen_exception(s, EXCP07_PREX, s->base.pc_next - s->cs_base);
 break;
 }
 modrm = x86_ldub_code(env, s);
@@ -6620,7 +6617,7 @@ static target_ulong disas_insn(DisasContext *s, CPUState 
*cpu)
offsetof(CPUX86State, segs[R_CS].selector));
 tcg_gen_st16_i32(s->tmp2_i32, cpu_env,
  offsetof(CPUX86State, fpcs));
-tcg_gen_st_tl(tcg_constant_tl(pc_start - s->cs_base),
+tcg_gen_st_tl(tcg_constant_tl(s->base.pc_next - s->cs_base),
   cpu_env, offsetof(CPUX86State, fpip));
 }
 }
@@ -6632,7 +6629,8 @@ static target_ulong disas_insn(DisasContext *s, CPUState 
*cpu)
 case 0xa5:
 ot = mo_b_d(b, dflag);
  

[PATCH v3 07/26] target/i386: Introduce DISAS_EOB*

2022-10-01 Thread Richard Henderson
Add a few DISAS_TARGET_* aliases to reduce the number of
calls to gen_eob() and gen_eob_inhibit_irq().  So far,
only update i386_tr_translate_insn for exiting the block
because of single-step or previous inhibit irq.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 23 +--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 4c1548da8e..caa22af5a7 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -132,6 +132,10 @@ typedef struct DisasContext {
 TCGOp *prev_insn_end;
 } DisasContext;
 
+#define DISAS_EOB_ONLY DISAS_TARGET_0
+#define DISAS_EOB_NEXT DISAS_TARGET_1
+#define DISAS_EOB_INHIBIT_IRQ  DISAS_TARGET_2
+
 /* The environment in which user-only runs is constrained. */
 #ifdef CONFIG_USER_ONLY
 #define PE(S) true
@@ -8849,7 +8853,7 @@ static void i386_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
  * the flag and abort the translation to give the irqs a
  * chance to happen.
  */
-dc->base.is_jmp = DISAS_TOO_MANY;
+dc->base.is_jmp = DISAS_EOB_NEXT;
 } else if (!is_same_page(>base, pc_next)) {
 dc->base.is_jmp = DISAS_TOO_MANY;
 }
@@ -8861,9 +8865,24 @@ static void i386_tr_tb_stop(DisasContextBase *dcbase, 
CPUState *cpu)
 {
 DisasContext *dc = container_of(dcbase, DisasContext, base);
 
-if (dc->base.is_jmp == DISAS_TOO_MANY) {
+switch (dc->base.is_jmp) {
+case DISAS_NORETURN:
+break;
+case DISAS_TOO_MANY:
+case DISAS_EOB_NEXT:
+gen_update_cc_op(dc);
 gen_update_eip_cur(dc);
+/* fall through */
+case DISAS_EOB_ONLY:
 gen_eob(dc);
+break;
+case DISAS_EOB_INHIBIT_IRQ:
+gen_update_cc_op(dc);
+gen_update_eip_cur(dc);
+gen_eob_inhibit_irq(dc, true);
+break;
+default:
+g_assert_not_reached();
 }
 }
 
-- 
2.34.1




[PATCH v3 09/26] target/i386: Use DISAS_EOB_NEXT

2022-10-01 Thread Richard Henderson
Replace sequences of gen_update_cc_op, gen_update_eip_next,
and gen_eob with the new is_jmp enumerator.

Reviewed-by: Paolo Bonzini 
Signed-off-by: Richard Henderson 
---
 target/i386/tcg/translate.c | 40 -
 1 file changed, 13 insertions(+), 27 deletions(-)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 8c0ef0f212..717c978381 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -7022,8 +7022,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_pop_update(s, ot);
 set_cc_op(s, CC_OP_EFLAGS);
 /* abort translation because TF/AC flag may change */
-gen_update_eip_next(s);
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_NEXT;
 }
 break;
 case 0x9e: /* sahf */
@@ -7452,8 +7451,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_helper_rdmsr(cpu_env);
 } else {
 gen_helper_wrmsr(cpu_env);
-gen_update_eip_next(s);
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_NEXT;
 }
 }
 break;
@@ -7652,8 +7650,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 goto illegal_op;
 }
 gen_helper_clac(cpu_env);
-gen_update_eip_next(s);
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_NEXT;
 break;
 
 case 0xcb: /* stac */
@@ -7662,8 +7659,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 goto illegal_op;
 }
 gen_helper_stac(cpu_env);
-gen_update_eip_next(s);
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_NEXT;
 break;
 
 CASE_MODRM_MEM_OP(1): /* sidt */
@@ -7707,8 +7703,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
 gen_helper_xsetbv(cpu_env, s->tmp2_i32, s->tmp1_i64);
 /* End TB because translation flags may change.  */
-gen_update_eip_next(s);
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_NEXT;
 break;
 
 case 0xd8: /* VMRUN */
@@ -7769,8 +7764,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 }
 gen_update_cc_op(s);
 gen_helper_stgi(cpu_env);
-gen_update_eip_next(s);
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_NEXT;
 break;
 
 case 0xdd: /* CLGI */
@@ -7808,8 +7802,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 tcg_gen_ext32u_tl(s->A0, cpu_regs[R_EAX]);
 }
 gen_helper_flush_page(cpu_env, s->A0);
-gen_update_eip_next(s);
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_NEXT;
 break;
 
 CASE_MODRM_MEM_OP(2): /* lgdt */
@@ -7892,8 +7885,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 tcg_gen_andi_tl(s->T1, s->T1, ~0xe);
 tcg_gen_or_tl(s->T0, s->T0, s->T1);
 gen_helper_write_crN(cpu_env, tcg_constant_i32(0), s->T0);
-gen_update_eip_next(s);
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_NEXT;
 break;
 
 CASE_MODRM_MEM_OP(7): /* invlpg */
@@ -7903,8 +7895,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_svm_check_intercept(s, SVM_EXIT_INVLPG);
 gen_lea_modrm(env, s, modrm);
 gen_helper_flush_page(cpu_env, s->A0);
-gen_update_eip_next(s);
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_NEXT;
 break;
 
 case 0xf8: /* swapgs */
@@ -8303,8 +8294,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_svm_check_intercept(s, SVM_EXIT_WRITE_CR0 + reg);
 gen_op_mov_v_reg(s, ot, s->T0, rm);
 gen_helper_write_crN(cpu_env, tcg_constant_i32(reg), s->T0);
-gen_update_eip_next(s);
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_NEXT;
 } else {
 gen_svm_check_intercept(s, SVM_EXIT_READ_CR0 + reg);
 gen_helper_read_crN(s->T0, cpu_env, tcg_constant_i32(reg));
@@ -8338,8 +8328,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_op_mov_v_reg(s, ot, s->T0, rm);
 tcg_gen_movi_i32(s->tmp2_i32, reg);
 gen_helper_set_dr(cpu_env, s->tmp2_i32, s->T0);
-gen_update_eip_next(s);
-gen_eob(s);
+s->base.is_jmp = DISAS_EOB_NEXT;
 } else {
 gen_svm_check_intercept(s, SVM_EXIT_READ_DR0 + reg);
 tcg_gen_movi_i32(s->tmp2_i32, reg);
@@ -8353,8 +8342,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
 gen_svm_check_intercept(s, 

Re: [PATCH v9 3/4] target/riscv: smstateen check for fcsr

2022-10-01 Thread mchitale
On Thu, 2022-09-29 at 09:09 +0800, weiwei wrote:
> On 2022/9/19 14:29, Mayuresh Chitale wrote:
> > If smstateen is implemented and sstateen0.fcsr is clear then the
> > floating point
> > operations must return illegal instruction exception or virtual
> > instruction
> > trap, if relevant.
> > 
> > Signed-off-by: Mayuresh Chitale 
> > ---
> >   target/riscv/csr.c| 23 +
> >   target/riscv/insn_trans/trans_rvf.c.inc   | 40
> > +--
> >   target/riscv/insn_trans/trans_rvzfh.c.inc | 12 +++
> >   3 files changed, 72 insertions(+), 3 deletions(-)
> > 
> > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> > index 59d5aa74ee..edaecf53ce 100644
> > --- a/target/riscv/csr.c
> > +++ b/target/riscv/csr.c
> > @@ -84,6 +84,10 @@ static RISCVException fs(CPURISCVState *env, int
> > csrno)
> >   !RISCV_CPU(env_cpu(env))->cfg.ext_zfinx) {
> >   return RISCV_EXCP_ILLEGAL_INST;
> >   }
> > +
> > +if (!env->debugger && !riscv_cpu_fp_enabled(env)) {
> > +return smstateen_acc_ok(env, 0, SMSTATEEN0_FCSR);
> > +}
> >   #endif
> >   return RISCV_EXCP_NONE;
> >   }
> > @@ -2024,6 +2028,9 @@ static RISCVException
> > write_mstateen0(CPURISCVState *env, int csrno,
> > target_ulong new_val)
> >   {
> >   uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
> > +if (!riscv_has_ext(env, RVF)) {
> > +wr_mask |= SMSTATEEN0_FCSR;
> > +}
> >   
> >   return write_mstateen(env, csrno, wr_mask, new_val);
> >   }
> > @@ -2072,6 +2079,10 @@ static RISCVException
> > write_mstateen0h(CPURISCVState *env, int csrno,
> >   {
> >   uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
> >   
> > +if (!riscv_has_ext(env, RVF)) {
> > +wr_mask |= SMSTATEEN0_FCSR;
> > +}
> > +
> >   return write_mstateenh(env, csrno, wr_mask, new_val);
> >   }
> >   
> > @@ -2121,6 +2132,10 @@ static RISCVException
> > write_hstateen0(CPURISCVState *env, int csrno,
> >   {
> >   uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
> >   
> > +if (!riscv_has_ext(env, RVF)) {
> > +wr_mask |= SMSTATEEN0_FCSR;
> > +}
> > +
> >   return write_hstateen(env, csrno, wr_mask, new_val);
> >   }
> >   
> > @@ -2172,6 +2187,10 @@ static RISCVException
> > write_hstateen0h(CPURISCVState *env, int csrno,
> >   {
> >   uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
> >   
> > +if (!riscv_has_ext(env, RVF)) {
> > +wr_mask |= SMSTATEEN0_FCSR;
> > +}
> > +
> >   return write_hstateenh(env, csrno, wr_mask, new_val);
> >   }
> >   
> > @@ -2231,6 +2250,10 @@ static RISCVException
> > write_sstateen0(CPURISCVState *env, int csrno,
> >   {
> >   uint64_t wr_mask = SMSTATEEN_STATEEN | SMSTATEEN0_HSENVCFG;
> >   
> > +if (!riscv_has_ext(env, RVF)) {
> > +wr_mask |= SMSTATEEN0_FCSR;
> > +}
> > +
> >   return write_sstateen(env, csrno, wr_mask, new_val);
> >   }
> >   
> > diff --git a/target/riscv/insn_trans/trans_rvf.c.inc
> > b/target/riscv/insn_trans/trans_rvf.c.inc
> > index a1d3eb52ad..ce8a0cc34b 100644
> > --- a/target/riscv/insn_trans/trans_rvf.c.inc
> > +++ b/target/riscv/insn_trans/trans_rvf.c.inc
> > @@ -24,9 +24,43 @@
> >   return false; \
> >   } while (0)
> >   
> > -#define REQUIRE_ZFINX_OR_F(ctx) do {\
> > -if (!ctx->cfg_ptr->ext_zfinx) { \
> > -REQUIRE_EXT(ctx, RVF); \
> > +#ifndef CONFIG_USER_ONLY
> > +static inline bool smstateen_check(DisasContext *ctx, int index)
> > +{
> > +CPUState *cpu = ctx->cs;
> > +CPURISCVState *env = cpu->env_ptr;
> > +uint64_t stateen = env->mstateen[index];
> > +
> > +if (!ctx->cfg_ptr->ext_smstateen || env->priv == PRV_M) {
> > +return true;
> > +}
> > +
> > +if (ctx->virt_enabled) {
> > +stateen &= env->hstateen[index];
> > +}
> > +
> > +if (env->priv == PRV_U && has_ext(ctx, RVS)) {
> > +stateen &= env->sstateen[index];
> > +}
> > +
> > +if (!(stateen & SMSTATEEN0_FCSR)) {
> > +return false;
> > +}
> > +
> > +return true;
> > +}
> > +#else
> > +#define smstateen_check(ctx, index) (true)
> > +#endif
> > +
> > +#define REQUIRE_ZFINX_OR_F(ctx) do { \
> > +if (!has_ext(ctx, RVF)) { \
> > +if (!ctx->cfg_ptr->ext_zfinx) { \
> > +return false; \
> > +} \
> > +if (!smstateen_check(ctx, 0)) { \
> > +return false; \
> > +} \
> >   } \
> >   } while (0)
> 
> I think the potential exception triggered by smstateen_check is not 
> correct here:
> 
> "return false"  can only trigger illegal instruction exception.
> 
> However, smstateen_check  is for accessing fcsr CSR, It may trigger 
> illegal or  virtual instruction exception
> 
> based on the privilege mode and Xstateen CSRs.
> 
> Regards,
> 
> Weiwei Li

Ok. Need to check on how to do it.
> 
> >   
> > diff --git 

Re: [PATCH v2 19/23] target/i386: Use gen_jmp_rel for gen_jcc

2022-10-01 Thread Richard Henderson

On 10/1/22 00:03, Paolo Bonzini wrote:

On Sat, Oct 1, 2022 at 3:04 AM Richard Henderson
 wrote:


On 9/21/22 06:09, Paolo Bonzini wrote:

On Tue, Sep 6, 2022 at 12:09 PM Richard Henderson
 wrote:

+gen_jcc1(s, b, l1);
+gen_jmp_rel(s, ot, 0, 1);
+gen_set_label(l1);
+gen_jmp_rel(s, ot, diff, 0);


Might be worth a comment that jumps with 16-bit operand size truncate
EIP even if the jump is not taken.


Hmm.  But is that correct?  That's not reflected by the pseudocode for Jcc.


No, it's not:

int main() {
 asm("clc; data16 jc 1f; 1:");
}

does not crash (it does with stc) on real hardware, but it does with
this series applied.  So the various occurrences of gen_jmp_rel(s, ot,
0, 1) or gen_jmp_rel(s, MO_32, 0, 1) should stay as gen_jmp_tb(s,
s->pc - s->cs_base, 1).


Nice test.  I had an idea this would be the case, so I had already added a helper to 
perform the jump with truncation to the "current code size".  It turned out that I needed 
that in other places too, like rep.


New patch set coming up.


r~



Re: [PATCH v9 1/4] target/riscv: Add smstateen support

2022-10-01 Thread mchitale
On Thu, 2022-09-29 at 11:43 +1000, Alistair Francis wrote:
> On Thu, Sep 29, 2022 at 10:58 AM weiwei  wrote:
> > 
> > On 2022/9/19 14:29, Mayuresh Chitale wrote:
> > > Smstateen extension specifies a mechanism to close
> > > the potential covert channels that could cause security issues.
> > > 
> > > This patch adds the CSRs defined in the specification and
> > > the corresponding predicates and read/write functions.
> > > 
> > > Signed-off-by: Mayuresh Chitale 
> > > ---
> > >   target/riscv/cpu.h  |   4 +
> > >   target/riscv/cpu_bits.h |  37 
> > >   target/riscv/csr.c  | 373
> > > 
> > >   target/riscv/machine.c  |  21 +++
> > >   4 files changed, 435 insertions(+)
> > > 
> > > diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> > > index 06751e1e3e..e407abbf93 100644
> > > --- a/target/riscv/cpu.h
> > > +++ b/target/riscv/cpu.h
> > > @@ -362,6 +362,9 @@ struct CPUArchState {
> > > 
> > >   /* CSRs for execution enviornment configuration */
> > >   uint64_t menvcfg;
> > > +uint64_t mstateen[SMSTATEEN_MAX_COUNT];
> > > +uint64_t hstateen[SMSTATEEN_MAX_COUNT];
> > > +uint64_t sstateen[SMSTATEEN_MAX_COUNT];
> > >   target_ulong senvcfg;
> > >   uint64_t henvcfg;
> > >   #endif
> > > @@ -437,6 +440,7 @@ struct RISCVCPUConfig {
> > >   bool ext_ifencei;
> > >   bool ext_icsr;
> > >   bool ext_zihintpause;
> > > +bool ext_smstateen;
> > >   bool ext_sstc;
> > >   bool ext_svinval;
> > >   bool ext_svnapot;
> > > diff --git a/target/riscv/cpu_bits.h b/target/riscv/cpu_bits.h
> > > index 7be12cac2e..9a3321e27c 100644
> > > --- a/target/riscv/cpu_bits.h
> > > +++ b/target/riscv/cpu_bits.h
> > > @@ -199,6 +199,12 @@
> > >   /* Supervisor Configuration CSRs */
> > >   #define CSR_SENVCFG 0x10A
> > > 
> > > +/* Supervisor state CSRs */
> > > +#define CSR_SSTATEEN0   0x10C
> > > +#define CSR_SSTATEEN1   0x10D
> > > +#define CSR_SSTATEEN2   0x10E
> > > +#define CSR_SSTATEEN3   0x10F
> > > +
> > >   /* Supervisor Trap Handling */
> > >   #define CSR_SSCRATCH0x140
> > >   #define CSR_SEPC0x141
> > > @@ -246,6 +252,16 @@
> > >   #define CSR_HENVCFG 0x60A
> > >   #define CSR_HENVCFGH0x61A
> > > 
> > > +/* Hypervisor state CSRs */
> > > +#define CSR_HSTATEEN0   0x60C
> > > +#define CSR_HSTATEEN0H  0x61C
> > > +#define CSR_HSTATEEN1   0x60D
> > > +#define CSR_HSTATEEN1H  0x61D
> > > +#define CSR_HSTATEEN2   0x60E
> > > +#define CSR_HSTATEEN2H  0x61E
> > > +#define CSR_HSTATEEN3   0x60F
> > > +#define CSR_HSTATEEN3H  0x61F
> > > +
> > >   /* Virtual CSRs */
> > >   #define CSR_VSSTATUS0x200
> > >   #define CSR_VSIE0x204
> > > @@ -291,6 +307,27 @@
> > >   #define CSR_MENVCFG 0x30A
> > >   #define CSR_MENVCFGH0x31A
> > > 
> > > +/* Machine state CSRs */
> > > +#define CSR_MSTATEEN0   0x30C
> > > +#define CSR_MSTATEEN0H  0x31C
> > > +#define CSR_MSTATEEN1   0x30D
> > > +#define CSR_MSTATEEN1H  0x31D
> > > +#define CSR_MSTATEEN2   0x30E
> > > +#define CSR_MSTATEEN2H  0x31E
> > > +#define CSR_MSTATEEN3   0x30F
> > > +#define CSR_MSTATEEN3H  0x31F
> > > +
> > > +/* Common defines for all smstateen */
> > > +#define SMSTATEEN_MAX_COUNT 4
> > > +#define SMSTATEEN0_CS   (1ULL << 0)
> > > +#define SMSTATEEN0_FCSR (1ULL << 1)
> > > +#define SMSTATEEN0_HSCONTXT (1ULL << 57)
> > > +#define SMSTATEEN0_IMSIC(1ULL << 58)
> > > +#define SMSTATEEN0_AIA  (1ULL << 59)
> > > +#define SMSTATEEN0_SVSLCT   (1ULL << 60)
> > > +#define SMSTATEEN0_HSENVCFG (1ULL << 62)
> > > +#define SMSTATEEN_STATEEN   (1ULL << 63)
> > > +
> > >   /* Enhanced Physical Memory Protection (ePMP) */
> > >   #define CSR_MSECCFG 0x747
> > >   #define CSR_MSECCFGH0x757
> > > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> > > index b96db1b62b..bbfdd49abd 100644
> > > --- a/target/riscv/csr.c
> > > +++ b/target/riscv/csr.c
> > > @@ -278,6 +278,72 @@ static RISCVException umode32(CPURISCVState
> > > *env, int csrno)
> > >   return umode(env, csrno);
> > >   }
> > > 
> > > +static RISCVException mstateen(CPURISCVState *env, int csrno)
> > > +{
> > > +CPUState *cs = env_cpu(env);
> > > +RISCVCPU *cpu = RISCV_CPU(cs);
> > > +
> > > +if (!cpu->cfg.ext_smstateen) {
> > > +return RISCV_EXCP_ILLEGAL_INST;
> > > +}
> > > +
> > > +return any(env, csrno);
> > > +}
> > > +
> > > +static RISCVException hstateen_pred(CPURISCVState *env, int
> > > csrno, int base)
> > > +{
> > > +CPUState *cs = env_cpu(env);
> > > +RISCVCPU *cpu = RISCV_CPU(cs);
> > > +
> > > +if (!cpu->cfg.ext_smstateen) {
> > > +return RISCV_EXCP_ILLEGAL_INST;
> > > +}
> > > +
> > > +if (env->priv < PRV_M) {
> > > +if (!(env->mstateen[csrno - base] & SMSTATEEN_STATEEN))
> > > {
> > > +return RISCV_EXCP_ILLEGAL_INST;
> > > +}
> 

Re: [PATCH 5/6] rx: re-randomize rng-seed on reboot

2022-10-01 Thread Yoshinori Sato
On Fri, 30 Sep 2022 08:23:38 +0900,
Jason A. Donenfeld wrote:
> 
> When the system reboots, the rng-seed that the FDT has should be
> re-randomized, so that the new boot gets a new seed. Since the FDT is in
> the ROM region at this point, we add a hook right after the ROM has been
> added, so that we have a pointer to that copy of the FDT.
> 
> Cc: Yoshinori Sato 
> Signed-off-by: Jason A. Donenfeld 
> ---
>  hw/rx/rx-gdbsim.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/hw/rx/rx-gdbsim.c b/hw/rx/rx-gdbsim.c
> index 8ffe1b8035..198d048964 100644
> --- a/hw/rx/rx-gdbsim.c
> +++ b/hw/rx/rx-gdbsim.c
> @@ -25,6 +25,7 @@
>  #include "hw/rx/rx62n.h"
>  #include "sysemu/qtest.h"
>  #include "sysemu/device_tree.h"
> +#include "sysemu/reset.h"
>  #include "hw/boards.h"
>  #include "qom/object.h"
>  
> @@ -148,6 +149,8 @@ static void rx_gdbsim_init(MachineState *machine)
>  dtb_offset = ROUND_DOWN(machine->ram_size - dtb_size, 16);
>  rom_add_blob_fixed("dtb", dtb, dtb_size,
> SDRAM_BASE + dtb_offset);
> +qemu_register_reset(qemu_fdt_randomize_seeds,
> +rom_ptr(SDRAM_BASE + dtb_offset, dtb_size));
>  /* Set dtb address to R1 */
>  RX_CPU(first_cpu)->env.regs[1] = SDRAM_BASE + dtb_offset;
>  }
> -- 
> 2.37.3
> 

Reviewed-by: Yoshinori Sato 

-- 
Yosinori Sato



Re: [PATCH v4 26/54] fsdev/virtfs-proxy-helper: Use g_mkdir()

2022-10-01 Thread Christian Schoenebeck
On Samstag, 1. Oktober 2022 05:48:18 CEST Bin Meng wrote:
> Hi Christian,
> 
> On Tue, Sep 27, 2022 at 7:07 PM Bin Meng  wrote:
> > From: Bin Meng 
> > 
> > Use g_mkdir() to create a directory on all platforms.
> > 
> > Signed-off-by: Bin Meng 
> > Reviewed-by: Christian Schoenebeck 
> > ---
> > 
> > (no changes since v2)
> > 
> > Changes in v2:
> > - Change to use g_mkdir()
> > 
> >  fsdev/virtfs-proxy-helper.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Would you pick up this patch in your queue?
> 
> Regards,
> Bin

Queued on 9p.next:
https://github.com/cschoenebeck/qemu/commits/9p.next

Thanks!

Note that I currently don't have much in my queue yet, so it will probably 
take at least one week or two before I send the next PR.

Also note that I plan more refactoring on the 9p tests in the upcoming days. 
So if you have plans for 9p test changes, better wait for my next PR to avoid 
conflicts.

Best regards,
Christian Schoenebeck





Re: [PATCH 11/12] audio: fix sw->buf size for audio recording

2022-10-01 Thread Volker Rümelin

Am 27.09.22 um 13:54 schrieb Marc-André Lureau:



On Fri, Sep 23, 2022 at 10:48 PM Volker Rümelin  
wrote:


The calculation of the buffer size needed to store audio samples
after resampling is wrong for audio recording. For audio recording
sw->ratio is calculated as

sw->ratio = frontend sample rate / backend sample rate.

>From this follows

frontend samples = frontend sample rate / backend sample rate
 * backend samples
frontend samples = sw->ratio * backend samples

In 2 of 3 places in the audio recording code where sw->ratio
is used in a calculation to get the number of frontend frames,
the calculation is wrong. Fix this. The 3rd formula in
audio_pcm_sw_read() is correct.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/71
Signed-off-by: Volker Rümelin 


Would you mind adding the test to qtest?

lgtm
Acked-by: Marc-André Lureau 



Hi Marc-André,

I will give it a try. But it will be a separate patch, because the test 
from issue #71 now checks for the error at 
https://lists.nongnu.org/archive/html/qemu-devel/2022-09/msg02347.html 
and not the one from issue #71.


With best regards,
Volker



Re: [PATCH v2] tests/9p: split virtio-9p-test.c into tests and 9p client part

2022-10-01 Thread Christian Schoenebeck
On Donnerstag, 29. September 2022 13:41:06 CEST Christian Schoenebeck wrote:
> This patch is pure refactoring, it does not change behaviour.
> 
> virtio-9p-test.c grew to 1657 lines. Let's split this file up between
> actual 9p test cases vs. 9p test client, to make it easier to
> concentrate on the actual 9p tests.
> 
> Move the 9p test client code to a new unit virtio-9p-client.c, which
> are basically all functions and types prefixed with v9fs_* already.
> 
> Note that some client wrapper functions (do_*) are preserved in
> virtio-9p-test.c, simply because these wrapper functions are going to
> be wiped with subsequent patches anyway.
> 
> As the global QGuestAllocator variable is moved to virtio-9p-client.c,
> add a new function v9fs_set_allocator() to be used by virtio-9p-test.c
> instead of fiddling with a global variable across units and libraries.
> 
> Signed-off-by: Christian Schoenebeck 
> Reviewed-by: Greg Kurz 
> ---

Queued on 9p.next:
https://github.com/cschoenebeck/qemu/commits/9p.next

Thanks!

Best regards,
Christian Schoenebeck

> 
> v1 -> v2:
>   - Move osdep.h include from virtio-9p-client.h to virtio-9p-client.c.
> 
>  tests/qtest/libqos/meson.build|   1 +
>  tests/qtest/libqos/virtio-9p-client.c | 684 +++
>  tests/qtest/libqos/virtio-9p-client.h | 138 +
>  tests/qtest/virtio-9p-test.c  | 770 +-
>  4 files changed, 849 insertions(+), 744 deletions(-)
>  create mode 100644 tests/qtest/libqos/virtio-9p-client.c
>  create mode 100644 tests/qtest/libqos/virtio-9p-client.h






Re: [PATCH v2 19/23] target/i386: Use gen_jmp_rel for gen_jcc

2022-10-01 Thread Paolo Bonzini
On Sat, Oct 1, 2022 at 3:04 AM Richard Henderson
 wrote:
>
> On 9/21/22 06:09, Paolo Bonzini wrote:
> > On Tue, Sep 6, 2022 at 12:09 PM Richard Henderson
> >  wrote:
> > > +gen_jcc1(s, b, l1);
> > > +gen_jmp_rel(s, ot, 0, 1);
> > > +gen_set_label(l1);
> > > +gen_jmp_rel(s, ot, diff, 0);
> >
> > Might be worth a comment that jumps with 16-bit operand size truncate
> > EIP even if the jump is not taken.
>
> Hmm.  But is that correct?  That's not reflected by the pseudocode for Jcc.

No, it's not:

int main() {
asm("clc; data16 jc 1f; 1:");
}

does not crash (it does with stc) on real hardware, but it does with
this series applied.  So the various occurrences of gen_jmp_rel(s, ot,
0, 1) or gen_jmp_rel(s, MO_32, 0, 1) should stay as gen_jmp_tb(s,
s->pc - s->cs_base, 1).

Paolo




Re: [PATCH v2 17/23] target/i386: Create gen_jmp_rel

2022-10-01 Thread Paolo Bonzini
On Sat, Oct 1, 2022 at 2:53 AM Richard Henderson
 wrote:
> I believe it really should be s->dflag, which makes all users of the function 
> pass dflag
> (the manual consistently talks about "operand size").  At which point this 
> parameter goes
> away and gen_jmp_rel grabs the operand size from DisasContext.
>
> Also, pre-existing bug vs CODE64 here -- operand size is always 64-bits for 
> near jumps.

Yes, sounds good.

Paolo