Re: [PATCH 4/9] target/arm: Support migration when FPSR/FPCR won't fit in the FPSCR

2024-06-28 Thread Peter Maydell
On Fri, 28 Jun 2024 at 17:01, Richard Henderson
 wrote:
>
> On 6/28/24 07:23, Peter Maydell wrote:
> > To support FPSR and FPCR bits that don't exist in the AArch32 FPSCR
> > view of floating point control and status (such as the FEAT_AFP ones),
> > we need to make sure those bits can be migrated. This commit allows
> > that, whilst maintaining backwards and forwards migration compatibility
> > for CPUs where there are no such bits:
> >
> > On sending:
> >   * If either the FPCR or the FPSR include set bits that are not
> > visible in the AArch32 FPSCR view of floating point control/status
> > then we send the FPCR and FPSR as two separate fields in a new
> > cpu/vfp/fpcr_fpsr subsection, and we send a 0 for the old
> > FPSCR field in cpu/vfp
> >   * Otherwise, we don't send the fpcr_fpsr subsection, and we send
> > an FPSCR-format value in cpu/vfp as we did previously
> >
> > On receiving:
> >   * if we see a non-zero FPSCR field, that is the right information
> >   * if we see a fpcr_fpsr subsection then that has the information
> >   * if we see neither, then FPSCR/FPCR/FPSR are all zero on the source;
> > cpu_pre_load() ensures the CPU state defaults to that
> >   * if we see both, then the migration source is buggy or malicious;
> > either the fpcr_fpsr or the FPSCR will "win" depending which
> > is first in the migration stream; we don't care which that is
> >
> > We make the new FPCR and FPSR on-the-wire data be 64 bits, because
> > architecturally these registers are that wide, and this avoids the
> > need to engage in further migration-compatibility contortions in
> > future if some new architecture revision defines bits in the high
> > half of either register.
> >
> > (We won't ever send the new migration subsection until we add support
> > for a CPU feature which enables setting overlapping FPCR bits, like
> > FEAT_AFP.)
> >
> > Signed-off-by: Peter Maydell
> > ---
> >   target/arm/machine.c | 134 ++-
> >   1 file changed, 132 insertions(+), 2 deletions(-)
>
> Reviewed-by: Richard Henderson 
>
> Not ideal, as vfp_get_{fpcr,fpsr} are called 3 or 4 times during migration.  
> But unless we
> have separate 'fp*r_migrate' fields in cpu state, initialized in pre_save, 
> there's no
> getting around it.  And I suppose migration isn't exactly performance 
> critical.

Yeah, we could have done it that way, but I am assuming that
the time taken for this is pretty miniscule in the general
scheme of how long migration takes, so I preferred the
way that doesn't clutter up the CPU state struct with
migration-only fields.

If somebody cares about migration downtime performance (which
does actually matter for some workload/use cases AIUI) they
can do some benchmarking and tell us what the actually
slow parts are :-)

thanks
-- PMM



Re: [PATCH 4/9] target/arm: Support migration when FPSR/FPCR won't fit in the FPSCR

2024-06-28 Thread Richard Henderson

On 6/28/24 07:23, Peter Maydell wrote:

To support FPSR and FPCR bits that don't exist in the AArch32 FPSCR
view of floating point control and status (such as the FEAT_AFP ones),
we need to make sure those bits can be migrated. This commit allows
that, whilst maintaining backwards and forwards migration compatibility
for CPUs where there are no such bits:

On sending:
  * If either the FPCR or the FPSR include set bits that are not
visible in the AArch32 FPSCR view of floating point control/status
then we send the FPCR and FPSR as two separate fields in a new
cpu/vfp/fpcr_fpsr subsection, and we send a 0 for the old
FPSCR field in cpu/vfp
  * Otherwise, we don't send the fpcr_fpsr subsection, and we send
an FPSCR-format value in cpu/vfp as we did previously

On receiving:
  * if we see a non-zero FPSCR field, that is the right information
  * if we see a fpcr_fpsr subsection then that has the information
  * if we see neither, then FPSCR/FPCR/FPSR are all zero on the source;
cpu_pre_load() ensures the CPU state defaults to that
  * if we see both, then the migration source is buggy or malicious;
either the fpcr_fpsr or the FPSCR will "win" depending which
is first in the migration stream; we don't care which that is

We make the new FPCR and FPSR on-the-wire data be 64 bits, because
architecturally these registers are that wide, and this avoids the
need to engage in further migration-compatibility contortions in
future if some new architecture revision defines bits in the high
half of either register.

(We won't ever send the new migration subsection until we add support
for a CPU feature which enables setting overlapping FPCR bits, like
FEAT_AFP.)

Signed-off-by: Peter Maydell
---
  target/arm/machine.c | 134 ++-
  1 file changed, 132 insertions(+), 2 deletions(-)


Reviewed-by: Richard Henderson 

Not ideal, as vfp_get_{fpcr,fpsr} are called 3 or 4 times during migration.  But unless we 
have separate 'fp*r_migrate' fields in cpu state, initialized in pre_save, there's no 
getting around it.  And I suppose migration isn't exactly performance critical.



r~



[PATCH 4/9] target/arm: Support migration when FPSR/FPCR won't fit in the FPSCR

2024-06-28 Thread Peter Maydell
To support FPSR and FPCR bits that don't exist in the AArch32 FPSCR
view of floating point control and status (such as the FEAT_AFP ones),
we need to make sure those bits can be migrated. This commit allows
that, whilst maintaining backwards and forwards migration compatibility
for CPUs where there are no such bits:

On sending:
 * If either the FPCR or the FPSR include set bits that are not
   visible in the AArch32 FPSCR view of floating point control/status
   then we send the FPCR and FPSR as two separate fields in a new
   cpu/vfp/fpcr_fpsr subsection, and we send a 0 for the old
   FPSCR field in cpu/vfp
 * Otherwise, we don't send the fpcr_fpsr subsection, and we send
   an FPSCR-format value in cpu/vfp as we did previously

On receiving:
 * if we see a non-zero FPSCR field, that is the right information
 * if we see a fpcr_fpsr subsection then that has the information
 * if we see neither, then FPSCR/FPCR/FPSR are all zero on the source;
   cpu_pre_load() ensures the CPU state defaults to that
 * if we see both, then the migration source is buggy or malicious;
   either the fpcr_fpsr or the FPSCR will "win" depending which
   is first in the migration stream; we don't care which that is

We make the new FPCR and FPSR on-the-wire data be 64 bits, because
architecturally these registers are that wide, and this avoids the
need to engage in further migration-compatibility contortions in
future if some new architecture revision defines bits in the high
half of either register.

(We won't ever send the new migration subsection until we add support
for a CPU feature which enables setting overlapping FPCR bits, like
FEAT_AFP.)

Signed-off-by: Peter Maydell 
---
 target/arm/machine.c | 134 ++-
 1 file changed, 132 insertions(+), 2 deletions(-)

diff --git a/target/arm/machine.c b/target/arm/machine.c
index 0a722ca7e75..8c820955d95 100644
--- a/target/arm/machine.c
+++ b/target/arm/machine.c
@@ -18,6 +18,34 @@ static bool vfp_needed(void *opaque)
 : cpu_isar_feature(aa32_vfp_simd, cpu));
 }
 
+static bool vfp_fpcr_fpsr_needed(void *opaque)
+{
+/*
+ * If either the FPCR or the FPSR include set bits that are not
+ * visible in the AArch32 FPSCR view of floating point control/status
+ * then we must send the FPCR and FPSR as two separate fields in the
+ * cpu/vfp/fpcr_fpsr subsection, and we will send a 0 for the old
+ * FPSCR field in cpu/vfp.
+ *
+ * If all the set bits are representable in an AArch32 FPSCR then we
+ * send that value as the cpu/vfp FPSCR field, and don't send the
+ * cpu/vfp/fpcr_fpsr subsection.
+ *
+ * On incoming migration, if the cpu/vfp FPSCR field is non-zero we
+ * use it, and if the fpcr_fpsr subsection is present we use that.
+ * (The subsection will never be present with a non-zero FPSCR field,
+ * and if FPSCR is zero and the subsection is not present that means
+ * that FPSCR/FPSR/FPCR are zero.)
+ *
+ * This preserves migration compatibility with older QEMU versions,
+ * in both directions.
+ */
+ARMCPU *cpu = opaque;
+CPUARMState *env = >env;
+
+return (vfp_get_fpcr(env) & ~FPCR_MASK) || (vfp_get_fpsr(env) & 
~FPSR_MASK);
+}
+
 static int get_fpscr(QEMUFile *f, void *opaque, size_t size,
  const VMStateField *field)
 {
@@ -25,7 +53,10 @@ static int get_fpscr(QEMUFile *f, void *opaque, size_t size,
 CPUARMState *env = >env;
 uint32_t val = qemu_get_be32(f);
 
-vfp_set_fpscr(env, val);
+if (val) {
+/* 0 means we might have the data in the fpcr_fpsr subsection */
+vfp_set_fpscr(env, val);
+}
 return 0;
 }
 
@@ -34,8 +65,9 @@ static int put_fpscr(QEMUFile *f, void *opaque, size_t size,
 {
 ARMCPU *cpu = opaque;
 CPUARMState *env = >env;
+uint32_t fpscr = vfp_fpcr_fpsr_needed(opaque) ? 0 : vfp_get_fpscr(env);
 
-qemu_put_be32(f, vfp_get_fpscr(env));
+qemu_put_be32(f, fpscr);
 return 0;
 }
 
@@ -45,6 +77,86 @@ static const VMStateInfo vmstate_fpscr = {
 .put = put_fpscr,
 };
 
+static int get_fpcr(QEMUFile *f, void *opaque, size_t size,
+ const VMStateField *field)
+{
+ARMCPU *cpu = opaque;
+CPUARMState *env = >env;
+uint64_t val = qemu_get_be64(f);
+
+vfp_set_fpcr(env, val);
+return 0;
+}
+
+static int put_fpcr(QEMUFile *f, void *opaque, size_t size,
+ const VMStateField *field, JSONWriter *vmdesc)
+{
+ARMCPU *cpu = opaque;
+CPUARMState *env = >env;
+
+qemu_put_be64(f, vfp_get_fpcr(env));
+return 0;
+}
+
+static const VMStateInfo vmstate_fpcr = {
+.name = "fpcr",
+.get = get_fpcr,
+.put = put_fpcr,
+};
+
+static int get_fpsr(QEMUFile *f, void *opaque, size_t size,
+ const VMStateField *field)
+{
+ARMCPU *cpu = opaque;
+CPUARMState *env = >env;
+uint64_t val = qemu_get_be64(f);
+
+vfp_set_fpsr(env, val);
+return 0;
+}
+
+static