Re: [Qemu-devel] [RFC PATCH 1/6] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access

2018-12-11 Thread Mark Cave-Ayland
On 10/12/2018 18:43, Richard Henderson wrote:

> On 12/7/18 2:56 AM, Mark Cave-Ayland wrote:
>> -gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env, 
>>   \
>> - cpu_fpr[rA(ctx->opcode)],  
>>   \
>> - cpu_fpr[rC(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);   
>>   \
>> +get_fpr(t0, rA(ctx->opcode));   
>>   \
>> +get_fpr(t1, rC(ctx->opcode));   
>>   \
>> +get_fpr(t2, rB(ctx->opcode));   
>>   \
>> +gen_helper_f##op(t3, cpu_env, t0, t1, t2);  
>>   \
>> +set_fpr(rD(ctx->opcode), t3);   
>>   \
>>  if (isfloat) {  
>>   \
>> -gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,  
>>   \
>> -cpu_fpr[rD(ctx->opcode)]);  
>>   \
>> +get_fpr(t0, rD(ctx->opcode));   
>>   \
>> +gen_helper_frsp(t3, cpu_env, t0);   
>>   \
>> +set_fpr(rD(ctx->opcode), t3);   
>>   \
>>  }   
>>   \
> 
> This is an accurate conversion, but the writeback to the rD register need not
> happen until after helper_frsp.  Just move it below the isfloat block.

Okay - I'll fix this up in the next iteration.

> I do see that helper_frsp can raise an exception for invalid_op for SNaN.  If
> that code were actually reachable, this would have been an existing bug, in
> that we should not have written back to rD after the first operation.  
> However,
> any SNaN will already have been eliminated by the first operation (via
> squashing to QNaN or by exiting via exception).
> 
> Similarly in GEN_FLOAT_AB.

Noted.

>> +get_fpr(t0, rB(ctx->opcode));
>> +gen_helper_frsqrte(t1, cpu_env, t0);
>> +set_fpr(rD(ctx->opcode), t1);
>> +gen_helper_frsp(t1, cpu_env, t1);
>> +gen_compute_fprf_float64(t1);
> 
> gen_frsqrtes has the set_fpr in the wrong place.  Likewise gen_fsqrts.

Ooops. I remember having a think a bit harder about these two functions, so 
I'll go
and take another look.


ATB,

Mark.



Re: [Qemu-devel] [RFC PATCH 1/6] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access

2018-12-11 Thread Mark Cave-Ayland
On 10/12/2018 05:17, David Gibson wrote:

> On Fri, Dec 07, 2018 at 08:56:30AM +, Mark Cave-Ayland wrote:
>> These helpers allow us to move FP register values to/from the specified 
>> TCGv_i64
>> argument.
>>
>> To prevent FP helpers accessing the cpu_fpr array directly, add extra TCG
>> temporaries as required.
> 
> It's not obvious to me why that's a desirable thing.  I'm assuming
> it's somehow necessary for the stuff later in the series, but I think
> we need a brief rationale here to explain why this isn't just adding
> extra reg copies for the sake of it.

Ah okay. It's because the VSX registers extend the set of VMX and FPR registers 
(see
the cpu_vsrh() and cpu_vsrl() helpers at the top of
target/ppc/translate/vsx-impl.inc.c). So in order to remove the static TCGv_i64
arrays then these helpers must also be converted.


ATB,

Mark.



Re: [Qemu-devel] [RFC PATCH 1/6] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access

2018-12-10 Thread David Gibson
On Mon, Dec 10, 2018 at 12:25:26PM -0600, Richard Henderson wrote:
> On 12/9/18 11:17 PM, David Gibson wrote:
> > On Fri, Dec 07, 2018 at 08:56:30AM +, Mark Cave-Ayland wrote:
> >> These helpers allow us to move FP register values to/from the specified 
> >> TCGv_i64
> >> argument.
> >>
> >> To prevent FP helpers accessing the cpu_fpr array directly, add extra TCG
> >> temporaries as required.
> > 
> > It's not obvious to me why that's a desirable thing.  I'm assuming
> > it's somehow necessary for the stuff later in the series, but I think
> > we need a brief rationale here to explain why this isn't just adding
> > extra reg copies for the sake of it.
> 
> Note that while this introduces extra opcodes, in many cases it does not 
> change
> the number of machine instructions that are generated.  Recall that accessing
> cpu_fpr[N] implies a load from env.  This change makes the load explicit.

I realised that a bit later in looking at the series.  I think a
paraphrasing of the above in the commit message would still be
helpful.

> The change does currently prevent caching cpu_fpr[N] in a host register.  That
> can and will be fixed by optimizing on memory operations instead.  (There is a
> patch that has been outstanding for 13 months to do this.  I intend to finally
> get around to merging it during the 4.0 cycle.)
> 
> 
> r~
> 

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature


Re: [Qemu-devel] [RFC PATCH 1/6] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access

2018-12-10 Thread Richard Henderson
On 12/7/18 2:56 AM, Mark Cave-Ayland wrote:
> -gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,  
>  \
> - cpu_fpr[rA(ctx->opcode)],   
>  \
> - cpu_fpr[rC(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);
>  \
> +get_fpr(t0, rA(ctx->opcode));
>  \
> +get_fpr(t1, rC(ctx->opcode));
>  \
> +get_fpr(t2, rB(ctx->opcode));
>  \
> +gen_helper_f##op(t3, cpu_env, t0, t1, t2);   
>  \
> +set_fpr(rD(ctx->opcode), t3);
>  \
>  if (isfloat) {   
>  \
> -gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,   
>  \
> -cpu_fpr[rD(ctx->opcode)]);   
>  \
> +get_fpr(t0, rD(ctx->opcode));
>  \
> +gen_helper_frsp(t3, cpu_env, t0);
>  \
> +set_fpr(rD(ctx->opcode), t3);
>  \
>  }
>  \

This is an accurate conversion, but the writeback to the rD register need not
happen until after helper_frsp.  Just move it below the isfloat block.

I do see that helper_frsp can raise an exception for invalid_op for SNaN.  If
that code were actually reachable, this would have been an existing bug, in
that we should not have written back to rD after the first operation.  However,
any SNaN will already have been eliminated by the first operation (via
squashing to QNaN or by exiting via exception).

Similarly in GEN_FLOAT_AB.

> +get_fpr(t0, rB(ctx->opcode));
> +gen_helper_frsqrte(t1, cpu_env, t0);
> +set_fpr(rD(ctx->opcode), t1);
> +gen_helper_frsp(t1, cpu_env, t1);
> +gen_compute_fprf_float64(t1);

gen_frsqrtes has the set_fpr in the wrong place.  Likewise gen_fsqrts.


r~



Re: [Qemu-devel] [RFC PATCH 1/6] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access

2018-12-10 Thread Richard Henderson
On 12/9/18 11:17 PM, David Gibson wrote:
> On Fri, Dec 07, 2018 at 08:56:30AM +, Mark Cave-Ayland wrote:
>> These helpers allow us to move FP register values to/from the specified 
>> TCGv_i64
>> argument.
>>
>> To prevent FP helpers accessing the cpu_fpr array directly, add extra TCG
>> temporaries as required.
> 
> It's not obvious to me why that's a desirable thing.  I'm assuming
> it's somehow necessary for the stuff later in the series, but I think
> we need a brief rationale here to explain why this isn't just adding
> extra reg copies for the sake of it.

Note that while this introduces extra opcodes, in many cases it does not change
the number of machine instructions that are generated.  Recall that accessing
cpu_fpr[N] implies a load from env.  This change makes the load explicit.

The change does currently prevent caching cpu_fpr[N] in a host register.  That
can and will be fixed by optimizing on memory operations instead.  (There is a
patch that has been outstanding for 13 months to do this.  I intend to finally
get around to merging it during the 4.0 cycle.)


r~



Re: [Qemu-devel] [RFC PATCH 1/6] target/ppc: introduce get_fpr() and set_fpr() helpers for FP register access

2018-12-09 Thread David Gibson
On Fri, Dec 07, 2018 at 08:56:30AM +, Mark Cave-Ayland wrote:
> These helpers allow us to move FP register values to/from the specified 
> TCGv_i64
> argument.
> 
> To prevent FP helpers accessing the cpu_fpr array directly, add extra TCG
> temporaries as required.

It's not obvious to me why that's a desirable thing.  I'm assuming
it's somehow necessary for the stuff later in the series, but I think
we need a brief rationale here to explain why this isn't just adding
extra reg copies for the sake of it.

> 
> Signed-off-by: Mark Cave-Ayland 
> ---
>  target/ppc/translate.c |  10 +
>  target/ppc/translate/fp-impl.inc.c | 492 
> -
>  2 files changed, 392 insertions(+), 110 deletions(-)
> 
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 2b37910248..1d4bf624a3 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -6694,6 +6694,16 @@ static inline void gen_##name(DisasContext *ctx)   
> \
>  GEN_TM_PRIV_NOOP(treclaim);
>  GEN_TM_PRIV_NOOP(trechkpt);
>  
> +static inline void get_fpr(TCGv_i64 dst, int regno)
> +{
> +tcg_gen_mov_i64(dst, cpu_fpr[regno]);
> +}
> +
> +static inline void set_fpr(int regno, TCGv_i64 src)
> +{
> +tcg_gen_mov_i64(cpu_fpr[regno], src);
> +}
> +
>  #include "translate/fp-impl.inc.c"
>  
>  #include "translate/vmx-impl.inc.c"
> diff --git a/target/ppc/translate/fp-impl.inc.c 
> b/target/ppc/translate/fp-impl.inc.c
> index 08770ba9f5..923fb7550f 100644
> --- a/target/ppc/translate/fp-impl.inc.c
> +++ b/target/ppc/translate/fp-impl.inc.c
> @@ -34,24 +34,39 @@ static void gen_set_cr1_from_fpscr(DisasContext *ctx)
>  #define _GEN_FLOAT_ACB(name, op, op1, op2, isfloat, set_fprf, type)  
>  \
>  static void gen_f##name(DisasContext *ctx)   
>  \
>  {
>  \
> +TCGv_i64 t0; 
>  \
> +TCGv_i64 t1; 
>  \
> +TCGv_i64 t2; 
>  \
> +TCGv_i64 t3; 
>  \
>  if (unlikely(!ctx->fpu_enabled)) {   
>  \
>  gen_exception(ctx, POWERPC_EXCP_FPU);
>  \
>  return;  
>  \
>  }
>  \
> +t0 = tcg_temp_new_i64(); 
>  \
> +t1 = tcg_temp_new_i64(); 
>  \
> +t2 = tcg_temp_new_i64(); 
>  \
> +t3 = tcg_temp_new_i64(); 
>  \
>  gen_reset_fpstatus();
>  \
> -gen_helper_f##op(cpu_fpr[rD(ctx->opcode)], cpu_env,  
>  \
> - cpu_fpr[rA(ctx->opcode)],   
>  \
> - cpu_fpr[rC(ctx->opcode)], cpu_fpr[rB(ctx->opcode)]);
>  \
> +get_fpr(t0, rA(ctx->opcode));
>  \
> +get_fpr(t1, rC(ctx->opcode));
>  \
> +get_fpr(t2, rB(ctx->opcode));
>  \
> +gen_helper_f##op(t3, cpu_env, t0, t1, t2);   
>  \
> +set_fpr(rD(ctx->opcode), t3);
>  \
>  if (isfloat) {   
>  \
> -gen_helper_frsp(cpu_fpr[rD(ctx->opcode)], cpu_env,   
>  \
> -cpu_fpr[rD(ctx->opcode)]);   
>  \
> +get_fpr(t0, rD(ctx->opcode));
>  \
> +gen_helper_frsp(t3, cpu_env, t0);
>  \
> +set_fpr(rD(ctx->opcode), t3);
>  \
>  }
>  \
>  if (set_fprf) {  
>  \
> -gen_compute_fprf_float64(cpu_fpr[rD(ctx->opcode)]);  
>  \
> +gen_compute_fprf_float64(t3);
>  \
>  }
>  \
>  if (unlikely(Rc(ctx->opcode) != 0)) {
>  \
>  gen_set_cr1_from_fpscr(ctx); 
>  \
>  }
>  \
> +tcg_temp_free_i64(t0);