That's what I suggested,
We preserve a  float computing cache
typedef struct FpRecord {
  uint8_t op;
  float32 A;
  float32 B;
}  FpRecord;
FpRecord fp_cache[1024];
int fp_cache_length;
uint32_t fp_exceptions;

1. For each new fp operation we push it to the  fp_cache,
2. Once we read the fp_exceptions , then we re-compute
the fp_exceptions by re-running the fp FpRecord sequence.
 and clear  fp_cache_length.
3. If we clear the fp_exceptions , then we set fp_cache_length to 0 and
 clear  fp_exceptions.
4. If the  fp_cache are full, then we re-compute
the fp_exceptions by re-running the fp FpRecord sequence.

Now the keypoint is how to tracking the read and write of FPSCR register,
The current code are
    cpu_fpscr = tcg_global_mem_new(cpu_env,
                                   offsetof(CPUPPCState, fpscr), "fpscr");

On Fri, May 1, 2020 at 9:59 AM Programmingkid <programmingk...@gmail.com>
wrote:

>
> > On Apr 30, 2020, at 12:34 PM, Dino Papararo <skizzat...@msn.com> wrote:
> >
> > Maybe the fastest way to implement hardfloats for ppc could be run them
> by default and until some fpu instruction request for FPSCR register.
> > At this time probably we want to check for some exception.. so QEMU
> could come back to last fpu instruction executed and re-execute it in
> softfloat taking care this time of FPSCR flags, then continue in hardfloats
> unitl another instruction looking for FPSCR register and so on..
> >
> > Dino
>
> That sounds like a good idea.
>
> > -----Messaggio originale-----
> > Da: BALATON Zoltan <bala...@eik.bme.hu>
> > Inviato: giovedì 30 aprile 2020 17:36
> > A: 罗勇刚(Yonggang Luo) <luoyongg...@gmail.com>
> > Cc: Richard Henderson <richard.hender...@linaro.org>; Dino Papararo <
> skizzat...@msn.com>; qemu-devel@nongnu.org; Programmingkid <
> programmingk...@gmail.com>; qemu-...@nongnu.org; Howard Spoelstra <
> hsp.c...@gmail.com>; Alex Bennée <alex.ben...@linaro.org>
> > Oggetto: Re: R: R: About hardfloat in ppc
> >
> > On Thu, 30 Apr 2020, 罗勇刚(Yonggang Luo) wrote:
> >> I propose a new way to computing the float flags, We preserve a  float
> >> computing cash typedef struct FpRecord {  uint8_t op;
> >> float32 A;
> >> float32 B;
> >> }  FpRecord;
> >> FpRecord fp_cache[1024];
> >> int fp_cache_length;
> >> uint32_t fp_exceptions;
> >>
> >> 1. For each new fp operation we push it to the  fp_cache, 2. Once we
> >> read the fp_exceptions , then we re-compute the fp_exceptions by
> >> re-running the fp FpRecord sequence.
> >> and clear  fp_cache_length.
> >> 3. If we clear the fp_exceptions , then we set fp_cache_length to 0
> >> and clear  fp_exceptions.
> >> 4. If the  fp_cache are full, then we re-compute the fp_exceptions by
> >> re-running the fp FpRecord sequence.
> >>
> >> Would this be a general method to use hard-float?
> >> The consued time should be  2*hard_float.
> >> Considerating read fp_exceptions are rare, then the amortized time
> >> complexity would be 1 * hard_float.
> >
> > It's hard to guess what the hit rate of such cache would be and if it's
> low then managing the cache is probably more expensive than running with
> softfloat. So to evaluate any proposed patch we also need some benchmarks
> which we can experiment with to tell if the results are good or not
> otherwise we're just guessing. Are there some existing tests and benchmarks
> that we can use? Alex mentioned fp-bench I think and to evaluate the
> correctness of the FP implementation I've seen this other
> > conversation:
> >
> > https://lists.nongnu.org/archive/html/qemu-devel/2020-04/msg05107.html
> > https://lists.nongnu.org/archive/html/qemu-devel/2020-04/msg05126.html
> >
> > Is that something we can use for PPC as well to check the correctness?
> >
> > So I think before implementing any potential solution that came up in
> this brainstorming the first step would be to get and compile (or write if
> not
> > available) some tests and benchmarks:
> >
> > 1. testing host behaviour for inexact and compare that for different
> archs 2. some FP tests that can be used to compare results with QEMU and
> real CPU to check correctness of emulation (if these check for inexact
> differences then could be used instead of 1.) 3. some benchmarks to
> evaluate QEMU performance (these could be same as FP tests or some real
> world FP heavy applications).
> >
> > Then we can see if the proposed solution is faster and still correct.
> >
> > Regards,
> > BALATON Zoltan
>
>

-- 
         此致
礼
罗勇刚
Yours
    sincerely,
Yonggang Luo

Reply via email to