That's what I suggested, We preserve a float computing cache typedef struct FpRecord { uint8_t op; float32 A; float32 B; } FpRecord; FpRecord fp_cache[1024]; int fp_cache_length; uint32_t fp_exceptions;
1. For each new fp operation we push it to the fp_cache, 2. Once we read the fp_exceptions , then we re-compute the fp_exceptions by re-running the fp FpRecord sequence. and clear fp_cache_length. 3. If we clear the fp_exceptions , then we set fp_cache_length to 0 and clear fp_exceptions. 4. If the fp_cache are full, then we re-compute the fp_exceptions by re-running the fp FpRecord sequence. Now the keypoint is how to tracking the read and write of FPSCR register, The current code are cpu_fpscr = tcg_global_mem_new(cpu_env, offsetof(CPUPPCState, fpscr), "fpscr"); On Fri, May 1, 2020 at 9:59 AM Programmingkid <programmingk...@gmail.com> wrote: > > > On Apr 30, 2020, at 12:34 PM, Dino Papararo <skizzat...@msn.com> wrote: > > > > Maybe the fastest way to implement hardfloats for ppc could be run them > by default and until some fpu instruction request for FPSCR register. > > At this time probably we want to check for some exception.. so QEMU > could come back to last fpu instruction executed and re-execute it in > softfloat taking care this time of FPSCR flags, then continue in hardfloats > unitl another instruction looking for FPSCR register and so on.. > > > > Dino > > That sounds like a good idea. > > > -----Messaggio originale----- > > Da: BALATON Zoltan <bala...@eik.bme.hu> > > Inviato: giovedì 30 aprile 2020 17:36 > > A: 罗勇刚(Yonggang Luo) <luoyongg...@gmail.com> > > Cc: Richard Henderson <richard.hender...@linaro.org>; Dino Papararo < > skizzat...@msn.com>; qemu-devel@nongnu.org; Programmingkid < > programmingk...@gmail.com>; qemu-...@nongnu.org; Howard Spoelstra < > hsp.c...@gmail.com>; Alex Bennée <alex.ben...@linaro.org> > > Oggetto: Re: R: R: About hardfloat in ppc > > > > On Thu, 30 Apr 2020, 罗勇刚(Yonggang Luo) wrote: > >> I propose a new way to computing the float flags, We preserve a float > >> computing cash typedef struct FpRecord { uint8_t op; > >> float32 A; > >> float32 B; > >> } FpRecord; > >> FpRecord fp_cache[1024]; > >> int fp_cache_length; > >> uint32_t fp_exceptions; > >> > >> 1. For each new fp operation we push it to the fp_cache, 2. Once we > >> read the fp_exceptions , then we re-compute the fp_exceptions by > >> re-running the fp FpRecord sequence. > >> and clear fp_cache_length. > >> 3. If we clear the fp_exceptions , then we set fp_cache_length to 0 > >> and clear fp_exceptions. > >> 4. If the fp_cache are full, then we re-compute the fp_exceptions by > >> re-running the fp FpRecord sequence. > >> > >> Would this be a general method to use hard-float? > >> The consued time should be 2*hard_float. > >> Considerating read fp_exceptions are rare, then the amortized time > >> complexity would be 1 * hard_float. > > > > It's hard to guess what the hit rate of such cache would be and if it's > low then managing the cache is probably more expensive than running with > softfloat. So to evaluate any proposed patch we also need some benchmarks > which we can experiment with to tell if the results are good or not > otherwise we're just guessing. Are there some existing tests and benchmarks > that we can use? Alex mentioned fp-bench I think and to evaluate the > correctness of the FP implementation I've seen this other > > conversation: > > > > https://lists.nongnu.org/archive/html/qemu-devel/2020-04/msg05107.html > > https://lists.nongnu.org/archive/html/qemu-devel/2020-04/msg05126.html > > > > Is that something we can use for PPC as well to check the correctness? > > > > So I think before implementing any potential solution that came up in > this brainstorming the first step would be to get and compile (or write if > not > > available) some tests and benchmarks: > > > > 1. testing host behaviour for inexact and compare that for different > archs 2. some FP tests that can be used to compare results with QEMU and > real CPU to check correctness of emulation (if these check for inexact > differences then could be used instead of 1.) 3. some benchmarks to > evaluate QEMU performance (these could be same as FP tests or some real > world FP heavy applications). > > > > Then we can see if the proposed solution is faster and still correct. > > > > Regards, > > BALATON Zoltan > > -- 此致 礼 罗勇刚 Yours sincerely, Yonggang Luo