R: R: R: About hardfloat in ppc
Maybe the fastest way to implement hardfloats for ppc could be run them by default and until some fpu instruction request for FPSCR register. At this time probably we want to check for some exception.. so QEMU could come back to last fpu instruction executed and re-execute it in softfloat taking care this time of FPSCR flags, then continue in hardfloats unitl another instruction looking for FPSCR register and so on.. Dino -Messaggio originale- Da: BALATON Zoltan Inviato: giovedì 30 aprile 2020 17:36 A: 罗勇刚(Yonggang Luo) Cc: Richard Henderson ; Dino Papararo ; qemu-devel@nongnu.org; Programmingkid ; qemu-...@nongnu.org; Howard Spoelstra ; Alex Bennée Oggetto: Re: R: R: About hardfloat in ppc On Thu, 30 Apr 2020, 罗勇刚(Yonggang Luo) wrote: > I propose a new way to computing the float flags, We preserve a float > computing cash typedef struct FpRecord { uint8_t op; > float32 A; > float32 B; > } FpRecord; > FpRecord fp_cache[1024]; > int fp_cache_length; > uint32_t fp_exceptions; > > 1. For each new fp operation we push it to the fp_cache, 2. Once we > read the fp_exceptions , then we re-compute the fp_exceptions by > re-running the fp FpRecord sequence. > and clear fp_cache_length. > 3. If we clear the fp_exceptions , then we set fp_cache_length to 0 > and clear fp_exceptions. > 4. If the fp_cache are full, then we re-compute the fp_exceptions by > re-running the fp FpRecord sequence. > > Would this be a general method to use hard-float? > The consued time should be 2*hard_float. > Considerating read fp_exceptions are rare, then the amortized time > complexity would be 1 * hard_float. It's hard to guess what the hit rate of such cache would be and if it's low then managing the cache is probably more expensive than running with softfloat. So to evaluate any proposed patch we also need some benchmarks which we can experiment with to tell if the results are good or not otherwise we're just guessing. Are there some existing tests and benchmarks that we can use? Alex mentioned fp-bench I think and to evaluate the correctness of the FP implementation I've seen this other conversation: https://lists.nongnu.org/archive/html/qemu-devel/2020-04/msg05107.html https://lists.nongnu.org/archive/html/qemu-devel/2020-04/msg05126.html Is that something we can use for PPC as well to check the correctness? So I think before implementing any potential solution that came up in this brainstorming the first step would be to get and compile (or write if not available) some tests and benchmarks: 1. testing host behaviour for inexact and compare that for different archs 2. some FP tests that can be used to compare results with QEMU and real CPU to check correctness of emulation (if these check for inexact differences then could be used instead of 1.) 3. some benchmarks to evaluate QEMU performance (these could be same as FP tests or some real world FP heavy applications). Then we can see if the proposed solution is faster and still correct. Regards, BALATON Zoltan
R: R: About hardfloat in ppc
Hi Alex, maybe a pseudo code can show better what I mean if (ppc_fpu_instruction == USE_FPSCR) /* instruction have dot '.' so FPSCR will be updated and we need have care about it */ soft_decode (ppc_fpu_instruction) else /* instruction have not dot '.' and FPSCR will be never updated and we don't need to have care about it -> maxspeed */ hard_decode (ppc_fpu_instruction) In ppc assembly all instructions who needs to take care of inexact flag and/or exception flags, are processed prior than test instructions, look at following exception handling example: fadd. f0,f1,f2 # f1 + f2 = f0. CR1 contains except.summary bta 4,error # if bit 0 of CR1 is set, go to error # bit 0 is set if any exception occurs . # if clear, continue operation . . error: mcrfs 2,1 # copy FPSCR bits 4-7 to CR field 2 # now CR1 and CR2 (bits 6 through 10) # contain all exception bits from FPSCR bta 6,invalid # CR bit 6 signals invalid bta 7,overflow # CR bit 7 signals overflow bta 8,underflow # CR bit 8 signals underflow bta 9,divbyzero # CR bit 9 signals divide-by-zero bta 10,inexact # CR bit 10 signals inexact invalid: mcrfs 2,2 # copy FPSCR bits 8-11 to CR field 2 mcrfs 3,3 # copy FPSCR bits 12-15 to CR field 3 mcrfs 4,5 # copy FPSCR bits 20-23 to CR field 4 # invalid bits are now CR bits 11-16 and bit 23 # now do exception handling based on which invalid bit # is set overflow: # do exception handling for overflow exception underflow: # do exception handling for underflow exception divbyzero: #do exception handling for the divide-by-zero exception inexact: # do exception handling for the inexact exception In this way you can know as soon as possible if you can go with hardfloats or not. I leave to you TCG's experts how it works and how to implement it, I'm only tryng to explain a possible fast way to go (if ever possible) ..Large majority of software don't check for exceptions at all and if I really want to pursue max precision I'll go for a software multiprecision library like GMP or MPFR Libraries. So the hardfloats 'should' be set as first choice and only if instruction requires precision/error check process it in softfloats. I hope to have added some new ideas to discussion, thank a lot Alex! Dino -Messaggio originale- Da: Alex Bennée Inviato: mercoledì 29 aprile 2020 13:57 A: Dino Papararo Cc: luoyongg...@gmail.com; BALATON Zoltan ; Mark Cave-Ayland ; Programmingkid ; Howard Spoelstra ; qemu-...@nongnu.org; qemu-devel@nongnu.org Oggetto: Re: R: About hardfloat in ppc Dino Papararo writes: > Hello, > about handling of PPC fpu exceptions and Hard Floats support we could > consider a different approach for different instructions. > i.e. not all fpu instructions take care about inexact or exceptions bits: if > I take a simple fadd f0,f1,f2 I'll copy value derived from adding f1+f2 into > f1 register and no one will check about inexact or exception bits raised into > FPSCR register. > Instead if I'll take fadd. f0,f1,f2 the dot following the add instructions > means I want take inexact or exceptions bits into account. > So I could use hard floats for first case and softfloats for second case. > Could this be a fast solution to start implement hard floats for PPC?? While it may be true that normal software practice is not to read the exception registers for every operation we can't base our emulation on that. We must always be able to re-create the state of the exception registers whenever they may be read by the program. There are 3 cases this may happen: - a direct read of the inexact register - checking the sigcontext of a synchronous exception (e.g. fault) - checking the sigcontext of an asynchronous exception (e.g. timer/IPI) Given the way the translator works we can simplify the asynchronous case because we know they are only ever delivered at the start of translated blocks. We must have a fully rectified system state at the end of every block. So lets consider some cases: fpOpA clear flags fpOpB clear flags fpOpC read flags Assuming we know the fpOps can't generate exceptions we can know that only fpOpC will ever generate a user visible floating point flags so we can indeed use hardfloat for fpOpA and fpOpB. However if we see the pattern: fpOpA ld/st clear flags fpOpB read flags we must have the fully rectified version of the flags because the ld/st may fault. However it's not guaranteed it will fault so we could defer the flag calculation for fpOpA until such time as we need it. The easiest way would be to save the values going into the operation and then re-run it in softfloat when required (hopefully never ;-). A lot will depend on the behaviour of the architecture. For example: fpOpA fpOpB read flag
R: About hardfloat in ppc
Typo correction " if I take a simple fadd f0,f1,f2 I'll copy value derived from adding f1+f2 into f0 register" -Messaggio originale- Da: Qemu-ppc Per conto di Dino Papararo Inviato: mercoledì 29 aprile 2020 12:18 A: Alex Bennée ; luoyongg...@gmail.com; BALATON Zoltan ; Mark Cave-Ayland ; Programmingkid ; Howard Spoelstra Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org Oggetto: R: About hardfloat in ppc Hello, about handling of PPC fpu exceptions and Hard Floats support we could consider a different approach for different instructions. i.e. not all fpu instructions take care about inexact or exceptions bits: if I take a simple fadd f0,f1,f2 I'll copy value derived from adding f1+f2 into f1 register and no one will check about inexact or exception bits raised into FPSCR register. Instead if I'll take fadd. f0,f1,f2 the dot following the add instructions means I want take inexact or exceptions bits into account. So I could use hard floats for first case and softfloats for second case. Could this be a fast solution to start implement hard floats for PPC?? A little of documentation here: http://mirror.informatimago.com/next/developer.apple.com/documentation/mac/PPCNumerics/PPCNumerics-154.html Regards, Dino Papararo -Messaggio originale- Da: Qemu-devel Per conto di Alex Bennée Inviato: martedì 28 aprile 2020 10:37 A: luoyongg...@gmail.com Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org Oggetto: Re: About hardfloat in ppc 罗勇刚(Yonggang Luo) writes: > I am confusing why only inexact are set then we can use hard-float. The inexact behaviour of the host hardware may be different from the guest architecture we are trying to emulate and the host hardware may not be configurable to emulate the guest mode. Have a look in softfloat.c and see all the places where float_flag_inexact is set. Can you convince yourself that the host hardware will do the same? > And PPC always clearing inexact flag before calling to soft-float > funcitons. so we can not optimize it with hard-float. > I need some resouces about ineact flag and why always clearing inexcat > in PPC FP simualtion. Because that is the behaviour of the PPC floating point unit. The inexact flag will represent the last operation done. > I am looking for two possible solution: > 1. do not clear inexact flag in PPC simulation 2. even the inexact are > cleared, we can still use alternative hard-float. > > But now I am the beginner, Have no clue about all the things. Well you'll need to learn about floating point because these are rather fundamental aspects of it's behaviour. In the old days QEMU used to use the host floating point processor with it's template based translation. However this led to lots of weird bugs because the floating point answers under qemu where different from the target it was trying to emulate. It was for this reason softfloat was introduced. The hardfloat optimisation can only be done when we are confident that we will get the exact same answer of the target we are trying to emulate - a "faster but incorrect" mode is just going to cause confusion as discussed in the previous thread. Have you read that yet? > > On Mon, Apr 27, 2020 at 7:10 PM Alex Bennée wrote: > >> >> BALATON Zoltan writes: >> >> > On Mon, 27 Apr 2020, Alex Bennée wrote: >> >> 罗勇刚(Yonggang Luo) writes: >> >>> Because ppc fpu-helper are always clearing float_flag_inexact, So >> >>> is that possible to optimize the performance when >> float_flag_inexact >> >>> are cleared? >> >> >> >> There was some discussion about this in the last thread about >> >> enabling hardfloat for PPC. See the thread: >> >> >> >> Subject: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC >> >> Date: Tue, 18 Feb 2020 18:10:16 +0100 >> >> Message-Id: <20200218171702.979f0746...@zero.eik.bme.hu> >> > >> > I've answered this already with link to that thread here: >> > >> > On Fri, 10 Apr 2020, BALATON Zoltan wrote: >> > : Date: Fri, 10 Apr 2020 20:04:53 +0200 (CEST) >> > : From: BALATON Zoltan >> > : To: "罗勇刚(Yonggang Luo)" >> > : Cc: qemu-devel@nongnu.org, Mark Cave-Ayland, John Arbuckle, >> qemu-...@nongnu.org, Paul Clarke, Howard Spoelstra, David Gibson >> > : Subject: Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC >> > : >> > : On Fri, 10 Apr 2020, 罗勇刚(Yonggang Luo) wrote: >> > :> Are this stable now? I'd like to see hard float to be landed:) >> > : >> > : If you want to see hardfloat for PPC then you should read the >> > replies to : this patch which can be found here: >> > : >> > : http://patchwork
R: About hardfloat in ppc
Hello, about handling of PPC fpu exceptions and Hard Floats support we could consider a different approach for different instructions. i.e. not all fpu instructions take care about inexact or exceptions bits: if I take a simple fadd f0,f1,f2 I'll copy value derived from adding f1+f2 into f1 register and no one will check about inexact or exception bits raised into FPSCR register. Instead if I'll take fadd. f0,f1,f2 the dot following the add instructions means I want take inexact or exceptions bits into account. So I could use hard floats for first case and softfloats for second case. Could this be a fast solution to start implement hard floats for PPC?? A little of documentation here: http://mirror.informatimago.com/next/developer.apple.com/documentation/mac/PPCNumerics/PPCNumerics-154.html Regards, Dino Papararo -Messaggio originale- Da: Qemu-devel Per conto di Alex Bennée Inviato: martedì 28 aprile 2020 10:37 A: luoyongg...@gmail.com Cc: qemu-...@nongnu.org; qemu-devel@nongnu.org Oggetto: Re: About hardfloat in ppc 罗勇刚(Yonggang Luo) writes: > I am confusing why only inexact are set then we can use hard-float. The inexact behaviour of the host hardware may be different from the guest architecture we are trying to emulate and the host hardware may not be configurable to emulate the guest mode. Have a look in softfloat.c and see all the places where float_flag_inexact is set. Can you convince yourself that the host hardware will do the same? > And PPC always clearing inexact flag before calling to soft-float > funcitons. so we can not optimize it with hard-float. > I need some resouces about ineact flag and why always clearing inexcat > in PPC FP simualtion. Because that is the behaviour of the PPC floating point unit. The inexact flag will represent the last operation done. > I am looking for two possible solution: > 1. do not clear inexact flag in PPC simulation 2. even the inexact are > cleared, we can still use alternative hard-float. > > But now I am the beginner, Have no clue about all the things. Well you'll need to learn about floating point because these are rather fundamental aspects of it's behaviour. In the old days QEMU used to use the host floating point processor with it's template based translation. However this led to lots of weird bugs because the floating point answers under qemu where different from the target it was trying to emulate. It was for this reason softfloat was introduced. The hardfloat optimisation can only be done when we are confident that we will get the exact same answer of the target we are trying to emulate - a "faster but incorrect" mode is just going to cause confusion as discussed in the previous thread. Have you read that yet? > > On Mon, Apr 27, 2020 at 7:10 PM Alex Bennée wrote: > >> >> BALATON Zoltan writes: >> >> > On Mon, 27 Apr 2020, Alex Bennée wrote: >> >> 罗勇刚(Yonggang Luo) writes: >> >>> Because ppc fpu-helper are always clearing float_flag_inexact, So >> >>> is that possible to optimize the performance when >> float_flag_inexact >> >>> are cleared? >> >> >> >> There was some discussion about this in the last thread about >> >> enabling hardfloat for PPC. See the thread: >> >> >> >> Subject: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC >> >> Date: Tue, 18 Feb 2020 18:10:16 +0100 >> >> Message-Id: <20200218171702.979f0746...@zero.eik.bme.hu> >> > >> > I've answered this already with link to that thread here: >> > >> > On Fri, 10 Apr 2020, BALATON Zoltan wrote: >> > : Date: Fri, 10 Apr 2020 20:04:53 +0200 (CEST) >> > : From: BALATON Zoltan >> > : To: "罗勇刚(Yonggang Luo)" >> > : Cc: qemu-devel@nongnu.org, Mark Cave-Ayland, John Arbuckle, >> qemu-...@nongnu.org, Paul Clarke, Howard Spoelstra, David Gibson >> > : Subject: Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC >> > : >> > : On Fri, 10 Apr 2020, 罗勇刚(Yonggang Luo) wrote: >> > :> Are this stable now? I'd like to see hard float to be landed:) >> > : >> > : If you want to see hardfloat for PPC then you should read the >> > replies to : this patch which can be found here: >> > : >> > : http://patchwork.ozlabs.org/patch/1240235/ >> > : >> > : to understand what's needed then try to implement the solution >> > with FP : exceptions cached in a global that maybe could work. I >> > won't be able to : do that as said here: >> > : >> > : >> > https://lists.nongnu.org/archive/html/qemu-ppc/2020-03/msg6.htm >> > l >> > : >> > : because I don't have time t
R: R: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
I think we all agree the best solution is to resolve powerpc issues about hardfloat current implementation. I think also powerpc is an important branch of qemu, for hystorical, present and (why not?) future reasons, and it must NOT be left behind. So I would invite best Qemu community's skilled programmers to work on this and solve the issue maybe in few days. The same group who worked on recent altivec optimizations is able to make a good patch even for this. In a subordinate way I'd like to implement anyway hardfloat support for powerpc, advising users about inaccurancy of results/flags and letting them choose. Of course I understand, and in part agree, on all your objections. Simply I prefer have always a choice. Best Regards, Dino Papararo -Messaggio originale- Da: Aleksandar Markovic Inviato: mercoledì 26 febbraio 2020 18:27 A: G 3 Cc: Alex Bennée ; Dino Papararo ; QEMU Developers ; qemu-...@nongnu.org; Howard Spoelstra ; luigi burdo ; David Gibson Oggetto: Re: R: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC On Wed, Feb 26, 2020 at 6:04 PM G 3 wrote: > > Accuracy is an important part of the IEEE 754 floating point standard. The > whole purpose of this standard is to ensure floating point calculations are > consistent across multiple CPUs. I believe referring to this patch as > inaccurate is itself inaccurate. That gives the impression that this patch > produces calculations that are not inline with established standards. This is > not true. The only part of this patch that will produce incorrect values are > the flags. There *may* be a program or two out there that depend on these > flags, but for the majority of programs that only care about basic floating > point arithmetic this patch will produce correct values. Currently the > emulated PowerPC's FPU already produces wrong values for the flags. This > patch does set the Inexact flag (which I don't like), but since I have never > encountered any source code that cares for this flag, I can let it go. I > think giving the user the ability to decide which option to use is the best > thing to do. > From the experiments described above, the patch in question changes the behavior of applications (for example, sound is different with and without the patch), which is in contradiction with your claim that you "never encountered any source code that cares for this flag" and that "the only part of this patch that will produce incorrect values are the flags". In other words, and playing further with them: The claim that "referring to this patch as inaccurate is itself inaccurate" is itself inaccurate. Best regards, Aleksandar > On Wed, Feb 26, 2020 at 10:51 AM Aleksandar Markovic > wrote: >> >> >> >> On Wed, Feb 26, 2020 at 3:29 PM Alex Bennée wrote: >> > >> > >> > Dino Papararo writes: >> > >> > > Please let's go with hardfloat pps support, it's really a good feature >> > > to implement. >> > > Even if in a first step it could lead to inaccuracy results, >> > > later it could solved with other patches. >> > >> > That's the wrong way around. We have regression tests for a reason. >> >> I tend to agree with Alex here, and additionally want to expand more >> on this topic. >> >> In my view: (that I think is at least very close to the community >> consensus) >> >> This is *not* a ppc-specific issue. There exist a principle across >> all targets that QEMU FPU calculation must be accurate - exactly as >> specified in any applicable particular ISA document. Any discrepancy is an >> outright bug. >> >> We even recently had several patches for FPU in ppc target that >> handled some fairly obscure cases of inaccuracies, I believe they >> were authored by Paul Clarke, so there are people in ppc community >> that care about FPU accuracy (as I guess is the case for any target). >> >> There shouldn't be a target that decides by itself and within itself >> "ok, we don't need accuracy, let's trade it for speed". This violates >> the architecture of QEMU. Please allow that for any given software >> project, there is an architecture that should be respected. >> >> This doesn't mean that anybody's experimentation is discouraged. >> No-one can stop anybody from forking from QEMU upstream tree and do >> whatever is wanted. >> >> But, this doesn't mean such experimentation will be upstreamed. QEMU >> upstream should be collecting place for the best ideas and >> implementations, not for arbitrary experimentations. >> >> Best regards, >> Aleksandar >> >> >> > I'll happily acc
R: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC
Please let's go with hardfloat pps support, it's really a good feature to implement. Even if in a first step it could lead to inaccuracy results, later it could solved with other patches. I think it's important for qemu to as global as possible and don't target only recent hardware. Regards, Dino Papararo Da: Qemu-ppc Per conto di luigi burdo Inviato: mercoledì 26 febbraio 2020 14:01 A: BALATON Zoltan ; Programmingkid Cc: David Gibson ; qemu-...@nongnu.org; qemu-devel qemu-devel ; Howard Spoelstra Oggetto: R: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC Hi Zoltan, i can say MacOs Leopard use multiple cores on PowerMac G5 Quad the most of the apps did for Panter/Tiger/leopard use for sure 2 Core in smtp only apps did for Tiger/leopard use more than 2 Cores. Ciao and thenks Luigi Da: Qemu-ppc mailto:qemu-ppc-bounces+intermediadc=hotmail@nongnu.org>> per conto di BALATON Zoltan mailto:bala...@eik.bme.hu>> Inviato: mercoledì 26 febbraio 2020 12:28 A: Programmingkid mailto:programmingk...@gmail.com>> Cc: Howard Spoelstra mailto:hsp.c...@gmail.com>>; qemu-...@nongnu.org<mailto:qemu-...@nongnu.org> mailto:qemu-...@nongnu.org>>; qemu-devel qemu-devel mailto:qemu-devel@nongnu.org>>; David Gibson mailto:da...@gibson.dropbear.id.au>> Oggetto: Re: [RFC PATCH v2] target/ppc: Enable hardfloat for PPC On Wed, 26 Feb 2020, Programmingkid wrote: > I think a timeout takes place and that is why audio stops playing. It is > probably an USB OHCI issue. The other USB controller seems to work > better. Which other USB controller? Maybe you could try enabling some usb_ohci* traces and see if they reveal anything. >> The Amiga like OSes I'm interested in don't use multiple cores so I'm >> mainly interested in improving single core performance. Also I'm not >> sure if (part of) your problem is slow FPU preventing fast enough audio >> decoding then having multiple CPUs with slow FPU would help as this may >> use a single thread anyway. > > Good point. MTTCG might be the option that really helps with speed > improvements. Only if you have multithreaded workload in the guest because AFAIK MTTCG only runs different vcpus in parallel, it won't make single emulated CPU faster in any way. OSX probably can benefit from having multiple cores emulated but I don't think MacOS would use it apart from some apps maybe. Regards, BALATON Zoltan
[Qemu-devel] R: [PATCH v5 8/8] target/ppc: remove various HOST_WORDS_BIGENDIAN hacks in int_helper.c
Hello Mark, I have a question about improving speed manually unrolling loops like this Assuming ARRAY_SIZE(r->u8) is always multiple of 4 you can manually improve loop in this way, on modern CPU non sequential instructions can be computed nearly for free: > { > int i, j = (sh & 0xf); > > -VECTOR_FOR_INORDER_I(i, u8) { > -r->u8[i] = j++; > +for (i = 0; i < ARRAY_SIZE(r->u8); i+=4,j+=4) { > +r->VsrB(i) = j; > +r->VsrB(i+1) = j+1; > +r->VsrB(i+2) = j+2; > +r->VsrB(i+3) = j+3; } > } In this patch there are a lot of functions can benefit by unrolling loops, with a huge speed improvement. Maybe compiler could do it itself but aren't humans still better? Best Regards, Dino Papararo -Messaggio originale- Da: Qemu-devel Per conto di Mark Cave-Ayland Inviato: mercoledì 30 gennaio 2019 21:37 A: qemu-devel@nongnu.org; qemu-...@nongnu.org; richard.hender...@linaro.org; da...@gibson.dropbear.id.au Oggetto: [Qemu-devel] [PATCH v5 8/8] target/ppc: remove various HOST_WORDS_BIGENDIAN hacks in int_helper.c Following on from the previous work, there are numerous endian-related hacks in int_helper.c that can now be replaced with Vsr* macros. There are also a few places where the VECTOR_FOR_INORDER_I macro can be replaced with a normal iterator since the processing order is irrelevant. Signed-off-by: Mark Cave-Ayland Reviewed-by: Richard Henderson --- target/ppc/int_helper.c | 155 ++-- 1 file changed, 45 insertions(+), 110 deletions(-) diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c index 916d10c25b..8efc283388 100644 --- a/target/ppc/int_helper.c +++ b/target/ppc/int_helper.c @@ -443,8 +443,8 @@ void helper_lvsl(ppc_avr_t *r, target_ulong sh) { int i, j = (sh & 0xf); -VECTOR_FOR_INORDER_I(i, u8) { -r->u8[i] = j++; +for (i = 0; i < ARRAY_SIZE(r->u8); i++) { +r->VsrB(i) = j++; } } @@ -452,18 +452,14 @@ void helper_lvsr(ppc_avr_t *r, target_ulong sh) { int i, j = 0x10 - (sh & 0xf); -VECTOR_FOR_INORDER_I(i, u8) { -r->u8[i] = j++; +for (i = 0; i < ARRAY_SIZE(r->u8); i++) { +r->VsrB(i) = j++; } } void helper_mtvscr(CPUPPCState *env, ppc_avr_t *r) { -#if defined(HOST_WORDS_BIGENDIAN) -env->vscr = r->u32[3]; -#else -env->vscr = r->u32[0]; -#endif +env->vscr = r->VsrW(3); set_flush_to_zero(vscr_nj, >vec_status); } @@ -870,8 +866,8 @@ target_ulong helper_vclzlsbb(ppc_avr_t *r) { target_ulong count = 0; int i; -VECTOR_FOR_INORDER_I(i, u8) { -if (r->u8[i] & 0x01) { +for (i = 0; i < ARRAY_SIZE(r->u8); i++) { +if (r->VsrB(i) & 0x01) { break; } count++; @@ -883,12 +879,8 @@ target_ulong helper_vctzlsbb(ppc_avr_t *r) { target_ulong count = 0; int i; -#if defined(HOST_WORDS_BIGENDIAN) for (i = ARRAY_SIZE(r->u8) - 1; i >= 0; i--) { -#else -for (i = 0; i < ARRAY_SIZE(r->u8); i++) { -#endif -if (r->u8[i] & 0x01) { +if (r->VsrB(i) & 0x01) { break; } count++; @@ -1137,18 +1129,14 @@ void helper_vperm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t result; int i; -VECTOR_FOR_INORDER_I(i, u8) { -int s = c->u8[i] & 0x1f; -#if defined(HOST_WORDS_BIGENDIAN) +for (i = 0; i < ARRAY_SIZE(r->u8); i++) { +int s = c->VsrB(i) & 0x1f; int index = s & 0xf; -#else -int index = 15 - (s & 0xf); -#endif if (s & 0x10) { -result.u8[i] = b->u8[index]; +result.VsrB(i) = b->VsrB(index); } else { -result.u8[i] = a->u8[index]; +result.VsrB(i) = a->VsrB(index); } } *r = result; @@ -1160,18 +1148,14 @@ void helper_vpermr(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t result; int i; -VECTOR_FOR_INORDER_I(i, u8) { -int s = c->u8[i] & 0x1f; -#if defined(HOST_WORDS_BIGENDIAN) +for (i = 0; i < ARRAY_SIZE(r->u8); i++) { +int s = c->VsrB(i) & 0x1f; int index = 15 - (s & 0xf); -#else -int index = s & 0xf; -#endif if (s & 0x10) { -result.u8[i] = a->u8[index]; +result.VsrB(i) = a->VsrB(index); } else { -result.u8[i] = b->u8[index]; +result.VsrB(i) = b->VsrB(index); } } *r = result; @@ -1868,25 +1852,14 @@ void helper_vsldoi(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, uint32_t shift) int i; ppc_avr_t result; -#if defined(HOST_WORDS_BIGENDIAN) for (i = 0; i < ARRAY_SIZE(r->u8); i++) { int index = sh + i;