The way to handle those situations is to have a arch decomposition pass that converts MULPS into a VZERO + MULADD. For bonus points, you can add to the arch peephole code to fuse MULPS + ADDPS.
For an example of that, take a look at mini-x86.c / mono_arch_decompose_opts. Rodrigo On Tue, Feb 9, 2010 at 11:57 AM, Sergei Dyshel <qyron.priv...@gmail.com>wrote: > Hi, > Now I'm stuck with another problem on PPC. For multiplication of floats > Altivec has only a fuse-add instruction which does a*b+c. So in order to > implement OP_MULPS I need to assure c==0. The only solution which comes to > mind is: > XZERO D > MULADD D <= S1, S2, D > > Where MULADD is the instruction and D, S1, S2 are ins->dreg, sreg1, sreg2. > But this solution won't work with cases in which S1=D or S2=D since D would > be zeroed before use. So 2 possibilities remain: > 1) Make sure that D <> S1 and D <> S2 and then previously-mentioned > solution will work. > 2) Allocate and additional (vector) register for MULPS and somehow store it > inside MonoInst structure. > > What is the traditional way to do such things? I really need to solve this > problem, any help will be greatly appreciated! > > Thanks, > Sergei > > > On Thu, Feb 4, 2010 at 02:59, Rodrigo Kumpera <kump...@gmail.com> wrote: > >> Hi Sergei, >> >> On Tue, Feb 2, 2010 at 6:59 AM, Sergei Dyshel <qyron.priv...@gmail.com>wrote: >> >>> Hello all, >>> >>> I'm currently working on PowerPC port of Mono which utilizes AltiVec SIMD >>> instructions. During the development I've encountered an alignment >>> problem: >>> >>> As far as I understood from running Mono's JIT, stack-allocated >>> Mono.Simd.Vector* types are always aligned by 16 byte bound, but global >>> ones aren't (such as static class members). This is not a problem for SSE >>> which has unaligned load/stores but AltiVec doesn't have them. Instead of >>> implementing misaligned loads/stores for AltiVec I think it's better to >>> force alignment in global variables, as it done in the case of stack. >>> >> >> No, the JIT doesn't align all Vector types to 16 bytes. There are places, >> like spill, code that >> still doesn't do it correctly. Not a lot of work to get there, but still >> not done. >> >> >> If by global variables you mean statics, then making them properly aligned >> is possible with some trickery. >> The only issue alignment issue we can't currently fix are heap objects due >> to how our GC works. >> Our new GC might eventually gain the ability to properly align such >> objects, but this is something >> for the far future. >> >> >> >>> Can somebody help me with that (e.g. point at relevant places in >>> 'mini-ppc.c')? >>> >> >> To fix the alignment of stack variables you need to mess with a bunch of >> places: >> >> -The spill code from mini-codegen.c >> -The var allocation code in mono_allocate_stack_slots (mini.c) >> >> To fix the static storage alignment you need to change the code that >> allocate the statics area >> to use the proper alignment. >> >> This is the same problem as with objects as it uses a gc routine to >> allocate the memory blob. >> Fixing this requires boing deep into the GC, which is not something >> simple. >> >> >> > > _______________________________________________ > Mono-devel-list mailing list > Mono-devel-list@lists.ximian.com > http://lists.ximian.com/mailman/listinfo/mono-devel-list > >
_______________________________________________ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list