On 12/31/18 7:51 AM, Nick Renieris wrote: > The PS4's APU doesn't support AVX2 or AVX-512 so I'd be fine if I > didn't have enough time to implement them.
Fair enough. A goal like this is a good thing. >> The tcg-op-gvec.h infrastructure allows for the different modes that avx+mmx >> allows: >> >> (1) 64-bit operations, >> (2) 128-bit operations, modifying only the low 128 bits, >> (3) 128-bit operations, zeroing bits beyond the first 128, >> (4) N*128-bit operations, zeroing bits beyond the first N*128. > > I assume you mean 256-bit ops on (2) and (3), and N*256 on (4)? Low > 128 bits of a 128-bit number is just the number. No, I mean 0FFCC8 paddb %mm0, %mm1 (1) 660FFCC8 paddb %xmm0, %xmm1 (2) C5F1FCC8 vpaddb %xmm0, %xmm1, %xmm1 (3) C5F5FCC8 vpaddb %ymm0, %ymm1, %ymm1 (4) 62F17548FCC8 vpaddb %zmm0, %zmm1, %zmm1 (4) On a system that supports AVX, (2) and (3), while computing 128-bit inputs and producing a 128-bit output, have different effects on the rest of the 256-bit register. > So, I would need to implement every SSE instruction that isn't > SSE_SPECIAL at the moment, using tcg-op-gvec.h? Or more instructions > than that? You'd want to do all of the SSE instructions, SSE_SPECIAL and otherwise. I believe that we want to eliminate sse_op_table* and implement all insns within a switch statement, like SSE_SPECIAL. Note that this does not mean one gigantic 5000 line function; appropriate use of helper functions should make the code for each switch entry fairly small. You'd want to re-organize the code generated by ops_sse.h using the (ptr, ptr, ..., desc) signature of gen_helper_gvec_{2,2i,3,...} and expand them using tcg_gen_gvec_{2,2i,3,...}_ool. Examples of these are in accel/tcg/tcg-runtime-gvec.c and target/arm/vec_helper.c. Use simd_oprsz to find out how much data should be operated upon. The clear_high function should be moved somewhere that it can be shared. Once all of this has been done for SSE, then AVX is implemented simply adjusting the oprsz and maxsz arguments to tcg_gen_gvec_*. > Assuming I do this for SSE and AVX, I would not need to touch anything > else like the TCG back-end, as every gvec/vec op is already > implemented for i386, correct? Correct. r~