On Sat, 21 Jan 2017, Richard Henderson wrote: > On 01/19/2017 08:54 AM, Kirill Batuzov wrote: > > > > Wrappers issue emulation code instead of operation if it is not supported by > > host. > > > > tcg_gen_add_i32x4 looks like this: > > > > if (TCG_TARGET_HAS_add_i32x4) { > > tcg_gen_op3_v128(INDEX_op_add_i32x4, args[0], args[1], args[2]); > > } else { > > for (i = 0; i < 4; i++) { > > tcg_gen_ld_i32(...); > > tcg_gen_ld_i32(...); > > tcg_gen_add_i32(...); > > tcg_gen_st_i32(...); > > } > > } > > To me that begs the question of why you wouldn't issue 4 adds on 4 i32 > registers instead. >
Because 4 adds on 4 i32 registers work good only when the size of vector elements matches the size of scalar variables we use for representation of a vector. add_i16x8 will not be that great if we use 4 i32 variables: each will need to be split into two values, processed independently and merged back afterwards. And when we create variable we do not know which operations will be performed on it. Scalar variables lack primitives to work with them as vectors of shorter values. This is one of the reasons I added v64 type instead of using i64 for 64-bit vector operations. And this is the reason I'm so opposed to using them to represent vector types if vector registers are not supported by host. Handling vector operations with element size that does not match representation will be complicated, may require special handling for different operations and will produce a lot of if-s in code. The method I'm proposing can handle any operation regardless of representation. This includes handling situation where host supports vector registers but does not support required operation (for example SSE/AVX does not support multiplication of vectors of 8-bit values). -- Kirill