Am 18.01.2018 um 20:26 schrieb Kyriazis, George: >> On Jan 18, 2018, at 1:10 PM, Roland Scheidegger <[email protected] >> <mailto:[email protected]>> wrote: >> >> Am 17.01.2018 um 23:33 schrieb George Kyriazis: >>> The texture swizzle was not doing the right thing for avx512-style >>> 16-wide loads. >>> >>> Special-case the post-load swizzle operations for avx512 so that we move >>> the xyzw components correctly to the outputs. >>> >>> cc: Jose Fonseca <[email protected] <mailto:[email protected]>> >>> --- >>> src/gallium/auxiliary/gallivm/lp_bld_pack.c | 40 >>> +++++++++++++++++++++++++++-- >>> 1 file changed, 38 insertions(+), 2 deletions(-) >>> >>> diff --git a/src/gallium/auxiliary/gallivm/lp_bld_pack.c >>> b/src/gallium/auxiliary/gallivm/lp_bld_pack.c >>> index e8d4fcd..7879826 100644 >>> --- a/src/gallium/auxiliary/gallivm/lp_bld_pack.c >>> +++ b/src/gallium/auxiliary/gallivm/lp_bld_pack.c >>> @@ -129,6 +129,31 @@ lp_build_const_unpack_shuffle_half(struct >>> gallivm_state *gallivm, >>> } >>> >>> /** >>> + * Similar to lp_build_const_unpack_shuffle_half, but for AVX512 >>> + * See comment above lp_build_interleave2_half for more details. >>> + */ >>> +static LLVMValueRef >>> +lp_build_const_unpack_shuffle_16wide(struct gallivm_state *gallivm, >>> + unsigned lo_hi) >>> +{ >>> + LLVMValueRef elems[LP_MAX_VECTOR_LENGTH]; >>> + unsigned i, j; >>> + >>> + assert(lo_hi < 2); >>> + >>> + // for the following lo_hi setting, convert 0 -> f to: >>> + // 0: 0 16 4 20 8 24 12 28 1 17 5 21 9 25 13 29 >>> + // 1: 2 18 6 22 10 26 14 30 3 19 7 23 11 27 15 31 >>> + for (i = 0; i < 16; i++) { >>> + j = ((i&0x06)<<1) + ((i&1)<<4) + (i>>3) + (lo_hi<<1); >>> + >>> + elems[i] = lp_build_const_int32(gallivm, j); >>> + } >>> + >>> + return LLVMConstVector(elems, 16); >>> +} >>> + >>> +/** >>> * Build shuffle vectors that match PACKxx (SSE) instructions or >>> * VPERM (Altivec). >>> */ >>> @@ -325,8 +350,8 @@ lp_build_interleave2(struct gallivm_state *gallivm, >>> } >>> >>> /** >>> - * Interleave vector elements but with 256 bit, >>> - * treats it as interleave with 2 concatenated 128 bit vectors. >>> + * Interleave vector elements but with 256 (or 512) bit, >>> + * treats it as interleave with 2 concatenated 128 (or 256) bit vectors. >>> * >>> * This differs to lp_build_interleave2 as that function would do the >>> following (for lo): >>> * a0 b0 a1 b1 a2 b2 a3 b3, and this does not compile into an AVX >>> unpack instruction. >>> @@ -343,6 +368,14 @@ lp_build_interleave2(struct gallivm_state *gallivm, >>> * >>> * And interleave-hi would result in: >>> * a2 b2 a3 b3 a6 b6 a7 b7 >>> + * >>> + * For 512 bits, the following are true: >>> + * >>> + * Interleave-lo would result in (capital letters denote hex indices): >>> + * a0 b0 a1 b1 a4 b4 a5 b5 a8 b8 a9 b9 aC bC aD bD >>> + * >>> + * Interleave-hi would result in: >>> + * a2 b2 a3 b3 a6 b6 a7 b7 aA bA aB bB aE bE aF bF >>> */ >>> LLVMValueRef >>> lp_build_interleave2_half(struct gallivm_state *gallivm, >>> @@ -354,6 +387,9 @@ lp_build_interleave2_half(struct gallivm_state >>> *gallivm, >>> if (type.length * type.width == 256) { >>> LLVMValueRef shuffle = >>> lp_build_const_unpack_shuffle_half(gallivm, type.length, lo_hi); >>> return LLVMBuildShuffleVector(gallivm->builder, a, b, shuffle, ""); >>> + } else if ((type.length == 16) && (type.width == 32)) { >>> + LLVMValueRef shuffle = >>> lp_build_const_unpack_shuffle_16wide(gallivm, lo_hi); >>> + return LLVMBuildShuffleVector(gallivm->builder, a, b, shuffle, >>> ""); >> This is not really "interleave_half", more like "interleave_quarter"... >> That said, avx512 certainly follows the same rules as avx256, so 128bit >> pieces are treated independently. So maybe this should be renamed like >> "interleave_native" or something like that. >> Also, I believe it is definitely a mistake to restrict this to dword >> interleaves here. You should handle all type widths, just like the >> 256bit case can handle all widths. >> And I'm not sure through which paths you reach this, but I'm not sure >> why you don't need the corresponding unpack2_native and pack2_native >> adjustments - it should not really be a special case, avx512 should >> generally handle things like this (if you'd want to extend the gallivm >> code to use avx512...). For that matter, the commit log and shortlog is >> confusing, because this isn't directly related to texture fetching. >> > > Thanks Roland, > > This check-in is a prerequisite for some avx-512 changes I am about to > check in to the swr driver. I’ve hit this for piglit test > vs-texelfetch-isampler1d and similar ones that try to fetch from a > texture from a VS. You are right, there are probably other cases where > this gets hit, but that was the place where I’ve hit it first. If you > could recommend some tests that hit the other width cases, it would be > great to fix them too; since didn’t know where those cases would be hit, > I decided to make this “surgical” change. Ok, so it is hit from the transpose_aos path. That should work alright, since this only ever deals with dword elements. I guess to see it hit from other paths you'd need some more proper support for 512bit vectors, e.g. pack/unpack_native.
> > My follow-on changes depend on this change for correctness, and I’d like > to get the the other changes in before the mesa 18.0 date (tomorrow). > AVX512 support is an ongoing effort on our side, so they will > definitely be additional check-ins along the same lines. > > Thoughts on checking in as-is for 18.0, with upcoming changes slowly > creeping in to fix additional issues, and make things more general? I can live with that, it's your driver which might break, llvmpipe shouldn't hit it :-). As long as you fix up the commit message (should really mention the transpose), Reviewed-by: Roland Scheidegger <[email protected]> > Thanks, > > George > > >> Roland >> >> >> >> >> >>> } else { >>> return lp_build_interleave2(gallivm, type, a, b, lo_hi); >>> } >>> >> >> _______________________________________________ >> mesa-dev mailing list >> [email protected] <mailto:[email protected]> >> https://lists.freedesktop.org/mailman/listinfo/mesa-dev >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.freedesktop.org_mailman_listinfo_mesa-2Ddev&d=DwMGaQ&c=uilaK90D4TOVoH58JNXRgQ&r=_QIjpv-UJ77xEQY8fIYoQtr5qv8wKrPJc7v7_-CYAb0&m=WSTeuSUPvmSGYKAu9XgzHNSXvuo4qh2MJ2USAEQygWQ&s=4yOW4nVoM0041Ves_N6YfrGtHDSNgprm6JdNpLmzMKI&e=> > _______________________________________________ mesa-dev mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/mesa-dev
