[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 Uroš Bizjak changed: What|Removed |Added Status|ASSIGNED|NEW Assignee|ubizjak at gmail dot com |unassigned at gcc dot gnu.org --- Comment #9 from Uroš Bizjak --- Oh well ... it looks that the implementation wandered into the areas of the compiler I'm not familiar with ... Unassigning myself, considering that at the end of a day, the prototype patch looks more like a band-aid for some different problem.
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 --- Comment #8 from Jakub Jelinek --- That does something different though. But there is in C: if (targetm.calls.promote_prototypes (fundecl ? TREE_TYPE (fundecl) : 0) && INTEGRAL_TYPE_P (type) && (TYPE_PRECISION (type) < TYPE_PRECISION (integer_type_node))) parmval = default_conversion (parmval); and in C++: else if (targetm.calls.promote_prototypes (type) && INTEGRAL_TYPE_P (type) && COMPLETE_TYPE_P (type) && tree_int_cst_lt (TYPE_SIZE (type), TYPE_SIZE (integer_type_node))) type = integer_type_node; and else if (targetm.calls.promote_prototypes (type) && INTEGRAL_TYPE_P (type) && COMPLETE_TYPE_P (type) && tree_int_cst_lt (TYPE_SIZE (type), TYPE_SIZE (integer_type_node))) val = cp_perform_integral_promotions (val, complain); This shows a clear inconsistency between C and C++, C passes the FUNCTION_TYPE, while C++ passes the argument type. If all the FEs passed the FUNCTION_TYPE/METHOD_TYPE at least, then i386 target hook could decide say based on some custom attribute you'd use on those builtins. Or we could change the target hook further and pass a fndecl (if known) and type to the target hook.
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 --- Comment #7 from Uroš Bizjak --- Another idea is to add an "nopromote" attribute to builtin decl (in ix86_Add_new_buitins), detect this attribute in TARGET_PROMOTE_PROTOTYPES and disable promotion in this case. If this approach works, we can perhaps selectively add "nopromote" attribute to masked builtins.
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 --- Comment #6 from Marek Polacek --- There is "type generic" attribute which disables certain promotions I think (or at least float -> double).
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 --- Comment #5 from Uroš Bizjak --- Also, #include __mmask16 m; __m512i zzz; __m512i foo (__m512i x, __m512i y, int a) { zzz = _mm512_mask_slli_epi32 (y, m, x, a); return _mm512_mask_srai_epi32 (y, m, x, a); } defeats the proposed prototype patch, resulting in: vmovd %edi, %xmm2 vmovdqa64 %zmm1, %zmm3 movzwl m(%rip), %eax vmovdqa64 %zmm1, %zmm4 kmovw %eax, %k1 vpslld %xmm2, %zmm0, %zmm3{%k1} vpsrad %xmm2, %zmm0, %zmm4{%k1} vmovdqa64 %zmm3, zzz(%rip) vmovdqa64 %zmm4, %zmm0 ret
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 --- Comment #4 from Uroš Bizjak --- (In reply to Jakub Jelinek from comment #3) > Because builtins are treated like any other function calls, and if > short/char args are promoted for normal calls, they are promoted for > builtins too. Indeed. Disabling TARGET_PROMOTE_PROTOTYPES gets us the direct move from memory. So... do we really need to promote all these builtins that usually result in a well known instruction? Is there a way to mark (some of ?) them with a "nopromote" flag?
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #3 from Jakub Jelinek --- Because builtins are treated like any other function calls, and if short/char args are promoted for normal calls, they are promoted for builtins too.
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 --- Comment #2 from Uroš Bizjak --- Hm, why does middle-end convert to integer in the first place? .optimized tree dump reads: foo (__m512i x, __m512i y, int a) { short unsigned int m.1_1; int _2; vector(16) int _7; vector(16) int _8; vector(16) int _9; vector(8) long long int _10; [local count: 1]: m.1_1 = m; _2 = (int) m.1_1; _7 = VIEW_CONVERT_EXPR(y_5(D)); _8 = VIEW_CONVERT_EXPR (x_6(D)); _9 = __builtin_ia32_psradi512_mask (_8, a_3(D), _7, _2); _10 = VIEW_CONVERT_EXPR<__m512i>(_9); return _10; }
[Bug target/82897] Unnecessary zero-extension when loading mask register from memory
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82897 Uroš Bizjak changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2017-11-08 Assignee|unassigned at gcc dot gnu.org |ubizjak at gmail dot com Ever confirmed|0 |1 --- Comment #1 from Uroš Bizjak --- Created attachment 42561 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=42561=edit Prototype patch This patch changes predicate for mask register to nonimmediate operand. This way, combine is able to simplify sequences of zero-extend and subreg operators to a simple move.