On Wed, 6 Nov 2024, Jakub Jelinek wrote:
> Hi!
>
> encode_tree_to_bitpos uses the more expensive sub_byte_op_p mode in which
> it has to allocate a buffer and do various extra work like shifting the bits
> etc. if bitlen or bitpos aren't multiples of BITS_PER_UNIT, or if bitlen
> doesn't have corresponding integer mode.
> The last case is explained later in the comments:
> /* The native_encode_expr machinery uses TYPE_MODE to determine how many
> bytes to write. This means it can write more than
> ROUND_UP (bitlen, BITS_PER_UNIT) / BITS_PER_UNIT bytes (for example
> write 8 bytes for a bitlen of 40). Skip the bytes that are not within
> bitlen and zero out the bits that are not relevant as well (that may
> contain a sign bit due to sign-extension). */
> Now, we've later added empty_ctor_p support, either {} CONSTRUCTOR
> or {CLOBBER}, which doesn't use native_encode_expr at all, just memset,
> so that case doesn't need those fancy games unless bitlen or bitpos
> aren't multiples of BITS_PER_UNIT (unlikely, but let's pretend it is
> possible).
>
> The following patch makes us use the fast path even for empty_ctor_p
> which occupy full bytes, we can just memset that in the provided buffer and
> don't need to XALLOCAVEC another buffer.
>
> This patch in itself fixes the testcase from the PR (which was about using
> huge XALLLOCAVEC), but I want to do some other changes, to be posted in a
> next patch.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
OK.
Thanks,
Richard.
> 2024-11-06 Jakub Jelinek <[email protected]>
>
> PR tree-optimization/117439
> * gimple-ssa-store-merging.cc (encode_tree_to_bitpos): For
> empty_ctor_p use !sub_byte_op_p even if bitlen doesn't have an
> integral mode.
>
> --- gcc/gimple-ssa-store-merging.cc.jj 2024-10-25 10:00:29.467767871
> +0200
> +++ gcc/gimple-ssa-store-merging.cc 2024-11-04 18:40:14.667260621 +0100
> @@ -1934,14 +1934,15 @@ encode_tree_to_bitpos (tree expr, unsign
> unsigned int total_bytes)
> {
> unsigned int first_byte = bitpos / BITS_PER_UNIT;
> - bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT)
> - || (bitpos % BITS_PER_UNIT)
> - || !int_mode_for_size (bitlen, 0).exists ());
> bool empty_ctor_p
> = (TREE_CODE (expr) == CONSTRUCTOR
> && CONSTRUCTOR_NELTS (expr) == 0
> && TYPE_SIZE_UNIT (TREE_TYPE (expr))
> - && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr))));
> + && tree_fits_uhwi_p (TYPE_SIZE_UNIT (TREE_TYPE (expr))));
> + bool sub_byte_op_p = ((bitlen % BITS_PER_UNIT)
> + || (bitpos % BITS_PER_UNIT)
> + || (!int_mode_for_size (bitlen, 0).exists ()
> + && !empty_ctor_p));
>
> if (!sub_byte_op_p)
> {
>
> Jakub
>
>
--
Richard Biener <[email protected]>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)