Re: [PATCH] fold strlen() of aggregate members (PR 77357)
On 07/11/2018 07:50 AM, Andre Vieira (lists) wrote: On 11/07/18 11:00, Andre Vieira (lists) wrote: On 09/07/18 22:44, Martin Sebor wrote: On 07/09/2018 06:40 AM, Richard Biener wrote: On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor wrote: On 07/06/2018 09:52 AM, Richard Biener wrote: On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor wrote: GCC folds accesses to members of constant aggregates except for character arrays/strings. For example, the strlen() call below is not folded: const char a[][4] = { "1", "12" }; int f (void) { retturn strlen (a[1]); } The attached change set enhances the string_constant() function to make it possible to extract string constants from aggregate initializers (CONSTRUCTORS). The initial solution was much simpler but as is often the case, MEM_REF made it fail to fold things like: int f (void) { retturn strlen (a[1] + 1); } Handling those made the project a bit more interesting and the final solution somewhat more involved. To handle offsets into aggregate string members the patch also extends the fold_ctor_reference() function to extract entire string array initializers even if the offset points past the beginning of the string and even though the size and exact type of the reference are not known (there isn't enough information in a MEM_REF to determine that). Tested along with the patch for PR 86415 on x86_64-linux. + if (TREE_CODE (init) == CONSTRUCTOR) + { + tree type; + if (TREE_CODE (arg) == ARRAY_REF + || TREE_CODE (arg) == MEM_REF) + type = TREE_TYPE (arg); + else if (TREE_CODE (arg) == COMPONENT_REF) + { + tree field = TREE_OPERAND (arg, 1); + type = TREE_TYPE (field); + } + else + return NULL_TREE; what's wrong with just type = TREE_TYPE (field); In response to your comment below abut size I simplified things further so determining the type a priori is no longer necessary. ? + base_off *= BITS_PER_UNIT; poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int, for poly you'd then use poly_offset? Okay, I tried to avoid the overflow. (Converting between all these flavors of wide int types is a monumental PITA.) You extend fold_ctor_reference to treat size == 0 specially but then bother to compute a size here - that looks unneeded? Yes, well spotted, thanks! I simplified the code so this isn't necessary, and neither is the type. While the offset of the reference determines the first field in the CONSTRUCTOR, how do you know the access doesn't touch adjacent ones? STRING_CSTs do not have to be '\0' terminated, so consider char x[2][4] = { "abcd", "abcd" }; and MEM[&x] with a char[8] type? memcpy "inlining" will create such MEMs for example. The code is only used to find string constants in initializer expressions where I don't think the size of the access comes into play. If a memcpy() call results in a MEM_REF[char[8], &x, 8] that's fine. It's a valid reference and we can still get the underlying character sequence (which is represented as two STRING_CSTs with the two string literals). I might be missing the point of your question. Maybe irrelevant for strlen folding depending on what you do for missing '\0' termination. @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor, tree byte_offset = DECL_FIELD_OFFSET (cfield); tree field_offset = DECL_FIELD_BIT_OFFSET (cfield); tree field_size = DECL_SIZE (cfield); - offset_int bitoffset; - offset_int bitoffset_end, access_end; + + if (!field_size && TREE_CODE (cval) == STRING_CST) + { + /* Determine the size of the flexible array member from +the size of the string initializer provided for it. */ + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval); + tree eltype = TREE_TYPE (TREE_TYPE (cval)); + len *= tree_to_uhwi (TYPE_SIZE (eltype)); + field_size = build_int_cst (size_type_node, len); + } Why does this only apply to STRING_CST initializers and not CONSTRUCTORS, say, for struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } }; I can't think of a use for it. Do you have something in mind? Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset which is useful in other parts of the compiler. So I don't see why it shouldn't work for general flex-arrays. ? And why not use simply field_size = TYPE_SIZE (TREE_TYPE (cval)); like you do in c_strlen? Yes, that's simpler, thanks. Otherwise looks reasonable. Attached is an updated patch. I also enhanced the handling of non-constant indices. They were already handled before to a smaller extent. (There may be other opportunities here.) Please don't do functional changes to a patch in review, without exactly pointing out the change. It makes review inefficent for me. Looks like it might be the NULL type argument handling?
Re: [PATCH] fold strlen() of aggregate members (PR 77357)
On 11/07/18 11:00, Andre Vieira (lists) wrote: > On 09/07/18 22:44, Martin Sebor wrote: >> On 07/09/2018 06:40 AM, Richard Biener wrote: >>> On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor wrote: On 07/06/2018 09:52 AM, Richard Biener wrote: > On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor wrote: >> >> GCC folds accesses to members of constant aggregates except >> for character arrays/strings. For example, the strlen() call >> below is not folded: >> >>const char a[][4] = { "1", "12" }; >> >>int f (void) { retturn strlen (a[1]); } >> >> The attached change set enhances the string_constant() function >> to make it possible to extract string constants from aggregate >> initializers (CONSTRUCTORS). >> >> The initial solution was much simpler but as is often the case, >> MEM_REF made it fail to fold things like: >> >>int f (void) { retturn strlen (a[1] + 1); } >> >> Handling those made the project a bit more interesting and >> the final solution somewhat more involved. >> >> To handle offsets into aggregate string members the patch also >> extends the fold_ctor_reference() function to extract entire >> string array initializers even if the offset points past >> the beginning of the string and even though the size and >> exact type of the reference are not known (there isn't enough >> information in a MEM_REF to determine that). >> >> Tested along with the patch for PR 86415 on x86_64-linux. > > + if (TREE_CODE (init) == CONSTRUCTOR) > + { > + tree type; > + if (TREE_CODE (arg) == ARRAY_REF > + || TREE_CODE (arg) == MEM_REF) > + type = TREE_TYPE (arg); > + else if (TREE_CODE (arg) == COMPONENT_REF) > + { > + tree field = TREE_OPERAND (arg, 1); > + type = TREE_TYPE (field); > + } > + else > + return NULL_TREE; > > what's wrong with just > > type = TREE_TYPE (field); In response to your comment below abut size I simplified things further so determining the type a priori is no longer necessary. > ? > > + base_off *= BITS_PER_UNIT; > > poly_uint64 isn't enough for "bits", with wide-int you'd use > offset_int, > for poly you'd then use poly_offset? Okay, I tried to avoid the overflow. (Converting between all these flavors of wide int types is a monumental PITA.) > > You extend fold_ctor_reference to treat size == 0 specially but then > bother to compute a size here - that looks unneeded? Yes, well spotted, thanks! I simplified the code so this isn't necessary, and neither is the type. > > While the offset of the reference determines the first field in the > CONSTRUCTOR, how do you know the access doesn't touch > adjacent ones? STRING_CSTs do not have to be '\0' terminated, > so consider > > char x[2][4] = { "abcd", "abcd" }; > > and MEM[&x] with a char[8] type? memcpy "inlining" will create > such MEMs for example. The code is only used to find string constants in initializer expressions where I don't think the size of the access comes into play. If a memcpy() call results in a MEM_REF[char[8], &x, 8] that's fine. It's a valid reference and we can still get the underlying character sequence (which is represented as two STRING_CSTs with the two string literals). I might be missing the point of your question. >>> >>> Maybe irrelevant for strlen folding depending on what you do >>> for missing '\0' termination. >>> > > @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree > ctor, >tree byte_offset = DECL_FIELD_OFFSET (cfield); >tree field_offset = DECL_FIELD_BIT_OFFSET (cfield); >tree field_size = DECL_SIZE (cfield); > - offset_int bitoffset; > - offset_int bitoffset_end, access_end; > + > + if (!field_size && TREE_CODE (cval) == STRING_CST) > + { > + /* Determine the size of the flexible array member from > +the size of the string initializer provided for it. */ > + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval); > + tree eltype = TREE_TYPE (TREE_TYPE (cval)); > + len *= tree_to_uhwi (TYPE_SIZE (eltype)); > + field_size = build_int_cst (size_type_node, len); > + } > > Why does this only apply to STRING_CST initializers and not > CONSTRUCTORS, > say, for > > struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } }; I can't think of a use for it. Do you have something in mind? >>> >>> Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset >>> which is useful in
Re: [PATCH] fold strlen() of aggregate members (PR 77357)
On 09/07/18 22:44, Martin Sebor wrote: > On 07/09/2018 06:40 AM, Richard Biener wrote: >> On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor wrote: >>> >>> On 07/06/2018 09:52 AM, Richard Biener wrote: On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor wrote: > > GCC folds accesses to members of constant aggregates except > for character arrays/strings. For example, the strlen() call > below is not folded: > >const char a[][4] = { "1", "12" }; > >int f (void) { retturn strlen (a[1]); } > > The attached change set enhances the string_constant() function > to make it possible to extract string constants from aggregate > initializers (CONSTRUCTORS). > > The initial solution was much simpler but as is often the case, > MEM_REF made it fail to fold things like: > >int f (void) { retturn strlen (a[1] + 1); } > > Handling those made the project a bit more interesting and > the final solution somewhat more involved. > > To handle offsets into aggregate string members the patch also > extends the fold_ctor_reference() function to extract entire > string array initializers even if the offset points past > the beginning of the string and even though the size and > exact type of the reference are not known (there isn't enough > information in a MEM_REF to determine that). > > Tested along with the patch for PR 86415 on x86_64-linux. + if (TREE_CODE (init) == CONSTRUCTOR) + { + tree type; + if (TREE_CODE (arg) == ARRAY_REF + || TREE_CODE (arg) == MEM_REF) + type = TREE_TYPE (arg); + else if (TREE_CODE (arg) == COMPONENT_REF) + { + tree field = TREE_OPERAND (arg, 1); + type = TREE_TYPE (field); + } + else + return NULL_TREE; what's wrong with just type = TREE_TYPE (field); >>> >>> In response to your comment below abut size I simplified things >>> further so determining the type a priori is no longer necessary. >>> ? + base_off *= BITS_PER_UNIT; poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int, for poly you'd then use poly_offset? >>> >>> Okay, I tried to avoid the overflow. (Converting between all >>> these flavors of wide int types is a monumental PITA.) >>> You extend fold_ctor_reference to treat size == 0 specially but then bother to compute a size here - that looks unneeded? >>> >>> Yes, well spotted, thanks! I simplified the code so this isn't >>> necessary, and neither is the type. >>> While the offset of the reference determines the first field in the CONSTRUCTOR, how do you know the access doesn't touch adjacent ones? STRING_CSTs do not have to be '\0' terminated, so consider char x[2][4] = { "abcd", "abcd" }; and MEM[&x] with a char[8] type? memcpy "inlining" will create such MEMs for example. >>> >>> The code is only used to find string constants in initializer >>> expressions where I don't think the size of the access comes >>> into play. If a memcpy() call results in a MEM_REF[char[8], >>> &x, 8] that's fine. It's a valid reference and we can still >>> get the underlying character sequence (which is represented >>> as two STRING_CSTs with the two string literals). I might >>> be missing the point of your question. >> >> Maybe irrelevant for strlen folding depending on what you do >> for missing '\0' termination. >> @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor, tree byte_offset = DECL_FIELD_OFFSET (cfield); tree field_offset = DECL_FIELD_BIT_OFFSET (cfield); tree field_size = DECL_SIZE (cfield); - offset_int bitoffset; - offset_int bitoffset_end, access_end; + + if (!field_size && TREE_CODE (cval) == STRING_CST) + { + /* Determine the size of the flexible array member from +the size of the string initializer provided for it. */ + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval); + tree eltype = TREE_TYPE (TREE_TYPE (cval)); + len *= tree_to_uhwi (TYPE_SIZE (eltype)); + field_size = build_int_cst (size_type_node, len); + } Why does this only apply to STRING_CST initializers and not CONSTRUCTORS, say, for struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } }; >>> >>> I can't think of a use for it. Do you have something in mind? >> >> Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset >> which is useful in other parts of the compiler. So I don't see why >> it shouldn't work for general flex-arrays. >> ? And why not use simply field_size = TYPE_SIZE (TRE
Re: [PATCH] fold strlen() of aggregate members (PR 77357)
On 07/09/2018 06:40 AM, Richard Biener wrote: On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor wrote: On 07/06/2018 09:52 AM, Richard Biener wrote: On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor wrote: GCC folds accesses to members of constant aggregates except for character arrays/strings. For example, the strlen() call below is not folded: const char a[][4] = { "1", "12" }; int f (void) { retturn strlen (a[1]); } The attached change set enhances the string_constant() function to make it possible to extract string constants from aggregate initializers (CONSTRUCTORS). The initial solution was much simpler but as is often the case, MEM_REF made it fail to fold things like: int f (void) { retturn strlen (a[1] + 1); } Handling those made the project a bit more interesting and the final solution somewhat more involved. To handle offsets into aggregate string members the patch also extends the fold_ctor_reference() function to extract entire string array initializers even if the offset points past the beginning of the string and even though the size and exact type of the reference are not known (there isn't enough information in a MEM_REF to determine that). Tested along with the patch for PR 86415 on x86_64-linux. + if (TREE_CODE (init) == CONSTRUCTOR) + { + tree type; + if (TREE_CODE (arg) == ARRAY_REF + || TREE_CODE (arg) == MEM_REF) + type = TREE_TYPE (arg); + else if (TREE_CODE (arg) == COMPONENT_REF) + { + tree field = TREE_OPERAND (arg, 1); + type = TREE_TYPE (field); + } + else + return NULL_TREE; what's wrong with just type = TREE_TYPE (field); In response to your comment below abut size I simplified things further so determining the type a priori is no longer necessary. ? + base_off *= BITS_PER_UNIT; poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int, for poly you'd then use poly_offset? Okay, I tried to avoid the overflow. (Converting between all these flavors of wide int types is a monumental PITA.) You extend fold_ctor_reference to treat size == 0 specially but then bother to compute a size here - that looks unneeded? Yes, well spotted, thanks! I simplified the code so this isn't necessary, and neither is the type. While the offset of the reference determines the first field in the CONSTRUCTOR, how do you know the access doesn't touch adjacent ones? STRING_CSTs do not have to be '\0' terminated, so consider char x[2][4] = { "abcd", "abcd" }; and MEM[&x] with a char[8] type? memcpy "inlining" will create such MEMs for example. The code is only used to find string constants in initializer expressions where I don't think the size of the access comes into play. If a memcpy() call results in a MEM_REF[char[8], &x, 8] that's fine. It's a valid reference and we can still get the underlying character sequence (which is represented as two STRING_CSTs with the two string literals). I might be missing the point of your question. Maybe irrelevant for strlen folding depending on what you do for missing '\0' termination. @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor, tree byte_offset = DECL_FIELD_OFFSET (cfield); tree field_offset = DECL_FIELD_BIT_OFFSET (cfield); tree field_size = DECL_SIZE (cfield); - offset_int bitoffset; - offset_int bitoffset_end, access_end; + + if (!field_size && TREE_CODE (cval) == STRING_CST) + { + /* Determine the size of the flexible array member from +the size of the string initializer provided for it. */ + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval); + tree eltype = TREE_TYPE (TREE_TYPE (cval)); + len *= tree_to_uhwi (TYPE_SIZE (eltype)); + field_size = build_int_cst (size_type_node, len); + } Why does this only apply to STRING_CST initializers and not CONSTRUCTORS, say, for struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } }; I can't think of a use for it. Do you have something in mind? Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset which is useful in other parts of the compiler. So I don't see why it shouldn't work for general flex-arrays. ? And why not use simply field_size = TYPE_SIZE (TREE_TYPE (cval)); like you do in c_strlen? Yes, that's simpler, thanks. Otherwise looks reasonable. Attached is an updated patch. I also enhanced the handling of non-constant indices. They were already handled before to a smaller extent. (There may be other opportunities here.) Please don't do functional changes to a patch in review, without exactly pointing out the change. It makes review inefficent for me. Looks like it might be the NULL type argument handling? Sorry. The change I was referring to is the addition and handling of the varidx variable to string_constant. It was necessary for parity
Re: [PATCH] fold strlen() of aggregate members (PR 77357)
On Sun, Jul 8, 2018 at 4:56 AM Martin Sebor wrote: > > On 07/06/2018 09:52 AM, Richard Biener wrote: > > On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor wrote: > >> > >> GCC folds accesses to members of constant aggregates except > >> for character arrays/strings. For example, the strlen() call > >> below is not folded: > >> > >>const char a[][4] = { "1", "12" }; > >> > >>int f (void) { retturn strlen (a[1]); } > >> > >> The attached change set enhances the string_constant() function > >> to make it possible to extract string constants from aggregate > >> initializers (CONSTRUCTORS). > >> > >> The initial solution was much simpler but as is often the case, > >> MEM_REF made it fail to fold things like: > >> > >>int f (void) { retturn strlen (a[1] + 1); } > >> > >> Handling those made the project a bit more interesting and > >> the final solution somewhat more involved. > >> > >> To handle offsets into aggregate string members the patch also > >> extends the fold_ctor_reference() function to extract entire > >> string array initializers even if the offset points past > >> the beginning of the string and even though the size and > >> exact type of the reference are not known (there isn't enough > >> information in a MEM_REF to determine that). > >> > >> Tested along with the patch for PR 86415 on x86_64-linux. > > > > + if (TREE_CODE (init) == CONSTRUCTOR) > > + { > > + tree type; > > + if (TREE_CODE (arg) == ARRAY_REF > > + || TREE_CODE (arg) == MEM_REF) > > + type = TREE_TYPE (arg); > > + else if (TREE_CODE (arg) == COMPONENT_REF) > > + { > > + tree field = TREE_OPERAND (arg, 1); > > + type = TREE_TYPE (field); > > + } > > + else > > + return NULL_TREE; > > > > what's wrong with just > > > > type = TREE_TYPE (field); > > In response to your comment below abut size I simplified things > further so determining the type a priori is no longer necessary. > > > ? > > > > + base_off *= BITS_PER_UNIT; > > > > poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int, > > for poly you'd then use poly_offset? > > Okay, I tried to avoid the overflow. (Converting between all > these flavors of wide int types is a monumental PITA.) > > > > > You extend fold_ctor_reference to treat size == 0 specially but then > > bother to compute a size here - that looks unneeded? > > Yes, well spotted, thanks! I simplified the code so this isn't > necessary, and neither is the type. > > > > > While the offset of the reference determines the first field in the > > CONSTRUCTOR, how do you know the access doesn't touch > > adjacent ones? STRING_CSTs do not have to be '\0' terminated, > > so consider > > > > char x[2][4] = { "abcd", "abcd" }; > > > > and MEM[&x] with a char[8] type? memcpy "inlining" will create > > such MEMs for example. > > The code is only used to find string constants in initializer > expressions where I don't think the size of the access comes > into play. If a memcpy() call results in a MEM_REF[char[8], > &x, 8] that's fine. It's a valid reference and we can still > get the underlying character sequence (which is represented > as two STRING_CSTs with the two string literals). I might > be missing the point of your question. Maybe irrelevant for strlen folding depending on what you do for missing '\0' termination. > > > > @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor, > >tree byte_offset = DECL_FIELD_OFFSET (cfield); > >tree field_offset = DECL_FIELD_BIT_OFFSET (cfield); > >tree field_size = DECL_SIZE (cfield); > > - offset_int bitoffset; > > - offset_int bitoffset_end, access_end; > > + > > + if (!field_size && TREE_CODE (cval) == STRING_CST) > > + { > > + /* Determine the size of the flexible array member from > > +the size of the string initializer provided for it. */ > > + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval); > > + tree eltype = TREE_TYPE (TREE_TYPE (cval)); > > + len *= tree_to_uhwi (TYPE_SIZE (eltype)); > > + field_size = build_int_cst (size_type_node, len); > > + } > > > > Why does this only apply to STRING_CST initializers and not CONSTRUCTORS, > > say, for > > > > struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } }; > > I can't think of a use for it. Do you have something in mind? Well, you basically implemented a get-CONSTRUCTOR-elt-at-offset which is useful in other parts of the compiler. So I don't see why it shouldn't work for general flex-arrays. > > > > ? And why not use simply > > > > field_size = TYPE_SIZE (TREE_TYPE (cval)); > > > > like you do in c_strlen? > > Yes, that's simpler, thanks. > > > > > Otherwise looks reasonable. > > Attached is an updated patch. I also enhanced the handling > of non-constant indices. They were already handled before > to a smaller extent.
Re: [PATCH] fold strlen() of aggregate members (PR 77357)
On 07/06/2018 09:52 AM, Richard Biener wrote: On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor wrote: GCC folds accesses to members of constant aggregates except for character arrays/strings. For example, the strlen() call below is not folded: const char a[][4] = { "1", "12" }; int f (void) { retturn strlen (a[1]); } The attached change set enhances the string_constant() function to make it possible to extract string constants from aggregate initializers (CONSTRUCTORS). The initial solution was much simpler but as is often the case, MEM_REF made it fail to fold things like: int f (void) { retturn strlen (a[1] + 1); } Handling those made the project a bit more interesting and the final solution somewhat more involved. To handle offsets into aggregate string members the patch also extends the fold_ctor_reference() function to extract entire string array initializers even if the offset points past the beginning of the string and even though the size and exact type of the reference are not known (there isn't enough information in a MEM_REF to determine that). Tested along with the patch for PR 86415 on x86_64-linux. + if (TREE_CODE (init) == CONSTRUCTOR) + { + tree type; + if (TREE_CODE (arg) == ARRAY_REF + || TREE_CODE (arg) == MEM_REF) + type = TREE_TYPE (arg); + else if (TREE_CODE (arg) == COMPONENT_REF) + { + tree field = TREE_OPERAND (arg, 1); + type = TREE_TYPE (field); + } + else + return NULL_TREE; what's wrong with just type = TREE_TYPE (field); In response to your comment below abut size I simplified things further so determining the type a priori is no longer necessary. ? + base_off *= BITS_PER_UNIT; poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int, for poly you'd then use poly_offset? Okay, I tried to avoid the overflow. (Converting between all these flavors of wide int types is a monumental PITA.) You extend fold_ctor_reference to treat size == 0 specially but then bother to compute a size here - that looks unneeded? Yes, well spotted, thanks! I simplified the code so this isn't necessary, and neither is the type. While the offset of the reference determines the first field in the CONSTRUCTOR, how do you know the access doesn't touch adjacent ones? STRING_CSTs do not have to be '\0' terminated, so consider char x[2][4] = { "abcd", "abcd" }; and MEM[&x] with a char[8] type? memcpy "inlining" will create such MEMs for example. The code is only used to find string constants in initializer expressions where I don't think the size of the access comes into play. If a memcpy() call results in a MEM_REF[char[8], &x, 8] that's fine. It's a valid reference and we can still get the underlying character sequence (which is represented as two STRING_CSTs with the two string literals). I might be missing the point of your question. @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor, tree byte_offset = DECL_FIELD_OFFSET (cfield); tree field_offset = DECL_FIELD_BIT_OFFSET (cfield); tree field_size = DECL_SIZE (cfield); - offset_int bitoffset; - offset_int bitoffset_end, access_end; + + if (!field_size && TREE_CODE (cval) == STRING_CST) + { + /* Determine the size of the flexible array member from +the size of the string initializer provided for it. */ + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval); + tree eltype = TREE_TYPE (TREE_TYPE (cval)); + len *= tree_to_uhwi (TYPE_SIZE (eltype)); + field_size = build_int_cst (size_type_node, len); + } Why does this only apply to STRING_CST initializers and not CONSTRUCTORS, say, for struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } }; I can't think of a use for it. Do you have something in mind? ? And why not use simply field_size = TYPE_SIZE (TREE_TYPE (cval)); like you do in c_strlen? Yes, that's simpler, thanks. Otherwise looks reasonable. Attached is an updated patch. I also enhanced the handling of non-constant indices. They were already handled before to a smaller extent. (There may be other opportunities here.) Martin PR middle-end/77357 - strlen of constant strings not folded gcc/ChangeLog: * builtins.c (c_strlen): Avoid out-of-bounds warnings when accessing implicitly initialized array elements. * expr.c (string_constant): Handle string initializers of character arrays within aggregates. * gimple-fold.c (fold_array_ctor_reference): Add argument. Store element offset. As a special case, handle zero size. (fold_nonarray_ctor_reference): Same. (fold_ctor_reference): Add argument. Store subobject offset. * gimple-fold.h (fold_ctor_reference): Add argument. gcc/testsuite/ChangeLog: PR middle-end/77357 * gcc.dg/strlenopt-49.c: New test. * gcc.dg/strlenopt-50.c: New test. * gcc.dg/strleno
Re: [PATCH] fold strlen() of aggregate members (PR 77357)
On Fri, Jul 6, 2018 at 1:54 AM Martin Sebor wrote: > > GCC folds accesses to members of constant aggregates except > for character arrays/strings. For example, the strlen() call > below is not folded: > >const char a[][4] = { "1", "12" }; > >int f (void) { retturn strlen (a[1]); } > > The attached change set enhances the string_constant() function > to make it possible to extract string constants from aggregate > initializers (CONSTRUCTORS). > > The initial solution was much simpler but as is often the case, > MEM_REF made it fail to fold things like: > >int f (void) { retturn strlen (a[1] + 1); } > > Handling those made the project a bit more interesting and > the final solution somewhat more involved. > > To handle offsets into aggregate string members the patch also > extends the fold_ctor_reference() function to extract entire > string array initializers even if the offset points past > the beginning of the string and even though the size and > exact type of the reference are not known (there isn't enough > information in a MEM_REF to determine that). > > Tested along with the patch for PR 86415 on x86_64-linux. + if (TREE_CODE (init) == CONSTRUCTOR) + { + tree type; + if (TREE_CODE (arg) == ARRAY_REF + || TREE_CODE (arg) == MEM_REF) + type = TREE_TYPE (arg); + else if (TREE_CODE (arg) == COMPONENT_REF) + { + tree field = TREE_OPERAND (arg, 1); + type = TREE_TYPE (field); + } + else + return NULL_TREE; what's wrong with just type = TREE_TYPE (field); ? + base_off *= BITS_PER_UNIT; poly_uint64 isn't enough for "bits", with wide-int you'd use offset_int, for poly you'd then use poly_offset? You extend fold_ctor_reference to treat size == 0 specially but then bother to compute a size here - that looks unneeded? While the offset of the reference determines the first field in the CONSTRUCTOR, how do you know the access doesn't touch adjacent ones? STRING_CSTs do not have to be '\0' terminated, so consider char x[2][4] = { "abcd", "abcd" }; and MEM[&x] with a char[8] type? memcpy "inlining" will create such MEMs for example. @@ -6554,8 +6577,16 @@ fold_nonarray_ctor_reference (tree type, tree ctor, tree byte_offset = DECL_FIELD_OFFSET (cfield); tree field_offset = DECL_FIELD_BIT_OFFSET (cfield); tree field_size = DECL_SIZE (cfield); - offset_int bitoffset; - offset_int bitoffset_end, access_end; + + if (!field_size && TREE_CODE (cval) == STRING_CST) + { + /* Determine the size of the flexible array member from +the size of the string initializer provided for it. */ + unsigned HOST_WIDE_INT len = TREE_STRING_LENGTH (cval); + tree eltype = TREE_TYPE (TREE_TYPE (cval)); + len *= tree_to_uhwi (TYPE_SIZE (eltype)); + field_size = build_int_cst (size_type_node, len); + } Why does this only apply to STRING_CST initializers and not CONSTRUCTORS, say, for struct S { int i; int a[]; } s = { 1, { 2, 3, 4, 5, 6 } }; ? And why not use simply field_size = TYPE_SIZE (TREE_TYPE (cval)); like you do in c_strlen? Otherwise looks reasonable. Thanks, Richard. > Martin
[PATCH] fold strlen() of aggregate members (PR 77357)
GCC folds accesses to members of constant aggregates except for character arrays/strings. For example, the strlen() call below is not folded: const char a[][4] = { "1", "12" }; int f (void) { retturn strlen (a[1]); } The attached change set enhances the string_constant() function to make it possible to extract string constants from aggregate initializers (CONSTRUCTORS). The initial solution was much simpler but as is often the case, MEM_REF made it fail to fold things like: int f (void) { retturn strlen (a[1] + 1); } Handling those made the project a bit more interesting and the final solution somewhat more involved. To handle offsets into aggregate string members the patch also extends the fold_ctor_reference() function to extract entire string array initializers even if the offset points past the beginning of the string and even though the size and exact type of the reference are not known (there isn't enough information in a MEM_REF to determine that). Tested along with the patch for PR 86415 on x86_64-linux. Martin PR middle-end/77357 - strlen of constant strings not folded gcc/ChangeLog: * builtins.c (c_strlen): Avoid out-of-bounds warnings when accessing implicitly initialized array elements. * expr.c (string_constant): Handle string initializers of character arrays within aggregates. * gimple-fold.c (fold_array_ctor_reference): Add argument. Store element offset. As a special case, handle zero size. (fold_nonarray_ctor_reference): Same. (fold_ctor_reference): Add argument. Store subobject offset. * gimple-fold.h (fold_ctor_reference): Add argument. gcc/testsuite/ChangeLog: PR middle-end/77357 * gcc.dg/strlenopt-49.c: New test. * gcc.dg/strlenopt-50.c: New test. diff --git a/gcc/builtins.c b/gcc/builtins.c index 91658e8..2f9d5d7 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -602,8 +602,15 @@ c_strlen (tree src, int only_value) = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (src; /* Set MAXELTS to sizeof (SRC) / sizeof (*SRC) - 1, the maximum possible - length of SRC. */ - unsigned maxelts = TREE_STRING_LENGTH (src) / eltsize - 1; + length of SRC. Prefer TYPE_SIZE() to TREE_STRING_LENGTH() if possible + in case the latter is less than the size of the array. */ + HOST_WIDE_INT maxelts = TREE_STRING_LENGTH (src); + tree type = TREE_TYPE (src); + if (tree size = TYPE_SIZE_UNIT (type)) +if (tree_fits_shwi_p (size)) + maxelts = tree_to_uhwi (size); + + maxelts = maxelts / eltsize - 1; /* PTR can point to the byte representation of any string type, including char* and wchar_t*. */ diff --git a/gcc/expr.c b/gcc/expr.c index 56751df..be3ab93 100644 --- a/gcc/expr.c +++ b/gcc/expr.c @@ -54,11 +54,13 @@ along with GCC; see the file COPYING3. If not see #include "reload.h" #include "langhooks.h" #include "common/common-target.h" +#include "tree-dfa.h" #include "tree-ssa-live.h" #include "tree-outof-ssa.h" #include "tree-ssa-address.h" #include "builtins.h" #include "ccmp.h" +#include "gimple-fold.h" #include "rtx-vector-builder.h" @@ -11274,54 +11276,20 @@ is_aligning_offset (const_tree offset, const_tree exp) tree string_constant (tree arg, tree *ptr_offset) { - tree array, offset, lower_bound; + tree array; STRIP_NOPS (arg); + poly_int64 base_off = 0; + if (TREE_CODE (arg) == ADDR_EXPR) { - if (TREE_CODE (TREE_OPERAND (arg, 0)) == STRING_CST) - { - *ptr_offset = size_zero_node; - return TREE_OPERAND (arg, 0); - } - else if (TREE_CODE (TREE_OPERAND (arg, 0)) == VAR_DECL) - { - array = TREE_OPERAND (arg, 0); - offset = size_zero_node; - } - else if (TREE_CODE (TREE_OPERAND (arg, 0)) == ARRAY_REF) - { - array = TREE_OPERAND (TREE_OPERAND (arg, 0), 0); - offset = TREE_OPERAND (TREE_OPERAND (arg, 0), 1); - if (TREE_CODE (array) != STRING_CST && !VAR_P (array)) - return 0; - - /* Check if the array has a nonzero lower bound. */ - lower_bound = array_ref_low_bound (TREE_OPERAND (arg, 0)); - if (!integer_zerop (lower_bound)) - { - /* If the offset and base aren't both constants, return 0. */ - if (TREE_CODE (lower_bound) != INTEGER_CST) - return 0; - if (TREE_CODE (offset) != INTEGER_CST) - return 0; - /* Adjust offset by the lower bound. */ - offset = size_diffop (fold_convert (sizetype, offset), -fold_convert (sizetype, lower_bound)); - } - } - else if (TREE_CODE (TREE_OPERAND (arg, 0)) == MEM_REF) - { - array = TREE_OPERAND (TREE_OPERAND (arg, 0), 0); - offset = TREE_OPERAND (TREE_OPERAND (arg, 0), 1); - if (TREE_CODE (array) != ADDR_EXPR) - return 0; - array = TREE_OPERAND (array, 0); - if (TREE_CODE (array) != STRING_CST && !VAR_P (array)) - return 0; - } - else - return 0; + arg = TREE_OPERAND (arg, 0); + array = get_addr_base_and_unit_offset (arg, &base_off); + if (!array + || (TREE_CODE (array) != VAR_DECL + && TREE_CODE (array) !=