On 1/5/19 1:31 PM, Jakub Jelinek wrote: > On Sat, Jan 05, 2019 at 12:20:24PM +0000, Bernd Edlinger wrote: >> co-incidentally my "Make strlen range computations more conservative" patch >> contained a fix on the same spot, I just did not have a test case for it: >> >> @@ -3184,7 +3146,10 @@ get_min_string_length (tree rhs, bool *f >> && TREE_READONLY (rhs)) >> rhs = DECL_INITIAL (rhs); >> >> - if (rhs && TREE_CODE (rhs) == STRING_CST) >> + if (rhs && TREE_CODE (rhs) == STRING_CST >> + && tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (rhs)))) == 1 >> + && TREE_STRING_LENGTH (rhs) > 0 >> + && TREE_STRING_POINTER (rhs) [TREE_STRING_LENGTH (rhs) - 1] == '\0') >> { >> *full_string_p = true; >> return strlen (TREE_STRING_POINTER (rhs)); >> >> >> additionally to your patch this tests the string is in fact a single-byte >> string. >> since strlen returns garbage otherwise. > > Multi-byte STRING_CSTs seem to be stored in the target byte order, e.g. > L"abcde" in a x86_64-linux -> powerpc64-linux cross is still > "\000\000\000a\000\000\000b\000\000\000c\000\000\000d\000\000\000\000" > so I think strlen is exactly what we want. The tree-ssa-strlen.c pass > doesn't track string lengths in whatever units it has, it tracks number of > non-zero bytes followed by zero byte known at certain address, or, if there > is no zero byte known, the minimum amount of non-zero bytes known at certain > address. >
Well, yes it works, but this can only optimize invalid code like strlen((char*)L"wide"). However, optimizing invalid stuff away does both defeat warnings in later passes, and is also not helpful for debugging the resulting code in general. > And, your patch has also the bad effect that it won't return any length for > the strings that aren't zero terminated. We do want to return 9 for the > "abcdefghi" string without zero terminator at the end, just need to set > *full_string_p to false. And, for "abcd\000fghi" string without zero > termination at the end, we do want to return 4 and set *full_string_p to > true. > Right, I did not consider the nul in the middle. Bernd.