On 08/25/18 19:32, Jeff Law wrote: > On 08/25/2018 12:32 AM, Bernd Edlinger wrote: >> On 08/25/18 01:54, Jeff Law wrote: >>> On 08/24/2018 11:26 AM, Bernd Edlinger wrote: >>>> On 08/24/18 18:51, Jeff Law wrote: >>>>>> Well, this is broken for wide character strings. >>>>>> but I hope we can get rid of STRING_CST which are >>>>>> not explicitly null terminated. >>>> >>>> I am afraid that is not going to happen. >>>> Maybe we can get STRING_CST that are never longer >>>> than the TYPE_UNIT_SIZE, but c_strlen and c_getstr >>>> need to take care that the string is zero-terminated. >>>> >>>> string_constant, should not promise the string is zero terminated. >>>> But instead it can promise that: >>>> 1) the STRING_CST is valid up to TREE_STRING_LENGTH >>>> 2) mem_size is >= TREE_STRING_LENGTH >>>> 3) memory between TREE_STRING_LENGTH and mem_size is ZERO. >>>> >>>> It will not guarantee anything about zero termination any more. >>> Interesting because those conditions would be sufficient to deal with a >>> regression I stumbled over after fixing Martin's patch to not assume >>> that all STRING_CSTs are NUL terminated. >>> >>> But I need to think about this a bit more. Essentially the question >>> we'd need to ask is whether or not these are sufficient in general or >>> just in specific cases. >>> >>> I tend to think they're not sufficient in general. If a string returned >>> by string_constant that didn't have a terminating NUL, but which did >>> pass the tests above were ultimately passed to the runtime's str* >>> routines, then the call may run off the end of the string. We'd like to >>> be able to warn for that. >>> >>> So ISTM those rules are only valid in contexts where we know the result >>> isn't going to be passed to str* and friends within the C library. >>> >>> I do think they're sufficient to avoid problems with the >>> tree-ssa-forwprop code we've looked at. So what may make the most sense >>> is to have that routine indicate it's willing to accept unterminated >>> strings, then check the conditions above before optimizing the code. >>> >> >> There are not too many callers of string_constant. >> Not all need zero termination. > Right. And in retrospect we probably should have avoided default > parameter overloads and just fixed the callers. But that can be a > follow-up. > >> >> But I think if the are interested in zero-termination >> they should simply call c_strlen or c_getstr. > Perhaps. > > >> >> >>>> >>>> In the end, the best approach might be to either merge my patch >>>> with Martins, or step-wise, first fixing wrong code, and then >>>> implementing warnings without fixing wrong code. >>> Unsure at this time. I've been working with both. I suspect that if we >>> went with yours that we'd then turn around and layer Martin's on top of >>> it because of the desire to signal to callers that we have an >>> unterminated string and have the callers take appropriate action. Which >>> begs the question of whether or not we just go with Martin's -- ie, is >>> there really any value in using both. I haven't seen indications there >>> is value in that approach, but I'm still poking at things. >>> >> >> Well, ya call it "layer one patch over the other" >> I call it "incremental improvements". > It is (of course) a case by case basis. The way I try to look at these > things is to ask whether or not the first patch under consideration > would have any value/purpose after the second patch was installed. If > so, then it may make sense to include both. If not, then we really just > want one patch. >
Agreed. I think the question is which of the possible STRING_CST semantics we want to have in the end (the middle-end). Everything builds on top of the semantic properties of STRING_CSTs. My first attempt of fix the STRING_CST semantic was trying to make string_constant happy. My second attempt is trying to make Richard happy. And when I look at both patches, I think the second one is better, and more simple. BTW I need to correct on statement in my last e-mail: On 08/24/18 23:55, Bernd Edlinger wrote:> > here I quote form martins 1/6 patch: >> + /* Compute the lower bound number of elements (not bytes) in the array >> + that the string is used to initialize. The actual size of the array >> + will be may be greater if the string is shorter, but the important >> + data point is whether the literal, including the terminating nul, >> + fits in the array. */ >> + unsigned HOST_WIDE_INT array_elts >> + = tree_to_uhwi (TYPE_SIZE_UNIT (TREE_TYPE (init))) / charsize; >> + >> + /* Compute the string length in (wide) characters. */ > > So prior to my STRING_CST patch this will ICE on a flexible array member, > because those have TYPE_SIZE_UNIT = NULL, and tree_to_uhwi will ICE. > > I used: > > compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)), > TREE_STRING_LENGTH (init)) > > and this will not ICE with NULL, but consider it like infinity, > and return 1. > > So my version did not ICE in that case. > Oooops, actually both versions will likely ICE on TYPE_SIZE_UNIT == NULL. I actually tried to test that case, but have done something wrong. I hope we can get rid if the incomplete types in the middle-end. So maybe just inject "if (!tree_fits_uhwi_p (...)) return NULL_TREE;" here? Or maybe just defer until we have clarity about the semantics. Bernd.