On 08/24/2018 11:26 AM, Bernd Edlinger wrote:
> On 08/24/18 18:51, Jeff Law wrote:
>>> Well, this is broken for wide character strings.
>>> but I hope we can get rid of STRING_CST which are
>>> not explicitly null terminated.
> 
> I am afraid that is not going to happen.
> Maybe we can get STRING_CST that are never longer
> than the TYPE_UNIT_SIZE, but c_strlen and c_getstr
> need to take care that the string is zero-terminated.
> 
> string_constant, should not promise the string is zero terminated.
> But instead it can promise that:
> 1) the STRING_CST is valid up to TREE_STRING_LENGTH
> 2) mem_size is >= TREE_STRING_LENGTH
> 3) memory between TREE_STRING_LENGTH and mem_size is ZERO.
> 
> It will not guarantee anything about zero termination any more.
Interesting because those conditions would be sufficient to deal with a
regression I stumbled over after fixing Martin's patch to not assume
that all STRING_CSTs are NUL terminated.

But I need to think about this a bit more.  Essentially the question
we'd need to ask is whether or not these are sufficient in general or
just in specific cases.

I tend to think they're not sufficient in general. If a string returned
by string_constant that didn't have a terminating NUL, but which did
pass the tests above were ultimately passed to the runtime's str*
routines, then the call may run off the end of the string.  We'd like to
be able to warn for that.

So ISTM those rules are only valid in contexts where we know the result
isn't going to be passed to str* and friends within the C library.

I do think they're sufficient to avoid problems with the
tree-ssa-forwprop code we've looked at.  So what may make the most sense
is to have that routine indicate it's willing to accept unterminated
strings, then check the conditions above before optimizing the code.

> 
> In the end, the best approach might be to either merge my patch
> with Martins, or step-wise, first fixing wrong code, and then
> implementing warnings without fixing wrong code.
Unsure at this time.  I've been working with both.  I suspect that if we
went with yours that we'd then turn around and layer Martin's on top of
it because of the desire to signal to callers that we have an
unterminated string and have the callers take appropriate action.  Which
begs the question of whether or not we just go with Martin's -- ie, is
there really any value in using both.  I haven't seen indications there
is value in that approach, but I'm still poking at things.

Jeff

Reply via email to