On Fri, Aug 3, 2018 at 9:19 AM Jeff Law <l...@redhat.com> wrote: > > On 07/25/2018 01:23 AM, Richard Biener wrote: > > On Tue, 24 Jul 2018, Bernd Edlinger wrote: > > > >> On 07/24/18 23:46, Jeff Law wrote: > >>> On 07/24/2018 01:59 AM, Bernd Edlinger wrote: > >>>> Hi! > >>>> > >>>> This patch makes strlen range computations more conservative. > >>>> > >>>> Firstly if there is a visible type cast from type A to B before passing > >>>> then value to strlen, don't expect the type layout of B to restrict the > >>>> possible return value range of strlen. > >>> Why do you think this is the right thing to do? ie, is there language > >>> in the standards that makes you think the code as it stands today is > >>> incorrect from a conformance standpoint? Is there a significant body of > >>> code that is affected in an adverse way by the current code? If so, > >>> what code? > >>> > >>> > >> > >> I think if you have an object, of an effective type A say char[100], then > >> you can cast the address of A to B, say typedef char (*B)[2] for instance > >> and then to const char *, say for use in strlen. I may be wrong, but I > >> think > >> that we should at least try not to pick up char[2] from B, but instead > >> use A for strlen ranges, or leave this range open. Currently the range > >> info for strlen is [0..1] in this case, even if we see the type cast > >> in the generic tree. > > > > You raise a valid point - namely that the middle-end allows > > any object (including storage with a declared type) to change > > its dynamic type (even of a piece of it). So unless you can > > prove that the dynamic type of the thing you are looking at > > matches your idea of that type you may not derive any string > > lengths (or ranges) from it. > > > > BUT - for the string_constant and c_strlen functions we are, > > in all cases we return something interesting, able to look > > at an initializer which then determines that type. Hopefully. > > I think the strlen() folding code when it sets SSA ranges > > now looks at types ...? > I'm leaning towards a similar conclusion, namely that we can only rely > on type information for the pointer that actually gets passed to strlen, > which 99.9% of the time is (char *), potentially with const qualifiers.
It's 100% (char *) because the C standard says arguments are converted to the argument type. > It's tempting to look back through the cast to find a cast from a char > array but I'm more and more concerned that it's not safe unless we can > walk back to an initializer. > > What this might argue is that we need to distinguish between a known > range and a likely range. I really dislike doing that again. We may > have to see more real world cases where the likely range allows us to > improve the precision of the sprintf warnings (since that's really the > goal of improved string length ranges). > > > > > > > Consider > > > > struct X { int i; char c[4]; int j;}; > > struct Y { char c[16]; }; > > > > void foo (struct X *p, struct Y *q) > > { > > memcpy (p, q, sizeof (struct Y)); > > if (strlen ((char *)(struct Y *)p + 4) < 7) > > abort (); > > } > > > > here the GIMPLE IL looks like > > > > const char * _1; > > > > <bb 2> [local count: 1073741825]: > > _5 = MEM[(char * {ref-all})q_4(D)]; > > MEM[(char * {ref-all})p_6(D)] = _5; > > _1 = p_6(D) + 4; > > _2 = __builtin_strlen (_1); > > > > and I guess Martin would argue that since p is of type struct X > > + 4 gets you to c[4] and thus strlen of that cannot be larger > > than 3. > But _1 is of type const char * and that's what's passed to strlen. The > type of P and Q are irrelevant ISTM. > > Jeff > >