On Wed, 1 Aug 2018, Bernd Edlinger wrote: > >> > The change to have all STRING_CSTs NUL terminated (but that NUL > >> > termination not necessarily inclided in STRING_LENGTH) is a good > >> > one. > >> > > >> > I'm not sure how we can reliably verify NUL termination after the > >> > fact though and build_string already makes sure to NUL terminate > >> > STRING_CSTs. So if GO strings are not NUL terminated then > >> > the STRING_CSTs still are. > >> > >> The advantage is that there are less variations how string literals look > >> like in the middle end. We will have a simple way to determine if > >> a string literal is NUL terminated or not. And checking that property > >> in varasm.c is exactly the right thing to do. > >> > >> String literals always have an array_type which may be shorter > >> than TREE_STRING_LENGTH, but that chops off only exactly > >> one wide character nul. Otherwise if the array_type is equal or larger, > >> we know for sure the value is nul terminated. In the middle-end > >> we can easily determine if a string is not NUL terminated by: > >> > >> compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)), > >> TREE_STRING_LENGTH (init)) < 0 > >> > >> I did use that already in my patch for pr86711. > > > > Hmm. How does that tell you whether the string is NUL terminated? > > TREE_STRING_LENGTH includes the NUL termination and if it is > > a significant char then TYPE_SIZE_UNIT should as well. > > I debugged that code a lot recently. > static const char x[2] = "ab" > gives a TREE_STRING_LENGTH of 3, the TREE_TYPE of that > beast is an array type for char[2]. and TYPE_SIZE_UNIT = 2.
Hmm. I think it would be nice if TREE_STRING_LENGTH would match char[2] and TYPE_SIZE_UNIT even if that is inconvenient for your check above. Because the '\0' doesn't belong to the string. Then build_string internally appends a '\0' outside of TREE_STRING_LENGTH. > An ordinary C string "ab" has TYPE_SIZE_UNIT(TREE_TYPE(x)) = 3. > > Of course with wide caracher strings the TREE_STING_LENGTH > and TYPE_SIZE_UNIT of the ARRAY_TYPE are multiple of > the used wide character type. Sure. > So I would like to be able to assume that the STRING_CST objects > are internally always generated properly by the front end. Yeah, I guess we need to define what "properly" is ;) > And that the ARRAY_TYPE of the string literal either has the > same length than the TREE_STRING_LENGTH or if it is shorter, > this is always exactly one (wide) character size less than TREE_STRING_LENGTH I think it should be always the same... > The idea is to use this property of string literals where needed, > and check rigorously in varasm.c. > > Does that make sense? So if it is not the same then the excess character needs to be a (wide) NUL in your model? ISTR your varasm.c patch didn't verify that. > > > > Isn't a proper test to look at TREE_STRING_POINTER[TREE_STRING_LENGTH - 1] > > (for HOST_CHAR_BITS strings)? > > > > There are also wide character strings, and all those test are broken > everywhere > for wide characters right now. > > Therefore checking the string constants at varasm.c revealed a lot of > intersting > things. > I will post several patches in the afternoon. > > > Relying on the type here looks somewhat fragile to me. > > > > Abstracting a string_cst_nul_terminated_p () helper would be a good > > idea I guess. > > Yes. indeed. > > > I realize using strlen(TREE_STRING_POINTER) doesn't work because > > of embedded NULs. > > > >> Additionally not having oversize string constants produced > >> by the front ends, where the extra characters are good for nothing, > >> also helps to improve correctness. > >> > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)