On Mon, 20 Aug 2018, Bernd Edlinger wrote: > On 08/20/18 13:01, Richard Biener wrote: > > On Wed, Aug 1, 2018 at 3:05 PM Bernd Edlinger <bernd.edlin...@hotmail.de> > > wrote: > >> > >> > >> > >> On 08/01/18 11:29, Richard Biener wrote: > >>> > >>> Hmm. I think it would be nice if TREE_STRING_LENGTH would > >>> match char[2] and TYPE_SIZE_UNIT even if that is inconvenient > >>> for your check above. Because the '\0' doesn't belong to the > >>> string. Then build_string internally appends a '\0' outside > >>> of TREE_STRING_LENGTH. > >>> > >> > >> Hmm. Yes, but the outside-0 byte is just one byte, not a wide > >> character. > > > > That could be fixed though (a wide 0 is just N 0s). Add a elsz = 1 > > parameter to build_string and allocate as many extra 0s as needed. > > > > There are STRING_CSTs which are not string literals, > >> for instance attribute tags, Pragmas, asm constrants, etc. > >> They use the '\0' outside, and have probably no TREE_TYPE. > >> > >>> > >>>> So I would like to be able to assume that the STRING_CST objects > >>>> are internally always generated properly by the front end. > >>> > >>> Yeah, I guess we need to define what "properly" is ;) > >>> > >> Yes. > >> > >>>> And that the ARRAY_TYPE of the string literal either has the > >>>> same length than the TREE_STRING_LENGTH or if it is shorter, > >>>> this is always exactly one (wide) character size less than > >>>> TREE_STRING_LENGTH > >>> > >>> I think it should be always the same... > >>> > >> > >> One could not differentiate between "\0" without zero-termination > >> and "" with zero-termination, theoretically. > > > > Is that important? Doesn't the C standard say how to parse string literals? > > > >> We also have char x[100] = "ab"; > >> that is TREE_STRING_LENGTH=3, and TYPE_SIZE_UNIT(TREE_TYPE(x)) = 100. > >> Of course one could create it with a TREE_STRING_LENGTH = 100, > >> but imagine char x[100000000000] = "ab" > > > > The question is more about TYPE_SIZE_UNIT (TREE_TYPE ("ab")) which I > > hope matches "ab" and not 'x'. If it matches 'x' then I'd rather have it > > unconditionally be [], thus incomplete (because the literals "size" depends > > on the context/LHS it is used on). > > > > Sorry, but I must say, it is not at all like that. > > If I compile x.c: > const char x[100] = "ab"; > > and set a breakpoint at output_constant: > > Breakpoint 1, output_constant (exp=0x7ffff6ff9dc8, size=100, align=256, > reverse=false) at ../../gcc-trunk/gcc/varasm.c:4821 > 4821 if (size == 0 || flag_syntax_only) > (gdb) p size > $1 = 100 > (gdb) call debug(exp) > "ab" > (gdb) p *exp > $2 = {base = {code = STRING_CST, side_effects_flag = 0, constant_flag = 1, > (gdb) p exp->typed.type->type_common.size_unit > $5 = (tree) 0x7ffff6ff9d80 > (gdb) call debug(exp->typed.type->type_common.size_unit) > 100 > (gdb) p exp->string.length > $6 = 3 > (gdb) p exp->string.str[0] > $8 = 97 'a' > (gdb) p exp->string.str[1] > $9 = 98 'b' > (gdb) p exp->string.str[2] > $10 = 0 '\000' > (gdb) p exp->string.str[3] > $11 = 0 '\000' > > > This is an important property of string_cst objects, that is used in c_strlen: > > It folds c_strlen(&x[4]) directly to 0, because every byte beyond > TREE_STRING_LENGTH > is guaranteed to be zero up to the type size. > > I would not have spent one thought on implementing an optimization like that, > but that's how it is right now.
Huh. So somebody interpreted STRING_CSTs similar to CONSTRUCTORs aka they have zero-padding up to its type size. I don't see this documented anywhere and it would suggest to "optimize" "ab\0\0\0\0" to "ab\0" with appropriate TYPE_SIZE. This is also a relatively new thing on trunk (coming out of the added mem_size parameter of string_constant). That this looks at the STRING_CST type like if (TREE_CODE (array) == STRING_CST) { *ptr_offset = fold_convert (sizetype, offset); if (mem_size) *mem_size = TYPE_SIZE_UNIT (TREE_TYPE (array)); return array; I'd call simply a bug. As said, interpretation of STRING_CSTs should depend on the context. And for char a[4] = "abc"; char b[5] = "abc"; we should better be able to share STRING_CSTs [you can see LTO sharing the nodes if you do b[4] but not when b[5] I suppose]. > All I want to do, is make sure that all string constants have the same look > and feel > in the middle-end, and restrict the variations that are allowed by the current > implementation. Sure, I understand that. But I'd like to simplify things and not add complications like looking at TYPE_SIZE vs. TREE_STRING_LENGTH to decide whether sth is 0-terminated. Richard. > > Bernd. > > > >>>> The idea is to use this property of string literals where needed, > >>>> and check rigorously in varasm.c. > >>>> > >>>> Does that make sense? > >>> > >>> So if it is not the same then the excess character needs to be > >>> a (wide) NUL in your model? ISTR your varasm.c patch didn't verify > >>> that. > >>> > >> > >> I think it does. > >> > >> > >> Bernd. > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)