On 08/17/18 15:38, Richard Biener wrote:
> On Fri, 17 Aug 2018, Bernd Edlinger wrote:
> 
>> On 08/17/18 14:19, Richard Biener wrote:
>>> On Fri, 17 Aug 2018, Bernd Edlinger wrote:
>>>
>>>> Richard Biener wrote:
>>>>> +embedded @code{NUL} characters.  However, the
>>>>> +@code{TREE_STRING_LENGTH} always includes a trailing @code{NUL} that
>>>>> +is not part of the language string literal but appended by the front end.
>>>>> +If the string shall not be @code{NUL}-terminated the @code{TREE_TYPE}
>>>>> +is one character shorter than @code{TREE_STRING_LENGTH}.
>>>>> +Excess caracters other than one trailing @code{NUL} character are not
>>>
>>> characters btw.
>>>
>>
>> thanks, updated.
>>
>>> I read the above that the string literal for
>>>
>>> char x[2] = "1";
>>>
>>> is actually "1\0\0" - there's one NUL that is not part of the language
>>> string literal.  The second sentence then suggests that both \0
>>> are removed because 2 is less than 3?
>>>
>>
>> maybe 2 is a bad example, lets consider:
>> char x[2000000000] = "1";
>>
>> That is a string_cst with STRING_LENGTH = 2, content = "2\0\0"
>> the array_type is used on both x, and the string_cst,
>> I was assuming that both tree objects refer to the same type object.
>> which is char[0..2000000000-1]
> 
> Oh, didn't realize we use char[200000000] for the STRING_CST.  Makes
> my suggestion to use char[] instead not (very) much worse than the
> existing practice then.
> 
>> varasm assembles the bytes that are given by STRING_LENGTH
>> and appends zeros as appropriate.
>>
>>> As said, having this extra semantics of a STRING_CST tied to
>>> another tree node (its TREE_TYPE) looks ugly.
>>>
>>>>> +permitted.
>>>>>
>>>>> I find this very confusing and oppose to that change.  Can we get
>>>>> back to the drawing board please?  If we want an easy way to
>>>>> see whether a string is "properly" terminated then maybe we can
>>>>> simply use a flag that gets set by build_string?
>>>>>
>>>>
>>>> What I mean with that is the case like
>>>> char x[2] = "123456";
>>>>
>>>> which is build_string(7, "123456"), but with a type char[2],
>>>> so varasm throws away "3456\0".
>>>
>>> I think varasm throws away chars not because of the type of
>>> the STRING_CST but because of the available storage in x.
>>>
>>
>> But at other places we look at the type of the string_cst, don't we?
>> Shouldn't those be the same?
> 
> I think most (all?) places look at TREE_TYPE (TREE_TYPE (string))
> only.  I'm not aware of users of the array domain of the array type
> of a string - but I'm far from knowing GCC inside-out ;)
> 

Yes, I know, that happens to me as well on the first day after my holidays ;)

>>>> I want to say that this is not okay, the excess precision
>>>> should only be used to strip the nul termination, in cases
>>>> where it is intended to be a assembled as a not zero terminated
>>>> string.  But maybe the wording could be improved?
>>>
>>> ISTR we always assemble a NUL in .strings to get string merging
>>> working.
>>>
>>
>> String merging is not working when the string is not explicitly
>> NUL terminated, my followup patch here tries to fix that:
>>
>> [PATCH] Handle not explicitly zero terminated strings in merge sections
>> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00481.html
> 
> I'd have expected sth as simple as
> 
>    if (merge_strings && str[thissize - 1] != '\0')
>      thissize++;
> 
> being appended in output_constant.
> 

Yes, but that can only be done in the .merge.str section,
otherwise it would happen in structure initializers as well.
And I would like to undo the case when Ada programs do

Process ("ABCD" & Ascii.NUL);

but not for embedded NUL in the string constant.
like:

Process ("ABCD" & Acii.NUL & "EFGH");


Bernd.

Reply via email to