On Mon, 15 Jul 2024, Jakub Jelinek wrote:

> On Mon, Jul 15, 2024 at 09:52:18AM +0200, Richard Biener wrote:
> > >         .string "k"
> > >         .string ""
> > >         .string ""
> > >         .string "\375"
> > >         .string ""
> > >         .string ""
> > >         .string ""
> > >         .string ""
> > >         .string ""
> > >         .string ""
> > > I think that is simply binary rubbish.
> > 
> > OK, so the "fix" for this would be to have .w8string .w16string and
> > .w32string (or similar), or even allow .string U"Žluťoučký" directly.
> 
> Maybe, but we'd also need to know that we are actually dealing with sensible
> readable data and not binary stuff, we just don't differentiate between
> that, even without the #embed stuff we have STRING_CST which can be binary 
> data
> (e.g. the LTO sections, or user provided unsigned char data[] = { 0x83,
> 0x35, 0x9a, ... };
> or it can be readable stuff.  And whether it was parsed as a string literal
> or array of constants doesn't tell.
> And for UTF-16/UTF-32 the endianity also matters.

I think varasm could know from the type of the variable that's
initialized.  We have TYPE_STRING_FLAG and we could possibly have
a flag on the STRING_CST itself to tell whether it was lexed from
a string literal or not.  Of course it's somewhat pointless to
"optimize" assembler output for readability.

Richard.

Reply via email to