On Mon, 15 Jul 2024, Jakub Jelinek wrote: > On Mon, Jul 15, 2024 at 09:52:18AM +0200, Richard Biener wrote: > > > .string "k" > > > .string "" > > > .string "" > > > .string "\375" > > > .string "" > > > .string "" > > > .string "" > > > .string "" > > > .string "" > > > .string "" > > > I think that is simply binary rubbish. > > > > OK, so the "fix" for this would be to have .w8string .w16string and > > .w32string (or similar), or even allow .string U"Žluťoučký" directly. > > Maybe, but we'd also need to know that we are actually dealing with sensible > readable data and not binary stuff, we just don't differentiate between > that, even without the #embed stuff we have STRING_CST which can be binary > data > (e.g. the LTO sections, or user provided unsigned char data[] = { 0x83, > 0x35, 0x9a, ... }; > or it can be readable stuff. And whether it was parsed as a string literal > or array of constants doesn't tell. > And for UTF-16/UTF-32 the endianity also matters.
I think varasm could know from the type of the variable that's initialized. We have TYPE_STRING_FLAG and we could possibly have a flag on the STRING_CST itself to tell whether it was lexed from a string literal or not. Of course it's somewhat pointless to "optimize" assembler output for readability. Richard.