On Mon, Jul 15, 2024 at 09:52:18AM +0200, Richard Biener wrote:
> >         .string "k"
> >         .string ""
> >         .string ""
> >         .string "\375"
> >         .string ""
> >         .string ""
> >         .string ""
> >         .string ""
> >         .string ""
> >         .string ""
> > I think that is simply binary rubbish.
> 
> OK, so the "fix" for this would be to have .w8string .w16string and
> .w32string (or similar), or even allow .string U"Žluťoučký" directly.

Maybe, but we'd also need to know that we are actually dealing with sensible
readable data and not binary stuff, we just don't differentiate between
that, even without the #embed stuff we have STRING_CST which can be binary data
(e.g. the LTO sections, or user provided unsigned char data[] = { 0x83,
0x35, 0x9a, ... };
or it can be readable stuff.  And whether it was parsed as a string literal
or array of constants doesn't tell.
And for UTF-16/UTF-32 the endianity also matters.

> So the overhead is at most '\t.string ""' which is 11 chars, you
> save the \0.  If you make the limit 8 chars does the size get any

I think that is the most important, typically in binary data one has tons of
binary zeros and 11 chars for that is simply too much.  Perhaps we could
save some size for that even without .base64 directive by emitting
        .zero   10
instead of
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
        .string ""
Though, it isn't just that, the 4 characters per all >= 0x7f bytes
and most of 0x01..0x1f bytes (exceptions are the 2 character \b, \t, \n,
\f, \r) is significant too, base64 is just 4./3 characters per byte.

> worse when encoding cc1plus?  How many extra strings get visible
> that way and how many of those are gibberish?

With either setting, it is about where exactly in the ~ 256 byte boundaries
ASCII chars appear or don't, if it is in the middle of those chunks, it
won't be readable anyway.  And I didn't want to make the decisions too
expensive.  So the intent is just differentiate between clearly binary data
vs. clearly readable data, the rest is unclear how it will be emitted.

> > I think best way to read partially binary data is objdump or similar,
> > but if people insist otherwise, -fno-base64-data-asm is possible.
> 
> We'll see if it pops up as a request.  I do find it quite nice to
> have the actual strings better visually separated from binary data
> which is worse in the current way with only .string

Ok.

        Jakub

Reply via email to