> Am 07.07.2024 um 17:14 schrieb Jakub Jelinek <ja...@redhat.com>:
>
> On Sun, Jul 07, 2024 at 09:02:57AM +0200, Richard Biener wrote:
>> I see. I was wondering because PCH includes are not resolved. That said,
>> it sounds like #embed is sadly defined on The preprocessor side rather
>> than in the language where it would have been easy to constrain uses to
>> those that make sense…
>
> I think there were big discussions on this and at some stage it has been
> a builtin etc.
>
>> Yeah, I wondered if where the raw data survives we can make it always
>> wrapped by a CONSTRUCTOR and add a RANGE_TARGET_BYTES element. This may
>> be useful to encode large initializers more efficiently during/after
>> parsing.
>
> We definitely should try to improve handling even large initializers which
> do not use #embed eventually, it depends on where all the large overheads
> are where to approach it.
> It could be handled in the preprocessor, say after we see 128 or how many
> CPP_NUMBERs from 0-255 alternating with CPP_COMMA, do some look ahead and
> construct a CPP_EMBED, or it could be done during parsing of initializer
> similarly after seeing certain number of initializers of a CHAR_BIT array
> use the C FE raw token lexing to watch ahead and create RAW_DATA_CST out
> of that if beneficial, etc.
> It really depends on where the biggest overhead is, whether it is in
> creation of the millions of CPP_NUMBER/CPP_COMMA tokens, or primarily
> when creating the large CONSTRUCTOR (the INTEGER_CSTs for the values should
> be shared, at most 256 of them, but the indexes are not).
There’s a very old PR about the regression for very large static initializers
compared to the time we wrote those directly to asm_out
Richard
> Jakub
>