Re: [RFC WIP] RAW_DATA_CST for #embed optimization

Jakub Jelinek Sun, 07 Jul 2024 08:13:44 -0700

On Sun, Jul 07, 2024 at 09:02:57AM +0200, Richard Biener wrote:
> I see.  I was wondering because PCH includes are not resolved.  That said,
> it sounds like #embed is sadly defined on The preprocessor side rather
> than in the language where it would have been easy to constrain uses to
> those that make sense…


I think there were big discussions on this and at some stage it has been
a builtin etc.

> Yeah, I wondered if where the raw data survives we can make it always
> wrapped by a CONSTRUCTOR and add a RANGE_TARGET_BYTES element.  This may
> be useful to encode large initializers more efficiently during/after
> parsing.

We definitely should try to improve handling even large initializers which
do not use #embed eventually, it depends on where all the large overheads
are where to approach it.
It could be handled in the preprocessor, say after we see 128 or how many
CPP_NUMBERs from 0-255 alternating with CPP_COMMA, do some look ahead and
construct a CPP_EMBED, or it could be done during parsing of initializer
similarly after seeing certain number of initializers of a CHAR_BIT array
use the C FE raw token lexing to watch ahead and create RAW_DATA_CST out
of that if beneficial, etc.
It really depends on where the biggest overhead is, whether it is in
creation of the millions of CPP_NUMBER/CPP_COMMA tokens, or primarily
when creating the large CONSTRUCTOR (the INTEGER_CSTs for the values should
be shared, at most 256 of them, but the indexes are not).

        Jakub

Re: [RFC WIP] RAW_DATA_CST for #embed optimization

Reply via email to