On Sun, Jul 07, 2024 at 09:02:57AM +0200, Richard Biener wrote: > I see. I was wondering because PCH includes are not resolved. That said, > it sounds like #embed is sadly defined on The preprocessor side rather > than in the language where it would have been easy to constrain uses to > those that make senseā¦
I think there were big discussions on this and at some stage it has been a builtin etc. > Yeah, I wondered if where the raw data survives we can make it always > wrapped by a CONSTRUCTOR and add a RANGE_TARGET_BYTES element. This may > be useful to encode large initializers more efficiently during/after > parsing. We definitely should try to improve handling even large initializers which do not use #embed eventually, it depends on where all the large overheads are where to approach it. It could be handled in the preprocessor, say after we see 128 or how many CPP_NUMBERs from 0-255 alternating with CPP_COMMA, do some look ahead and construct a CPP_EMBED, or it could be done during parsing of initializer similarly after seeing certain number of initializers of a CHAR_BIT array use the C FE raw token lexing to watch ahead and create RAW_DATA_CST out of that if beneficial, etc. It really depends on where the biggest overhead is, whether it is in creation of the millions of CPP_NUMBER/CPP_COMMA tokens, or primarily when creating the large CONSTRUCTOR (the INTEGER_CSTs for the values should be shared, at most 256 of them, but the indexes are not). Jakub