https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863
--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> --- I think we should add a new tree next to STRING_CST for use inside of CONSTRUCTORs. STRING_CST in theory could be used e.g. inside of a constructor_elt, with say RANGE_EXPR for the index and STRING_CST for the element. But the problem is that the STRING_CST owns the whole string. If somebody does: const char arr[] = { 1, 2, 3, 42, #embed "large_data.bin" 0, [100000] = 15, [200000] = 32 }; it would be nice if we could start with say that STRING_CST covering all of 50MB or how much of data, but then the designated initializer overrides mean either that we need to expand it all to the huge number of INTEGER_CSTs, or if we had some tree that can extract some substring from larger STRING_CST use that (3 operands, the STRING_CST and offset from start and length), we could just create two such small trees and build one INTEGER_CST in between. Because if we just have STRING_CST, we'd need to create 2 new huge STRING_CSTs when splitting something into halves. SUBSTRING_CST? For what to do with -E, if the amount of data is really small (dunno, 64 or 128 bytes or user parameter?), we should expand it in the preprocessed source as integer tokens with commas in between, so 124,231,0,15,24,86 but for larger I'd go with what I've proposed in the LLVM pull request, i.e. emit in the preprocessed source #embed "." __gnu__::__base64__("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVlciBhZGlwaXNjaW5nIGVsaXQuIE5hbSBzZWQgdGVsbHVzIGlkIG1hZ25hIGVsZW1lbnR1bSB0aW5jaWR1bnQuIEFsaXF1YW0gaWQgZG9sb3IuIFV0IHRlbXB1cyBwdXJ1cyBhdCBsb3JlbS4uLgo=") or so, where the embed data would be base64 decoded from the string instead of read from some other file. Because #embed_base64 or what has been proposed there would be something to diagnose with -pedantic-errors as invalid, while I think recognized #embed implementation-defined parameters aren't strictly invalid (while unrecognized are).