[Bug c/105863] RFE: C23 #embed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863 Jakub Jelinek changed: What|Removed |Added Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org --- Comment #12 from Jakub Jelinek --- Working on this now.
[Bug c/105863] RFE: C23 #embed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863 --- Comment #11 from Joseph S. Myers --- It makes the changes more complicated (everything that handles CONSTRUCTORs, whether to output them to assembly or to extract values for optimization etc., needs to handle the new tree), but yes, having a new tree would allow more efficient handling of additional cases that wouldn't be so efficient with STRING_CST.
[Bug c/105863] RFE: C23 #embed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863 --- Comment #10 from Jakub Jelinek --- I think we should add a new tree next to STRING_CST for use inside of CONSTRUCTORs. STRING_CST in theory could be used e.g. inside of a constructor_elt, with say RANGE_EXPR for the index and STRING_CST for the element. But the problem is that the STRING_CST owns the whole string. If somebody does: const char arr[] = { 1, 2, 3, 42, #embed "large_data.bin" 0, [10] = 15, [20] = 32 }; it would be nice if we could start with say that STRING_CST covering all of 50MB or how much of data, but then the designated initializer overrides mean either that we need to expand it all to the huge number of INTEGER_CSTs, or if we had some tree that can extract some substring from larger STRING_CST use that (3 operands, the STRING_CST and offset from start and length), we could just create two such small trees and build one INTEGER_CST in between. Because if we just have STRING_CST, we'd need to create 2 new huge STRING_CSTs when splitting something into halves. SUBSTRING_CST? For what to do with -E, if the amount of data is really small (dunno, 64 or 128 bytes or user parameter?), we should expand it in the preprocessed source as integer tokens with commas in between, so 124,231,0,15,24,86 but for larger I'd go with what I've proposed in the LLVM pull request, i.e. emit in the preprocessed source #embed "." __gnu__::__base64__("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVlciBhZGlwaXNjaW5nIGVsaXQuIE5hbSBzZWQgdGVsbHVzIGlkIG1hZ25hIGVsZW1lbnR1bSB0aW5jaWR1bnQuIEFsaXF1YW0gaWQgZG9sb3IuIFV0IHRlbXB1cyBwdXJ1cyBhdCBsb3JlbS4uLgo=") or so, where the embed data would be base64 decoded from the string instead of read from some other file. Because #embed_base64 or what has been proposed there would be something to diagnose with -pedantic-errors as invalid, while I think recognized #embed implementation-defined parameters aren't strictly invalid (while unrecognized are).
[Bug c/105863] RFE: C23 #embed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863 --- Comment #9 from Joseph S. Myers --- The most straightforward and most important case to optimize is the one where the #embed expansion lies entirely inside a single character array initializer (possibly with some integer constants before or after it in the initializer - whether coming from the prefix and suffix parameters to #embed or otherwise) - in which case the initializer can be converted internally to a STRING_CST. Cases that aren't within a character array initializer like that are harder to optimize (might require additional internal representation beyond the front end), and probably also less important to optimize initially.
[Bug c/105863] RFE: C23 #embed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863 --- Comment #8 from H. Peter Anvin --- Well, _Embed() would be an extension and it doesn't seem unreasonable to say that _Embed() would be expanded after token pasting. After all, as has been discussed in the C committee is that if #embed cannot be short-circuited the value is significantly reduced. That being said, it makes sense what you said. Both the #pragma and #embed, as well as some other use cases, really call for real procedural support in cpp. I have an idea for that that I would like to present to the C committee; I don't think it is really in scope for this request though.
[Bug c/105863] RFE: C23 #embed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #7 from Jakub Jelinek --- _Embed opens the pandorra box what should happen when you stringify it or try to token paste it together with something else etc. Anyway, for GCC implementation of what C23 specifies, I wonder if we shouldn't implement it in separate steps, first in a dumb way of just expanding it always into preprocessor token, a path that could perhaps then be kept for the smaller sizes when it isn't worth doing something smart. And only in the second step try to add optimizations to it (guess for C those could be easier than for C++ because C doesn't try to tokenize everything first, so for C when we peek at the large embed token outside of the language contexts where we know how to handle those (e.g. most importantly inside of aggregate initializers) we could simply replace the token with expanded form of it, say if one uses void foo (...); void bar () { foo ( #embed "foo_arguments" ); } it would work without having to bother too much about that specific case. The LLVM current pull request for this is https://github.com/llvm/llvm-project/pull/68620 I think we should try to use same options where reasonable.
[Bug c/105863] RFE: C23 #embed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863 --- Comment #6 from joseph at codesourcery dot com --- The latest version should be taken to be what's in the working draft N3096, plus the editorial fixes from CD2 comments GB-081 through GB-084.
[Bug c/105863] RFE: C23 #embed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863 --- Comment #5 from Marek Polacek --- Latest rev: https://open-std.org/JTC1/SC22/WG14/www/docs/n3017.htm
[Bug c/105863] RFE: C23 #embed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863 --- Comment #4 from H. Peter Anvin --- So I'm updating this to be C23 #embed, since that is a bit more general than the typical incbin (at least conceptually it operates on the preprocessor syntactic level; it does not of course preclude a shortcut between the preprocessor and the compiler.) However, C23 #embed has a *huge* problem; specifically it has exactly the same problem that necessitated #pragma to be augmented with _Pragma(). Therefore, I believe that an equivalent construct (_Embed()) is needed for #embed as well. I have given this feedback to members of the C committee, but it was not surprisingly too late for C23; I hope it will be considered for C2y and I believe it would be a highly desirable extension in the meantime.