[Bug c/105863] RFE: C23 #embed

2024-06-11 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #12 from Jakub Jelinek  ---
Working on this now.

[Bug c/105863] RFE: C23 #embed

2024-05-20 Thread jsm28 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863

--- Comment #11 from Joseph S. Myers  ---
It makes the changes more complicated (everything that handles CONSTRUCTORs,
whether to output them to assembly or to extract values for optimization etc.,
needs to handle the new tree), but yes, having a new tree would allow more
efficient handling of additional cases that wouldn't be so efficient with
STRING_CST.

[Bug c/105863] RFE: C23 #embed

2024-05-17 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863

--- Comment #10 from Jakub Jelinek  ---
I think we should add a new tree next to STRING_CST for use inside of
CONSTRUCTORs.
STRING_CST in theory could be used e.g. inside of a constructor_elt, with say
RANGE_EXPR for the index and STRING_CST for the element.  But the problem is
that the
STRING_CST owns the whole string.  If somebody does:
const char arr[] = { 1, 2, 3, 42,
#embed "large_data.bin"
0, [10] = 15, [20] = 32 };
it would be nice if we could start with say that STRING_CST covering all of
50MB or how much of data, but then the designated initializer overrides mean
either that we need to expand it all to the huge number of INTEGER_CSTs, or if
we had some tree that can extract some substring from larger STRING_CST use
that (3 operands, the STRING_CST and offset from start and length), we could
just create two such small trees and build one INTEGER_CST in between.
Because if we just have STRING_CST, we'd need to create 2 new huge STRING_CSTs
when splitting something into halves.
SUBSTRING_CST?

For what to do with -E, if the amount of data is really small (dunno, 64 or 128
bytes or user parameter?), we should expand it in the preprocessed source as
integer tokens with commas in between, so
 124,231,0,15,24,86
but for larger I'd go with what I've proposed in the LLVM pull request, i.e.
emit in
the preprocessed source
#embed "."
__gnu__::__base64__("TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVlciBhZGlwaXNjaW5nIGVsaXQuIE5hbSBzZWQgdGVsbHVzIGlkIG1hZ25hIGVsZW1lbnR1bSB0aW5jaWR1bnQuIEFsaXF1YW0gaWQgZG9sb3IuIFV0IHRlbXB1cyBwdXJ1cyBhdCBsb3JlbS4uLgo=")
or so, where the embed data would be base64 decoded from the string instead of
read from some other file.
Because
#embed_base64
or what has been proposed there would be something to diagnose with
-pedantic-errors as invalid,
while I think recognized #embed implementation-defined parameters aren't
strictly invalid (while unrecognized are).

[Bug c/105863] RFE: C23 #embed

2024-05-16 Thread jsm28 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863

--- Comment #9 from Joseph S. Myers  ---
The most straightforward and most important case to optimize is the one where
the #embed expansion lies entirely inside a single character array initializer
(possibly with some integer constants before or after it in the initializer -
whether coming from the prefix and suffix parameters to #embed or otherwise) -
in which case the initializer can be converted internally to a STRING_CST.
Cases that aren't within a character array initializer like that are harder to
optimize (might require additional internal representation beyond the front
end), and probably also less important to optimize initially.

[Bug c/105863] RFE: C23 #embed

2024-05-15 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863

--- Comment #8 from H. Peter Anvin  ---
Well, _Embed() would be an extension and it doesn't seem unreasonable to say
that _Embed() would be expanded after token pasting. After all, as has been
discussed in the C committee is that if #embed cannot be short-circuited the
value is significantly reduced.

That being said, it makes sense what you said.

Both the #pragma and #embed, as well as some other use cases, really call for
real procedural support in cpp. I have an idea for that that I would like to
present to the C committee; I don't think it is really in scope for this
request though.

[Bug c/105863] RFE: C23 #embed

2024-05-15 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #7 from Jakub Jelinek  ---
_Embed opens the pandorra box what should happen when you stringify it or try
to token paste it together with something else etc.

Anyway, for GCC implementation of what C23 specifies, I wonder if we shouldn't
implement it in separate steps, first in a dumb way of just expanding it always
into preprocessor token, a path that could perhaps then be kept for the smaller
sizes when it isn't worth doing something smart.
And only in the second step try to add optimizations to it (guess for C those
could be easier than for C++ because C doesn't try to tokenize everything
first, so for C when we peek at the large embed token outside of the language
contexts where we know how to handle those (e.g. most importantly inside of
aggregate initializers) we could simply replace the token with expanded form of
it, say if one uses
void foo (...);
void bar ()
{
  foo (
#embed "foo_arguments"
  );
}
it would work without having to bother too much about that specific case.
The LLVM current pull request for this is
https://github.com/llvm/llvm-project/pull/68620
I think we should try to use same options where reasonable.

[Bug c/105863] RFE: C23 #embed

2023-06-22 Thread joseph at codesourcery dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863

--- Comment #6 from joseph at codesourcery dot com  ---
The latest version should be taken to be what's in the working draft 
N3096, plus the editorial fixes from CD2 comments GB-081 through GB-084.

[Bug c/105863] RFE: C23 #embed

2023-06-22 Thread mpolacek at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863

--- Comment #5 from Marek Polacek  ---
Latest rev: https://open-std.org/JTC1/SC22/WG14/www/docs/n3017.htm

[Bug c/105863] RFE: C23 #embed

2023-06-05 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863

--- Comment #4 from H. Peter Anvin  ---
So I'm updating this to be C23 #embed, since that is a bit more general than
the typical incbin (at least conceptually it operates on the preprocessor
syntactic level; it does not of course preclude a shortcut between the
preprocessor and the compiler.)

However, C23 #embed has a *huge* problem; specifically it has exactly the same
problem that necessitated #pragma to be augmented with _Pragma(). Therefore, I
believe that an equivalent construct (_Embed()) is needed for #embed as well.

I have given this feedback to members of the C committee, but it was not
surprisingly too late for C23; I hope it will be considered for C2y and I
believe it would be a highly desirable extension in the meantime.