[Bug c/116302] transparent union does not work with {integral type, bitfield struct}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116302 --- Comment #5 from H. Peter Anvin --- That's really too bad. It would be a very nice feature to have to migrate a code base from shift and mask to using bitfields.
[Bug c/116302] transparent union does not work with {integral type, bitfield struct}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116302 --- Comment #2 from H. Peter Anvin --- Created attachment 58881 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58881&action=edit Error output Generated with: gcc -std=gnu17 -ggdb3 -O2 -Wattributes -Wall -Wextra -c transp.c
[Bug c/116302] transparent union does not work with {integral type, bitfield struct}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116302 --- Comment #1 from H. Peter Anvin --- Created attachment 58880 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58880&action=edit Reproducer (preprocessed)
[Bug c/116302] New: transparent union does not work with {integral type, bitfield struct}
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116302 Bug ID: 116302 Summary: transparent union does not work with {integral type, bitfield struct} Product: gcc Version: 14.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: hpa at zytor dot com Target Milestone: --- Created attachment 58879 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58879&action=edit Reproducer (C source) Bitfields are not portable, and some compilers generate truly awful code when using them. Furthermore, it is fairly common to want to be able to do bit operations that cross bitfields. However, because bitfields are not portable in general, using a union type for this is not safe. As such, for both debugging and portability, it is desirable to be able to treat an integer type of some kind as a union with a bitfield on gcc, while letting inferior compilers define the type as a simple integral type. Unfortunately, this does not appear to work, even when the "union cannot be made transparent" error does not appear: transp.c:30:22: warning: missing braces around initializer [-Wmissing-braces] 30 | opflags_t hello[2] = { | ^ transp.c: In function ‘antiop’: transp.c:37:12: error: wrong type argument to bit-complement 37 | return ~op; |^ transp.c:38:1: warning: control reaches end of non-void function [-Wreturn-type] 38 | } | ^ Reproducer included. Tested on gcc 14.2.1, Fedora 40, x86-64: gcc version 14.2.1 20240801 (Red Hat 14.2.1-1) (GCC) gcc-14.2.1-1.fc40.x86_64
[Bug target/103503] RFE: no save registers attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103503 --- Comment #7 from H. Peter Anvin --- Note: this is now implemented for x86, but it affects other targets as well.
[Bug c/105863] RFE: C23 #embed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863 --- Comment #8 from H. Peter Anvin --- Well, _Embed() would be an extension and it doesn't seem unreasonable to say that _Embed() would be expanded after token pasting. After all, as has been discussed in the C committee is that if #embed cannot be short-circuited the value is significantly reduced. That being said, it makes sense what you said. Both the #pragma and #embed, as well as some other use cases, really call for real procedural support in cpp. I have an idea for that that I would like to present to the C committee; I don't think it is really in scope for this request though.
[Bug target/113686] [RISC-V] TLS (Local Exec) relaxation on structures (LE)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113686 --- Comment #2 from H. Peter Anvin --- The intermediate alignment for lui is known, so if an object is known to fit *entirely* within its natural alignment then it can be safely CSE'd, but this is typically not the case with structures or arrays. It would be nice to fix this in the architecture (as a new standard extension). Unfortunately, due to the decision to allocate 3/4 of the instruction encoding space to 16-bit instructions only 3 completely reserved 5-bit opcodes remain - and that is only what is currently in the ISA document. Adding "auipc1" and "lui1" would consume two of those (they would most naturally fall at op = 1X10011) There are some less desirable options, like reclaiming part of the encoding space for longer instructions (requiring another 1 bit in the prefix for long instructions would provide the additional two encodings); on RV32 *only* the RV64 encoding space op = 0X11011 could be used, but I doubt it would be much appreciated to have this capability on RV32 and not RV64. (Not to mention hacks like only having part of the register space accessible, which wouldn't necessarily be horrendous, though, as these instructions would belong in some pretty fixed patterns.) One could also say that auipc1 would be "good enough" (combined with PC-relative relocations for TLS, at least for the TE model) and would still be valuable enough to occupy a full opcode. (Obviously, the idea would be that these instructions really are pseudo-instructions carrying one more immediate bit, and that the linker would apply those bits using "HI21" relocations.) I think this is something where the ISA architects could really use to hear from you compiler developers; since I'm not really familiar with where the pain points in the compiler lie, it is hard for me to speak authoritatively.
[Bug target/113686] New: [RISC-V] TLS (Local Exec) relaxation on structures (LE)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113686 Bug ID: 113686 Summary: [RISC-V] TLS (Local Exec) relaxation on structures (LE) Product: gcc Version: 13.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hpa at zytor dot com Target Milestone: --- When the Local Exec TLS model is in use, gcc generates inefficient code for accessing the member of a structure: struct foobar { int alpha; int beta; }; _Thread_local struct foobar foo; void func(int bar) { foo.beta = bar; } # Version 1 luia1,%tprel_hi(foo) adda1,a1,tp,%tprel_add(foo) addi a1,a1,%tprel_lo(foo) sw a0,4(a1) However, in this case it could be generated as: # Version 2 luia1,%tprel_hi(sym+4) addi a1,a1,tp,%tprel_add(sym+4) sw a0,%tprel_lo(sym+4)(a1) ... which, if %tprel_hi(sym+4) == 0, as it often is for small embedded software, the linker can relax to a simple (tp) reference: # Version 2a (post-relaxation with small .tbss) sw a0,%tprel_lo(sym+4)(tp) The linker will *not* relax version 1 all the way; leaving an unnecessary mv: # Version 1a (post-relaxation with small .tbss) mv a1,tp sw a0,%tprel_lo(sym+4)(tp) It is of course trickier for the case of multiple subsequent references to the structure if the structure is not aligned, as gcc can't know a priori where the 4K breaks are[*]. The version 1 code is more efficient in that case (3 instructions + 1 instruction/field as opposed to 3 instructions/field.) However, if the structure *is* aligned, gcc will still not optimize 1 into 2. There are at least a few options I see: 1. gcc option: gcc can generate version 2 code for a single field reference, or if the alignment is such that all fields are guaranteed to fall inside the same 4K window. 2. gcc and optional ABI option: introduce a "TLS TE-tiny" model for deep embedded use, where the combined size of the TSS area is limited to 4K equivalent to the way direct gp references [or zero, if the global pointer is 0] work. Thus, direct (tp) references can be used. NOTE: With the current binutils, this will error unless .option norelax is in effect. It might be desirable to instead have a new relocation type, which would require binutils support. Alternatively, ld should recognize that the TLS offset is within +/- 2K and suppress the warning in that case (since at that point the address is available the the linker.) The linker could be further optimized by allowing the TLS to offset; presumably equivalently to the __global_pointer$ symbol. 3. binutils option: teach ld to relax these kinds of chained pointer references. [*] Rant: in my opinion, the lui/auipc instructions are fundamentally misdesigned by not having an overlap bit to guarantee a sizable window.
[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312 --- Comment #21 from H. Peter Anvin --- I think this could be a really useful performance improvement in general. The Linux exception and syscall paths have a fair number of tail calls on the primary path, and this would make it possible to avoid the register save and restores for each of the functions in the tail called path. I have asked Xin Li (FRED maintainer) to try this out when he has the opportunity, although right now the Linux kernel merge window is open and so that is necessarily his first priority.
[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312 --- Comment #19 from H. Peter Anvin --- I'm away for the long weekend, but I'll try it out on Tuesday.
[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312 --- Comment #15 from H. Peter Anvin --- That should be fine for this use case, obviously. I should add the following: the reason the assembly stub isn't a problem for FRED whereas it is a bit of a nuisance for IDT-style delivery is that with FRED, vector dispatch is done in software, not hardware. This is exactly because *most* operating systems do need some amount of common entry/exit code anyway, and having to duplicate it is a severe nuisance. In the specific case of Linux, the full register set, including saved registers, are a required part of the exception frame in order for things like ptrace() and fork() to work correctly, so relying on the compiler to save the "saved" registers doesn't work for us anyway. So since there is only *one* instance of the assembly stub needed, it means there isn't a whole separate stub needed for every handler.
[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312 --- Comment #13 from H. Peter Anvin --- No, it will not. Most OSes flows will want to merge the kernel and user flows at some point for some handlers, so it isn't clear that that makes sense.
[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312 --- Comment #10 from H. Peter Anvin --- Right, is there such an attribute (that's what I'm asking for in bug 103503)? All I see in the gcc documentation is no_calle*R*_saved_registers, which, again, is the exact opposite.
[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312 --- Comment #6 from H. Peter Anvin --- Of course. That's not what we want in the Linux kernel specifically, though. It's really up to the OS.
[Bug target/113321] x86-64: Make __attribute__((interrupt)) more robust
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113321 --- Comment #2 from H. Peter Anvin --- Right. The only thing I'm suggesting is that for the cost of one extra instruction we can make it robust against the programmer picking the wrong type, or wanting to use the same handler. It isn't a necessary thing by any means.
[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312 --- Comment #4 from H. Peter Anvin --- (In reply to H.J. Lu from comment #2) > (In reply to H. Peter Anvin from comment #1) > > This is actually a specific use case of the feature requested in bug 103503. > > This covers #1. Should FRED handler take a pointer to the event info as the > function argument? This can be passed in from the assembly stub (normally in %rdi).
[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312 --- Comment #3 from H. Peter Anvin --- Created attachment 57032 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57032&action=edit FRED assembly entry stub (example, slightly modified from the Linux kernel)
[Bug target/113321] New: x86-64: Make __attribute__((interrupt)) more robust
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113321 Bug ID: 113321 Summary: x86-64: Make __attribute__((interrupt)) more robust Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hpa at zytor dot com Target Milestone: --- __attribute__((interrupt)) on x86 has two prototypes, and picking the wrong type "probably will cause a system crash." It turns out that this is unavoidable on i386, but on x86-64 we can do better: - On x86-64, an exception/interrupt carries an error code if and only if the stack is 16-byte aligned (specifically, RSP[3] = 0) on exception entry. The proper stack pointer for using with IRET is therefore always given by: RSP |= 8 ... and the error code, if present, will be located at offset -8 from this address.
[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312 --- Comment #1 from H. Peter Anvin --- This is actually a specific use case of the feature requested in bug 103503.
[Bug c++/113298] RFE: allow suppressing warnings for void * conversions with -fpermissive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113298 --- Comment #2 from H. Peter Anvin --- You're not wrong per se. Arguably the problem (and many others) would be better solved by allowing user-specified conversations that are not member functions. In that case one could do: // Set the properties/types for which // we allow pointer conversations from // void * template concept void_pointer_convertible = ... template concept some_void_type = std::is_void_v; template operator T * (V *ptr) { return static_cast(ptr); } ... and now the programmer has full control over exactly what they wish to permit.
[Bug c++/113298] New: RFE: allow suppressing warnings for void * conversions with -fpermissive
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113298 Bug ID: 113298 Summary: RFE: allow suppressing warnings for void * conversions with -fpermissive Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hpa at zytor dot com Target Milestone: --- -fpermissive downgrades some errors to warnings, but there doesn't seem to be any -W options to suppress those warnings, and -fpermissive itself is a fairly wide switch. Having individual -W options for the various -fpermissive events would allow the programmer to pick what extensions to allow unconditionally, which to warn for, and which to error out for (-Werror=). Allowing void * to be converted to a pointer to a POD structure is particularly useful in the process of moving a project from C to C++.
[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020 --- Comment #5 from H. Peter Anvin --- I don't think source code modifications are a huge problem, but at this point they require tracking down each individual bit. As far as trapping implementations are concerned: 1. In deeply embedded implementations, it is entirely possible that firmware/microcode might be *more* expensive than logic. Although memory arrays are, of course, very dense, they are still extremely general and RISC-V isn't a very sparse instruction set. 2. It seems like it almost would require an implementation-specific performance model. Now, one can validly argue that by setting the cost of unimplemented instructions to a (near-)infinite value such instructions should never be generated even if they are "enabled". That might also be a possible avenue for achieving this. As far as an explosion of subsets, yes, this is really what this means. Bloating a tiny on-chip control processor both in area and timing to implement instructions that never actually appears in the code is at best painful. That being said, I do intend to submit a proposal to the RISC-V ISA folks to subset the Zbb subset. It is worth noting that there are overlaps between the Zb* and Zbk* subsets, but the individual intersection sets do not have their own names. The Zbb instruction set is particularly noxious (and this is indeed an ISA definition problem), because it implements multiple things that are, from an implementation point of view, completely separate and require separate code paths in the ALU: § 1.2.1 Logical with negate - minimal cost; in fact in some implementations it might have zero or even negative cost due to decoder simplification. - Extremely common in embedded operations. § 1.2.2 Count leading/trailing zero bits - Requires dedicated logic. - ctz and clz have very different uses. - Typically clz and ctz will not be able to share logic, either, requiring *two* dedicated units. § 1.2.3 Count population - Requires dedicated logic. - May be useless depending on what the processor needs. § 1.2.4 Integer minimum/maximum - May be cheap or expensive, depending on if an existing comparator can be leveraged. - Quite possibly free or almost free if the AMO instruction set is already supported in its entirety, as that requires max/min already. § 1.2.5 Sign- and zero-extension § 1.2.6 Bitwise rotation - May be very cheap or quite expensive, depending on the implementation of the shift instructions. § 1.2.7 OR combine - Requires dedicated logic. - Virtually useless in control processors that do not process text. § 1.2.8 Byte-reverse - Requires dedicated logic. - These, and some other instructions, are special cases of a bit swap extension proposed in the original bitmanip proposal, but was not included even as a separate set. - Virtually useless in control processors that does not need to interface with cross-endian data. These 8 groups really ought to be given separate names. Is this going to happen again? Quite likely. It seems, as you say, that chopping the public ISA to pieces to support every single use case would seem unlikely. It really comes down to: out of multiple suboptimal cases (forced hardware bloat, custom subsets, extremely fine grained public subsets, vendor-hacked trees that lag behind and/or diverge from upstream), what option is the least amount of badness?
[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020 --- Comment #2 from H. Peter Anvin --- Named subsets are, inherently, designed to make sense toward mass-produced products where the hardware and software are designed (mostly) independently. However, what I mean with "very deep embedded use" is hardware and software being co-designed. The RISC-V ISA policy is that those are considered vendor-specific subsets and are to be given an X* name; however, gcc obviously needs to be able to understand the meaning of this X* name. At this point there is no way to do without changing the source code in nontrivial ways. Regardless of if it is done in source code or at runtime, by implementing a fine-grained, preferably table-driven, approach to subsets in gcc then it would be very simple for a hardware implementor to define their custom X-subsets without a lot of surgery to the code, *and* it makes it possible to take it one step further and allowing custom (or newly defined! - there have been multiple instances already of new subsets of existing instructions defined a posteori) instruction subsets to be defined in a configuration file.
[Bug c/96952] __builtin_thread_pointer support cannot be probed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96952 H. Peter Anvin changed: What|Removed |Added CC||hpa at zytor dot com --- Comment #10 from H. Peter Anvin --- Is this bug still relevant? RISC-V doesn't even seem to support disabling tls support, and __builtin_thread_pointer() appears to be properly supported. So it would presumably be up to any remaining target that doesn't have __builtin_thread_pointer() (or not in all configurations) to verify that __has_builtin(__builtin_thread_pointer) evaluates to false?
[Bug target/111020] New: RFE: RISC-V: ability to cherry-pick additional instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020 Bug ID: 111020 Summary: RFE: RISC-V: ability to cherry-pick additional instructions Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hpa at zytor dot com Target Milestone: --- For very deeply embedded use, it is sometimes highly desirable to control the instruction set on a very fine grained basis. For example, the Zbb extension contains a mixture of things that most likely requires separate functional units. However, as an example, the ctz instruction is highly useful to speed up interrupt latency in designs that do not have vectorized interrupt handling (which is, in its most basic form, a dedicated ctz unit.) It would be massive hardware bloat to require the full Zbb set to add this one instruction. Once the instruction is added, though, one would like to be able to use it as fully as possible. This, obviously, creates binaries that are specifically tuned toward a single processor implementation, but that is pretty much the essence of deeply embedded, where in the normal case the entire software stack from the OS to application is linked together in a single binary, or at the very least compiled together, often from a single source tree. As far as object code compatibility is concerned, this is very much a "programmer beware" situation. There is no need for heroics in terms of tagging objects with the exact instruction set, for example.
[Bug c++/106486] C++ warning for -Wmissing-prototypes is pure nuisance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106486 --- Comment #5 from H. Peter Anvin --- Yes, exactly.
[Bug c/105863] RFE: C23 #embed
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863 --- Comment #4 from H. Peter Anvin --- So I'm updating this to be C23 #embed, since that is a bit more general than the typical incbin (at least conceptually it operates on the preprocessor syntactic level; it does not of course preclude a shortcut between the preprocessor and the compiler.) However, C23 #embed has a *huge* problem; specifically it has exactly the same problem that necessitated #pragma to be augmented with _Pragma(). Therefore, I believe that an equivalent construct (_Embed()) is needed for #embed as well. I have given this feedback to members of the C committee, but it was not surprisingly too late for C23; I hope it will be considered for C2y and I believe it would be a highly desirable extension in the meantime.
[Bug c/96054] RFE: __attribute__((fatal))
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96054 --- Comment #2 from H. Peter Anvin --- I agree, my naming was very poor. Perhaps "panic" or "abort" would work; those are classic names in software use for this. Another case of a function that could be so attributed would be the function typically called __assert_failed(). It is probably worth noting that all the ones I can think of should be noreturn functions. I don't know if that is truly inherent, but personally I cannot think of a case where it would not be.
[Bug middle-end/56314] Please allow per-function specification of register conventions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56314 --- Comment #6 from H. Peter Anvin --- Unfortunately that's not really possible given the way the way the level does runtime patching (which isn't going to change, sorry.) At the very least we would need a *lot* more compiler support to give LTO all the information that it needs; say a *very* minimum LTO would need to support ORC metadata.
[Bug c/59850] Support sparse-style pointer address spaces (type attributes)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59850 --- Comment #37 from H. Peter Anvin --- One would assume that there would be __foo__ aliases for the attribute names like all the other ones.
[Bug tree-optimization/107006] Missing optimization: common idiom for external data
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006 --- Comment #11 from H. Peter Anvin --- If you look at the output, you see that the loops are already fully unrolled (at considerable code size cost.) Unfortunately, since the issue at hand is dealing with code written to be portable, adding gcc-specific hacks are not really a reasonable option.
[Bug rtl-optimization/107006] Missing optimization: common idiom for external data
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006 --- Comment #9 from H. Peter Anvin --- To clarify: the C test case produces the same output regardless if it is compiled as C or C++. Only the C++ wrapped class definition detects the additional case of a 32-bit bigendian load.
[Bug rtl-optimization/107006] Missing optimization: common idiom for external data
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006 --- Comment #8 from H. Peter Anvin --- Created attachment 53610 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53610&action=edit C++ test case object code
[Bug rtl-optimization/107006] Missing optimization: common idiom for external data
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006 --- Comment #7 from H. Peter Anvin --- Created attachment 53609 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53609&action=edit C++ test case assembly output
[Bug rtl-optimization/107006] Missing optimization: common idiom for external data
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006 --- Comment #6 from H. Peter Anvin --- Created attachment 53608 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53608&action=edit C++ test case preprocessed source
[Bug rtl-optimization/107006] Missing optimization: common idiom for external data
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006 --- Comment #5 from H. Peter Anvin --- Created attachment 53607 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53607&action=edit C++ test case main file
[Bug rtl-optimization/107006] Missing optimization: common idiom for external data
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006 --- Comment #4 from H. Peter Anvin --- Created attachment 53606 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53606&action=edit C++ test case class definition header file
[Bug rtl-optimization/107006] Missing optimization: common idiom for external data
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006 --- Comment #3 from H. Peter Anvin --- Created attachment 53605 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53605&action=edit C test case object code
[Bug rtl-optimization/107006] Missing optimization: common idiom for external data
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006 --- Comment #2 from H. Peter Anvin --- Created attachment 53604 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53604&action=edit C test case assembly output
[Bug rtl-optimization/107006] Missing optimization: common idiom for external data
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006 --- Comment #1 from H. Peter Anvin --- Created attachment 53603 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53603&action=edit C test case preprocessed source
[Bug rtl-optimization/107006] New: Missing optimization: common idiom for external data
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006 Bug ID: 107006 Summary: Missing optimization: common idiom for external data Product: gcc Version: 12.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: hpa at zytor dot com Target Milestone: --- Created attachment 53602 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53602&action=edit C test case source The only *portable* way in C to deal with external data structures containing data of specific endianness, possibly unaligned, is to operate on them as byte (char) arrays. At least on x86 (which supports arbitrarily aligned loads), gcc *sometimes* recognize these as single loads, but sometimes not. In the included test cases, there is a plain C implementation and an implementation wrapped in a C++ class. Compiling the former with: gcc -std=c2x -g -O3 -W -Wall -[cSE] -o bswap.[osi] bswap.c ... recognizes the load idiom for 16-bit numbers but not for 32- or 64-bit numbers. Compiling the latter with: gcc -std=c++20 -g -O3 -E -Wall -[cSE] -o bswapcc.[osi] bswapcc.cc ... *additionally* recognizes the 32-bit load, *but only in the bigendian case* (that is, it generates a load and a bswap instruction); whereas in the littleendian -- native -- case, this does not happen! I am familiar with the used of packed arrays and __builtin_bswap*() for these accesses, but unfortunately these are gcc-specific.
[Bug c++/106486] New: C++ warning for -Wmissing-prototypes is pure nuisance
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106486 Bug ID: 106486 Summary: C++ warning for -Wmissing-prototypes is pure nuisance Product: gcc Version: 12.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: hpa at zytor dot com Target Milestone: --- Since upgrading to gcc 12.1.1, I keep getting the following warning through various projects: cc1plus: warning: command-line option ‘-Wmissing-prototypes’ is valid for C/ObjC but not for C++ This warning is pure nuisance. In a mixed-language project it is *extra* important that the right prototypes are used, and it is far easier to enable -Wmissing-prototypes project wide. This warning implies that one would have to conditionalize the -W options based on the language of an input file, which is often painful to do without structural Makefile changes. Note that there doesn't seem to be any way to squelch this warning, either (e.g. a -Wno-warning-not-applicable option or similar.) cc1plus: warning: command-line option ‘-Wmissing-prototypes’ is valid for C/ObjC but not for C++
[Bug target/103503] RFE: no save registers attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103503 --- Comment #4 from H. Peter Anvin --- The interrupt attribute typically does two things: 1. It changes the return instruction; 2. It marks all registers as saved. 2 is exactly the *opposite* of what I want; I would like to improve performance by the fact that the compiler-invisible entry flow has already saved all registers, whether or not they are saved in the ABI. Thus, I would like it to treat all (non-fixed) registers as *clobbered*, not *saved*. Ideally, "nosaved" and "interrupt" should be possible to use together, to get effect #1 of the interrupt attribute, but that is usually less important.
[Bug c/105863] New: RFE: __attribute__((incbin("file"))) or __builtin_incbin("file")
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863 Bug ID: 105863 Summary: RFE: __attribute__((incbin("file"))) or __builtin_incbin("file") Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: hpa at zytor dot com Target Milestone: --- It is a *very* common operation to want to include a preexisting binary object into a compiled project. There are a number of ways to do this, but they all suffer significant shortcomings. The most common ways are to use a preprocessor to convert the input binary object to textual source, or to wrap an object in assembly code and use .incbin. The former is *extremely* inefficient, the latter has a number of pitfalls, including the one described in bug 66871, the need for platform-specific coding, sizeof() not being functional, etc. I would like to suggest a variable attribute __attribute__((incbin("file"))) which statically initializes a variable to the contents of the given binary file, or a __builtin_incbin("file") which would expand to an initializer; the end result would look either like: char foobar[] __attribute__((incbin("foobar.bin"))); char foobar[] = __builtin_incbin("foobar.bin");
[Bug middle-end/85751] RFE: option to align code using breakpoint instructions when unreachable
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85751 --- Comment #2 from H. Peter Anvin --- Goodness... I missed the question here. The intent was to just take advantage of existing padding: the execution flow should not go there.
[Bug target/103503] New: RFE: no save registers attribute
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103503 Bug ID: 103503 Summary: RFE: no save registers attribute Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hpa at zytor dot com Target Milestone: --- Target: multiple When a common assembly interrupt entry code or an equivalent hardware engine is used to handle register saves in an interrupt routine, it may be completely unnecessary to save and restore registers in the interrupt handler itself, even if they would normally be clobbered. Unfortunately [-f]call-used-reg is not supported by either __attribute__((optimize)) nor _Pragma("GCC optimize"); otherwise that would be a very valid solution. Putting all interrupt handlers in a separate compilation unit is awkward at the very best. AVR has __attribute__((OS_{task,main})) for this purpose, but being able to do it in general would improve interrupt/trap latency. See also bug 38534. Note: not saving the registers in the assembly wrapper is generally not an option; similarly, whether or not a hardware engine can do it practically depends on both the hardware implementation and the ABI. The RISC-V ABI, for example, scatters clobbered and saved registers all over the register map, which makes doing such a thing in hardware difficult.
[Bug c/102266] New: RFE: x86: print operand with optional (%rip) suffix
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102266 Bug ID: 102266 Summary: RFE: x86: print operand with optional (%rip) suffix Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: hpa at zytor dot com Target Milestone: --- Target: x86 In the Linux kernel we reasonably frequently use extended asm operand modifiers like %P[]/%p[] for encoding a memory operand that *must not* use register-indirect forms. On x86-64, they can sometimes be encoded as %rip-relative, however, there is currently no convenient way for doing so without also making the assembly code x86-64 specific whereas it otherwise would be perfectly fine dual mode. I would therefore like to request one of the following, in order of preference: 1. A modifier to emit a memory immediate operand (i.e. a constant sans $) with a (%rip) suffix assuming it can be so encoded. 2. A simple macro (like %=) that emits (%rip) on x86-64 but nothing on i386. The priority of this is quite low, but it is probably simple to implement.