[Bug c/116302] transparent union does not work with {integral type, bitfield struct}

2024-08-19 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116302

--- Comment #5 from H. Peter Anvin  ---
That's really too bad. It would be a very nice feature to have to migrate a
code base from shift and mask to using bitfields.

[Bug c/116302] transparent union does not work with {integral type, bitfield struct}

2024-08-08 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116302

--- Comment #2 from H. Peter Anvin  ---
Created attachment 58881
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58881&action=edit
Error output

Generated with:

gcc -std=gnu17 -ggdb3 -O2 -Wattributes -Wall -Wextra -c transp.c

[Bug c/116302] transparent union does not work with {integral type, bitfield struct}

2024-08-08 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116302

--- Comment #1 from H. Peter Anvin  ---
Created attachment 58880
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58880&action=edit
Reproducer (preprocessed)

[Bug c/116302] New: transparent union does not work with {integral type, bitfield struct}

2024-08-08 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116302

Bug ID: 116302
   Summary: transparent union does not work with {integral type,
bitfield struct}
   Product: gcc
   Version: 14.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hpa at zytor dot com
  Target Milestone: ---

Created attachment 58879
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58879&action=edit
Reproducer (C source)

Bitfields are not portable, and some compilers generate truly awful code when
using them. Furthermore, it is fairly common to want to be able to do bit
operations that cross bitfields.

However, because bitfields are not portable in general, using a union type for
this is not safe.

As such, for both debugging and portability, it is desirable to be able to
treat an integer type of some kind as a union with a bitfield on gcc, while
letting inferior compilers define the type as a simple integral type.

Unfortunately, this does not appear to work, even when the "union cannot be
made transparent" error does not appear:

transp.c:30:22: warning: missing braces around initializer [-Wmissing-braces]
   30 | opflags_t hello[2] = {
  |  ^
transp.c: In function ‘antiop’:
transp.c:37:12: error: wrong type argument to bit-complement
   37 | return ~op;
  |^
transp.c:38:1: warning: control reaches end of non-void function
[-Wreturn-type]
   38 | }
  | ^

Reproducer included.

Tested on gcc 14.2.1, Fedora 40, x86-64:

gcc version 14.2.1 20240801 (Red Hat 14.2.1-1) (GCC)
gcc-14.2.1-1.fc40.x86_64

[Bug target/103503] RFE: no save registers attribute

2024-05-15 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103503

--- Comment #7 from H. Peter Anvin  ---
Note: this is now implemented for x86, but it affects other targets as well.

[Bug c/105863] RFE: C23 #embed

2024-05-15 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863

--- Comment #8 from H. Peter Anvin  ---
Well, _Embed() would be an extension and it doesn't seem unreasonable to say
that _Embed() would be expanded after token pasting. After all, as has been
discussed in the C committee is that if #embed cannot be short-circuited the
value is significantly reduced.

That being said, it makes sense what you said.

Both the #pragma and #embed, as well as some other use cases, really call for
real procedural support in cpp. I have an idea for that that I would like to
present to the C committee; I don't think it is really in scope for this
request though.

[Bug target/113686] [RISC-V] TLS (Local Exec) relaxation on structures (LE)

2024-02-01 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113686

--- Comment #2 from H. Peter Anvin  ---
The intermediate alignment for lui is known, so if an object is known to fit
*entirely* within its natural alignment then it can be safely CSE'd, but this
is typically not the case with structures or arrays.

It would be nice to fix this in the architecture (as a new standard extension).

Unfortunately, due to the decision to allocate 3/4 of the instruction encoding
space to 16-bit instructions only 3 completely reserved 5-bit opcodes remain -
and that is only what is currently in the ISA document. Adding "auipc1" and
"lui1" would consume two of those (they would most naturally fall at op =
1X10011)

There are some less desirable options, like reclaiming part of the encoding
space for longer instructions (requiring another 1 bit in the prefix for long
instructions would provide the additional two encodings); on RV32 *only* the
RV64 encoding space op = 0X11011 could be used, but I doubt it would be much
appreciated to have this capability on RV32 and not RV64.

(Not to mention hacks like only having part of the register space accessible,
which wouldn't necessarily be horrendous, though, as these instructions would
belong in some pretty fixed patterns.)

One could also say that auipc1 would be "good enough" (combined with
PC-relative relocations for TLS, at least for the TE model) and would still be
valuable enough to occupy a full opcode.

(Obviously, the idea would be that these instructions really are
pseudo-instructions carrying one more immediate bit, and that the linker would
apply those bits using "HI21" relocations.)

I think this is something where the ISA architects could really use to hear
from you compiler developers; since I'm not really familiar with where the pain
points in the compiler lie, it is hard for me to speak authoritatively.

[Bug target/113686] New: [RISC-V] TLS (Local Exec) relaxation on structures (LE)

2024-01-31 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113686

Bug ID: 113686
   Summary: [RISC-V] TLS (Local Exec) relaxation on structures
(LE)
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hpa at zytor dot com
  Target Milestone: ---

When the Local Exec TLS model is in use, gcc generates inefficient code for
accessing the member of a structure:

struct foobar {
   int alpha;
   int beta;
};

_Thread_local struct foobar foo;

void func(int bar)
{
foo.beta = bar;
}

# Version 1
luia1,%tprel_hi(foo)
adda1,a1,tp,%tprel_add(foo)
addi   a1,a1,%tprel_lo(foo)
sw a0,4(a1)

However, in this case it could be generated as:

# Version 2
luia1,%tprel_hi(sym+4)
addi   a1,a1,tp,%tprel_add(sym+4)
sw a0,%tprel_lo(sym+4)(a1)

... which, if %tprel_hi(sym+4) == 0, as it often is for small embedded
software, the linker can relax to a simple (tp) reference:

# Version 2a (post-relaxation with small .tbss)
sw a0,%tprel_lo(sym+4)(tp)

The linker will *not* relax version 1 all the way; leaving an unnecessary mv:

# Version 1a (post-relaxation with small .tbss)
mv a1,tp
sw a0,%tprel_lo(sym+4)(tp)

It is of course trickier for the case of multiple subsequent references to the
structure if the structure is not aligned, as gcc can't know a priori where the
4K breaks are[*]. The version 1 code is more efficient in that case (3
instructions + 1 instruction/field as opposed to 3 instructions/field.)

However, if the structure *is* aligned, gcc will still not optimize 1 into 2.

There are at least a few options I see:

1. gcc option: gcc can generate version 2 code for a single field reference, or
if the alignment is such that all fields are guaranteed to fall inside the same
4K window.

2. gcc and optional ABI option: introduce a "TLS TE-tiny" model for deep
embedded use, where the combined size of the TSS area is limited to 4K
equivalent to the way direct gp references [or zero, if the global pointer is
0] work. Thus, direct (tp) references can be used.

NOTE: With the current binutils, this will error unless .option norelax is in
effect. It might be desirable to instead have a new relocation type, which
would require binutils support. Alternatively, ld should recognize that the TLS
offset is within +/- 2K and suppress the warning in that case (since at that
point the address is available the the linker.)

The linker could be further optimized by allowing the TLS to offset; presumably
equivalently to the __global_pointer$ symbol.

3. binutils option: teach ld to relax these kinds of chained pointer
references.



[*] Rant: in my opinion, the lui/auipc instructions are fundamentally
misdesigned by not having an overlap bit to guarantee a sizable window.

[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED

2024-01-13 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312

--- Comment #21 from H. Peter Anvin  ---
I think this could be a really useful performance improvement in general. The
Linux exception and syscall paths have a fair number of tail calls on the
primary path, and this would make it possible to avoid the register save and
restores for each of the functions in the tail called path.

I have asked Xin Li (FRED maintainer) to try this out when he has the
opportunity, although right now the Linux kernel merge window is open and so
that is necessarily his first priority.

[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED

2024-01-13 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312

--- Comment #19 from H. Peter Anvin  ---
I'm away for the long weekend, but I'll try it out on Tuesday.

[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED

2024-01-11 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312

--- Comment #15 from H. Peter Anvin  ---
That should be fine for this use case, obviously.

I should add the following: the reason the assembly stub isn't a problem for
FRED whereas it is a bit of a nuisance for IDT-style delivery is that with
FRED, vector dispatch is done in software, not hardware. This is exactly
because *most* operating systems do need some amount of common entry/exit code
anyway, and having to duplicate it is a severe nuisance.

In the specific case of Linux, the full register set, including saved
registers, are a required part of the exception frame in order for things like
ptrace() and fork() to work correctly, so relying on the compiler to save the
"saved" registers doesn't work for us anyway.

So since there is only *one* instance of the assembly stub needed, it means
there isn't a whole separate stub needed for every handler.

[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED

2024-01-11 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312

--- Comment #13 from H. Peter Anvin  ---
No, it will not. Most OSes flows will want to merge the kernel and user flows
at some point for some handlers, so it isn't clear that that makes sense.

[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED

2024-01-10 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312

--- Comment #10 from H. Peter Anvin  ---
Right, is there such an attribute (that's what I'm asking for in bug 103503)?

All I see in the gcc documentation is no_calle*R*_saved_registers, which,
again, is the exact opposite.

[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED

2024-01-10 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312

--- Comment #6 from H. Peter Anvin  ---
Of course. That's not what we want in the Linux kernel specifically, though.
It's really up to the OS.

[Bug target/113321] x86-64: Make __attribute__((interrupt)) more robust

2024-01-10 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113321

--- Comment #2 from H. Peter Anvin  ---
Right. The only thing I'm suggesting is that for the cost of one extra
instruction we can make it robust against the programmer picking the wrong
type, or wanting to use the same handler.

It isn't a necessary thing by any means.

[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED

2024-01-10 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312

--- Comment #4 from H. Peter Anvin  ---
(In reply to H.J. Lu from comment #2)
> (In reply to H. Peter Anvin from comment #1)
> > This is actually a specific use case of the feature requested in bug 103503.
> 
> This covers #1.  Should FRED handler take a pointer to the event info as the
> function argument?

This can be passed in from the assembly stub (normally in %rdi).

[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED

2024-01-10 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312

--- Comment #3 from H. Peter Anvin  ---
Created attachment 57032
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57032&action=edit
FRED assembly entry stub (example, slightly modified from the Linux kernel)

[Bug target/113321] New: x86-64: Make __attribute__((interrupt)) more robust

2024-01-10 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113321

Bug ID: 113321
   Summary: x86-64: Make __attribute__((interrupt)) more robust
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hpa at zytor dot com
  Target Milestone: ---

__attribute__((interrupt)) on x86 has two prototypes, and picking the wrong
type "probably will cause a system crash." It turns out that this is
unavoidable on i386, but on x86-64 we can do better:

- On x86-64, an exception/interrupt carries an error code if and only if the
stack is 16-byte aligned (specifically, RSP[3] = 0) on exception entry.

The proper stack pointer for using with IRET is therefore always given by:

RSP |= 8

... and the error code, if present, will be located at offset -8 from this
address.

[Bug target/113312] Update __attribute__((interrupt)) for Intel FRED

2024-01-10 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113312

--- Comment #1 from H. Peter Anvin  ---
This is actually a specific use case of the feature requested in bug 103503.

[Bug c++/113298] RFE: allow suppressing warnings for void * conversions with -fpermissive

2024-01-09 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113298

--- Comment #2 from H. Peter Anvin  ---
You're not wrong per se. Arguably the problem (and many others) would be better
solved by allowing user-specified conversations that are not member functions.
In that case one could do:

// Set the properties/types for which
// we allow pointer conversations from
// void *
template 
concept void_pointer_convertible = ...

template 
concept some_void_type = std::is_void_v;

template 
operator T * (V *ptr) { return static_cast(ptr); }

... and now the programmer has full control over exactly what they wish to
permit.

[Bug c++/113298] New: RFE: allow suppressing warnings for void * conversions with -fpermissive

2024-01-09 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113298

Bug ID: 113298
   Summary: RFE: allow suppressing warnings for void * conversions
with -fpermissive
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hpa at zytor dot com
  Target Milestone: ---

-fpermissive downgrades some errors to warnings, but there doesn't seem to be
any -W options to suppress those warnings, and -fpermissive itself is a fairly
wide switch.

Having individual -W options for the various -fpermissive events would allow
the programmer to pick what extensions to allow unconditionally, which to warn
for, and which to error out for (-Werror=).

Allowing void * to be converted to a pointer to a POD structure is particularly
useful in the process of moving a project from C to C++.

[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions

2023-08-14 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

--- Comment #5 from H. Peter Anvin  ---
I don't think source code modifications are a huge problem, but at this point
they require tracking down each individual bit.

As far as trapping implementations are concerned:

1. In deeply embedded implementations, it is entirely possible that
firmware/microcode might be *more* expensive than logic. Although memory arrays
are, of course, very dense, they are still extremely general and RISC-V isn't a
very sparse instruction set.

2. It seems like it almost would require an implementation-specific performance
model. Now, one can validly argue that by setting the cost of unimplemented
instructions to a (near-)infinite value such instructions should never be
generated even if they are "enabled". That might also be a possible avenue for
achieving this.

As far as an explosion of subsets, yes, this is really what this means.
Bloating a tiny on-chip control processor both in area and timing to implement
instructions that never actually appears in the code is at best painful.

That being said, I do intend to submit a proposal to the RISC-V ISA folks to
subset the Zbb subset. It is worth noting that there are overlaps between the
Zb* and Zbk* subsets, but the individual intersection sets do not have their
own names.

The Zbb instruction set is particularly noxious (and this is indeed an ISA
definition problem), because it implements multiple things that are, from an
implementation point of view, completely separate and require separate code
paths in the ALU:

§ 1.2.1 Logical with negate
- minimal cost; in fact in some implementations it might have zero or
even negative cost due to decoder simplification.
- Extremely common in embedded operations.

§ 1.2.2 Count leading/trailing zero bits
- Requires dedicated logic.
- ctz and clz have very different uses.
- Typically clz and ctz will not be able to share logic, either,
requiring *two* dedicated units.

§ 1.2.3 Count population
- Requires dedicated logic.
- May be useless depending on what the processor needs.

§ 1.2.4 Integer minimum/maximum
- May be cheap or expensive, depending on if an existing comparator can
be leveraged.
- Quite possibly free or almost free if the AMO instruction set is
already supported in its entirety, as that requires max/min already.

§ 1.2.5 Sign- and zero-extension
§ 1.2.6 Bitwise rotation
- May be very cheap or quite expensive, depending on the implementation
of the shift instructions.

§ 1.2.7 OR combine
- Requires dedicated logic.
- Virtually useless in control processors that do not process text.

§ 1.2.8 Byte-reverse
- Requires dedicated logic.
- These, and some other instructions, are special cases of a bit swap
extension proposed in the original bitmanip proposal, but was not included even
as a separate set.
- Virtually useless in control processors that does not need to
interface with cross-endian data.


These 8 groups really ought to be given separate names.

Is this going to happen again? Quite likely.

It seems, as you say, that chopping the public ISA to pieces to support every
single use case would seem unlikely.

It really comes down to: out of multiple suboptimal cases (forced hardware
bloat, custom subsets, extremely fine grained public subsets, vendor-hacked
trees that lag behind and/or diverge from upstream), what option is the least
amount of badness?

[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions

2023-08-14 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

--- Comment #2 from H. Peter Anvin  ---
Named subsets are, inherently, designed to make sense toward mass-produced
products where the hardware and software are designed (mostly) independently.
However, what I mean with "very deep embedded use" is hardware and software
being co-designed.

The RISC-V ISA policy is that those are considered vendor-specific subsets and
are to be given an X* name; however, gcc obviously needs to be able to
understand the meaning of this X* name. At this point there is no way to do
without changing the source code in nontrivial ways.

Regardless of if it is done in source code or at runtime, by implementing a
fine-grained, preferably table-driven, approach to subsets in gcc then it would
be very simple for a hardware implementor to define their custom X-subsets
without a lot of surgery to the code, *and* it makes it possible to take it one
step further and allowing custom (or newly defined! - there have been multiple
instances already of new subsets of existing instructions defined a posteori)
instruction subsets to be defined in a configuration file.

[Bug c/96952] __builtin_thread_pointer support cannot be probed

2023-08-14 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96952

H. Peter Anvin  changed:

   What|Removed |Added

 CC||hpa at zytor dot com

--- Comment #10 from H. Peter Anvin  ---
Is this bug still relevant? RISC-V doesn't even seem to support disabling tls
support, and __builtin_thread_pointer() appears to be properly supported. So it
would presumably be up to any remaining target that doesn't have
__builtin_thread_pointer() (or not in all configurations) to verify that
__has_builtin(__builtin_thread_pointer) evaluates to false?

[Bug target/111020] New: RFE: RISC-V: ability to cherry-pick additional instructions

2023-08-14 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

Bug ID: 111020
   Summary: RFE: RISC-V: ability to cherry-pick additional
instructions
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hpa at zytor dot com
  Target Milestone: ---

For very deeply embedded use, it is sometimes highly desirable to control the
instruction set on a very fine grained basis. For example, the Zbb extension
contains a mixture of things that most likely requires separate functional
units. However, as an example, the ctz instruction is highly useful to speed up
interrupt latency in designs that do not have vectorized interrupt handling
(which is, in its most basic form, a dedicated ctz unit.) It would be massive
hardware bloat to require the full Zbb set to add this one instruction.

Once the instruction is added, though, one would like to be able to use it as
fully as possible.

This, obviously, creates binaries that are specifically tuned toward a single
processor implementation, but that is pretty much the essence of deeply
embedded, where in the normal case the entire software stack from the OS to
application is linked together in a single binary, or at the very least
compiled together, often from a single source tree.

As far as object code compatibility is concerned, this is very much a
"programmer beware" situation. There is no need for heroics in terms of tagging
objects with the exact instruction set, for example.

[Bug c++/106486] C++ warning for -Wmissing-prototypes is pure nuisance

2023-06-05 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106486

--- Comment #5 from H. Peter Anvin  ---
Yes, exactly.

[Bug c/105863] RFE: C23 #embed

2023-06-05 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863

--- Comment #4 from H. Peter Anvin  ---
So I'm updating this to be C23 #embed, since that is a bit more general than
the typical incbin (at least conceptually it operates on the preprocessor
syntactic level; it does not of course preclude a shortcut between the
preprocessor and the compiler.)

However, C23 #embed has a *huge* problem; specifically it has exactly the same
problem that necessitated #pragma to be augmented with _Pragma(). Therefore, I
believe that an equivalent construct (_Embed()) is needed for #embed as well.

I have given this feedback to members of the C committee, but it was not
surprisingly too late for C23; I hope it will be considered for C2y and I
believe it would be a highly desirable extension in the meantime.

[Bug c/96054] RFE: __attribute__((fatal))

2022-11-14 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96054

--- Comment #2 from H. Peter Anvin  ---
I agree, my naming was very poor.
Perhaps "panic" or "abort" would work; those are classic names in software use
for this.

Another case of a function that could be so attributed would be the function
typically called __assert_failed().

It is probably worth noting that all the ones I can think of should be noreturn
functions. I don't know if that is truly inherent, but personally I cannot
think of a case where it would not be.

[Bug middle-end/56314] Please allow per-function specification of register conventions

2022-10-03 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56314

--- Comment #6 from H. Peter Anvin  ---
Unfortunately that's not really possible given the way the way the level does
runtime patching (which isn't going to change, sorry.) At the very least we
would need a *lot* more compiler support to give LTO all the information that
it needs; say a *very* minimum LTO would need to support ORC metadata.

[Bug c/59850] Support sparse-style pointer address spaces (type attributes)

2022-10-03 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59850

--- Comment #37 from H. Peter Anvin  ---
One would assume that there would be __foo__ aliases for the attribute names
like all the other ones.

[Bug tree-optimization/107006] Missing optimization: common idiom for external data

2022-09-22 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006

--- Comment #11 from H. Peter Anvin  ---
If you look at the output, you see that the loops are already fully unrolled
(at considerable code size cost.)

Unfortunately, since the issue at hand is dealing with code written to be
portable, adding gcc-specific hacks are not really a reasonable option.

[Bug rtl-optimization/107006] Missing optimization: common idiom for external data

2022-09-21 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006

--- Comment #9 from H. Peter Anvin  ---
To clarify: the C test case produces the same output regardless if it is
compiled as C or C++. Only the C++ wrapped class definition detects the
additional case of a 32-bit bigendian load.

[Bug rtl-optimization/107006] Missing optimization: common idiom for external data

2022-09-21 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006

--- Comment #8 from H. Peter Anvin  ---
Created attachment 53610
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53610&action=edit
C++ test case object code

[Bug rtl-optimization/107006] Missing optimization: common idiom for external data

2022-09-21 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006

--- Comment #7 from H. Peter Anvin  ---
Created attachment 53609
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53609&action=edit
C++ test case assembly output

[Bug rtl-optimization/107006] Missing optimization: common idiom for external data

2022-09-21 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006

--- Comment #6 from H. Peter Anvin  ---
Created attachment 53608
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53608&action=edit
C++ test case preprocessed source

[Bug rtl-optimization/107006] Missing optimization: common idiom for external data

2022-09-21 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006

--- Comment #5 from H. Peter Anvin  ---
Created attachment 53607
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53607&action=edit
C++ test case main file

[Bug rtl-optimization/107006] Missing optimization: common idiom for external data

2022-09-21 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006

--- Comment #4 from H. Peter Anvin  ---
Created attachment 53606
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53606&action=edit
C++ test case class definition header file

[Bug rtl-optimization/107006] Missing optimization: common idiom for external data

2022-09-21 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006

--- Comment #3 from H. Peter Anvin  ---
Created attachment 53605
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53605&action=edit
C test case object code

[Bug rtl-optimization/107006] Missing optimization: common idiom for external data

2022-09-21 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006

--- Comment #2 from H. Peter Anvin  ---
Created attachment 53604
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53604&action=edit
C test case assembly output

[Bug rtl-optimization/107006] Missing optimization: common idiom for external data

2022-09-21 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006

--- Comment #1 from H. Peter Anvin  ---
Created attachment 53603
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53603&action=edit
C test case preprocessed source

[Bug rtl-optimization/107006] New: Missing optimization: common idiom for external data

2022-09-21 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107006

Bug ID: 107006
   Summary: Missing optimization: common idiom for external data
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hpa at zytor dot com
  Target Milestone: ---

Created attachment 53602
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53602&action=edit
C test case source

The only *portable* way in C to deal with external data structures containing
data of specific endianness, possibly unaligned, is to operate on them as byte
(char) arrays.

At least on x86 (which supports arbitrarily aligned loads), gcc *sometimes*
recognize these as single loads, but sometimes not.

In the included test cases, there is a plain C implementation and an
implementation wrapped in a C++ class.

Compiling the former with:

gcc -std=c2x -g -O3 -W -Wall -[cSE] -o bswap.[osi] bswap.c

... recognizes the load idiom for 16-bit numbers but not for 32- or 64-bit
numbers.

Compiling the latter with:

gcc -std=c++20 -g -O3 -E -Wall -[cSE] -o bswapcc.[osi] bswapcc.cc

... *additionally* recognizes the 32-bit load, *but only in the bigendian case*
(that is, it generates a load and a bswap instruction); whereas in the
littleendian -- native -- case, this does not happen!

I am familiar with the used of packed arrays and __builtin_bswap*() for these
accesses, but unfortunately these are gcc-specific.

[Bug c++/106486] New: C++ warning for -Wmissing-prototypes is pure nuisance

2022-07-30 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106486

Bug ID: 106486
   Summary: C++ warning for -Wmissing-prototypes is pure nuisance
   Product: gcc
   Version: 12.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hpa at zytor dot com
  Target Milestone: ---

Since upgrading to gcc 12.1.1, I keep getting the following warning through
various projects:

cc1plus: warning: command-line option ‘-Wmissing-prototypes’ is valid for
C/ObjC but not for C++

This warning is pure nuisance. In a mixed-language project it is *extra*
important that the right prototypes are used, and it is far easier to enable
-Wmissing-prototypes project wide. This warning implies that one would have to
conditionalize the -W options based on the language of an input file, which is
often painful to do without structural Makefile changes.

Note that there doesn't seem to be any way to squelch this warning, either
(e.g. a -Wno-warning-not-applicable option or similar.)
cc1plus: warning: command-line option ‘-Wmissing-prototypes’ is valid for
C/ObjC but not for C++

[Bug target/103503] RFE: no save registers attribute

2022-06-06 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103503

--- Comment #4 from H. Peter Anvin  ---
The interrupt attribute typically does two things:

1. It changes the return instruction;
2. It marks all registers as saved.

2 is exactly the *opposite* of what I want; I would like to improve performance
by the fact that the compiler-invisible entry flow has already saved all
registers, whether or not they are saved in the ABI. Thus, I would like it to
treat all (non-fixed) registers as *clobbered*, not *saved*.

Ideally, "nosaved" and "interrupt" should be possible to use together, to get
effect #1 of the interrupt attribute, but that is usually less important.

[Bug c/105863] New: RFE: __attribute__((incbin("file"))) or __builtin_incbin("file")

2022-06-06 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105863

Bug ID: 105863
   Summary: RFE: __attribute__((incbin("file"))) or
__builtin_incbin("file")
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hpa at zytor dot com
  Target Milestone: ---

It is a *very* common operation to want to include a preexisting binary object
into a compiled project. There are a number of ways to do this, but they all
suffer significant shortcomings.

The most common ways are to use a preprocessor to convert the input binary
object to textual source, or to wrap an object in assembly code and use
.incbin.  The former is *extremely* inefficient, the latter has a number of
pitfalls, including the one described in bug 66871, the need for
platform-specific coding, sizeof() not being functional, etc.

I would like to suggest a variable attribute __attribute__((incbin("file")))
which statically initializes a variable to the contents of the given binary
file, or a __builtin_incbin("file") which would expand to an initializer; the
end result would look either like:

char foobar[] __attribute__((incbin("foobar.bin")));
char foobar[] = __builtin_incbin("foobar.bin");

[Bug middle-end/85751] RFE: option to align code using breakpoint instructions when unreachable

2022-06-06 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85751

--- Comment #2 from H. Peter Anvin  ---
Goodness... I missed the question here.
The intent was to just take advantage of existing padding: the execution flow
should not go there.

[Bug target/103503] New: RFE: no save registers attribute

2021-11-30 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103503

Bug ID: 103503
   Summary: RFE: no save registers attribute
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hpa at zytor dot com
  Target Milestone: ---
Target: multiple

When a common assembly interrupt entry code or an equivalent hardware engine is
used to handle register saves in an interrupt routine, it may be completely
unnecessary to save and restore registers in the interrupt handler itself, even
if they would normally be clobbered.

Unfortunately [-f]call-used-reg is not supported by either
__attribute__((optimize)) nor _Pragma("GCC optimize"); otherwise that would be
a very valid solution.

Putting all interrupt handlers in a separate compilation unit is awkward at the
very best.

AVR has __attribute__((OS_{task,main})) for this purpose, but being able to do
it in general would improve interrupt/trap latency.

See also bug 38534.

Note: not saving the registers in the assembly wrapper is generally not an
option; similarly, whether or not a hardware engine can do it practically
depends on both the hardware implementation and the ABI. The RISC-V ABI, for
example, scatters clobbered and saved registers all over the register map,
which makes doing such a thing in hardware difficult.

[Bug c/102266] New: RFE: x86: print operand with optional (%rip) suffix

2021-09-09 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102266

Bug ID: 102266
   Summary: RFE: x86: print operand with optional (%rip) suffix
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: hpa at zytor dot com
  Target Milestone: ---
Target: x86

In the Linux kernel we reasonably frequently use extended asm operand modifiers
like %P[]/%p[] for encoding a memory operand that *must not* use
register-indirect forms. On x86-64, they can sometimes be encoded as
%rip-relative, however, there is currently no convenient way for doing so
without also making the assembly code x86-64 specific whereas it otherwise
would be perfectly fine dual mode.

I would therefore like to request one of the following, in order of preference:

1. A modifier to emit a memory immediate operand (i.e. a constant sans $) with
a (%rip) suffix assuming it can be so encoded.

2. A simple macro (like %=) that emits (%rip) on x86-64 but nothing on i386.

The priority of this is quite low, but it is probably simple to implement.