https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63304
--- Comment #33 from Wilco <wdijkstr at arm dot com> --- (In reply to Evandro from comment #32) > (In reply to Ramana Radhakrishnan from comment #31) > > (In reply to Evandro from comment #30) > > > The performance impact of always referring to constants as if they were > > > far > > > away is significant on targets which do not fuse ADRP and LDR together. > > > > What happens if you split them up and schedule them appropriately ? I didn't > > see any significant impact in my benchmarking on implementations that did > > not implement such fusion. Where people want performance in these cases they > > can well use -mpc-relative-literal-loads or -mcmodel=tiny - it's in there > > already. > > Because of side effects of the Haiffa scheduler, the loads now pile up, and > the ADRPs may affect the load issue rate rather badly if not fused. At leas > on our processor. ADRP latency to load-address should be zero on any OoO core - ADRP is basically a move-immediate, so can execute early and hide any latency. > Which brings another point, shouldn't there be just one ADRP per BB or, > ideally, per function? Or am I missing something? That's not possible in this case as the section is mergeable. An alternative implementation using anchors may be feasible, but GCC is extremely bad at using anchors efficiently - functions using several global variables also end up with a large number of ADRPs when you'd expect a single ADRP.