> Am 20.11.2025 um 16:53 schrieb Jeff Law <[email protected]>:
>
>
>
> On 11/19/25 9:42 AM, Andi Kleen wrote:
>>> I know I was pushing for it to be enabled more widely as it's painfully hard
>>> to forward from a narrow store to a wider load. But based on earlier
>>> discussions I've backed off that position.
>> FWIW I would expect any slightly better OOO core aimed at general
>> purpose code to have some form of hardware support for a subset of the
>> cases.
> The narrow store to wide load is the problem space, even for OOO cores. I
> fully expect any modern performance core to forward when the load can get all
> of its data from a single prior store.
>
>
>> The rules can be very complicated. As an example see the diagram
>> in https://chipsandcheese.com/p/a-peek-at-sapphire-rapids
>> https://substackcdn.com/image/fetch/$s_!rESw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F69b17f38-0631-424d-8e05-7988f9b174f6_2559x1214.png
> They don't look significantly more complex than I expected. Essentially if
> the load is contained within the store, then it's forwarded, with a possible
> penalty if there isn't a perfect start match, but it's still forwarded.
>
> If there's a partial overlap then no store to load forwarding occurs and you
> take that full 19c penalty.
There’s the strategy of increasing the issue distance between store and load.
Some OOO implementations now try to anticipate and delay a load. The compiler
could do its own thing here during scheduling (usually to the contrary goal of
delaying stores and issueing loads as early as possible).
On GIMPLE we’re trying to aggressively elide problematic loads, but the worst
case is when the forwarding issue is not obviously visible. The profiling idea
here sounds interesting, but identifying a problematic load exactly is
challenging.
Richard
>
> jeff