On 3/21/24 8:36 AM, Vineet Gupta wrote:


On 3/18/24 21:41, Jeff Law wrote:

On 3/16/24 11:35 AM, Vineet Gupta wrote:
Hi,

This set of patches (for gcc-15) help improve stack/array accesses
by improving constant materialization. Details are in respective
patches.

The first patch is the main change which improves SPEC cactu by 10%.
Just to confirm.  Yup, 10% reduction in icounts and about a 3.5%
improvement in cycles on our target.  Which is great!

This also makes me wonder if cactu is the benchmark that was sensitive
to flushing the pending queue in the scheduler.  Jivan's data would tend
to indicate that is the case as several routines seem to flush the
pending queue often.  In particular:

ML_BSSN_RHS_Body
ML_BSSN_Advect_Body
ML_BSSN_constraints_Body

All have a high number of dynamic instructions as well as lots of
flushes of the pending queue.

Vineet, you might want to look and see if cranking up the
max-pending-list-length parameter helps drive down spilling.   I think
it's default value is 32 insns.  I've seen it cranked up to 128 and 256
insns without significant ill effects on compile time.

My recollection (it's been like 3 years) of the key loop was that it had
a few hundred instructions and we'd flush the pending list about 50
cycles into the loop as there just wasn't enough issue bandwidth to the
FP units to dispatch all the FP instructions as their inputs became
ready.  So you'd be looking for flushes in a big loop.

Here are the results for Cactu on top of the new splitter changes:

default : 2,565,319,368,591
128     : 2,509,741,035,068
256     : 2,527,817,813,612

I've haven't probed deeper in generated code itself but likely to be
helping with spilling
Actually, I read that as not important for this issue. While it is 50b instructions, I would be looking for something that had perhaps an order of magnitude bigger impact. Ultimately I think it means we still don't have a good handle on what's causing the spilling. Oh well.

So if we go back to Robin's observation that scheduling dramatically increases the instruction count, perhaps we try a run with -fno-schedule-insns -fno-schedule-insns2 and see how the instruction counts compare.


Jeff

Reply via email to