On 3/16/24 11:35 AM, Vineet Gupta wrote:
Hi,

This set of patches (for gcc-15) help improve stack/array accesses
by improving constant materialization. Details are in respective
patches.

The first patch is the main change which improves SPEC cactu by 10%.
Just to confirm. Yup, 10% reduction in icounts and about a 3.5% improvement in cycles on our target. Which is great!

This also makes me wonder if cactu is the benchmark that was sensitive to flushing the pending queue in the scheduler. Jivan's data would tend to indicate that is the case as several routines seem to flush the pending queue often. In particular:

ML_BSSN_RHS_Body
ML_BSSN_Advect_Body
ML_BSSN_constraints_Body

All have a high number of dynamic instructions as well as lots of flushes of the pending queue.

Vineet, you might want to look and see if cranking up the max-pending-list-length parameter helps drive down spilling. I think it's default value is 32 insns. I've seen it cranked up to 128 and 256 insns without significant ill effects on compile time.

My recollection (it's been like 3 years) of the key loop was that it had a few hundred instructions and we'd flush the pending list about 50 cycles into the loop as there just wasn't enough issue bandwidth to the FP units to dispatch all the FP instructions as their inputs became ready. So you'd be looking for flushes in a big loop.


Jeff


Reply via email to