On 3/21/24 07:45, Jeff Law wrote:
>>>> The first patch is the main change which improves SPEC cactu by 10%.
>>> Just to confirm.  Yup, 10% reduction in icounts and about a 3.5%
>>> improvement in cycles on our target.  Which is great!
>>>
>>> This also makes me wonder if cactu is the benchmark that was sensitive
>>> to flushing the pending queue in the scheduler.  Jivan's data would tend
>>> to indicate that is the case as several routines seem to flush the
>>> pending queue often.  In particular:
>>>
>>> ML_BSSN_RHS_Body
>>> ML_BSSN_Advect_Body
>>> ML_BSSN_constraints_Body
>>>
>>> All have a high number of dynamic instructions as well as lots of
>>> flushes of the pending queue.
>>>
>>> Vineet, you might want to look and see if cranking up the
>>> max-pending-list-length parameter helps drive down spilling.   I think
>>> it's default value is 32 insns.  I've seen it cranked up to 128 and 256
>>> insns without significant ill effects on compile time.
>>>
>>> My recollection (it's been like 3 years) of the key loop was that it had
>>> a few hundred instructions and we'd flush the pending list about 50
>>> cycles into the loop as there just wasn't enough issue bandwidth to the
>>> FP units to dispatch all the FP instructions as their inputs became
>>> ready.  So you'd be looking for flushes in a big loop.
>> Here are the results for Cactu on top of the new splitter changes:
>>
>> default      : 2,565,319,368,591
>> 128  : 2,509,741,035,068
>> 256  : 2,527,817,813,612
>>
>> I've haven't probed deeper in generated code itself but likely to be
>> helping with spilling
> Actually, I read that as not important for this issue.  While it is 50b 
> instructions, I would be looking for something that had perhaps an order 
> of magnitude bigger impact.    Ultimately I think it means we still 
> don't have a good handle on what's causing the spilling.  Oh well.
>
> So if we go back to Robin's observation that scheduling dramatically 
> increases the instruction count, perhaps we try a run with 
> -fno-schedule-insns -fno-schedule-insns2 and see how the instruction 
> counts compare.

Oh yeah ! Robin hinted to this in Tues patchworks meeting too

default     : 2,565,319,368,591
128         : 2,509,741,035,068
256         : 2,527,817,813,612
no-sched{,2}: 1,295,520,567,376

-Vineet

Reply via email to