Hi Philipp,
On 20/11/14 10:47, Dr. Philipp Tomsich wrote:
Kyrill,
I don't mind it being in config/arm if you plan to wire it up later, good to
know.
Another comment inline….
I’ll clean up the missing xgene1_ and the mistyped xgene_ prefix and resubmit.
+(define_insn_reservation "div" 2
+ (and (eq_attr "tune" "xgene1")
+ (eq_attr "type" "sdiv,udiv"))
+ "xgene1_decode1op,xgene_divide")
The dangerous part was the reservation duration (the xgene_divide*<large
number>).
The latency number (2 in this version, 66 in the previous) is not harmful to
the automaton size
and can be as high as needed (if this operation is high latency)....
It doesn’t really matter for any workload we’ve encountered, as the hardware is
better at dealing with ‘div’-latencies than the scheduler (especially, as ‘div’
is variable latency and any guess we have will be wrong… we’ll likely add
scheduling hook function in the future).
The more important thing is to keep the cost of divides high enough in the
cost-model.
In other words: 66 would be the worst case and will normally not be correct
anyway. Furthermore, it’s rather unplausible, that we find 264 instructions
(for this worst-case scenario) to fill the scheduling bubble between the
div-insn and its result usage.
Ok, makes sense. I just thought that 2 is a bit too low but if your
benchmarking showed it to be reasonable I won't complain ;)
Kyrill
Best,
Philipp.