Hi Philipp,

On 20/11/14 10:47, Dr. Philipp Tomsich wrote:
Kyrill,

I don't mind it being in config/arm if you plan to wire it up later, good to 
know.
Another comment inline….
I’ll clean up the missing xgene1_ and the mistyped xgene_ prefix and resubmit.

+(define_insn_reservation "div" 2
+  (and (eq_attr "tune" "xgene1")
+       (eq_attr "type" "sdiv,udiv"))
+  "xgene1_decode1op,xgene_divide")
The dangerous part was the reservation duration (the xgene_divide*<large 
number>).
The latency number (2 in this version, 66 in the previous) is not harmful to 
the automaton size
and can be as high as needed (if this operation is high latency)....
It doesn’t really matter for any workload we’ve encountered, as the hardware is 
better at dealing with ‘div’-latencies than the scheduler (especially, as ‘div’ 
is variable latency and any guess we have will be wrong… we’ll likely add 
scheduling hook function in the future).
The more important thing is to keep the cost of divides high enough in the 
cost-model.

In other words: 66 would be the worst case and will normally not be correct 
anyway. Furthermore, it’s rather unplausible, that we find 264 instructions 
(for this worst-case scenario) to fill the scheduling bubble between the 
div-insn and its result usage.

Ok, makes sense. I just thought that 2 is a bit too low but if your benchmarking showed it to be reasonable I won't complain ;)

Kyrill


Best,
Philipp.



Reply via email to