Re: [PATCH] Atom: Scheduler improvements for better imul placement

Andrey Belevantsev Fri, 13 Apr 2012 00:16:28 -0700

On 12.04.2012 18:22, Richard Guenther wrote:

2012/4/12 Andrey Belevantsev<a...@ispras.ru>:

On 12.04.2012 17:54, Richard Guenther wrote:


2012/4/12 Andrey Belevantsev<a...@ispras.ru>:


On 12.04.2012 16:38, Richard Guenther wrote:



On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatin<izamya...@gmail.com>
  wrote:



On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
<richard.guent...@gmail.com>      wrote:



On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov<amona...@ispras.ru>
  wrote:

Can atom execute two IMUL in parallel?  Or what exactly is the
pipeline
behavior?




As I understand from Intel's optimization reference manual, the
behavior is as
follows: if the instruction immediately following IMUL has shorter
latency,
execution is stalled for 4 cycles (which is IMUL's latency);
otherwise,
a
4-or-more cycles latency instruction can be issued after IMUL without
a
stall.
In other words, IMUL is pipelined with respect to other long-latency
instructions, but not to short-latency instructions.




It seems to be modeled in the pipeline description though:

;;; imul insn has 5 cycles latency
(define_reservation "atom-imul-32"
                    "atom-imul-1, atom-imul-2, atom-imul-3,
atom-imul-4,
                     atom-port-0")

;;; imul instruction excludes other non-FP instructions.
(exclusion_set "atom-eu-0, atom-eu-1"
               "atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4")


The main idea is quite simple:

If we are going to schedule IMUL instruction (it is on the top of
ready list) we try to find out producer of other (independent) IMUL
instruction that is in ready list too. The goal is try to schedule
such a producer to get another IMUL in ready list and get scheduling
of 2 successive IMUL instructions.




Why does that not happen without your patch?  Does it never happen
without
your patch or does it merely not happen for one EEMBC benchmark (can
you provide a testcase?)?




It does not happen because the scheduler by itself does not do such
specific
reordering.  That said, it is easy to imagine the cases where this patch
will make things worse rather than better.



That surprises me.  What is so specific about this reordering?



I mean that the scheduler does things like "sort the ready list according to
a number of heuristics and to the insn priority, then choose the insn that
would allow the maximum of ready insns to be issued on the current cycle".
  The heuristics are rather general.  The scheduler does not do things like
"if some random insn is ready, then choose some other random insn from the
ready list and schedule it" (which is what the patch does). This is what
scheduler hooks are for, to account for some target-specific heuristic.

The problem is that this particular implementation looks somewhat like an
overkill and also motivated by a single benchmark.  Testing on a wider set
of benchmarks and checking compile-time hit would make the motivation more
convincing.


Yeah, and especially looking _why_ the generic heuristics are not working
and if they could be improved.  After all the issue seems to be properly
modeled in the DFA.

It is, but the DFA only models the actual stalls that you get when an imulis scheduled. What you need here is to increase priority for ready insnsthat have imuls in their critical paths, and those imuls should be closeenough to fill in the gap (actual consumers in the case of the patch).

The lookahead done in max_issue can help with this "dynamic" prioritychange in general, but it considers just the ready insns, not theirimmediate consumers. You need to make the deeper lookahead if you want todo this in a target independent way -- no-no from compile time POV. Anumber of already existing hooks can help:

- sched_reorder used in this patch just sorts the ready list in any way thetarget wishes;

- adjust_priority/adjust_cost helps to boost or to lower priority in suchsubtle cases;

- is_costly_dependence (now rs6000 only) can guide the early queue insnremoval and its addition to the ready list;

- dfa_lookahead_guard (now ia64 only) can ban some insns from being issuedon the current try.

Using sched_reorder is existing practice, like in s390, spu, mips, etc.Usually though an implementation again only considers the actual readyinsns and not the successors. I'd say that trying more tests would help,or other hooks can be tried, too, but after all this is Atom specific, soit's the decision of i386 maintainers.


Andrey


Richard.

Andrey

Igor, why not try different subtler mechanisms like adjust_priority,
which
is get called when an insn is added to the ready list?  E.g. increase the
producer's priority.

The patch as is misses checks for NONDEBUG_INSN_P.  Also, why bail out
when
you have more than one imul in the ready list?  Don't you want to bump
the
priority of the other imul found?

Andrey

And MD allows us only prefer scheduling of successive IMUL
instructions,
i.e.
If IMUL was just scheduled and ready list contains another IMUL
instruction then it will be chosen as the best candidate for
scheduling.

at least from my very limited guessing of what the above does.  So,
did
you
analyze why the scheduler does not properly schedule things for you?

Richard.


  From reading the patch, I could not understand the link between
pipeline
behavior and what the patch appears to do.

Hope that helps.

Alexander

Re: [PATCH] Atom: Scheduler improvements for better imul placement

Reply via email to