On 09/12/2015 04:38 AM, Ajit Kumar Agarwal wrote:


-----Original Message----- From: Jeff Law [mailto:l...@redhat.com]
Sent: Thursday, September 10, 2015 3:10 AM To: Ajit Kumar Agarwal;
Richard Biener Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta;
Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re:
[Patch,tree-optimization]: Add new path Splitting pass on tree ssa
representation

On 08/26/2015 11:29 PM, Ajit Kumar Agarwal wrote:

Thanks. The following testcase testsuite/gcc.dg/tree-ssa/ifc-5.c

void dct_unquantize_h263_inter_c (short *block, int n, int qscale,
int nCoeffs) { int i, level, qmul, qadd;

qadd = (qscale - 1) | 1; qmul = qscale << 1;

for (i = 0; i <= nCoeffs; i++) { level = block[i]; if (level < 0)
level = level * qmul - qadd; else level = level * qmul + qadd;
block[i] = level; } }

The above Loop is a candidate of path splitting as the IF block
merges at the latch of the Loop and the path splitting duplicates
The latch of the loop which is the statement block[i] = level into
the predecessors THEN and ELSE block.
So coming back to this patch -- which IMHO is a candidate for including given it was posted well before stage1 close.

I wonder if as an initial heuristic, we could avoid path splitting when the THEN/ELSE blocks are relatively small and have a well defined structure. If we look at ifc-5.c we have this in the ivcanon dumps for the two key blocks:

;;   basic block 6, loop depth 1, count 0, freq 2457, maybe hot
;;    prev block 5, next block 7, flags: (NEW, REACHABLE)
;;    pred:       5 [27.0%]  (TRUE_VALUE,EXECUTABLE)
  _16 = qmul_7 * level_15;
  level_17 = _16 - qadd_6;
  goto <bb 8>;
;;    succ:       8 [100.0%]  (FALLTHRU,EXECUTABLE)

;;   basic block 7, loop depth 1, count 0, freq 6643, maybe hot
;;    prev block 6, next block 8, flags: (NEW, REACHABLE)
;;    pred:       5 [73.0%]  (FALSE_VALUE,EXECUTABLE)
  _18 = qmul_7 * level_15;
  level_19 = qadd_6 + _18;
;;    succ:       8 [100.0%]  (FALLTHRU,EXECUTABLE)


There's a lot of commonality there.

Alternately, we could look at those blocks and the merge point;


;;   basic block 8, loop depth 1, count 0, freq 9100, maybe hot
;;    prev block 7, next block 9, flags: (NEW, REACHABLE)
;;    pred:       6 [100.0%]  (FALLTHRU,EXECUTABLE)
;;                7 [100.0%]  (FALLTHRU,EXECUTABLE)
  # level_2 = PHI <level_17(6), level_19(7)>
  _20 = (short int) level_2;
  *_13 = _20;
  i_22 = i_24 + 1;
  if (nCoeffs_9(D) >= i_22)
    goto <bb 10>;
  else
    goto <bb 9>;
;;    succ:       10 [91.0%]  (TRUE_VALUE,EXECUTABLE)


And make a guess that there's not going to be any DCE/CSE opportunities if we path split. It might help to see some sample codes where path splitting helps to see if there's patterns in the code that we can look for.




Due to above path splitting,  the IF conversion is disabled and
the above IF-THEN-ELSE is not IF-converted and the test case
fails.
So I think the question then becomes which of the two styles
generally results in better code?  The path-split version or the
older if-converted version.

If the latter, then this may suggest that we've got the path
splitting code at the wrong stage in the optimizer pipeline or
that we need better heuristics for >>when to avoid applying path
splitting.

The code generated by the Path Splitting is useful when it exposes
the DCE, PRE,CCP candidates. Whereas the IF-conversion is useful When
the if-conversion exposes the vectorization candidates.
So perhaps two path splitting passes. One early in the pipeline that runs when we're not vectorizing, and one late that runs if we are vectorizing.

I don't generally like conditionalizing passes like that, but it might be a reasonable way to go in this case if we can't come up with a good heuristic around when to split paths or leave them alone.

Thoughts?


The suggestion for keeping the path splitting later in the pipeline
after the if-conversion and the vectorization is useful as it doesn't
break the Existing Deja GNU tests. Also useful to keep the path
splitting later in the pipeline after the if-conversion and
vectorization is that path splitting Can always duplicate the merge
node into its predecessor after the if-conversion and vectorization
pass, if the if-conversion and vectorization Is not applicable to the
Loops. But this suppresses the CCP, PRE candidates which are earlier
in the optimization pipeline.
Just to be clear, I'm less concerned about specific tests in the dejagnu suite than I am overall performance.

Anyway, I'm going to drop the patch into my local tree and try to get it updated to work with the trunk so I can poke around a bit.



Jeff

Reply via email to