On 09/12/2015 04:38 AM, Ajit Kumar Agarwal wrote:
-----Original Message----- From: Jeff Law [mailto:l...@redhat.com]
Sent: Thursday, September 10, 2015 3:10 AM To: Ajit Kumar Agarwal;
Richard Biener Cc: GCC Patches; Vinod Kathail; Shail Aditya Gupta;
Vidhumouli Hunsigida; Nagaraju Mekala Subject: Re:
[Patch,tree-optimization]: Add new path Splitting pass on tree ssa
representation
On 08/26/2015 11:29 PM, Ajit Kumar Agarwal wrote:
Thanks. The following testcase testsuite/gcc.dg/tree-ssa/ifc-5.c
void dct_unquantize_h263_inter_c (short *block, int n, int qscale,
int nCoeffs) { int i, level, qmul, qadd;
qadd = (qscale - 1) | 1; qmul = qscale << 1;
for (i = 0; i <= nCoeffs; i++) { level = block[i]; if (level < 0)
level = level * qmul - qadd; else level = level * qmul + qadd;
block[i] = level; } }
The above Loop is a candidate of path splitting as the IF block
merges at the latch of the Loop and the path splitting duplicates
The latch of the loop which is the statement block[i] = level into
the predecessors THEN and ELSE block.
So coming back to this patch -- which IMHO is a candidate for including
given it was posted well before stage1 close.
I wonder if as an initial heuristic, we could avoid path splitting when
the THEN/ELSE blocks are relatively small and have a well defined
structure. If we look at ifc-5.c we have this in the ivcanon dumps for
the two key blocks:
;; basic block 6, loop depth 1, count 0, freq 2457, maybe hot
;; prev block 5, next block 7, flags: (NEW, REACHABLE)
;; pred: 5 [27.0%] (TRUE_VALUE,EXECUTABLE)
_16 = qmul_7 * level_15;
level_17 = _16 - qadd_6;
goto <bb 8>;
;; succ: 8 [100.0%] (FALLTHRU,EXECUTABLE)
;; basic block 7, loop depth 1, count 0, freq 6643, maybe hot
;; prev block 6, next block 8, flags: (NEW, REACHABLE)
;; pred: 5 [73.0%] (FALSE_VALUE,EXECUTABLE)
_18 = qmul_7 * level_15;
level_19 = qadd_6 + _18;
;; succ: 8 [100.0%] (FALLTHRU,EXECUTABLE)
There's a lot of commonality there.
Alternately, we could look at those blocks and the merge point;
;; basic block 8, loop depth 1, count 0, freq 9100, maybe hot
;; prev block 7, next block 9, flags: (NEW, REACHABLE)
;; pred: 6 [100.0%] (FALLTHRU,EXECUTABLE)
;; 7 [100.0%] (FALLTHRU,EXECUTABLE)
# level_2 = PHI <level_17(6), level_19(7)>
_20 = (short int) level_2;
*_13 = _20;
i_22 = i_24 + 1;
if (nCoeffs_9(D) >= i_22)
goto <bb 10>;
else
goto <bb 9>;
;; succ: 10 [91.0%] (TRUE_VALUE,EXECUTABLE)
And make a guess that there's not going to be any DCE/CSE opportunities
if we path split. It might help to see some sample codes where path
splitting helps to see if there's patterns in the code that we can look for.
Due to above path splitting, the IF conversion is disabled and
the above IF-THEN-ELSE is not IF-converted and the test case
fails.
So I think the question then becomes which of the two styles
generally results in better code? The path-split version or the
older if-converted version.
If the latter, then this may suggest that we've got the path
splitting code at the wrong stage in the optimizer pipeline or
that we need better heuristics for >>when to avoid applying path
splitting.
The code generated by the Path Splitting is useful when it exposes
the DCE, PRE,CCP candidates. Whereas the IF-conversion is useful When
the if-conversion exposes the vectorization candidates.
So perhaps two path splitting passes. One early in the pipeline that
runs when we're not vectorizing, and one late that runs if we are
vectorizing.
I don't generally like conditionalizing passes like that, but it might
be a reasonable way to go in this case if we can't come up with a good
heuristic around when to split paths or leave them alone.
Thoughts?
The suggestion for keeping the path splitting later in the pipeline
after the if-conversion and the vectorization is useful as it doesn't
break the Existing Deja GNU tests. Also useful to keep the path
splitting later in the pipeline after the if-conversion and
vectorization is that path splitting Can always duplicate the merge
node into its predecessor after the if-conversion and vectorization
pass, if the if-conversion and vectorization Is not applicable to the
Loops. But this suppresses the CCP, PRE candidates which are earlier
in the optimization pipeline.
Just to be clear, I'm less concerned about specific tests in the dejagnu
suite than I am overall performance.
Anyway, I'm going to drop the patch into my local tree and try to get it
updated to work with the trunk so I can poke around a bit.
Jeff