Hi!

On 2021-03-17T08:46:16+0100, Richard Biener <richard.guent...@gmail.com> wrote:
> On Wed, Mar 17, 2021 at 12:25 AM Thomas Schwinge
> <tho...@codesourcery.com> wrote:
>> This is a prototype/"initial hack" for a very simple implementation of
>> <https://gcc.gnu.org/PR90591> "Avoid unnecessary data transfer out of OMP
>> construct".  (... to be completed and posted later.)  So simple that it
>> will easily fail (gracefully, of course), but yet is effective for a lot
>> of real-world code:
>>
>>     subroutine [...]
>>     [...]
>>     real([...])   xx        !temporary variable, for distance calculation
>>     [...]
>>     !$acc kernels pcopyin(x, zii) reduction(+:eva) ! implicit 'copy(xx)' for 
>> scalar used inside region; established during gimplification
>>           do 100 i=0,n-1
>>             evx=0.0d0
>>             do 90 j=0,n-1
>>               xx=abs(x(1,i)-x(1,j))
>>     [...]
>>     !$acc end kernels
>>     [...]
>>     ['xx' never used here]
>>     end subroutine [...]
>>
>> Inside 'kernels', we'd like to automatically parallelize loops (we've
>> basically got that working; analysis by Graphite etc.), but the problem
>> is that given "implicit 'copy(xx)' for scalar used inside region", when
>> Graphite later looks at the outlined 'kernels' region's function, it must
>> assume that 'xx' is still live after the OpenACC 'kernels' construct --
>> and thus cannot treat it as a thread-private temporary, cannot
>> parallelize.
>>
>> Now, walking each function backwards (!), I'm taking note of any
>> variables' uses, and if I reach an 'kernels' construct, but have not seen
>> a use of 'xx', I may then optimize 'copy(xx)' -> 'firstprivate(xx)',
>> enabling Graphite to do its thing.  (Other such optimizations may be
>> added later.)  (This is inspired by Jakub's commit
>> 1a80d6b87d81c3f336ab199a901cf80ae349c335 "re PR tree-optimization/68128
>> (A huge regression in Parboil v2.5 OpenMP CUTCP test (2.5 times lower
>> performance))".)
>>
>> I've now got a simple 'callback_op', which for '!is_lhs' looks at
>> 'get_base_address ([op])', and if that 'var' is contained in the set of
>> current candidates (initialized per containg 'bind's, which we enter
>> first, even if walking a 'gimple_seq' backwards), removes that 'var' as a
>> candidate for such optimization.  (Plus some "details", of couse.)  This
>> seems to work fine, as far as I can tell.  :-)
>
> It might then still fail for x = a[var] when you are interested in 'var'.

That already works as expected:

    gimple_assign <array_ref, x, a[var], NULL, NULL>

..., and in the debug log ('OP' is 'get_base_address (OPERAND)'), I see:

    ##### LOOKING INTO OPERAND:  <array_ref 0x7ffff67af3b8>
    ###### OP:  <var_decl 0x7ffff7feeb40 a>
    ####### WASN'T CANDIDATE
    ##### LOOKING INTO OPERAND:  <var_decl 0x7ffff7feeb40 a>
    ###### OP:  <var_decl 0x7ffff7feeb40 a>
    ####### WASN'T CANDIDATE
    ##### LOOKING INTO OPERAND:  <var_decl 0x7ffff7feebd0 var>
    ###### OP:  <var_decl 0x7ffff7feebd0 var>
    ####### NO LONGER CANDIDATE
    ##### LOOKING INTO OPERAND:  <var_decl 0x7ffff7feec60 x>
    ###### IGNORED: IS_LHS

Notice 'var' "NO LONGER CANDIDATE".

Well, I cross-checked, and I'm getting this behavior (deep scanning)
because I'm (intentionally) using (default) '*walk_subtrees = 1' in my
'callback_op'.  Indeed if I specify '*walk_subtrees = 0', the debug log
changes as follows:

     ##### LOOKING INTO OPERAND:  <array_ref 0x7ffff67af3b8>
     ###### OP:  <var_decl 0x7ffff7feeb40 a>
     ####### WASN'T CANDIDATE
    -##### LOOKING INTO OPERAND:  <var_decl 0x7ffff7feeb40 a>
    -###### OP:  <var_decl 0x7ffff7feeb40 a>
    -####### WASN'T CANDIDATE
    -##### LOOKING INTO OPERAND:  <var_decl 0x7ffff7feebd0 var>
    -###### OP:  <var_decl 0x7ffff7feebd0 var>
    -####### NO LONGER CANDIDATE
     ##### LOOKING INTO OPERAND:  <var_decl 0x7ffff7feec60 x>
     ###### IGNORED: IS_LHS

..., and your projected mis-optimization:

    +##### OPTIMIZING map(force_tofrom:var [len: 4]) -> firstprivate(var)

If I continue to use (default) '*walk_subtrees = 1', then I might not
actually need to use 'get_base_address', because we get all the
individual sub operands visited.

> I think you want to use walk_gimple_stmt and provide walk_tree_fn which
> will recurse into the complex tree operands (also making get_base_address
> unnecessary).

I'm already using 'walk_gimple_stmt' via 'walk_gimple_seq', and I'll
(later) look into removing 'get_base_address', and also look up
'walk_tree_fn'.  Ah, wait, what you suggested with 'walk_tree_fn' is
actually what I've already got; in my last email I said "I've now got a
simple 'callback_op' [...]".  Seems we're converging!  ;-)


Grüße
 Thomas


>> Of course, the eventual IPA-based solution (see PR90591, PR94693, etc.)
>> will be much better -- but we need something now.
>>
>>
>> Grüße
>>  Thomas
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf

Reply via email to