------- Comment #3 from roger at eyesopen dot com  2007-01-11 16:56 -------
Ok, I've now spent some time reading the code, I understand what's going wrong
and what needs to be done to fix it.  The problem resolves around the
"nest_flag" argument to gfc_trans_nested_forall_loop.  This argument only ever
has the value zero when called  from gfc_trans_forall_1 when calculating the
mask for a nested forall.  Unfortunately, this use is incorrect.  Conceptually
in a nested forall the mask needs to have one element for each
execution/iteration of the assignment statement.  So in Paul's reduced testcase
in comment #5, the inner mask (translated as needs temp.5) to help in an array
of size 4!  This is required because the body of the loop may potentially
affect either the inner or outer mask condition.

So there's good news and bad news.

The bad news is that in the general case, we can't determine how many times the
 loop body will be iterated at compile-time.  When the loop bounds are
constant, we can simply multiply together innersize * outersize.  But often,
the shape of the iteration may be irregular, such as triangular operations:

  forall(i=1:10,mask1[i])
    forall(j=1:i,mask2[i,j])
      foo(i,j)

or completely irregular such as

  forall(i=1:10,mask1[i])
    forall(j=1:bound[i],mask2[i,j])
      bar(i,j)

The good news is that the current expansion of forall loops can be restructured
to get the fortran9x semantics correct.  In the worst case, we need to generate
another nested set of loops to determine the mask size, at run-time, before
allocating and then populating the array.  One nice aspect of this is that the
inner mask can take into account how sparse the outermask is.  i.e. for bar()
above we can do better than sum(bound[1:10]).  The bad news is this form of
conservative implementation is likely to be less efficient than the current
incorrect (and poor) implementation.  The good news is that in follow-up
patches, we should be able to significantly optimize FORALL much like we now do
with WHERE and generate much more efficient code for the common case, with mask
elimination, loop fusion etc...  For Paul's example in comment #5, it should be
possible to implement this loop without any mask arrays, as a single
doubly-nested loop.  However, the most immediate goal is rewriting gfortran's
forall expansion to produce the correct result.

Hopefully, I'll have the first in a series of patches to rewrite the basics of
nested FORALL expansion soon.  Unfortunately, its taking slightly longer than
I anticipated, though I've now confirmed that this isn't a simple few-line fix,
but a more fundamental design issue in nested forall expansion.

Paul, Steve, Please let me know if you see an issue with the above analysis.
Hopefully, the three-loop strategy of (i) determine mask size, (ii) populate
mask and (iii) conditionally execute loop makes sense?


-- 

roger at eyesopen dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|unassigned at gcc dot gnu   |roger at eyesopen dot com
                   |dot org                     |
             Status|NEW                         |ASSIGNED
   Last reconfirmed|2007-01-08 11:23:23         |2007-01-11 16:56:25
               date|                            |


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30404

Reply via email to