------- Comment #3 from roger at eyesopen dot com 2007-01-11 16:56 ------- Ok, I've now spent some time reading the code, I understand what's going wrong and what needs to be done to fix it. The problem resolves around the "nest_flag" argument to gfc_trans_nested_forall_loop. This argument only ever has the value zero when called from gfc_trans_forall_1 when calculating the mask for a nested forall. Unfortunately, this use is incorrect. Conceptually in a nested forall the mask needs to have one element for each execution/iteration of the assignment statement. So in Paul's reduced testcase in comment #5, the inner mask (translated as needs temp.5) to help in an array of size 4! This is required because the body of the loop may potentially affect either the inner or outer mask condition.
So there's good news and bad news. The bad news is that in the general case, we can't determine how many times the loop body will be iterated at compile-time. When the loop bounds are constant, we can simply multiply together innersize * outersize. But often, the shape of the iteration may be irregular, such as triangular operations: forall(i=1:10,mask1[i]) forall(j=1:i,mask2[i,j]) foo(i,j) or completely irregular such as forall(i=1:10,mask1[i]) forall(j=1:bound[i],mask2[i,j]) bar(i,j) The good news is that the current expansion of forall loops can be restructured to get the fortran9x semantics correct. In the worst case, we need to generate another nested set of loops to determine the mask size, at run-time, before allocating and then populating the array. One nice aspect of this is that the inner mask can take into account how sparse the outermask is. i.e. for bar() above we can do better than sum(bound[1:10]). The bad news is this form of conservative implementation is likely to be less efficient than the current incorrect (and poor) implementation. The good news is that in follow-up patches, we should be able to significantly optimize FORALL much like we now do with WHERE and generate much more efficient code for the common case, with mask elimination, loop fusion etc... For Paul's example in comment #5, it should be possible to implement this loop without any mask arrays, as a single doubly-nested loop. However, the most immediate goal is rewriting gfortran's forall expansion to produce the correct result. Hopefully, I'll have the first in a series of patches to rewrite the basics of nested FORALL expansion soon. Unfortunately, its taking slightly longer than I anticipated, though I've now confirmed that this isn't a simple few-line fix, but a more fundamental design issue in nested forall expansion. Paul, Steve, Please let me know if you see an issue with the above analysis. Hopefully, the three-loop strategy of (i) determine mask size, (ii) populate mask and (iii) conditionally execute loop makes sense? -- roger at eyesopen dot com changed: What |Removed |Added ---------------------------------------------------------------------------- AssignedTo|unassigned at gcc dot gnu |roger at eyesopen dot com |dot org | Status|NEW |ASSIGNED Last reconfirmed|2007-01-08 11:23:23 |2007-01-11 16:56:25 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30404