loopacino wrote: > Please create separate PRs for Clang and MLIR. These are separate > implementations don't need to be reviewed together. I am focusing on Clang in > this PR. > > None of the added test are actually requiring the intratile loops to be > canonical. That is, something where an surrounding loop-associated directives > affects a grid/floor loop as well as a intratile loop at the same time. For > instance: > > ```c++ > #pragma omp for collapse(2) > #pragma omp tile size(2) > for (int i = 0; i < n; ++i) > ... > ``` > > or two-level tiling: > > ```c++ > #pragma omp tile sizes(3, 5) > #pragma omp tile size(2) > for (int i = 0; i < n; ++i) > ... > ``` > > `IntraTileMustBeCanonical` is an ok-ish first heuristic, a refinement would > be to which of the tile loops are actually affected and thus would need to be > canonical. For instance: > > ```c++ > #pragma omp for collapse(3) > #pragma omp tile size(2, 3) > for (int i = 0; i < n; ++i) > for (int j = 0; j < n; ++j) > ... > ``` > > here, the intratile-loop for `j` does not need to be canonical, but `i` does. > Because as you found out emitting the canonical version of the loop inhibits > vectorization, it would be quite important to eventually have this. > > Because Clang implements loop-transformations using AST transforms, I think > it would be easier and more accurate if we always emit the canonical form, > but emit a marker that allows to optimize the canonical form to the > non-canonical one later, as suggested > [here](https://github.com/llvm/llvm-project/pull/191114#issuecomment-4313291974). > For instance, it could also emit an alternatve AST depending on the > situation we are in, like > `OMPCanonicalLoopNestTransformationDirective::getTransformed()` does. Imagine > such an AST: > > ``` > OMPLoopAlternative > `- getCanonical(): ForStmt // Predicated version when canonical version is > needed > `- for (int i = 0; i < n; ++) > `- if (i < .floor.iv+tilesize) > `- getOptimized(): ForStmt // Vectorizable version when not used as an > affected loop > `- for (int i = 0; i < min(n, .floor.iv+tilesize; ++) > ``` > > Storing both versions at the same time might not be the best solution due to > the overhead of storing both. Instance, it could just add a marker (such as > `AttributedStmt`) to the loop which says that the ForStmt can be emitted as > the optimized one at CodeGen, i.e.: > > ``` > for (int i = 0; i < n; ++) > `- AttributedStmt __attribute__((__invariant_pred_bound)) // States that the > RHS is invariant to the loop using the LHS as IV > `- if (i < .floor.iv+tilesize) > `- ... > ``` > > is emitted as > > ``` > for (int i = 0; i < min(n, .floor.iv+tilesize); ++) > `- ... > ``` > > I think both approaches are viable, but we don't need to start with a > `IntraTileMustBeCanonical` if eventually we are going to do the second. Which > approach would you prefer? > > When I try this out, I get 3 failing tests: > > ``` > Failed Tests (3): > libomp :: transform/tile/foreach.cpp > libomp :: transform/tile/intfor.c > libomp :: transform/tile/iterfor.cpp > ``` > > Test updates in this PR are adding more iterator evaluations, which assume is > because mostly because fewer invariant values are precomputed? Ideally, If > `IntraTileMustBeCanonical` is false it generates the same code. This should > actually be easier to do with an `IntraTileMustBeCanonical` heursitic than > with the marker approach.
1. Reverted MLIR changes 2. As suggested, went with the marker-based approach. 3. All tests pass now. https://github.com/llvm/llvm-project/pull/191114 _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
