Issue 90870
Summary Loop-pipelining invalid hoisting out
Labels new issue
Assignees
Reporter fotiskoun
    The loop pipelining may hoist-out some operations but does not check if it is legal to do so by making sure that it is not exceeding the original loop bounds, which is incorrect and can also lead to unexpected behaviour when accessing structures with fewer elements than the hoisting.

Consider the following example 

```
func.func @f(%arg0: memref<4x16xf32>, %arg1: vector<16xf32>) -> vector<16xf32> {
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %cst = arith.constant 0.000000e+00 : f32
 %c2 = arith.constant 2 : index
  %0 = scf.for %arg2 = %c0 to %c2 step %c1 iter_args(%arg3 = %arg1) -> (vector<16xf32>) {
    %1 = vector.transfer_read %arg0[%arg2, %c0], %cst {in_bounds = [true]} : memref<4x16xf32>, vector<16xf32>
    %2 = arith.addf %1, %arg3 : vector<16xf32>
    scf.yield %2 : vector<16xf32>
  }
  return %0 : vector<16xf32>
}
module attributes {transform.with_named_sequence} {
  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
    %0 = transform.structured.match ops{["scf.for"]} in %arg1 : (!transform.any_op) -> !transform.op<"scf.for">
    %1 = transform.loop.pipeline %0 {iteration_interval = 1 : i64, read_latency = 5 : i64,  scheduling_type = "full-loops"} : (!transform.op<"scf.for">) -> !transform.any_op
 transform.yield
 }
}
```

The output for this example is:

```
module {
  func.func @f(%arg0: memref<4x16xf32>, %arg1: vector<16xf32>) -> vector<16xf32> {
    %c1 = arith.constant 1 : index
 %c0 = arith.constant 0 : index
    %c4 = arith.constant 4 : index
 %c3 = arith.constant 3 : index
    %c2 = arith.constant 2 : index
 %cst = arith.constant 0.000000e+00 : f32
    %0 = vector.transfer_read %arg0[%c0, %c0], %cst {in_bounds = [true]} : memref<4x16xf32>, vector<16xf32>
    %1 = vector.transfer_read %arg0[%c1, %c0], %cst {in_bounds = [true]} : memref<4x16xf32>, vector<16xf32>
    %2 = vector.transfer_read %arg0[%c2, %c0], %cst {in_bounds = [true]} : memref<4x16xf32>, vector<16xf32>
    %3 = vector.transfer_read %arg0[%c3, %c0], %cst {in_bounds = [true]} : memref<4x16xf32>, vector<16xf32>
    %4 = vector.transfer_read %arg0[%c4, %c0], %cst {in_bounds = [true]} : memref<4x16xf32>, vector<16xf32>
    %5 = vector.transfer_read %arg0[%c1, %c0], %cst {in_bounds = [true]} : memref<4x16xf32>, vector<16xf32>
    %6 = arith.addf %0, %arg1 : vector<16xf32>
    %7 = arith.addf %1, %6 : vector<16xf32>
    %8 = arith.addf %2, %7 : vector<16xf32>
    %9 = arith.addf %3, %8 : vector<16xf32>
    %10 = arith.addf %4, %9 : vector<16xf32>
    %11 = arith.addf %5, %10 : vector<16xf32>
    return %11 : vector<16xf32>
  }
  module attributes {transform.with_named_sequence} {
    transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) {
 %0 = transform.structured.match ops{["scf.for"]} in %arg0 : (!transform.any_op) -> !transform.op<"scf.for">
      %1 = transform.loop.pipeline %0 {read_latency = 5 : i64} : (!transform.op<"scf.for">) -> !transform.any_op
 transform.yield
    }
  }
}
```
The result is incorrect as the original loop reads from `%arg0[0]` to `%arg0[1]` but after pipelining and unrolling the reads are going up to `%arg0[4]`.

The solution proposed, checks the bounds of the loop and only allows the minimum unrolling between the loop bounds and the provided unrolling argument.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to