[llvm-bugs] [Bug 90870] Loop-pipelining invalid hoisting out

LLVM Bugs via llvm-bugs Thu, 02 May 2024 09:05:45 -0700

Issue	90870
Summary	Loop-pipelining invalid hoisting out
Labels	new issue
Assignees
Reporter	fotiskoun

    The loop pipelining may hoist-out some operations but does not check if it is legal to do so by making sure that it is not exceeding the original loop bounds, which is incorrect and can also lead to unexpected behaviour when accessing structures with fewer elements than the hoisting.


Consider the following example 

```
func.func @f(%arg0: memref<4x16xf32>, %arg1: vector<16xf32>) -> vector<16xf32> {
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %cst = arith.constant 0.000000e+00 : f32
 %c2 = arith.constant 2 : index
  %0 = scf.for %arg2 = %c0 to %c2 step %c1 iter_args(%arg3 = %arg1) -> (vector<16xf32>) {
    %1 = vector.transfer_read %arg0[%arg2, %c0], %cst {in_bounds = [true]} : memref<4x16xf32>, vector<16xf32>
    %2 = arith.addf %1, %arg3 : vector<16xf32>
    scf.yield %2 : vector<16xf32>
  }
  return %0 : vector<16xf32>
}
module attributes {transform.with_named_sequence} {
  transform.named_sequence @__transform_main(%arg1: !transform.any_op {transform.readonly}) {
    %0 = transform.structured.match ops{["scf.for"]} in %arg1 : (!transform.any_op) -> !transform.op<"scf.for">
    %1 = transform.loop.pipeline %0 {iteration_interval = 1 : i64, read_latency = 5 : i64,  scheduling_type = "full-loops"} : (!transform.op<"scf.for">) -> !transform.any_op
 transform.yield
 }
}
```

The output for this example is:

```
module {
  func.func @f(%arg0: memref<4x16xf32>, %arg1: vector<16xf32>) -> vector<16xf32> {
    %c1 = arith.constant 1 : index
 %c0 = arith.constant 0 : index
    %c4 = arith.constant 4 : index
 %c3 = arith.constant 3 : index
    %c2 = arith.constant 2 : index
 %cst = arith.constant 0.000000e+00 : f32
    %0 = vector.transfer_read %arg0[%c0, %c0], %cst {in_bounds = [true]} : memref<4x16xf32>, vector<16xf32>
    %1 = vector.transfer_read %arg0[%c1, %c0], %cst {in_bounds = [true]} : memref<4x16xf32>, vector<16xf32>
    %2 = vector.transfer_read %arg0[%c2, %c0], %cst {in_bounds = [true]} : memref<4x16xf32>, vector<16xf32>
    %3 = vector.transfer_read %arg0[%c3, %c0], %cst {in_bounds = [true]} : memref<4x16xf32>, vector<16xf32>
    %4 = vector.transfer_read %arg0[%c4, %c0], %cst {in_bounds = [true]} : memref<4x16xf32>, vector<16xf32>
    %5 = vector.transfer_read %arg0[%c1, %c0], %cst {in_bounds = [true]} : memref<4x16xf32>, vector<16xf32>
    %6 = arith.addf %0, %arg1 : vector<16xf32>
    %7 = arith.addf %1, %6 : vector<16xf32>
    %8 = arith.addf %2, %7 : vector<16xf32>
    %9 = arith.addf %3, %8 : vector<16xf32>
    %10 = arith.addf %4, %9 : vector<16xf32>
    %11 = arith.addf %5, %10 : vector<16xf32>
    return %11 : vector<16xf32>
  }
  module attributes {transform.with_named_sequence} {
    transform.named_sequence @__transform_main(%arg0: !transform.any_op {transform.readonly}) {
 %0 = transform.structured.match ops{["scf.for"]} in %arg0 : (!transform.any_op) -> !transform.op<"scf.for">
      %1 = transform.loop.pipeline %0 {read_latency = 5 : i64} : (!transform.op<"scf.for">) -> !transform.any_op
 transform.yield
    }
  }
}
```
The result is incorrect as the original loop reads from `%arg0[0]` to `%arg0[1]` but after pipelining and unrolling the reads are going up to `%arg0[4]`.

The solution proposed, checks the bounds of the loop and only allows the minimum unrolling between the loop bounds and the provided unrolling argument.

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 90870] Loop-pipelining invalid hoisting out

Reply via email to