Issue 91555
Summary [flang] Debug information with MLIR inlining
Labels flang
Reporter vzakhari
    This affects the design for debug information generation in Flang:

Right now, the `AddDebugInfo` pass is run pretty late in the pipeline.  The MLIR inliner pass runs before it.  Even though the MLIR inlining is not fully functional, there is a way to force inlining for some simple functions and demonstrate the problem.

     1  subroutine test(x, y)
     2    real :: x, y
     3    x = x + y ! may alias
 4    call inner(x, y)
     5  contains
     6    subroutine inner(x, y)
     7      real :: x, y
     8      x = x + y ! may not alias
 9    end subroutine inner
    10  end subroutine test

Compile with: `flang-new -g -O3 alias.f90 -c -mmlir -mlir-print-ir-after-all -mmlir -inline-all=true -mmlir -mlir-print-debuginfo -v -mllvm -print-after-all -mllvm -print-module-scope`

Here is the location information that is attached to the operations after MLIR inliner pass:
#loc1 = loc("/path/alias.f90":1:1)
module attributes {dlti.dl_spec = #dlti.dl_spec<#dlti.dl_entry<i32, dense<32> : vector<2xi64>>, #dlti.dl_entry<f64, dense<64> : vector<2xi64>>, #dlti.dl_entry<f128, dense<128> : vector<2xi64>>, #dlti.dl_entry<!llvm.ptr<270>, dense<32> : vector<4xi64>>, #dlti.dl_entry<f16, dense<16> : vector<2xi64>>, #dlti.dl_entry<i1, dense<8> : vector<2xi64>>, #dlti.dl_entry<!llvm.ptr, dense<64> : vector<4xi64>>, #dlti.dl_entry<i16, dense<16> : vector<2xi64>>, #dlti.dl_entry<i8, dense<8> : vector<2xi64>>, #dlti.dl_entry<i128, dense<128> : vector<2xi64>>, #dlti.dl_entry<f80, dense<128> : vector<2xi64>>, #dlti.dl_entry<!llvm.ptr<272>, dense<64> : vector<4xi64>>, #dlti.dl_entry<!llvm.ptr<271>, dense<32> : vector<4xi64>>, #dlti.dl_entry<i64, dense<64> : vector<2xi64>>, #dlti.dl_entry<"dlti.endianness", "little">, #dlti.dl_entry<"dlti.stack_alignment", 128 : i64>>, fir.defaultkind = "a1c4d8i4l4r4", fir.kindmap = "", fir.target_cpu = "x86-64", llvm.data_layout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128", llvm.target_triple = "x86_64-unknown-linux-gnu"} {
  func.func @_QPtest(%arg0: !fir.ref<f32> {fir.bindc_name = "x"} loc("/path/alias.f90":1:1), %arg1: !fir.ref<f32> {fir.bindc_name = "y"} loc("/path/alias.f90":1:1)) {
    %0 = fir.dummy_scope : !fir.dscope loc(#loc1)
    %1 = fir.declare %arg0 dummy_scope %0 {uniq_name = "_QFtestEx"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32> loc(#loc2)
 %2 = fir.declare %arg1 dummy_scope %0 {uniq_name = "_QFtestEy"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32> loc(#loc3)
    %3 = fir.load %1 : !fir.ref<f32> loc(#loc4)
    %4 = fir.load %2 : !fir.ref<f32> loc(#loc4)
    %5 = arith.addf %3, %4 fastmath<contract> : f32 loc(#loc4) %5 to %1 : !fir.ref<f32> loc(#loc4)
    %6 = fir.dummy_scope : !fir.dscope loc(#loc11)
    %7 = fir.declare %1 dummy_scope %6 {uniq_name = "_QFtestFinnerEx"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32> loc(#loc12)
    %8 = fir.declare %2 dummy_scope %6 {uniq_name = "_QFtestFinnerEy"} : (!fir.ref<f32>, !fir.dscope) -> !fir.ref<f32> loc(#loc13)
    %9 = fir.load %7 : !fir.ref<f32> loc(#loc14)
    %10 = fir.load %8 : !fir.ref<f32> loc(#loc14)
    %11 = arith.addf %9, %10 fastmath<contract> : f32 loc(#loc14) %11 to %7 : !fir.ref<f32> loc(#loc14)
 return loc(#loc10)
  } loc(#loc1)
} loc(#loc)
#loc = loc("/path/alias.f90":0:0)
#loc2 = loc("/path/alias.f90":2:11)
#loc3 = loc("/path/alias.f90":2:14)
#loc4 = loc("/path/alias.f90":3:3)
#loc5 = loc("/path/alias.f90":6:3)
#loc6 = loc("/path/alias.f90":4:3)
#loc7 = loc("/path/alias.f90":7:13)
#loc8 = loc("/path/alias.f90":7:16)
#loc9 = loc("/path/alias.f90":8:5)
#loc10 = loc("/path/alias.f90":10:1)
#loc11 = loc(callsite(#loc5 at #loc6))
#loc12 = loc(callsite(#loc7 at #loc6))
#loc13 = loc(callsite(#loc8 at #loc6))
#loc14 = loc(callsite(#loc9 at #loc6))

The MLIR inliner implementation produces a callsite location for each operation inlined from `inner`.  After conversion to LLVM IR it looks like this:
define void @test_(ptr %0, ptr %1) #0 !dbg !5 {
  %3 = load float, ptr %0, align 4, !dbg !9, !tbaa !10
  %4 = load float, ptr %1, align 4, !dbg !9, !tbaa !16
  %5 = fadd contract float %3, %4, !dbg !9
  store float %5, ptr %0, align 4, !dbg !9, !tbaa !10
  %6 = load float, ptr %0, align 4, !dbg !18, !tbaa !10
  %7 = load float, ptr %1, align 4, !dbg !18, !tbaa !16
  %8 = fadd contract float %6, %7, !dbg !18
  store float %8, ptr %0, align 4, !dbg !18, !tbaa !10
  ret void, !dbg !19
!3 = distinct !DICompileUnit(language: DW_LANG_Fortran95, file: !4, producer: "flang version 19.0.0 (ssh:// e08863ed78aa96096f04f0856069da8645944ab1)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug)
!4 = !DIFile(filename: "alias.f90", directory: "/path")
!5 = distinct !DISubprogram(name: "test_", linkageName: "test_", scope: !4, file: !4, line: 1, type: !6, scopeLine: 1, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !3)
!6 = !DISubroutineType(cc: DW_CC_normal, types: !7)
!7 = !{!8, !8}
!8 = !DIBasicType(name: "real", size: 32, encoding: DW_ATE_float)
!9 = !DILocation(line: 3, column: 3, scope: !5)
!18 = !DILocation(line: 4, column: 3, scope: !5)

So all instructions inlined from `inner` end up on the same line as the call site.  This is a problem for debugging. 
Compare it with LLVM inlining in this example: - the inlined instructions inherit their line number information from the callee (look for the last instance of `; *** IR Dump After InlinerPass on (test) ***`).

I believe the problem is that the MLIR Location attached to the operations of `inner` *before* MLIR inlining do not have proper `LLVM::DILocalScopeAttr` attached to them.  So there is no scope attribute on the callee Locations in the callsite Locations created by MLIR inliner.  One can trace the call chain from to, which results in the callsite Locations to be translated to LLVM debug metadata that corresponds to the call operation location.  I think instead, the callee Locations inside the callsite Locations should follow the call chain from to, so that the callee's Locations are used for the cloned operations with proper scope.

It seems to mean that before MLIR inliner we have to have Locations with proper scope (e.g. `LLVM::DISubprogramAttr`) attached to them. This cannot be achieved with the current ordering of the MLIR inliner and `AddDebugInfo` passes.

@kiranchandramohan, @abidh, @jeanPerier, @VijayKandiah, FYI.
