[clang] [llvm] Add option to generate additional debug info for expression dereferencing pointer to pointers. (PR #81545)

William Junda Huang via cfe-commits Mon, 01 Apr 2024 17:06:32 -0700

huangjd wrote:

> > > Reading LLVM IR lit CHECK lines from clang codegen is a bit difficult - 
> > > could you include some simple examples (perhaps from the new clang tests 
> > > in this patch) showing the DWARF output just as comments in this review 
> > > for something more easily glanceable?
> > 
> > 
> > Attached is the output of the following command
> > `clang ~/llvm-project/clang/test/CodeGenCXX/debug-info-ptr-to-ptr.cpp 
> > -fdebug-info-for-pointer-type -g2 -S -O3 -o /tmp/debug-info-ptr-to-ptr.txt`
> > [debug-info-ptr-to-ptr.txt](https://github.com/llvm/llvm-project/files/14659111/debug-info-ptr-to-ptr.txt)
> 
> Thanks - OK, so this only applies to intermediate structures and arrays (is 
> it useful for arrays? You can't really reorder them - learning that the hot 
> part of an array is the 5th-10th element might be of limited (or at least 
> sufficiently different from the struct layout stuff) value - and it's a more 
> dynamic property/has a runtime parameter component that might be harder to 
> use?)
> 
> Do you have size impact numbers for this? I wouldn't put this under a 
> (possibly cc1) flag for now unless we've got some pretty compelling data that 
> this doesn't substantially change debug info size, which would be a bit 
> surprising to me - I'd assume this would be quite expensive, but it's just a 
> guess.


For clarification,
LLVM previously supported: 
`int foo(A* a) { return a->i; }`  In this case the type of variable `a` is 
emitted, and for the instruction `mov 8(%rdi), %rax` (assume offset is 8), the 
debug info emitted for this instruction associates `%rdi` to `a`, and by 
looking up the data layout in A's type info, we know which field is accessed.

This patch handles two cases that are previously not supported.
1. `int foo(void* a) { return ((A*)a)->i; }` Previously only the debug info of 
void pointer type `a` is emitted, and it is still associated to `%rdi`, so we 
couldn't deduce what is being accessed from that instruction. In this patch, it 
emits a pseudo variable in addition, which is also associated to `%rdi`, and it 
has the correct type info when traversing the member expression. 

2. `int foo(B* b) { return b->a->i; }` Previously only the debug info of `b` is 
emitted, but not the intermediate value, so for the second `mov` instruction 
emitted, it could not associate the memory operand to any variable. In this 
patch it emits a pseudo variable for intermediate values if it is used as the 
pointer operand in a member expr. 

It should apply to array, if the expression actually ends up in an instruction 
like `mov 8(%rdi), %rax`. I have test cases for it, and assembly dump also 
shows the memory operand is correlated to the pseudo variable. Note that in 
most use cases the presence of array is actually irrelevant because we are not 
type casting the array element itself ( `((A&)foo[i]).member`, that's generally 
invalid), instead we type cast the pointer (`((A*) foo[i])->member`) in this 
case whatever being type casted doesn't matter because it's case 1.

As for impact, I believe @namhyung did some measurement for building the Linux 
kernel, and it does not have a significant impact. 


https://github.com/llvm/llvm-project/pull/81545
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] Add option to generate additional debug info for expression dereferencing pointer to pointers. (PR #81545)

Reply via email to