Re: [PATCH] Output DIEs for outlined OpenMP functions in correct lexical scope
On Wed, 10 May 2017 17:24:27 +0200 Jakub Jelinek wrote: > What I don't like is that the patch is inconsistent, it sets DECL_CONTEXT > of the child function for all kinds of outlined functions, but then you just > choose one of the many places and add it into the BLOCK tree. Any reason > why the DECL_CONTEXT change can't be done in a helper function together > with all the changes you've added into omp-expand.c, and then call it from > expand_omp_parallel (with the child_fn and entry_stmt arguments) so that > you can call it easily also for other constructs, either now or later on? I've worked out a way to do the DECL_CONTEXT and the scope change together. The helper function should be usable for other constructs, though I have not tested this yet. > Also, is there any rationale on appending the FUNCTION_DECL to BLOCK_VARS > instead of prepending it there (which is cheaper)? Does the debugger > care about relative order of those artificial functions vs. other > variables in the lexical scope? To the best of my knowledge, the debugger doesn't care about the order. I've changed the code to prepend the FUNCTION_DECL to BLOCK_VARS instead. How does this new version (below) look? I've done a "make bootstrap" and "make -k check". No regressions found for x86_64. gcc/ChangeLog: * omp-expand.c (adjust_context_scope): New function. (expand_parallel_call): Call adjust_context_scope. --- gcc/omp-expand.c | 38 ++ 1 file changed, 38 insertions(+) diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c index d6755cd..9eb0a89 100644 --- a/gcc/omp-expand.c +++ b/gcc/omp-expand.c @@ -498,6 +498,42 @@ parallel_needs_hsa_kernel_p (struct omp_region *region) return false; } +/* Change DECL_CONTEXT of CHILD_FNDECL to that of the parent function. + Add CHILD_FNDECL to decl chain of the supercontext of the block + ENTRY_BLOCK - this is the block which originally contained the + code from which CHILD_FNDECL was created. + + Together, these actions ensure that the debug info for the outlined + function will be emitted with the correct lexical scope. */ + +static void +adjust_context_and_scope (tree entry_block, tree child_fndecl) +{ + if (entry_block != NULL_TREE && TREE_CODE (entry_block) == BLOCK) +{ + tree b = BLOCK_SUPERCONTEXT (entry_block); + + if (TREE_CODE (b) == BLOCK) +{ + tree parent_fndecl; + + /* Follow supercontext chain until the parent fndecl +is found. */ + for (parent_fndecl = BLOCK_SUPERCONTEXT (b); + TREE_CODE (parent_fndecl) == BLOCK; + parent_fndecl = BLOCK_SUPERCONTEXT (parent_fndecl)) + ; + + gcc_assert (TREE_CODE (parent_fndecl) == FUNCTION_DECL); + + DECL_CONTEXT (child_fndecl) = parent_fndecl; + + DECL_CHAIN (child_fndecl) = BLOCK_VARS (b); + BLOCK_VARS (b) = child_fndecl; + } +} +} + /* Build the function calls to GOMP_parallel_start etc to actually generate the parallel operation. REGION is the parallel region being expanded. BB is the block where to insert the code. WS_ARGS @@ -667,6 +703,8 @@ expand_parallel_call (struct omp_region *region, basic_block bb, tree child_fndecl = gimple_omp_parallel_child_fn (entry_stmt); t2 = build_fold_addr_expr (child_fndecl); + adjust_context_and_scope (gimple_block (entry_stmt), child_fndecl); + vec_alloc (args, 4 + vec_safe_length (ws_args)); args->quick_push (t2); args->quick_push (t1);
Re: [PATCH] Output DIEs for outlined OpenMP functions in correct lexical scope
On Wed, May 10, 2017 at 5:24 PM, Jakub Jelinek wrote: > On Fri, May 05, 2017 at 10:23:59AM -0700, Kevin Buettner wrote: >> On Fri, 5 May 2017 14:23:14 +0300 (MSK) >> Alexander Monakov wrote: >> >> > On Thu, 4 May 2017, Kevin Buettner wrote: >> > > diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c >> > > index 5c48b78..7029951 100644 >> > > --- a/gcc/omp-expand.c >> > > +++ b/gcc/omp-expand.c >> > > @@ -667,6 +667,25 @@ expand_parallel_call (struct omp_region *region, >> > > basic_block bb, >> > >> > Outlined functions are also used for 'omp task' and 'omp target' regions, >> > but >> > here only 'omp parallel' is handled. Will this code need to be duplicated >> > for >> > those region types? >> >> For 'omp task' and 'omp target', I think it's possible or even likely >> that the original context which started these parallel tasks will no >> longer exist. So, it might not make sense to do something equivalent >> for 'task' and 'target'. > > It depends. E.g. for #pragma omp taskloop without nogroup clause, it acts the > same as #pragma omp parallel in the nesting regard, the GOMP_taskloop* > function will > not return until all the tasks finished. Or if you have #pragma omp task and > #pragma omp taskwait on the next line, or #pragma omp taskgroup > around it, or #pragma omp target without nowait clause, it will behave the > same. > Then there are cases where the encountering function will still be around, > but already not all the lexical scopes (or inline functions), e.g. if there > is #pragma omp taskwait or taskgroup etc. outside of the innermost lexical > scope(s), but still somewhere in the function. What the debugger should do > in that case is that it should figure out that the spot the task has been > created in has passed, so not show vars in the lexical scopes already left, > but still show others? Then of course if there is nothing waiting for the > task or async target in the current function, the function's frame could be > left, perhaps multiple callers too. > >> > >tree child_fndecl = gimple_omp_parallel_child_fn (entry_stmt); >> > >t2 = build_fold_addr_expr (child_fndecl); >> > > >> > > + if (gimple_block (entry_stmt) != NULL_TREE >> > > + && TREE_CODE (gimple_block (entry_stmt)) == BLOCK) >> > >> > Here and also below, ... >> > >> > > +{ >> > > + tree b = BLOCK_SUPERCONTEXT (gimple_block (entry_stmt)); >> > > + >> > > + /* Add child_fndecl to var chain of the supercontext of the >> > > +block corresponding to entry_stmt. This ensures that debug >> > > +info for the outlined function will be emitted for the correct >> > > +lexical scope. */ >> > > + if (b != NULL_TREE && TREE_CODE (b) == BLOCK) >> > >> > ... here, I'm curious why the conditionals are necessary -- I don't see >> > why the >> > conditions can be sometimes true and sometimes false. Sorry if I'm missing >> > something obvious. > > gimple_block can be NULL. And, most calls of gimple_block that want to > ensure it is a BLOCK actually do verify it is a BLOCK, while it is unlikely > and it is usually just LTO that screws things up, I'd keep it. gimple_block should always be either NULL or a BLOCK. It's *_CONTEXT that can be types, decls or blocks. >> > > + if (b != NULL_TREE && TREE_CODE (b) == BLOCK) >> >> I check to make sure that b is a block so that I can later refer to >> BLOCK_VARS (b). > > I believe BLOCK_SUPERCONTEXT of a BLOCK should always be non-NULL, either > another BLOCK, or FUNCTION_DECL. Thus I think b != NULL_TREE && is > redundant here. Yes. > > What I don't like is that the patch is inconsistent, it sets DECL_CONTEXT > of the child function for all kinds of outlined functions, but then you just > choose one of the many places and add it into the BLOCK tree. Any reason > why the DECL_CONTEXT change can't be done in a helper function together > with all the changes you've added into omp-expand.c, and then call it from > expand_omp_parallel (with the child_fn and entry_stmt arguments) so that > you can call it easily also for other constructs, either now or later on? > Also, is there any rationale on appending the FUNCTION_DECL to BLOCK_VARS > instead of prepending it there (which is cheaper)? Does the debugger > care about relative order of those artificial functions vs. other > variables in the lexical scope? > > Jakub
Re: [PATCH] Output DIEs for outlined OpenMP functions in correct lexical scope
On Fri, May 05, 2017 at 10:23:59AM -0700, Kevin Buettner wrote: > On Fri, 5 May 2017 14:23:14 +0300 (MSK) > Alexander Monakov wrote: > > > On Thu, 4 May 2017, Kevin Buettner wrote: > > > diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c > > > index 5c48b78..7029951 100644 > > > --- a/gcc/omp-expand.c > > > +++ b/gcc/omp-expand.c > > > @@ -667,6 +667,25 @@ expand_parallel_call (struct omp_region *region, > > > basic_block bb, > > > > Outlined functions are also used for 'omp task' and 'omp target' regions, > > but > > here only 'omp parallel' is handled. Will this code need to be duplicated > > for > > those region types? > > For 'omp task' and 'omp target', I think it's possible or even likely > that the original context which started these parallel tasks will no > longer exist. So, it might not make sense to do something equivalent > for 'task' and 'target'. It depends. E.g. for #pragma omp taskloop without nogroup clause, it acts the same as #pragma omp parallel in the nesting regard, the GOMP_taskloop* function will not return until all the tasks finished. Or if you have #pragma omp task and #pragma omp taskwait on the next line, or #pragma omp taskgroup around it, or #pragma omp target without nowait clause, it will behave the same. Then there are cases where the encountering function will still be around, but already not all the lexical scopes (or inline functions), e.g. if there is #pragma omp taskwait or taskgroup etc. outside of the innermost lexical scope(s), but still somewhere in the function. What the debugger should do in that case is that it should figure out that the spot the task has been created in has passed, so not show vars in the lexical scopes already left, but still show others? Then of course if there is nothing waiting for the task or async target in the current function, the function's frame could be left, perhaps multiple callers too. > > >tree child_fndecl = gimple_omp_parallel_child_fn (entry_stmt); > > >t2 = build_fold_addr_expr (child_fndecl); > > > > > > + if (gimple_block (entry_stmt) != NULL_TREE > > > + && TREE_CODE (gimple_block (entry_stmt)) == BLOCK) > > > > Here and also below, ... > > > > > +{ > > > + tree b = BLOCK_SUPERCONTEXT (gimple_block (entry_stmt)); > > > + > > > + /* Add child_fndecl to var chain of the supercontext of the > > > +block corresponding to entry_stmt. This ensures that debug > > > +info for the outlined function will be emitted for the correct > > > +lexical scope. */ > > > + if (b != NULL_TREE && TREE_CODE (b) == BLOCK) > > > > ... here, I'm curious why the conditionals are necessary -- I don't see why > > the > > conditions can be sometimes true and sometimes false. Sorry if I'm missing > > something obvious. gimple_block can be NULL. And, most calls of gimple_block that want to ensure it is a BLOCK actually do verify it is a BLOCK, while it is unlikely and it is usually just LTO that screws things up, I'd keep it. > > > + if (b != NULL_TREE && TREE_CODE (b) == BLOCK) > > I check to make sure that b is a block so that I can later refer to > BLOCK_VARS (b). I believe BLOCK_SUPERCONTEXT of a BLOCK should always be non-NULL, either another BLOCK, or FUNCTION_DECL. Thus I think b != NULL_TREE && is redundant here. What I don't like is that the patch is inconsistent, it sets DECL_CONTEXT of the child function for all kinds of outlined functions, but then you just choose one of the many places and add it into the BLOCK tree. Any reason why the DECL_CONTEXT change can't be done in a helper function together with all the changes you've added into omp-expand.c, and then call it from expand_omp_parallel (with the child_fn and entry_stmt arguments) so that you can call it easily also for other constructs, either now or later on? Also, is there any rationale on appending the FUNCTION_DECL to BLOCK_VARS instead of prepending it there (which is cheaper)? Does the debugger care about relative order of those artificial functions vs. other variables in the lexical scope? Jakub
Re: [PATCH] Output DIEs for outlined OpenMP functions in correct lexical scope
On Fri, 5 May 2017 14:23:14 +0300 (MSK) Alexander Monakov wrote: > On Thu, 4 May 2017, Kevin Buettner wrote: > > diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c > > index 5c48b78..7029951 100644 > > --- a/gcc/omp-expand.c > > +++ b/gcc/omp-expand.c > > @@ -667,6 +667,25 @@ expand_parallel_call (struct omp_region *region, > > basic_block bb, > > Outlined functions are also used for 'omp task' and 'omp target' regions, but > here only 'omp parallel' is handled. Will this code need to be duplicated for > those region types? For 'omp task' and 'omp target', I think it's possible or even likely that the original context which started these parallel tasks will no longer exist. So, it might not make sense to do something equivalent for 'task' and 'target'. That said, I have not yet given the matter much study. There may be cases where having scoped debug info might still prove useful. The short answer is, "I don't know." > >tree child_fndecl = gimple_omp_parallel_child_fn (entry_stmt); > >t2 = build_fold_addr_expr (child_fndecl); > > > > + if (gimple_block (entry_stmt) != NULL_TREE > > + && TREE_CODE (gimple_block (entry_stmt)) == BLOCK) > > Here and also below, ... > > > +{ > > + tree b = BLOCK_SUPERCONTEXT (gimple_block (entry_stmt)); > > + > > + /* Add child_fndecl to var chain of the supercontext of the > > +block corresponding to entry_stmt. This ensures that debug > > +info for the outlined function will be emitted for the correct > > +lexical scope. */ > > + if (b != NULL_TREE && TREE_CODE (b) == BLOCK) > > ... here, I'm curious why the conditionals are necessary -- I don't see why > the > conditions can be sometimes true and sometimes false. Sorry if I'm missing > something obvious. I'm not especially knowledgeable about gcc internals. It may be the case these conditionals that you noted are unnecessary and might perhaps better be handled via the use of an assert. I will note that when I originally coded it, I had fewer tests. The code still worked for the cases that I tried. Later, when I reviewed it for posting here, I decided to add some more checks. I'll explain my reasoning for each of them... > > + if (gimple_block (entry_stmt) != NULL_TREE If we have NULL_TREE here, dereferencing gimple_block (entry_stmt) further won't work. > > + && TREE_CODE (gimple_block (entry_stmt)) == BLOCK) It seemed to me that a having a BLOCK is necessary in order to later use BLOCK_SUPERCONTEXT. > > + if (b != NULL_TREE && TREE_CODE (b) == BLOCK) I check to make sure that b is a block so that I can later refer to BLOCK_VARS (b). Again, it may be the case that these should always evaluate to true. If so, then use of an assert might be better here. Kevin
Re: [PATCH] Output DIEs for outlined OpenMP functions in correct lexical scope
On Thu, 4 May 2017, Kevin Buettner wrote: > diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c > index 5c48b78..7029951 100644 > --- a/gcc/omp-expand.c > +++ b/gcc/omp-expand.c > @@ -667,6 +667,25 @@ expand_parallel_call (struct omp_region *region, > basic_block bb, Outlined functions are also used for 'omp task' and 'omp target' regions, but here only 'omp parallel' is handled. Will this code need to be duplicated for those region types? >tree child_fndecl = gimple_omp_parallel_child_fn (entry_stmt); >t2 = build_fold_addr_expr (child_fndecl); > > + if (gimple_block (entry_stmt) != NULL_TREE > + && TREE_CODE (gimple_block (entry_stmt)) == BLOCK) Here and also below, ... > +{ > + tree b = BLOCK_SUPERCONTEXT (gimple_block (entry_stmt)); > + > + /* Add child_fndecl to var chain of the supercontext of the > +block corresponding to entry_stmt. This ensures that debug > +info for the outlined function will be emitted for the correct > +lexical scope. */ > + if (b != NULL_TREE && TREE_CODE (b) == BLOCK) ... here, I'm curious why the conditionals are necessary -- I don't see why the conditions can be sometimes true and sometimes false. Sorry if I'm missing something obvious. Thanks. Alexander
Re: [PATCH] Output DIEs for outlined OpenMP functions in correct lexical scope
Ahem... I forgot to note that: I have bootstrapped and regression tested my patch on x86_64-pc-linux-gnu. Kevin On Thu, 4 May 2017 17:45:51 -0700 Kevin Buettner wrote: > Consider the following OpenMP program: > > void foo (int a1) {} > > int > main (void) > { > static int s1 = -41; > int i1 = 11, i2; > > for (i2 = 1; i2 <= 2; i2++) > { > int pass = i2; > #pragma omp parallel num_threads (2) firstprivate (i1) > { > foo (i1); > } > foo(pass); > } > foo (s1); foo (i2); > } > > At the moment, when debugging such a program, GDB is not able to find > and print the values of s1, i2, and pass. > > My changes to omp-low.c and omp-expand.c, in conjunction with several > other patches, allow GDB to find and print these values. > > This is the current behavior when debugging in GDB: > > (gdb) b 14 > Breakpoint 1 at 0x400617: file ex3.c, line 14. > (gdb) run > Starting program: /mesquite2/.ironwood2/omp-tests/k/ex3-trunk > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > [New Thread 0x773ca700 (LWP 32628)] > > Thread 1 "ex3-trunk" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14 > 14 foo (i1); > (gdb) p s1 > No symbol "s1" in current context. > (gdb) p i1 > $1 = 11 > (gdb) p i2 > No symbol "i2" in current context. > (gdb) p pass > No symbol "pass" in current context. > (gdb) c > Continuing. > [Switching to Thread 0x773ca700 (LWP 32628)] > > Thread 2 "ex3-trunk" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14 > 14 foo (i1); > (gdb) p s1 > No symbol "s1" in current context. > (gdb) p i1 > $2 = 11 > (gdb) p i2 > No symbol "i2" in current context. > (gdb) p pass > No symbol "pass" in current context. > (gdb) bt > #0 main._omp_fn.0 () at ex3.c:14 > #1 0x77bc4926 in gomp_thread_start (xdata=) > at gcc/libgomp/team.c:122 > #2 0x7799761a in start_thread () from /lib64/libpthread.so.0 > #3 0x776d159d in clone () from /lib64/libc.so.6 > > Note that GDB is unable to find s1, i2, or pass for either thread. > > I show the backtrace for thread 2 because it's the more difficult case > to handle due to the stack trace stopping at clone(). The stack frame > for main(), which is where the variables of interest reside, is not a > part of this stack. > > When we run this example using the patches associated with this change > along with several other patches, GDB's behavior looks like this: > > (gdb) b 14 > Breakpoint 1 at 0x400617: file ex3.c, line 14. > (gdb) run > Starting program: /mesquite2/.ironwood2/omp-tests/k/ex3-new > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > [New Thread 0x773ca700 (LWP 32643)] > > Thread 1 "ex3-new" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14 > 14 foo (i1); > (gdb) p s1 > $1 = -41 > (gdb) p i1 > $2 = 11 > (gdb) p i2 > $3 = 1 > (gdb) p pass > $4 = 1 > (gdb) c > Continuing. > [Switching to Thread 0x773ca700 (LWP 32643)] > > Thread 2 "ex3-new" hit Breakpoint 1, main._omp_fn.0 () at ex3.c:14 > 14 foo (i1); > (gdb) p s1 > $5 = -41 > (gdb) p i1 > $6 = 11 > (gdb) p i2 > $7 = 1 > (gdb) p pass > $8 = 1 > > I didn't show the stack here. It's the same as before. (I would, > however, like to be able to make GDB display a unified stack.) > > Note that GDB is now able to find and print values for s1, i2, and > pass. > > GCC constructs a new function for executing the parallel code. > The debugging information entry for this function is presently > placed at the same level as that of the function in which the > "#pragma omp parallel" directive appeared. > > This is partial output from "readelf -v" for the (non-working) example > shown above: > > <1><2d>: Abbrev Number: 2 (DW_TAG_subprogram) > <2e> DW_AT_external: 1 > <2e> DW_AT_name: (indirect string, offset: 0x80): main > <32> DW_AT_decl_file : 1 > <33> DW_AT_decl_line : 4 > <34> DW_AT_prototyped : 1 > <34> DW_AT_type: <0x9d> > <38> DW_AT_low_pc : 0x400591 > <40> DW_AT_high_pc : 0x71 > <48> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa) > <4a> DW_AT_GNU_all_tail_call_sites: 1 > <4a> DW_AT_sibling : <0x9d> > <2><4e>: Abbrev Number: 3 (DW_TAG_variable) > <4f> DW_AT_name: s1 > <52> DW_AT_decl_file : 1 > <53> DW_AT_decl_line : 6 > <54> DW_AT_type: <0x9d> > <58> DW_AT_location: 9 byte block: 3 30 10 60 0 0 0 0 0 > (DW_OP_addr: 601030) > ... > <2><7c>: Abbrev Number: 4