On Mon, 2024-06-03 at 08:29 +0200, Richard Biener wrote: > On Fri, May 31, 2024 at 11:23 PM Qing Zhao <[email protected]> > wrote: > > > > > > > > > On May 23, 2024, at 07:46, Richard Biener > > > <[email protected]> wrote: > > > > > > On Wed, May 22, 2024 at 8:53 PM Qing Zhao <[email protected]> > > > wrote: > > > > > > > > > > > > > > > > > On May 22, 2024, at 03:38, Richard Biener > > > > > <[email protected]> wrote: > > > > > > > > > > On Tue, May 21, 2024 at 11:36 PM David Malcolm > > > > > <[email protected]> wrote: > > > > > > > > > > > > On Tue, 2024-05-21 at 15:13 +0000, Qing Zhao wrote: > > > > > > > Thanks for the comments and suggestions. > > > > > > > > > > > > > > > On May 15, 2024, at 10:00, David Malcolm > > > > > > > > <[email protected]> > > > > > > > > wrote: > > > > > > > > > > > > > > > > On Tue, 2024-05-14 at 15:08 +0200, Richard Biener > > > > > > > > wrote: > > > > > > > > > On Mon, 13 May 2024, Qing Zhao wrote: > > > > > > > > > > > > > > > > > > > -Warray-bounds is an important option to enable > > > > > > > > > > linux kernal to > > > > > > > > > > keep > > > > > > > > > > the array out-of-bound errors out of the source > > > > > > > > > > tree. > > > > > > > > > > > > > > > > > > > > However, due to the false positive warnings > > > > > > > > > > reported in > > > > > > > > > > PR109071 > > > > > > > > > > (-Warray-bounds false positive warnings due to code > > > > > > > > > > duplication > > > > > > > > > > from > > > > > > > > > > jump threading), -Warray-bounds=1 cannot be added > > > > > > > > > > on by > > > > > > > > > > default. > > > > > > > > > > > > > > > > > > > > Although it's impossible to elinimate all the false > > > > > > > > > > positive > > > > > > > > > > warnings > > > > > > > > > > from -Warray-bounds=1 (See PR104355 Misleading - > > > > > > > > > > Warray-bounds > > > > > > > > > > documentation says "always out of bounds"), we > > > > > > > > > > should minimize > > > > > > > > > > the > > > > > > > > > > false positive warnings in -Warray-bounds=1. > > > > > > > > > > > > > > > > > > > > The root reason for the false positive warnings > > > > > > > > > > reported in > > > > > > > > > > PR109071 is: > > > > > > > > > > > > > > > > > > > > When the thread jump optimization tries to reduce > > > > > > > > > > the # of > > > > > > > > > > branches > > > > > > > > > > inside the routine, sometimes it needs to duplicate > > > > > > > > > > the code > > > > > > > > > > and > > > > > > > > > > split into two conditional pathes. for example: > > > > > > > > > > > > > > > > > > > > The original code: > > > > > > > > > > > > > > > > > > > > void sparx5_set (int * ptr, struct nums * sg, int > > > > > > > > > > index) > > > > > > > > > > { > > > > > > > > > > if (index >= 4) > > > > > > > > > > warn (); > > > > > > > > > > *ptr = 0; > > > > > > > > > > *val = sg->vals[index]; > > > > > > > > > > if (index >= 4) > > > > > > > > > > warn (); > > > > > > > > > > *ptr = *val; > > > > > > > > > > > > > > > > > > > > return; > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > With the thread jump, the above becomes: > > > > > > > > > > > > > > > > > > > > void sparx5_set (int * ptr, struct nums * sg, int > > > > > > > > > > index) > > > > > > > > > > { > > > > > > > > > > if (index >= 4) > > > > > > > > > > { > > > > > > > > > > warn (); > > > > > > > > > > *ptr = 0; // Code duplications since > > > > > > > > > > "warn" does > > > > > > > > > > return; > > > > > > > > > > *val = sg->vals[index]; // same this line. > > > > > > > > > > // In this path, > > > > > > > > > > since it's > > > > > > > > > > under > > > > > > > > > > the condition > > > > > > > > > > // "index >= 4", the > > > > > > > > > > compiler > > > > > > > > > > knows > > > > > > > > > > the value > > > > > > > > > > // of "index" is > > > > > > > > > > larger then 4, > > > > > > > > > > therefore the > > > > > > > > > > // out-of-bound > > > > > > > > > > warning. > > > > > > > > > > warn (); > > > > > > > > > > } > > > > > > > > > > else > > > > > > > > > > { > > > > > > > > > > *ptr = 0; > > > > > > > > > > *val = sg->vals[index]; > > > > > > > > > > } > > > > > > > > > > *ptr = *val; > > > > > > > > > > return; > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > We can see, after the thread jump optimization, the > > > > > > > > > > # of > > > > > > > > > > branches > > > > > > > > > > inside > > > > > > > > > > the routine "sparx5_set" is reduced from 2 to 1, > > > > > > > > > > however, due > > > > > > > > > > to > > > > > > > > > > the > > > > > > > > > > code duplication (which is needed for the > > > > > > > > > > correctness of the > > > > > > > > > > code), > > > > > > > > > > we > > > > > > > > > > got a false positive out-of-bound warning. > > > > > > > > > > > > > > > > > > > > In order to eliminate such false positive out-of- > > > > > > > > > > bound warning, > > > > > > > > > > > > > > > > > > > > A. Add one more flag for GIMPLE: is_splitted. > > > > > > > > > > B. During the thread jump optimization, when the > > > > > > > > > > basic blocks > > > > > > > > > > are > > > > > > > > > > duplicated, mark all the STMTs inside the original > > > > > > > > > > and > > > > > > > > > > duplicated > > > > > > > > > > basic blocks as "is_splitted"; > > > > > > > > > > C. Inside the array bound checker, add the > > > > > > > > > > following new > > > > > > > > > > heuristic: > > > > > > > > > > > > > > > > > > > > If > > > > > > > > > > 1. the stmt is duplicated and splitted into two > > > > > > > > > > conditional > > > > > > > > > > paths; > > > > > > > > > > + 2. the warning level < 2; > > > > > > > > > > + 3. the current block is not dominating the exit > > > > > > > > > > block > > > > > > > > > > Then not report the warning. > > > > > > > > > > > > > > > > > > > > The false positive warnings are moved from -Warray- > > > > > > > > > > bounds=1 to > > > > > > > > > > -Warray-bounds=2 now. > > > > > > > > > > > > > > > > > > > > Bootstrapped and regression tested on both x86 and > > > > > > > > > > aarch64. > > > > > > > > > > adjusted > > > > > > > > > > -Warray-bounds-61.c due to the false positive > > > > > > > > > > warnings. > > > > > > > > > > > > > > > > > > > > Let me know if you have any comments and > > > > > > > > > > suggestions. > > > > > > > > > > > > > > > > > > At the last Cauldron I talked with David Malcolm > > > > > > > > > about these kind > > > > > > > > > of > > > > > > > > > issues and thought of instead of suppressing > > > > > > > > > diagnostics to > > > > > > > > > record > > > > > > > > > how a block was duplicated. For jump threading my > > > > > > > > > idea was to > > > > > > > > > record > > > > > > > > > the condition that was proved true when entering the > > > > > > > > > path and do > > > > > > > > > this > > > > > > > > > by recording the corresponding locations > > > > > > > > > > > > > > Is only recording the location for the TRUE path enough? > > > > > > > We might need to record the corresponding locations for > > > > > > > both TRUE and > > > > > > > FALSE paths since the VRP might be more accurate on both > > > > > > > paths. > > > > > > > Is only recording the location is enough? > > > > > > > Do we need to record the pointer to the original > > > > > > > condition stmt? > > > > > > > > > > > > Just to be clear: I don't plan to work on this myself (I > > > > > > have far too > > > > > > much already to work on...); I'm assuming Richard Biener is > > > > > > hoping/planning to implement this himself. > > > > > > > > > > While I think some of this might be an improvement to those > > > > > vast > > > > > number of "false positive" diagnostics we have from (too) > > > > > late diagnostic > > > > > passes I do not have the cycles to work on this. > > > > > > > > I can study a little bit more on how to improve the diagnostics > > > > for PR 109071 first. > > > > > > > > FYI, I just updated PR109071’s description as: Need more > > > > context for -Warray-bounds warnings due to code duplication > > > > from jump threading. > > > > > > > > As we already agreed, this is NOT a false positive. It caught a > > > > real bug in linux kernel that need to be patched in the kernel > > > > source. (See > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109071#c11) > > > > > > > > In order to add more context to the diagnostics to help the end > > > > user locate the bug accurately in their source code, compiler > > > > needs: > > > > > > > > 1. During jump threading phase, record the following > > > > information for each duplicated STMT: > > > > A. A pointer to the COND stmt; > > > > B. True/false for such COND > > > > 2. During array out-of-bound checking phase, whenever locate an > > > > out-of-bound case, > > > > A. Use a rich_location for the diagnostic; > > > > B. Create an instance of diagnositic_path, add events to > > > > this diagnostic_path based on the COND stmt, True/False info > > > > recorded in the STMT; > > > > C. Call richloc.set_path() to associate the path with > > > > the rich_location; > > > > D. Then issue warning with this rich_location instead of > > > > the current regular location. > > > > > > > > Any comment and suggestion to the above? > > > > > > I was originally thinking of using location_adhoc_data to store > > > 1.A > > > and 1.B as a common bit to associate to each > > > copied stmt. IIRC location_adhoc_data->data is the stmts > > > associated > > > lexical block so we'd need to stuff another > > > 'data' field into this struct, like a "copy history" where we > > > could > > > then even record copied-of-copy-of-X. > > > locataion_adhoc_data seems imperfectly packed right now, with a > > > 32bit > > > hole before 'data' and 32bit unused > > > at its end, so we might get away without memory use by re- > > > ordering > > > discriminator before 'data' ... > > > > Like this? > > > > diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h > > index e6e2b0897572..ee344f91333b 100644 > > --- a/libcpp/include/line-map.h > > +++ b/libcpp/include/line-map.h > > @@ -761,8 +761,9 @@ struct GTY(()) maps_info_macro { > > struct GTY(()) location_adhoc_data { > > location_t locus; > > source_range src_range; > > - void * GTY((skip)) data; > > unsigned discriminator; > > + void * GTY((skip)) data; > > + void * GTY((skip)) copy_history; > > }; > > struct htab; > > Yes. > > > How about the copy_history? Do we need a new data structure (like > > the following, or other suggestion) for this? Where should I add > > this new data structure? > > As it needs to be managed by libcpp it should be in this very same > file. > > > struct copy_history { > > location_t condition; > > Bool is_true_path; > > } > > I think we want a pointer to the previous copy-of state as well in > case a stmt > was copied twice. We'll see whether a single (condition) location > plus edge flag > is sufficient. I'd say we should plan for an enum to indicate the > duplication > reason at least (jump threading, unswitching, unrolling come to my > mind). For > jump threading being able to say "when <condition> is true/false" is > probably > good enough, though it might not always be easy to identify a single > condition > here given a threading path starts at an incoming edge to a CFG merge > and > will usually end with the outgoing edge of a condition that we can > statically > evaluate. The condition controlling the path entry might not be > determined > fully by a single condition location. > > Possibly building a full "diagnostic path" object at threading time > might be > the only way to record all the facts, but that's also going to be > more > expensive.
Note that a diagnostic_path represents a path through some kind of graph, whereas it sounds like you want to be storing the *nodes* in the graph, and later generating the diagnostic_path from that graph when we need it (which is trivial if the graph is actually just a tree: just follow the parent links backwards, then reverse it). Dave > I can think of having a -fdiagnostic-try-to-explain-harder option to > clarify confusing > diagnostics people could use on-demand? > > Richard. > > > Qing > > > > > > Richard. > > > > > > > Thanks. > > > > > > > > Qing > > > > > > > > > > > > > > > > > > Richard. > > > > > > > > > > > But feel free to ask me any questions about the > > > > > > diagnostic_path > > > > > > machinery within the diagnostics subsystem. > > > > > > > > > > > > [...snip...] > > > > > > > > > > > > Regards > > > > > > Dave > > > > >
