Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

Richard Biener Wed, 05 Jun 2024 10:07:27 -0700

> Am 05.06.2024 um 18:39 schrieb Qing Zhao <[email protected]>:
> 
> 
> 
>> On Jun 5, 2024, at 03:26, Richard Biener <[email protected]> wrote:
>> 
>>> On Tue, Jun 4, 2024 at 10:31 PM Qing Zhao <[email protected]> wrote:
>>> 
>>> 
>>> 
>>>> On Jun 4, 2024, at 03:43, Richard Biener <[email protected]> 
>>>> wrote:
>>>> 
>>>> On Mon, Jun 3, 2024 at 4:48 PM David Malcolm <[email protected]> wrote:
>>>>> 
>>>>> On Mon, 2024-06-03 at 08:29 +0200, Richard Biener wrote:
>>>>>> On Fri, May 31, 2024 at 11:23 PM Qing Zhao <[email protected]>
>>>>>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On May 23, 2024, at 07:46, Richard Biener
>>>>>>>> <[email protected]> wrote:
>>>>>>>> 
>>>>>>>> On Wed, May 22, 2024 at 8:53 PM Qing Zhao <[email protected]>
>>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On May 22, 2024, at 03:38, Richard Biener
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>> 
>>>>>>>>>> On Tue, May 21, 2024 at 11:36 PM David Malcolm
>>>>>>>>>> <[email protected]> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, 2024-05-21 at 15:13 +0000, Qing Zhao wrote:
>>>>>>>>>>>> Thanks for the comments and suggestions.
>>>>>>>>>>>> 
>>>>>>>>>>>>> On May 15, 2024, at 10:00, David Malcolm
>>>>>>>>>>>>> <[email protected]>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, 2024-05-14 at 15:08 +0200, Richard Biener
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On Mon, 13 May 2024, Qing Zhao wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> -Warray-bounds is an important option to enable
>>>>>>>>>>>>>>> linux kernal to
>>>>>>>>>>>>>>> keep
>>>>>>>>>>>>>>> the array out-of-bound errors out of the source
>>>>>>>>>>>>>>> tree.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> However, due to the false positive warnings
>>>>>>>>>>>>>>> reported in
>>>>>>>>>>>>>>> PR109071
>>>>>>>>>>>>>>> (-Warray-bounds false positive warnings due to code
>>>>>>>>>>>>>>> duplication
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> jump threading), -Warray-bounds=1 cannot be added
>>>>>>>>>>>>>>> on by
>>>>>>>>>>>>>>> default.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Although it's impossible to elinimate all the false
>>>>>>>>>>>>>>> positive
>>>>>>>>>>>>>>> warnings
>>>>>>>>>>>>>>> from -Warray-bounds=1 (See PR104355 Misleading -
>>>>>>>>>>>>>>> Warray-bounds
>>>>>>>>>>>>>>> documentation says "always out of bounds"), we
>>>>>>>>>>>>>>> should minimize
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> false positive warnings in -Warray-bounds=1.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The root reason for the false positive warnings
>>>>>>>>>>>>>>> reported in
>>>>>>>>>>>>>>> PR109071 is:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> When the thread jump optimization tries to reduce
>>>>>>>>>>>>>>> the # of
>>>>>>>>>>>>>>> branches
>>>>>>>>>>>>>>> inside the routine, sometimes it needs to duplicate
>>>>>>>>>>>>>>> the code
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> split into two conditional pathes. for example:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The original code:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> void sparx5_set (int * ptr, struct nums * sg, int
>>>>>>>>>>>>>>> index)
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> if (index >= 4)
>>>>>>>>>>>>>>> warn ();
>>>>>>>>>>>>>>> *ptr = 0;
>>>>>>>>>>>>>>> *val = sg->vals[index];
>>>>>>>>>>>>>>> if (index >= 4)
>>>>>>>>>>>>>>> warn ();
>>>>>>>>>>>>>>> *ptr = *val;
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> return;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> With the thread jump, the above becomes:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> void sparx5_set (int * ptr, struct nums * sg, int
>>>>>>>>>>>>>>> index)
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> if (index >= 4)
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>  warn ();
>>>>>>>>>>>>>>>  *ptr = 0;         // Code duplications since
>>>>>>>>>>>>>>> "warn" does
>>>>>>>>>>>>>>> return;
>>>>>>>>>>>>>>>  *val = sg->vals[index];   // same this line.
>>>>>>>>>>>>>>>                            // In this path,
>>>>>>>>>>>>>>> since it's
>>>>>>>>>>>>>>> under
>>>>>>>>>>>>>>> the condition
>>>>>>>>>>>>>>>                            // "index >= 4", the
>>>>>>>>>>>>>>> compiler
>>>>>>>>>>>>>>> knows
>>>>>>>>>>>>>>> the value
>>>>>>>>>>>>>>>                            // of "index" is
>>>>>>>>>>>>>>> larger then 4,
>>>>>>>>>>>>>>> therefore the
>>>>>>>>>>>>>>>                            // out-of-bound
>>>>>>>>>>>>>>> warning.
>>>>>>>>>>>>>>>  warn ();
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> else
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>  *ptr = 0;
>>>>>>>>>>>>>>>  *val = sg->vals[index];
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> *ptr = *val;
>>>>>>>>>>>>>>> return;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> We can see, after the thread jump optimization, the
>>>>>>>>>>>>>>> # of
>>>>>>>>>>>>>>> branches
>>>>>>>>>>>>>>> inside
>>>>>>>>>>>>>>> the routine "sparx5_set" is reduced from 2 to 1,
>>>>>>>>>>>>>>> however,  due
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> code duplication (which is needed for the
>>>>>>>>>>>>>>> correctness of the
>>>>>>>>>>>>>>> code),
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>> got a false positive out-of-bound warning.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> In order to eliminate such false positive out-of-
>>>>>>>>>>>>>>> bound warning,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> A. Add one more flag for GIMPLE: is_splitted.
>>>>>>>>>>>>>>> B. During the thread jump optimization, when the
>>>>>>>>>>>>>>> basic blocks
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> duplicated, mark all the STMTs inside the original
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> duplicated
>>>>>>>>>>>>>>> basic blocks as "is_splitted";
>>>>>>>>>>>>>>> C. Inside the array bound checker, add the
>>>>>>>>>>>>>>> following new
>>>>>>>>>>>>>>> heuristic:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>> 1. the stmt is duplicated and splitted into two
>>>>>>>>>>>>>>> conditional
>>>>>>>>>>>>>>> paths;
>>>>>>>>>>>>>>> +  2. the warning level < 2;
>>>>>>>>>>>>>>> +  3. the current block is not dominating the exit
>>>>>>>>>>>>>>> block
>>>>>>>>>>>>>>> Then not report the warning.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The false positive warnings are moved from -Warray-
>>>>>>>>>>>>>>> bounds=1 to
>>>>>>>>>>>>>>> -Warray-bounds=2 now.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Bootstrapped and regression tested on both x86 and
>>>>>>>>>>>>>>> aarch64.
>>>>>>>>>>>>>>> adjusted
>>>>>>>>>>>>>>> -Warray-bounds-61.c due to the false positive
>>>>>>>>>>>>>>> warnings.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Let me know if you have any comments and
>>>>>>>>>>>>>>> suggestions.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> At the last Cauldron I talked with David Malcolm
>>>>>>>>>>>>>> about these kind
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> issues and thought of instead of suppressing
>>>>>>>>>>>>>> diagnostics to
>>>>>>>>>>>>>> record
>>>>>>>>>>>>>> how a block was duplicated.  For jump threading my
>>>>>>>>>>>>>> idea was to
>>>>>>>>>>>>>> record
>>>>>>>>>>>>>> the condition that was proved true when entering the
>>>>>>>>>>>>>> path and do
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>> by recording the corresponding locations
>>>>>>>>>>>> 
>>>>>>>>>>>> Is only recording the location for the TRUE path  enough?
>>>>>>>>>>>> We might need to record the corresponding locations for
>>>>>>>>>>>> both TRUE and
>>>>>>>>>>>> FALSE paths since the VRP might be more accurate on both
>>>>>>>>>>>> paths.
>>>>>>>>>>>> Is only recording the location is enough?
>>>>>>>>>>>> Do we need to record the pointer to the original
>>>>>>>>>>>> condition stmt?
>>>>>>>>>>> 
>>>>>>>>>>> Just to be clear: I don't plan to work on this myself (I
>>>>>>>>>>> have far too
>>>>>>>>>>> much already to work on...); I'm assuming Richard Biener is
>>>>>>>>>>> hoping/planning to implement this himself.
>>>>>>>>>> 
>>>>>>>>>> While I think some of this might be an improvement to those
>>>>>>>>>> vast
>>>>>>>>>> number of "false positive" diagnostics we have from (too)
>>>>>>>>>> late diagnostic
>>>>>>>>>> passes I do not have the cycles to work on this.
>>>>>>>>> 
>>>>>>>>> I can study a little bit more on how to improve the diagnostics
>>>>>>>>> for PR 109071 first.
>>>>>>>>> 
>>>>>>>>> FYI, I just updated PR109071’s description as: Need more
>>>>>>>>> context for -Warray-bounds warnings due to code duplication
>>>>>>>>> from jump threading.
>>>>>>>>> 
>>>>>>>>> As we already agreed, this is NOT a false positive. It caught a
>>>>>>>>> real bug in linux kernel that need to be patched in the kernel
>>>>>>>>> source. (See
>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109071#c11)
>>>>>>>>> 
>>>>>>>>> In order to add more context to the diagnostics to help the end
>>>>>>>>> user locate the bug accurately in their source code, compiler
>>>>>>>>> needs:
>>>>>>>>> 
>>>>>>>>> 1. During jump threading phase, record the following
>>>>>>>>> information for each duplicated STMT:
>>>>>>>>>     A. A pointer to the COND stmt;
>>>>>>>>>     B. True/false for such COND
>>>>>>>>> 2. During array out-of-bound checking phase, whenever locate an
>>>>>>>>> out-of-bound case,
>>>>>>>>>     A. Use a rich_location for the diagnostic;
>>>>>>>>>     B. Create an instance of diagnositic_path, add events to
>>>>>>>>> this diagnostic_path based on the COND stmt, True/False info
>>>>>>>>> recorded in the STMT;
>>>>>>>>>     C. Call richloc.set_path() to associate the path with
>>>>>>>>> the rich_location;
>>>>>>>>>     D. Then issue warning with this rich_location instead of
>>>>>>>>> the current regular location.
>>>>>>>>> 
>>>>>>>>> Any comment and suggestion to the above?
>>>>>>>> 
>>>>>>>> I was originally thinking of using location_adhoc_data to store
>>>>>>>> 1.A
>>>>>>>> and 1.B as a common bit to associate to each
>>>>>>>> copied stmt.  IIRC location_adhoc_data->data is the stmts
>>>>>>>> associated
>>>>>>>> lexical block so we'd need to stuff another
>>>>>>>> 'data' field into this struct, like a "copy history" where we
>>>>>>>> could
>>>>>>>> then even record copied-of-copy-of-X.
>>>>>>>> locataion_adhoc_data seems imperfectly packed right now, with a
>>>>>>>> 32bit
>>>>>>>> hole before 'data' and 32bit unused
>>>>>>>> at its end, so we might get away without memory use by re-
>>>>>>>> ordering
>>>>>>>> discriminator before 'data' ...
>>>>>>> 
>>>>>>> Like this?
>>>>>>> 
>>>>>>> diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
>>>>>>> index e6e2b0897572..ee344f91333b 100644
>>>>>>> --- a/libcpp/include/line-map.h
>>>>>>> +++ b/libcpp/include/line-map.h
>>>>>>> @@ -761,8 +761,9 @@ struct GTY(()) maps_info_macro {
>>>>>>> struct GTY(()) location_adhoc_data {
>>>>>>> location_t locus;
>>>>>>> source_range src_range;
>>>>>>> -  void * GTY((skip)) data;
>>>>>>> unsigned discriminator;
>>>>>>> +  void * GTY((skip)) data;
>>>>>>> +  void * GTY((skip)) copy_history;
>>>>>>> };
>>>>>>> struct htab;
>>>>>> 
>>>>>> Yes.
>>>>>> 
>>>>>>> How about the copy_history? Do we need a new data structure (like
>>>>>>> the following, or other suggestion) for this? Where should I add
>>>>>>> this new data structure?
>>>>>> 
>>>>>> As it needs to be managed by libcpp it should be in this very same
>>>>>> file.
>>>>>> 
>>>>>>> struct copy_history {
>>>>>>> location_t condition;
>>>>>>> Bool is_true_path;
>>>>>>> }
>>>>>> 
>>>>>> I think we want a pointer to the previous copy-of state as well in
>>>>>> case a stmt
>>>>>> was copied twice.  We'll see whether a single (condition) location
>>>>>> plus edge flag
>>>>>> is sufficient.  I'd say we should plan for an enum to indicate the
>>>>>> duplication
>>>>>> reason at least (jump threading, unswitching, unrolling come to my
>>>>>> mind).  For
>>>>>> jump threading being able to say "when <condition> is true/false" is
>>>>>> probably
>>>>>> good enough, though it might not always be easy to identify a single
>>>>>> condition
>>>>>> here given a threading path starts at an incoming edge to a CFG merge
>>>>>> and
>>>>>> will usually end with the outgoing edge of a condition that we can
>>>>>> statically
>>>>>> evaluate.  The condition controlling the path entry might not be
>>>>>> determined
>>>>>> fully by a single condition location.
>>>>>> 
>>>>>> Possibly building a full "diagnostic path" object at threading time
>>>>>> might be
>>>>>> the only way to record all the facts, but that's also going to be
>>>>>> more
>>>>>> expensive.
>>>>> 
>>>>> Note that a diagnostic_path represents a path through some kind of
>>>>> graph, whereas it sounds like you want to be storing the *nodes* in the
>>>>> graph, and later generating the diagnostic_path from that graph when we
>>>>> need it (which is trivial if the graph is actually just a tree: just
>>>>> follow the parent links backwards, then reverse it).
>>>> 
>>>> I think we are mixing two things - one is that a single transform like jump
>>>> threading produces a stmt copy and when we emit a diagnostic on that
>>>> copied statement we want to tell the user the condition under which the
>>>> copy is executed.  That "condition" can be actually a sequence of
>>>> conditionals.  I wanted to point out that a diagnostic_path instance could
>>>> be used to describe such complex condition.
>>>> 
>>>> But then the other thing I wanted to address with the link to a previous
>>>> copy_history - that's when a statement gets copied twice, for example
>>>> by two distinct jump threading optimizations.  Like when dumping
>>>> the inlining decisions for diagnostics we could dump the logical "and"
>>>> of the conditions of the two threadings.  Since we have a single
>>>> location per GIMPLE stmt we'd have to keep a "stack" of copy events
>>>> associated with it.  That's the linked list (I think a linked list should
>>>> work fine here).
>>> Yes, the linked list to keep the “stack” of copy events should be good 
>>> enough
>>> to form the sequence of conditionals event for the diagnostic_path instance.
>>>> 
>>>> I realize things may get a bit "fat", but eventually we are not duplicating
>>>> statements that much.  I do hope we can share for example a big
>>>> diagnostic_path when we duplicate a basic block during threading
>>>> and use one instance for all stmts in such block copy (IIRC we never
>>>> release locations or their ad-hoc data, we just make sure to never
>>>> use locations with ad-hoc data pointing to BLOCKs that we released,
>>>> but the linemap data will still have pointers in "dead" location entries,
>>>> right?)
>>> Are you still suggesting to add two artificial stmts in the beginning and 
>>> the
>>> end of the duplicated block to carry the copy history information for all 
>>> the
>>> stmts in the block to save space?
>>> 
>>> Compared with the approach to carry such information to each duplicated 
>>> stmts (which I preferred),
>>> The major concerns with the approach are:
>>>  1. Optimizations might move stmts out of these two artificial stmts, 
>>> therefore we need add
>>>       Some memory barrier on these two artificial stmts to prevent such 
>>> movements.
>>>      This might prevent good optimization from happening and result in 
>>> runtime performance loss;
>>>  2. Every time when query whether the stmt is copied and get  its copy 
>>> history, we have to search backward or
>>>       Forward through the stmt chain to get to the artificial stmts that 
>>> carry the copy history, compilation time will
>>>      be slower.
>>> 
>>> I still think that carrying the copy history info to each duplicated stmts 
>>> might be better. We can limit the length of
>>> the history to control the space, what do you think?
>> 
>> Copying to each stmt is definitely easier so I think it's better to
>> start with that and only resort
>> to other things when this fails.
> Okay.  Will try this approach first.
>> 
>> Note we could, instead of putting a pointer to data into the ad-hoc
>> info, put in an index
>> and have the actual data be maintained elsewhere.  That data could
>> even only keep
>> the "last" 1000 contexts and simply discard "older" ones.  Before IPA
>> info can still be
>> transfered between functions, but after IPA where most of the
>> threadings happen the
>> extra info could be discarded once we get a function to the last pass
>> emitting diagnostics
>> we want to amend.  Doing an index lookup makes it possible to not
>> worry about "stale"
>> entries.
> 
> Are you suggesting to use a global array with upper-bound “1000” to hold the 
> copy-history info?
> Then for each copied stmt, put in an index to this array to refer the 
> copy-history linked list for it?
> Is “1000” enough for one function? We can do some experiments to decide this 
> number.
> 
> What do you mean by “stale” entries? The entries for the previous function?
> 
> Is a function-level array enough for this purpose? i.e, for each function, we 
> use an array to hold the copy-history info, and free this array after this 
> function is done with compilation?
> A little confused here….

I was thinking of a hash map and noting when we create an entry after IPA so we 
can scrap it when done with the function 

> Thanks for clarification.
> 
> Qing
> 
> 
>> 
>> Richard.
>> 
>>> 
>>> Qing
>>> 
>>>> 
>>>> Richard.
>>>> 
>>>>> 
>>>>> Dave
>>>>> 
>>>>>> I can think of having a -fdiagnostic-try-to-explain-harder option to
>>>>>> clarify confusing
>>>>>> diagnostics people could use on-demand?
>>>>>> 
>>>>>> Richard.
>>>>>> 
>>>>>>> Qing
>>>>>>>> 
>>>>>>>> Richard.
>>>>>>>> 
>>>>>>>>> Thanks.
>>>>>>>>> 
>>>>>>>>> Qing
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> Richard.
>>>>>>>>>> 
>>>>>>>>>>> But feel free to ask me any questions about the
>>>>>>>>>>> diagnostic_path
>>>>>>>>>>> machinery within the diagnostics subsystem.
>>>>>>>>>>> 
>>>>>>>>>>> [...snip...]
>>>>>>>>>>> 
>>>>>>>>>>> Regards
>>>>>>>>>>> Dave
> 
>
Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

Reply via email to