Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

Qing Zhao Wed, 05 Jun 2024 10:59:14 -0700


> On Jun 5, 2024, at 13:07, Richard Biener <richard.guent...@gmail.com> wrote:
> 
> 
> 
>> Am 05.06.2024 um 18:39 schrieb Qing Zhao <qing.z...@oracle.com>:
>> 
>> 
>> 
>>> On Jun 5, 2024, at 03:26, Richard Biener <richard.guent...@gmail.com> wrote:
>>> 
>>>> On Tue, Jun 4, 2024 at 10:31 PM Qing Zhao <qing.z...@oracle.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Jun 4, 2024, at 03:43, Richard Biener <richard.guent...@gmail.com> 
>>>>> wrote:
>>>>> 
>>>>> On Mon, Jun 3, 2024 at 4:48 PM David Malcolm <dmalc...@redhat.com> wrote:
>>>>>> 
>>>>>> On Mon, 2024-06-03 at 08:29 +0200, Richard Biener wrote:
>>>>>>> On Fri, May 31, 2024 at 11:23 PM Qing Zhao <qing.z...@oracle.com>
>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On May 23, 2024, at 07:46, Richard Biener
>>>>>>>>> <richard.guent...@gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>> On Wed, May 22, 2024 at 8:53 PM Qing Zhao <qing.z...@oracle.com>
>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> On May 22, 2024, at 03:38, Richard Biener
>>>>>>>>>>> <richard.guent...@gmail.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, May 21, 2024 at 11:36 PM David Malcolm
>>>>>>>>>>> <dmalc...@redhat.com> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, 2024-05-21 at 15:13 +0000, Qing Zhao wrote:
>>>>>>>>>>>>> Thanks for the comments and suggestions.
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On May 15, 2024, at 10:00, David Malcolm
>>>>>>>>>>>>>> <dmalc...@redhat.com>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, 2024-05-14 at 15:08 +0200, Richard Biener
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> On Mon, 13 May 2024, Qing Zhao wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -Warray-bounds is an important option to enable
>>>>>>>>>>>>>>>> linux kernal to
>>>>>>>>>>>>>>>> keep
>>>>>>>>>>>>>>>> the array out-of-bound errors out of the source
>>>>>>>>>>>>>>>> tree.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> However, due to the false positive warnings
>>>>>>>>>>>>>>>> reported in
>>>>>>>>>>>>>>>> PR109071
>>>>>>>>>>>>>>>> (-Warray-bounds false positive warnings due to code
>>>>>>>>>>>>>>>> duplication
>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>> jump threading), -Warray-bounds=1 cannot be added
>>>>>>>>>>>>>>>> on by
>>>>>>>>>>>>>>>> default.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Although it's impossible to elinimate all the false
>>>>>>>>>>>>>>>> positive
>>>>>>>>>>>>>>>> warnings
>>>>>>>>>>>>>>>> from -Warray-bounds=1 (See PR104355 Misleading -
>>>>>>>>>>>>>>>> Warray-bounds
>>>>>>>>>>>>>>>> documentation says "always out of bounds"), we
>>>>>>>>>>>>>>>> should minimize
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> false positive warnings in -Warray-bounds=1.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The root reason for the false positive warnings
>>>>>>>>>>>>>>>> reported in
>>>>>>>>>>>>>>>> PR109071 is:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> When the thread jump optimization tries to reduce
>>>>>>>>>>>>>>>> the # of
>>>>>>>>>>>>>>>> branches
>>>>>>>>>>>>>>>> inside the routine, sometimes it needs to duplicate
>>>>>>>>>>>>>>>> the code
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> split into two conditional pathes. for example:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The original code:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> void sparx5_set (int * ptr, struct nums * sg, int
>>>>>>>>>>>>>>>> index)
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>> if (index >= 4)
>>>>>>>>>>>>>>>> warn ();
>>>>>>>>>>>>>>>> *ptr = 0;
>>>>>>>>>>>>>>>> *val = sg->vals[index];
>>>>>>>>>>>>>>>> if (index >= 4)
>>>>>>>>>>>>>>>> warn ();
>>>>>>>>>>>>>>>> *ptr = *val;
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> return;
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> With the thread jump, the above becomes:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> void sparx5_set (int * ptr, struct nums * sg, int
>>>>>>>>>>>>>>>> index)
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>> if (index >= 4)
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>> warn ();
>>>>>>>>>>>>>>>> *ptr = 0;         // Code duplications since
>>>>>>>>>>>>>>>> "warn" does
>>>>>>>>>>>>>>>> return;
>>>>>>>>>>>>>>>> *val = sg->vals[index];   // same this line.
>>>>>>>>>>>>>>>>                           // In this path,
>>>>>>>>>>>>>>>> since it's
>>>>>>>>>>>>>>>> under
>>>>>>>>>>>>>>>> the condition
>>>>>>>>>>>>>>>>                           // "index >= 4", the
>>>>>>>>>>>>>>>> compiler
>>>>>>>>>>>>>>>> knows
>>>>>>>>>>>>>>>> the value
>>>>>>>>>>>>>>>>                           // of "index" is
>>>>>>>>>>>>>>>> larger then 4,
>>>>>>>>>>>>>>>> therefore the
>>>>>>>>>>>>>>>>                           // out-of-bound
>>>>>>>>>>>>>>>> warning.
>>>>>>>>>>>>>>>> warn ();
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> else
>>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>>> *ptr = 0;
>>>>>>>>>>>>>>>> *val = sg->vals[index];
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> *ptr = *val;
>>>>>>>>>>>>>>>> return;
>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> We can see, after the thread jump optimization, the
>>>>>>>>>>>>>>>> # of
>>>>>>>>>>>>>>>> branches
>>>>>>>>>>>>>>>> inside
>>>>>>>>>>>>>>>> the routine "sparx5_set" is reduced from 2 to 1,
>>>>>>>>>>>>>>>> however,  due
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> code duplication (which is needed for the
>>>>>>>>>>>>>>>> correctness of the
>>>>>>>>>>>>>>>> code),
>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>> got a false positive out-of-bound warning.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> In order to eliminate such false positive out-of-
>>>>>>>>>>>>>>>> bound warning,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> A. Add one more flag for GIMPLE: is_splitted.
>>>>>>>>>>>>>>>> B. During the thread jump optimization, when the
>>>>>>>>>>>>>>>> basic blocks
>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>> duplicated, mark all the STMTs inside the original
>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>> duplicated
>>>>>>>>>>>>>>>> basic blocks as "is_splitted";
>>>>>>>>>>>>>>>> C. Inside the array bound checker, add the
>>>>>>>>>>>>>>>> following new
>>>>>>>>>>>>>>>> heuristic:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>>> 1. the stmt is duplicated and splitted into two
>>>>>>>>>>>>>>>> conditional
>>>>>>>>>>>>>>>> paths;
>>>>>>>>>>>>>>>> +  2. the warning level < 2;
>>>>>>>>>>>>>>>> +  3. the current block is not dominating the exit
>>>>>>>>>>>>>>>> block
>>>>>>>>>>>>>>>> Then not report the warning.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> The false positive warnings are moved from -Warray-
>>>>>>>>>>>>>>>> bounds=1 to
>>>>>>>>>>>>>>>> -Warray-bounds=2 now.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Bootstrapped and regression tested on both x86 and
>>>>>>>>>>>>>>>> aarch64.
>>>>>>>>>>>>>>>> adjusted
>>>>>>>>>>>>>>>> -Warray-bounds-61.c due to the false positive
>>>>>>>>>>>>>>>> warnings.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Let me know if you have any comments and
>>>>>>>>>>>>>>>> suggestions.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> At the last Cauldron I talked with David Malcolm
>>>>>>>>>>>>>>> about these kind
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> issues and thought of instead of suppressing
>>>>>>>>>>>>>>> diagnostics to
>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>> how a block was duplicated.  For jump threading my
>>>>>>>>>>>>>>> idea was to
>>>>>>>>>>>>>>> record
>>>>>>>>>>>>>>> the condition that was proved true when entering the
>>>>>>>>>>>>>>> path and do
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>> by recording the corresponding locations
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Is only recording the location for the TRUE path  enough?
>>>>>>>>>>>>> We might need to record the corresponding locations for
>>>>>>>>>>>>> both TRUE and
>>>>>>>>>>>>> FALSE paths since the VRP might be more accurate on both
>>>>>>>>>>>>> paths.
>>>>>>>>>>>>> Is only recording the location is enough?
>>>>>>>>>>>>> Do we need to record the pointer to the original
>>>>>>>>>>>>> condition stmt?
>>>>>>>>>>>> 
>>>>>>>>>>>> Just to be clear: I don't plan to work on this myself (I
>>>>>>>>>>>> have far too
>>>>>>>>>>>> much already to work on...); I'm assuming Richard Biener is
>>>>>>>>>>>> hoping/planning to implement this himself.
>>>>>>>>>>> 
>>>>>>>>>>> While I think some of this might be an improvement to those
>>>>>>>>>>> vast
>>>>>>>>>>> number of "false positive" diagnostics we have from (too)
>>>>>>>>>>> late diagnostic
>>>>>>>>>>> passes I do not have the cycles to work on this.
>>>>>>>>>> 
>>>>>>>>>> I can study a little bit more on how to improve the diagnostics
>>>>>>>>>> for PR 109071 first.
>>>>>>>>>> 
>>>>>>>>>> FYI, I just updated PR109071’s description as: Need more
>>>>>>>>>> context for -Warray-bounds warnings due to code duplication
>>>>>>>>>> from jump threading.
>>>>>>>>>> 
>>>>>>>>>> As we already agreed, this is NOT a false positive. It caught a
>>>>>>>>>> real bug in linux kernel that need to be patched in the kernel
>>>>>>>>>> source. (See
>>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109071#c11)
>>>>>>>>>> 
>>>>>>>>>> In order to add more context to the diagnostics to help the end
>>>>>>>>>> user locate the bug accurately in their source code, compiler
>>>>>>>>>> needs:
>>>>>>>>>> 
>>>>>>>>>> 1. During jump threading phase, record the following
>>>>>>>>>> information for each duplicated STMT:
>>>>>>>>>>    A. A pointer to the COND stmt;
>>>>>>>>>>    B. True/false for such COND
>>>>>>>>>> 2. During array out-of-bound checking phase, whenever locate an
>>>>>>>>>> out-of-bound case,
>>>>>>>>>>    A. Use a rich_location for the diagnostic;
>>>>>>>>>>    B. Create an instance of diagnositic_path, add events to
>>>>>>>>>> this diagnostic_path based on the COND stmt, True/False info
>>>>>>>>>> recorded in the STMT;
>>>>>>>>>>    C. Call richloc.set_path() to associate the path with
>>>>>>>>>> the rich_location;
>>>>>>>>>>    D. Then issue warning with this rich_location instead of
>>>>>>>>>> the current regular location.
>>>>>>>>>> 
>>>>>>>>>> Any comment and suggestion to the above?
>>>>>>>>> 
>>>>>>>>> I was originally thinking of using location_adhoc_data to store
>>>>>>>>> 1.A
>>>>>>>>> and 1.B as a common bit to associate to each
>>>>>>>>> copied stmt.  IIRC location_adhoc_data->data is the stmts
>>>>>>>>> associated
>>>>>>>>> lexical block so we'd need to stuff another
>>>>>>>>> 'data' field into this struct, like a "copy history" where we
>>>>>>>>> could
>>>>>>>>> then even record copied-of-copy-of-X.
>>>>>>>>> locataion_adhoc_data seems imperfectly packed right now, with a
>>>>>>>>> 32bit
>>>>>>>>> hole before 'data' and 32bit unused
>>>>>>>>> at its end, so we might get away without memory use by re-
>>>>>>>>> ordering
>>>>>>>>> discriminator before 'data' ...
>>>>>>>> 
>>>>>>>> Like this?
>>>>>>>> 
>>>>>>>> diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
>>>>>>>> index e6e2b0897572..ee344f91333b 100644
>>>>>>>> --- a/libcpp/include/line-map.h
>>>>>>>> +++ b/libcpp/include/line-map.h
>>>>>>>> @@ -761,8 +761,9 @@ struct GTY(()) maps_info_macro {
>>>>>>>> struct GTY(()) location_adhoc_data {
>>>>>>>> location_t locus;
>>>>>>>> source_range src_range;
>>>>>>>> -  void * GTY((skip)) data;
>>>>>>>> unsigned discriminator;
>>>>>>>> +  void * GTY((skip)) data;
>>>>>>>> +  void * GTY((skip)) copy_history;
>>>>>>>> };
>>>>>>>> struct htab;
>>>>>>> 
>>>>>>> Yes.
>>>>>>> 
>>>>>>>> How about the copy_history? Do we need a new data structure (like
>>>>>>>> the following, or other suggestion) for this? Where should I add
>>>>>>>> this new data structure?
>>>>>>> 
>>>>>>> As it needs to be managed by libcpp it should be in this very same
>>>>>>> file.
>>>>>>> 
>>>>>>>> struct copy_history {
>>>>>>>> location_t condition;
>>>>>>>> Bool is_true_path;
>>>>>>>> }
>>>>>>> 
>>>>>>> I think we want a pointer to the previous copy-of state as well in
>>>>>>> case a stmt
>>>>>>> was copied twice.  We'll see whether a single (condition) location
>>>>>>> plus edge flag
>>>>>>> is sufficient.  I'd say we should plan for an enum to indicate the
>>>>>>> duplication
>>>>>>> reason at least (jump threading, unswitching, unrolling come to my
>>>>>>> mind).  For
>>>>>>> jump threading being able to say "when <condition> is true/false" is
>>>>>>> probably
>>>>>>> good enough, though it might not always be easy to identify a single
>>>>>>> condition
>>>>>>> here given a threading path starts at an incoming edge to a CFG merge
>>>>>>> and
>>>>>>> will usually end with the outgoing edge of a condition that we can
>>>>>>> statically
>>>>>>> evaluate.  The condition controlling the path entry might not be
>>>>>>> determined
>>>>>>> fully by a single condition location.
>>>>>>> 
>>>>>>> Possibly building a full "diagnostic path" object at threading time
>>>>>>> might be
>>>>>>> the only way to record all the facts, but that's also going to be
>>>>>>> more
>>>>>>> expensive.
>>>>>> 
>>>>>> Note that a diagnostic_path represents a path through some kind of
>>>>>> graph, whereas it sounds like you want to be storing the *nodes* in the
>>>>>> graph, and later generating the diagnostic_path from that graph when we
>>>>>> need it (which is trivial if the graph is actually just a tree: just
>>>>>> follow the parent links backwards, then reverse it).
>>>>> 
>>>>> I think we are mixing two things - one is that a single transform like 
>>>>> jump
>>>>> threading produces a stmt copy and when we emit a diagnostic on that
>>>>> copied statement we want to tell the user the condition under which the
>>>>> copy is executed.  That "condition" can be actually a sequence of
>>>>> conditionals.  I wanted to point out that a diagnostic_path instance could
>>>>> be used to describe such complex condition.
>>>>> 
>>>>> But then the other thing I wanted to address with the link to a previous
>>>>> copy_history - that's when a statement gets copied twice, for example
>>>>> by two distinct jump threading optimizations.  Like when dumping
>>>>> the inlining decisions for diagnostics we could dump the logical "and"
>>>>> of the conditions of the two threadings.  Since we have a single
>>>>> location per GIMPLE stmt we'd have to keep a "stack" of copy events
>>>>> associated with it.  That's the linked list (I think a linked list should
>>>>> work fine here).
>>>> Yes, the linked list to keep the “stack” of copy events should be good 
>>>> enough
>>>> to form the sequence of conditionals event for the diagnostic_path 
>>>> instance.
>>>>> 
>>>>> I realize things may get a bit "fat", but eventually we are not 
>>>>> duplicating
>>>>> statements that much.  I do hope we can share for example a big
>>>>> diagnostic_path when we duplicate a basic block during threading
>>>>> and use one instance for all stmts in such block copy (IIRC we never
>>>>> release locations or their ad-hoc data, we just make sure to never
>>>>> use locations with ad-hoc data pointing to BLOCKs that we released,
>>>>> but the linemap data will still have pointers in "dead" location entries,
>>>>> right?)
>>>> Are you still suggesting to add two artificial stmts in the beginning and 
>>>> the
>>>> end of the duplicated block to carry the copy history information for all 
>>>> the
>>>> stmts in the block to save space?
>>>> 
>>>> Compared with the approach to carry such information to each duplicated 
>>>> stmts (which I preferred),
>>>> The major concerns with the approach are:
>>>> 1. Optimizations might move stmts out of these two artificial stmts, 
>>>> therefore we need add
>>>>      Some memory barrier on these two artificial stmts to prevent such 
>>>> movements.
>>>>     This might prevent good optimization from happening and result in 
>>>> runtime performance loss;
>>>> 2. Every time when query whether the stmt is copied and get  its copy 
>>>> history, we have to search backward or
>>>>      Forward through the stmt chain to get to the artificial stmts that 
>>>> carry the copy history, compilation time will
>>>>     be slower.
>>>> 
>>>> I still think that carrying the copy history info to each duplicated stmts 
>>>> might be better. We can limit the length of
>>>> the history to control the space, what do you think?
>>> 
>>> Copying to each stmt is definitely easier so I think it's better to
>>> start with that and only resort
>>> to other things when this fails.
>> Okay.  Will try this approach first.
>>> 
>>> Note we could, instead of putting a pointer to data into the ad-hoc
>>> info, put in an index
>>> and have the actual data be maintained elsewhere.  That data could
>>> even only keep
>>> the "last" 1000 contexts and simply discard "older" ones.  Before IPA
>>> info can still be
>>> transfered between functions, but after IPA where most of the
>>> threadings happen the
>>> extra info could be discarded once we get a function to the last pass
>>> emitting diagnostics
>>> we want to amend.  Doing an index lookup makes it possible to not
>>> worry about "stale"
>>> entries.
>> 
>> Are you suggesting to use a global array with upper-bound “1000” to hold the 
>> copy-history info?
>> Then for each copied stmt, put in an index to this array to refer the 
>> copy-history linked list for it?
>> Is “1000” enough for one function? We can do some experiments to decide this 
>> number.
>> 
>> What do you mean by “stale” entries? The entries for the previous function?
>> 
>> Is a function-level array enough for this purpose? i.e, for each function, 
>> we use an array to hold the copy-history info, and free this array after 
>> this function is done with compilation?
>> A little confused here….
> 
> I was thinking of a hash map and noting when we create an entry after IPA so 
> we can scrap it when done with the function 
Okay. I see. 
then where should this hash map be declared? diagnostic-spec.h?


Qing

> 
>> Thanks for clarification.
>> 
>> Qing
>> 
>> 
>>> 
>>> Richard.
>>> 
>>>> 
>>>> Qing
>>>> 
>>>>> 
>>>>> Richard.
>>>>> 
>>>>>> 
>>>>>> Dave
>>>>>> 
>>>>>>> I can think of having a -fdiagnostic-try-to-explain-harder option to
>>>>>>> clarify confusing
>>>>>>> diagnostics people could use on-demand?
>>>>>>> 
>>>>>>> Richard.
>>>>>>> 
>>>>>>>> Qing
>>>>>>>>> 
>>>>>>>>> Richard.
>>>>>>>>> 
>>>>>>>>>> Thanks.
>>>>>>>>>> 
>>>>>>>>>> Qing
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Richard.
>>>>>>>>>>> 
>>>>>>>>>>>> But feel free to ask me any questions about the
>>>>>>>>>>>> diagnostic_path
>>>>>>>>>>>> machinery within the diagnostics subsystem.
>>>>>>>>>>>> 
>>>>>>>>>>>> [...snip...]
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards
>>>>>>>>>>>> Dave

Re: [RFC][PATCH] PR tree-optimization/109071 - -Warray-bounds false positive warnings due to code duplication from jump threading

Reply via email to