> Am 05.06.2024 um 18:39 schrieb Qing Zhao <qing.z...@oracle.com>:
>
>
>
>> On Jun 5, 2024, at 03:26, Richard Biener <richard.guent...@gmail.com> wrote:
>>
>>> On Tue, Jun 4, 2024 at 10:31 PM Qing Zhao <qing.z...@oracle.com> wrote:
>>>
>>>
>>>
>>>> On Jun 4, 2024, at 03:43, Richard Biener <richard.guent...@gmail.com>
>>>> wrote:
>>>>
>>>> On Mon, Jun 3, 2024 at 4:48 PM David Malcolm <dmalc...@redhat.com> wrote:
>>>>>
>>>>> On Mon, 2024-06-03 at 08:29 +0200, Richard Biener wrote:
>>>>>> On Fri, May 31, 2024 at 11:23 PM Qing Zhao <qing.z...@oracle.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> On May 23, 2024, at 07:46, Richard Biener
>>>>>>>> <richard.guent...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> On Wed, May 22, 2024 at 8:53 PM Qing Zhao <qing.z...@oracle.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> On May 22, 2024, at 03:38, Richard Biener
>>>>>>>>>> <richard.guent...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>> On Tue, May 21, 2024 at 11:36 PM David Malcolm
>>>>>>>>>> <dmalc...@redhat.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Tue, 2024-05-21 at 15:13 +0000, Qing Zhao wrote:
>>>>>>>>>>>> Thanks for the comments and suggestions.
>>>>>>>>>>>>
>>>>>>>>>>>>> On May 15, 2024, at 10:00, David Malcolm
>>>>>>>>>>>>> <dmalc...@redhat.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, 2024-05-14 at 15:08 +0200, Richard Biener
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> On Mon, 13 May 2024, Qing Zhao wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Warray-bounds is an important option to enable
>>>>>>>>>>>>>>> linux kernal to
>>>>>>>>>>>>>>> keep
>>>>>>>>>>>>>>> the array out-of-bound errors out of the source
>>>>>>>>>>>>>>> tree.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However, due to the false positive warnings
>>>>>>>>>>>>>>> reported in
>>>>>>>>>>>>>>> PR109071
>>>>>>>>>>>>>>> (-Warray-bounds false positive warnings due to code
>>>>>>>>>>>>>>> duplication
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> jump threading), -Warray-bounds=1 cannot be added
>>>>>>>>>>>>>>> on by
>>>>>>>>>>>>>>> default.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Although it's impossible to elinimate all the false
>>>>>>>>>>>>>>> positive
>>>>>>>>>>>>>>> warnings
>>>>>>>>>>>>>>> from -Warray-bounds=1 (See PR104355 Misleading -
>>>>>>>>>>>>>>> Warray-bounds
>>>>>>>>>>>>>>> documentation says "always out of bounds"), we
>>>>>>>>>>>>>>> should minimize
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> false positive warnings in -Warray-bounds=1.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The root reason for the false positive warnings
>>>>>>>>>>>>>>> reported in
>>>>>>>>>>>>>>> PR109071 is:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> When the thread jump optimization tries to reduce
>>>>>>>>>>>>>>> the # of
>>>>>>>>>>>>>>> branches
>>>>>>>>>>>>>>> inside the routine, sometimes it needs to duplicate
>>>>>>>>>>>>>>> the code
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> split into two conditional pathes. for example:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The original code:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> void sparx5_set (int * ptr, struct nums * sg, int
>>>>>>>>>>>>>>> index)
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> if (index >= 4)
>>>>>>>>>>>>>>> warn ();
>>>>>>>>>>>>>>> *ptr = 0;
>>>>>>>>>>>>>>> *val = sg->vals[index];
>>>>>>>>>>>>>>> if (index >= 4)
>>>>>>>>>>>>>>> warn ();
>>>>>>>>>>>>>>> *ptr = *val;
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> return;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> With the thread jump, the above becomes:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> void sparx5_set (int * ptr, struct nums * sg, int
>>>>>>>>>>>>>>> index)
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> if (index >= 4)
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> warn ();
>>>>>>>>>>>>>>> *ptr = 0; // Code duplications since
>>>>>>>>>>>>>>> "warn" does
>>>>>>>>>>>>>>> return;
>>>>>>>>>>>>>>> *val = sg->vals[index]; // same this line.
>>>>>>>>>>>>>>> // In this path,
>>>>>>>>>>>>>>> since it's
>>>>>>>>>>>>>>> under
>>>>>>>>>>>>>>> the condition
>>>>>>>>>>>>>>> // "index >= 4", the
>>>>>>>>>>>>>>> compiler
>>>>>>>>>>>>>>> knows
>>>>>>>>>>>>>>> the value
>>>>>>>>>>>>>>> // of "index" is
>>>>>>>>>>>>>>> larger then 4,
>>>>>>>>>>>>>>> therefore the
>>>>>>>>>>>>>>> // out-of-bound
>>>>>>>>>>>>>>> warning.
>>>>>>>>>>>>>>> warn ();
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> else
>>>>>>>>>>>>>>> {
>>>>>>>>>>>>>>> *ptr = 0;
>>>>>>>>>>>>>>> *val = sg->vals[index];
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> *ptr = *val;
>>>>>>>>>>>>>>> return;
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We can see, after the thread jump optimization, the
>>>>>>>>>>>>>>> # of
>>>>>>>>>>>>>>> branches
>>>>>>>>>>>>>>> inside
>>>>>>>>>>>>>>> the routine "sparx5_set" is reduced from 2 to 1,
>>>>>>>>>>>>>>> however, due
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> code duplication (which is needed for the
>>>>>>>>>>>>>>> correctness of the
>>>>>>>>>>>>>>> code),
>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>> got a false positive out-of-bound warning.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In order to eliminate such false positive out-of-
>>>>>>>>>>>>>>> bound warning,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> A. Add one more flag for GIMPLE: is_splitted.
>>>>>>>>>>>>>>> B. During the thread jump optimization, when the
>>>>>>>>>>>>>>> basic blocks
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> duplicated, mark all the STMTs inside the original
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>> duplicated
>>>>>>>>>>>>>>> basic blocks as "is_splitted";
>>>>>>>>>>>>>>> C. Inside the array bound checker, add the
>>>>>>>>>>>>>>> following new
>>>>>>>>>>>>>>> heuristic:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If
>>>>>>>>>>>>>>> 1. the stmt is duplicated and splitted into two
>>>>>>>>>>>>>>> conditional
>>>>>>>>>>>>>>> paths;
>>>>>>>>>>>>>>> + 2. the warning level < 2;
>>>>>>>>>>>>>>> + 3. the current block is not dominating the exit
>>>>>>>>>>>>>>> block
>>>>>>>>>>>>>>> Then not report the warning.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The false positive warnings are moved from -Warray-
>>>>>>>>>>>>>>> bounds=1 to
>>>>>>>>>>>>>>> -Warray-bounds=2 now.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Bootstrapped and regression tested on both x86 and
>>>>>>>>>>>>>>> aarch64.
>>>>>>>>>>>>>>> adjusted
>>>>>>>>>>>>>>> -Warray-bounds-61.c due to the false positive
>>>>>>>>>>>>>>> warnings.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Let me know if you have any comments and
>>>>>>>>>>>>>>> suggestions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At the last Cauldron I talked with David Malcolm
>>>>>>>>>>>>>> about these kind
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>> issues and thought of instead of suppressing
>>>>>>>>>>>>>> diagnostics to
>>>>>>>>>>>>>> record
>>>>>>>>>>>>>> how a block was duplicated. For jump threading my
>>>>>>>>>>>>>> idea was to
>>>>>>>>>>>>>> record
>>>>>>>>>>>>>> the condition that was proved true when entering the
>>>>>>>>>>>>>> path and do
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>> by recording the corresponding locations
>>>>>>>>>>>>
>>>>>>>>>>>> Is only recording the location for the TRUE path enough?
>>>>>>>>>>>> We might need to record the corresponding locations for
>>>>>>>>>>>> both TRUE and
>>>>>>>>>>>> FALSE paths since the VRP might be more accurate on both
>>>>>>>>>>>> paths.
>>>>>>>>>>>> Is only recording the location is enough?
>>>>>>>>>>>> Do we need to record the pointer to the original
>>>>>>>>>>>> condition stmt?
>>>>>>>>>>>
>>>>>>>>>>> Just to be clear: I don't plan to work on this myself (I
>>>>>>>>>>> have far too
>>>>>>>>>>> much already to work on...); I'm assuming Richard Biener is
>>>>>>>>>>> hoping/planning to implement this himself.
>>>>>>>>>>
>>>>>>>>>> While I think some of this might be an improvement to those
>>>>>>>>>> vast
>>>>>>>>>> number of "false positive" diagnostics we have from (too)
>>>>>>>>>> late diagnostic
>>>>>>>>>> passes I do not have the cycles to work on this.
>>>>>>>>>
>>>>>>>>> I can study a little bit more on how to improve the diagnostics
>>>>>>>>> for PR 109071 first.
>>>>>>>>>
>>>>>>>>> FYI, I just updated PR109071’s description as: Need more
>>>>>>>>> context for -Warray-bounds warnings due to code duplication
>>>>>>>>> from jump threading.
>>>>>>>>>
>>>>>>>>> As we already agreed, this is NOT a false positive. It caught a
>>>>>>>>> real bug in linux kernel that need to be patched in the kernel
>>>>>>>>> source. (See
>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109071#c11)
>>>>>>>>>
>>>>>>>>> In order to add more context to the diagnostics to help the end
>>>>>>>>> user locate the bug accurately in their source code, compiler
>>>>>>>>> needs:
>>>>>>>>>
>>>>>>>>> 1. During jump threading phase, record the following
>>>>>>>>> information for each duplicated STMT:
>>>>>>>>> A. A pointer to the COND stmt;
>>>>>>>>> B. True/false for such COND
>>>>>>>>> 2. During array out-of-bound checking phase, whenever locate an
>>>>>>>>> out-of-bound case,
>>>>>>>>> A. Use a rich_location for the diagnostic;
>>>>>>>>> B. Create an instance of diagnositic_path, add events to
>>>>>>>>> this diagnostic_path based on the COND stmt, True/False info
>>>>>>>>> recorded in the STMT;
>>>>>>>>> C. Call richloc.set_path() to associate the path with
>>>>>>>>> the rich_location;
>>>>>>>>> D. Then issue warning with this rich_location instead of
>>>>>>>>> the current regular location.
>>>>>>>>>
>>>>>>>>> Any comment and suggestion to the above?
>>>>>>>>
>>>>>>>> I was originally thinking of using location_adhoc_data to store
>>>>>>>> 1.A
>>>>>>>> and 1.B as a common bit to associate to each
>>>>>>>> copied stmt. IIRC location_adhoc_data->data is the stmts
>>>>>>>> associated
>>>>>>>> lexical block so we'd need to stuff another
>>>>>>>> 'data' field into this struct, like a "copy history" where we
>>>>>>>> could
>>>>>>>> then even record copied-of-copy-of-X.
>>>>>>>> locataion_adhoc_data seems imperfectly packed right now, with a
>>>>>>>> 32bit
>>>>>>>> hole before 'data' and 32bit unused
>>>>>>>> at its end, so we might get away without memory use by re-
>>>>>>>> ordering
>>>>>>>> discriminator before 'data' ...
>>>>>>>
>>>>>>> Like this?
>>>>>>>
>>>>>>> diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
>>>>>>> index e6e2b0897572..ee344f91333b 100644
>>>>>>> --- a/libcpp/include/line-map.h
>>>>>>> +++ b/libcpp/include/line-map.h
>>>>>>> @@ -761,8 +761,9 @@ struct GTY(()) maps_info_macro {
>>>>>>> struct GTY(()) location_adhoc_data {
>>>>>>> location_t locus;
>>>>>>> source_range src_range;
>>>>>>> - void * GTY((skip)) data;
>>>>>>> unsigned discriminator;
>>>>>>> + void * GTY((skip)) data;
>>>>>>> + void * GTY((skip)) copy_history;
>>>>>>> };
>>>>>>> struct htab;
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> How about the copy_history? Do we need a new data structure (like
>>>>>>> the following, or other suggestion) for this? Where should I add
>>>>>>> this new data structure?
>>>>>>
>>>>>> As it needs to be managed by libcpp it should be in this very same
>>>>>> file.
>>>>>>
>>>>>>> struct copy_history {
>>>>>>> location_t condition;
>>>>>>> Bool is_true_path;
>>>>>>> }
>>>>>>
>>>>>> I think we want a pointer to the previous copy-of state as well in
>>>>>> case a stmt
>>>>>> was copied twice. We'll see whether a single (condition) location
>>>>>> plus edge flag
>>>>>> is sufficient. I'd say we should plan for an enum to indicate the
>>>>>> duplication
>>>>>> reason at least (jump threading, unswitching, unrolling come to my
>>>>>> mind). For
>>>>>> jump threading being able to say "when <condition> is true/false" is
>>>>>> probably
>>>>>> good enough, though it might not always be easy to identify a single
>>>>>> condition
>>>>>> here given a threading path starts at an incoming edge to a CFG merge
>>>>>> and
>>>>>> will usually end with the outgoing edge of a condition that we can
>>>>>> statically
>>>>>> evaluate. The condition controlling the path entry might not be
>>>>>> determined
>>>>>> fully by a single condition location.
>>>>>>
>>>>>> Possibly building a full "diagnostic path" object at threading time
>>>>>> might be
>>>>>> the only way to record all the facts, but that's also going to be
>>>>>> more
>>>>>> expensive.
>>>>>
>>>>> Note that a diagnostic_path represents a path through some kind of
>>>>> graph, whereas it sounds like you want to be storing the *nodes* in the
>>>>> graph, and later generating the diagnostic_path from that graph when we
>>>>> need it (which is trivial if the graph is actually just a tree: just
>>>>> follow the parent links backwards, then reverse it).
>>>>
>>>> I think we are mixing two things - one is that a single transform like jump
>>>> threading produces a stmt copy and when we emit a diagnostic on that
>>>> copied statement we want to tell the user the condition under which the
>>>> copy is executed. That "condition" can be actually a sequence of
>>>> conditionals. I wanted to point out that a diagnostic_path instance could
>>>> be used to describe such complex condition.
>>>>
>>>> But then the other thing I wanted to address with the link to a previous
>>>> copy_history - that's when a statement gets copied twice, for example
>>>> by two distinct jump threading optimizations. Like when dumping
>>>> the inlining decisions for diagnostics we could dump the logical "and"
>>>> of the conditions of the two threadings. Since we have a single
>>>> location per GIMPLE stmt we'd have to keep a "stack" of copy events
>>>> associated with it. That's the linked list (I think a linked list should
>>>> work fine here).
>>> Yes, the linked list to keep the “stack” of copy events should be good
>>> enough
>>> to form the sequence of conditionals event for the diagnostic_path instance.
>>>>
>>>> I realize things may get a bit "fat", but eventually we are not duplicating
>>>> statements that much. I do hope we can share for example a big
>>>> diagnostic_path when we duplicate a basic block during threading
>>>> and use one instance for all stmts in such block copy (IIRC we never
>>>> release locations or their ad-hoc data, we just make sure to never
>>>> use locations with ad-hoc data pointing to BLOCKs that we released,
>>>> but the linemap data will still have pointers in "dead" location entries,
>>>> right?)
>>> Are you still suggesting to add two artificial stmts in the beginning and
>>> the
>>> end of the duplicated block to carry the copy history information for all
>>> the
>>> stmts in the block to save space?
>>>
>>> Compared with the approach to carry such information to each duplicated
>>> stmts (which I preferred),
>>> The major concerns with the approach are:
>>> 1. Optimizations might move stmts out of these two artificial stmts,
>>> therefore we need add
>>> Some memory barrier on these two artificial stmts to prevent such
>>> movements.
>>> This might prevent good optimization from happening and result in
>>> runtime performance loss;
>>> 2. Every time when query whether the stmt is copied and get its copy
>>> history, we have to search backward or
>>> Forward through the stmt chain to get to the artificial stmts that
>>> carry the copy history, compilation time will
>>> be slower.
>>>
>>> I still think that carrying the copy history info to each duplicated stmts
>>> might be better. We can limit the length of
>>> the history to control the space, what do you think?
>>
>> Copying to each stmt is definitely easier so I think it's better to
>> start with that and only resort
>> to other things when this fails.
> Okay. Will try this approach first.
>>
>> Note we could, instead of putting a pointer to data into the ad-hoc
>> info, put in an index
>> and have the actual data be maintained elsewhere. That data could
>> even only keep
>> the "last" 1000 contexts and simply discard "older" ones. Before IPA
>> info can still be
>> transfered between functions, but after IPA where most of the
>> threadings happen the
>> extra info could be discarded once we get a function to the last pass
>> emitting diagnostics
>> we want to amend. Doing an index lookup makes it possible to not
>> worry about "stale"
>> entries.
>
> Are you suggesting to use a global array with upper-bound “1000” to hold the
> copy-history info?
> Then for each copied stmt, put in an index to this array to refer the
> copy-history linked list for it?
> Is “1000” enough for one function? We can do some experiments to decide this
> number.
>
> What do you mean by “stale” entries? The entries for the previous function?
>
> Is a function-level array enough for this purpose? i.e, for each function, we
> use an array to hold the copy-history info, and free this array after this
> function is done with compilation?
> A little confused here….
I was thinking of a hash map and noting when we create an entry after IPA so we
can scrap it when done with the function
> Thanks for clarification.
>
> Qing
>
>
>>
>> Richard.
>>
>>>
>>> Qing
>>>
>>>>
>>>> Richard.
>>>>
>>>>>
>>>>> Dave
>>>>>
>>>>>> I can think of having a -fdiagnostic-try-to-explain-harder option to
>>>>>> clarify confusing
>>>>>> diagnostics people could use on-demand?
>>>>>>
>>>>>> Richard.
>>>>>>
>>>>>>> Qing
>>>>>>>>
>>>>>>>> Richard.
>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>> Qing
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Richard.
>>>>>>>>>>
>>>>>>>>>>> But feel free to ask me any questions about the
>>>>>>>>>>> diagnostic_path
>>>>>>>>>>> machinery within the diagnostics subsystem.
>>>>>>>>>>>
>>>>>>>>>>> [...snip...]
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Dave
>
>