> On Jun 5, 2024, at 13:07, Richard Biener <richard.guent...@gmail.com> wrote: > > > >> Am 05.06.2024 um 18:39 schrieb Qing Zhao <qing.z...@oracle.com>: >> >> >> >>> On Jun 5, 2024, at 03:26, Richard Biener <richard.guent...@gmail.com> wrote: >>> >>>> On Tue, Jun 4, 2024 at 10:31 PM Qing Zhao <qing.z...@oracle.com> wrote: >>>> >>>> >>>> >>>>> On Jun 4, 2024, at 03:43, Richard Biener <richard.guent...@gmail.com> >>>>> wrote: >>>>> >>>>> On Mon, Jun 3, 2024 at 4:48 PM David Malcolm <dmalc...@redhat.com> wrote: >>>>>> >>>>>> On Mon, 2024-06-03 at 08:29 +0200, Richard Biener wrote: >>>>>>> On Fri, May 31, 2024 at 11:23 PM Qing Zhao <qing.z...@oracle.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On May 23, 2024, at 07:46, Richard Biener >>>>>>>>> <richard.guent...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> On Wed, May 22, 2024 at 8:53 PM Qing Zhao <qing.z...@oracle.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On May 22, 2024, at 03:38, Richard Biener >>>>>>>>>>> <richard.guent...@gmail.com> wrote: >>>>>>>>>>> >>>>>>>>>>> On Tue, May 21, 2024 at 11:36 PM David Malcolm >>>>>>>>>>> <dmalc...@redhat.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> On Tue, 2024-05-21 at 15:13 +0000, Qing Zhao wrote: >>>>>>>>>>>>> Thanks for the comments and suggestions. >>>>>>>>>>>>> >>>>>>>>>>>>>> On May 15, 2024, at 10:00, David Malcolm >>>>>>>>>>>>>> <dmalc...@redhat.com> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, 2024-05-14 at 15:08 +0200, Richard Biener >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> On Mon, 13 May 2024, Qing Zhao wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Warray-bounds is an important option to enable >>>>>>>>>>>>>>>> linux kernal to >>>>>>>>>>>>>>>> keep >>>>>>>>>>>>>>>> the array out-of-bound errors out of the source >>>>>>>>>>>>>>>> tree. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> However, due to the false positive warnings >>>>>>>>>>>>>>>> reported in >>>>>>>>>>>>>>>> PR109071 >>>>>>>>>>>>>>>> (-Warray-bounds false positive warnings due to code >>>>>>>>>>>>>>>> duplication >>>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>> jump threading), -Warray-bounds=1 cannot be added >>>>>>>>>>>>>>>> on by >>>>>>>>>>>>>>>> default. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Although it's impossible to elinimate all the false >>>>>>>>>>>>>>>> positive >>>>>>>>>>>>>>>> warnings >>>>>>>>>>>>>>>> from -Warray-bounds=1 (See PR104355 Misleading - >>>>>>>>>>>>>>>> Warray-bounds >>>>>>>>>>>>>>>> documentation says "always out of bounds"), we >>>>>>>>>>>>>>>> should minimize >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> false positive warnings in -Warray-bounds=1. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The root reason for the false positive warnings >>>>>>>>>>>>>>>> reported in >>>>>>>>>>>>>>>> PR109071 is: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> When the thread jump optimization tries to reduce >>>>>>>>>>>>>>>> the # of >>>>>>>>>>>>>>>> branches >>>>>>>>>>>>>>>> inside the routine, sometimes it needs to duplicate >>>>>>>>>>>>>>>> the code >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> split into two conditional pathes. for example: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The original code: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> void sparx5_set (int * ptr, struct nums * sg, int >>>>>>>>>>>>>>>> index) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> if (index >= 4) >>>>>>>>>>>>>>>> warn (); >>>>>>>>>>>>>>>> *ptr = 0; >>>>>>>>>>>>>>>> *val = sg->vals[index]; >>>>>>>>>>>>>>>> if (index >= 4) >>>>>>>>>>>>>>>> warn (); >>>>>>>>>>>>>>>> *ptr = *val; >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> return; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> With the thread jump, the above becomes: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> void sparx5_set (int * ptr, struct nums * sg, int >>>>>>>>>>>>>>>> index) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> if (index >= 4) >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> warn (); >>>>>>>>>>>>>>>> *ptr = 0; // Code duplications since >>>>>>>>>>>>>>>> "warn" does >>>>>>>>>>>>>>>> return; >>>>>>>>>>>>>>>> *val = sg->vals[index]; // same this line. >>>>>>>>>>>>>>>> // In this path, >>>>>>>>>>>>>>>> since it's >>>>>>>>>>>>>>>> under >>>>>>>>>>>>>>>> the condition >>>>>>>>>>>>>>>> // "index >= 4", the >>>>>>>>>>>>>>>> compiler >>>>>>>>>>>>>>>> knows >>>>>>>>>>>>>>>> the value >>>>>>>>>>>>>>>> // of "index" is >>>>>>>>>>>>>>>> larger then 4, >>>>>>>>>>>>>>>> therefore the >>>>>>>>>>>>>>>> // out-of-bound >>>>>>>>>>>>>>>> warning. >>>>>>>>>>>>>>>> warn (); >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> else >>>>>>>>>>>>>>>> { >>>>>>>>>>>>>>>> *ptr = 0; >>>>>>>>>>>>>>>> *val = sg->vals[index]; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> *ptr = *val; >>>>>>>>>>>>>>>> return; >>>>>>>>>>>>>>>> } >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> We can see, after the thread jump optimization, the >>>>>>>>>>>>>>>> # of >>>>>>>>>>>>>>>> branches >>>>>>>>>>>>>>>> inside >>>>>>>>>>>>>>>> the routine "sparx5_set" is reduced from 2 to 1, >>>>>>>>>>>>>>>> however, due >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> code duplication (which is needed for the >>>>>>>>>>>>>>>> correctness of the >>>>>>>>>>>>>>>> code), >>>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>> got a false positive out-of-bound warning. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In order to eliminate such false positive out-of- >>>>>>>>>>>>>>>> bound warning, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> A. Add one more flag for GIMPLE: is_splitted. >>>>>>>>>>>>>>>> B. During the thread jump optimization, when the >>>>>>>>>>>>>>>> basic blocks >>>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>> duplicated, mark all the STMTs inside the original >>>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> duplicated >>>>>>>>>>>>>>>> basic blocks as "is_splitted"; >>>>>>>>>>>>>>>> C. Inside the array bound checker, add the >>>>>>>>>>>>>>>> following new >>>>>>>>>>>>>>>> heuristic: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If >>>>>>>>>>>>>>>> 1. the stmt is duplicated and splitted into two >>>>>>>>>>>>>>>> conditional >>>>>>>>>>>>>>>> paths; >>>>>>>>>>>>>>>> + 2. the warning level < 2; >>>>>>>>>>>>>>>> + 3. the current block is not dominating the exit >>>>>>>>>>>>>>>> block >>>>>>>>>>>>>>>> Then not report the warning. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The false positive warnings are moved from -Warray- >>>>>>>>>>>>>>>> bounds=1 to >>>>>>>>>>>>>>>> -Warray-bounds=2 now. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Bootstrapped and regression tested on both x86 and >>>>>>>>>>>>>>>> aarch64. >>>>>>>>>>>>>>>> adjusted >>>>>>>>>>>>>>>> -Warray-bounds-61.c due to the false positive >>>>>>>>>>>>>>>> warnings. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Let me know if you have any comments and >>>>>>>>>>>>>>>> suggestions. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> At the last Cauldron I talked with David Malcolm >>>>>>>>>>>>>>> about these kind >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>> issues and thought of instead of suppressing >>>>>>>>>>>>>>> diagnostics to >>>>>>>>>>>>>>> record >>>>>>>>>>>>>>> how a block was duplicated. For jump threading my >>>>>>>>>>>>>>> idea was to >>>>>>>>>>>>>>> record >>>>>>>>>>>>>>> the condition that was proved true when entering the >>>>>>>>>>>>>>> path and do >>>>>>>>>>>>>>> this >>>>>>>>>>>>>>> by recording the corresponding locations >>>>>>>>>>>>> >>>>>>>>>>>>> Is only recording the location for the TRUE path enough? >>>>>>>>>>>>> We might need to record the corresponding locations for >>>>>>>>>>>>> both TRUE and >>>>>>>>>>>>> FALSE paths since the VRP might be more accurate on both >>>>>>>>>>>>> paths. >>>>>>>>>>>>> Is only recording the location is enough? >>>>>>>>>>>>> Do we need to record the pointer to the original >>>>>>>>>>>>> condition stmt? >>>>>>>>>>>> >>>>>>>>>>>> Just to be clear: I don't plan to work on this myself (I >>>>>>>>>>>> have far too >>>>>>>>>>>> much already to work on...); I'm assuming Richard Biener is >>>>>>>>>>>> hoping/planning to implement this himself. >>>>>>>>>>> >>>>>>>>>>> While I think some of this might be an improvement to those >>>>>>>>>>> vast >>>>>>>>>>> number of "false positive" diagnostics we have from (too) >>>>>>>>>>> late diagnostic >>>>>>>>>>> passes I do not have the cycles to work on this. >>>>>>>>>> >>>>>>>>>> I can study a little bit more on how to improve the diagnostics >>>>>>>>>> for PR 109071 first. >>>>>>>>>> >>>>>>>>>> FYI, I just updated PR109071’s description as: Need more >>>>>>>>>> context for -Warray-bounds warnings due to code duplication >>>>>>>>>> from jump threading. >>>>>>>>>> >>>>>>>>>> As we already agreed, this is NOT a false positive. It caught a >>>>>>>>>> real bug in linux kernel that need to be patched in the kernel >>>>>>>>>> source. (See >>>>>>>>>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109071#c11) >>>>>>>>>> >>>>>>>>>> In order to add more context to the diagnostics to help the end >>>>>>>>>> user locate the bug accurately in their source code, compiler >>>>>>>>>> needs: >>>>>>>>>> >>>>>>>>>> 1. During jump threading phase, record the following >>>>>>>>>> information for each duplicated STMT: >>>>>>>>>> A. A pointer to the COND stmt; >>>>>>>>>> B. True/false for such COND >>>>>>>>>> 2. During array out-of-bound checking phase, whenever locate an >>>>>>>>>> out-of-bound case, >>>>>>>>>> A. Use a rich_location for the diagnostic; >>>>>>>>>> B. Create an instance of diagnositic_path, add events to >>>>>>>>>> this diagnostic_path based on the COND stmt, True/False info >>>>>>>>>> recorded in the STMT; >>>>>>>>>> C. Call richloc.set_path() to associate the path with >>>>>>>>>> the rich_location; >>>>>>>>>> D. Then issue warning with this rich_location instead of >>>>>>>>>> the current regular location. >>>>>>>>>> >>>>>>>>>> Any comment and suggestion to the above? >>>>>>>>> >>>>>>>>> I was originally thinking of using location_adhoc_data to store >>>>>>>>> 1.A >>>>>>>>> and 1.B as a common bit to associate to each >>>>>>>>> copied stmt. IIRC location_adhoc_data->data is the stmts >>>>>>>>> associated >>>>>>>>> lexical block so we'd need to stuff another >>>>>>>>> 'data' field into this struct, like a "copy history" where we >>>>>>>>> could >>>>>>>>> then even record copied-of-copy-of-X. >>>>>>>>> locataion_adhoc_data seems imperfectly packed right now, with a >>>>>>>>> 32bit >>>>>>>>> hole before 'data' and 32bit unused >>>>>>>>> at its end, so we might get away without memory use by re- >>>>>>>>> ordering >>>>>>>>> discriminator before 'data' ... >>>>>>>> >>>>>>>> Like this? >>>>>>>> >>>>>>>> diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h >>>>>>>> index e6e2b0897572..ee344f91333b 100644 >>>>>>>> --- a/libcpp/include/line-map.h >>>>>>>> +++ b/libcpp/include/line-map.h >>>>>>>> @@ -761,8 +761,9 @@ struct GTY(()) maps_info_macro { >>>>>>>> struct GTY(()) location_adhoc_data { >>>>>>>> location_t locus; >>>>>>>> source_range src_range; >>>>>>>> - void * GTY((skip)) data; >>>>>>>> unsigned discriminator; >>>>>>>> + void * GTY((skip)) data; >>>>>>>> + void * GTY((skip)) copy_history; >>>>>>>> }; >>>>>>>> struct htab; >>>>>>> >>>>>>> Yes. >>>>>>> >>>>>>>> How about the copy_history? Do we need a new data structure (like >>>>>>>> the following, or other suggestion) for this? Where should I add >>>>>>>> this new data structure? >>>>>>> >>>>>>> As it needs to be managed by libcpp it should be in this very same >>>>>>> file. >>>>>>> >>>>>>>> struct copy_history { >>>>>>>> location_t condition; >>>>>>>> Bool is_true_path; >>>>>>>> } >>>>>>> >>>>>>> I think we want a pointer to the previous copy-of state as well in >>>>>>> case a stmt >>>>>>> was copied twice. We'll see whether a single (condition) location >>>>>>> plus edge flag >>>>>>> is sufficient. I'd say we should plan for an enum to indicate the >>>>>>> duplication >>>>>>> reason at least (jump threading, unswitching, unrolling come to my >>>>>>> mind). For >>>>>>> jump threading being able to say "when <condition> is true/false" is >>>>>>> probably >>>>>>> good enough, though it might not always be easy to identify a single >>>>>>> condition >>>>>>> here given a threading path starts at an incoming edge to a CFG merge >>>>>>> and >>>>>>> will usually end with the outgoing edge of a condition that we can >>>>>>> statically >>>>>>> evaluate. The condition controlling the path entry might not be >>>>>>> determined >>>>>>> fully by a single condition location. >>>>>>> >>>>>>> Possibly building a full "diagnostic path" object at threading time >>>>>>> might be >>>>>>> the only way to record all the facts, but that's also going to be >>>>>>> more >>>>>>> expensive. >>>>>> >>>>>> Note that a diagnostic_path represents a path through some kind of >>>>>> graph, whereas it sounds like you want to be storing the *nodes* in the >>>>>> graph, and later generating the diagnostic_path from that graph when we >>>>>> need it (which is trivial if the graph is actually just a tree: just >>>>>> follow the parent links backwards, then reverse it). >>>>> >>>>> I think we are mixing two things - one is that a single transform like >>>>> jump >>>>> threading produces a stmt copy and when we emit a diagnostic on that >>>>> copied statement we want to tell the user the condition under which the >>>>> copy is executed. That "condition" can be actually a sequence of >>>>> conditionals. I wanted to point out that a diagnostic_path instance could >>>>> be used to describe such complex condition. >>>>> >>>>> But then the other thing I wanted to address with the link to a previous >>>>> copy_history - that's when a statement gets copied twice, for example >>>>> by two distinct jump threading optimizations. Like when dumping >>>>> the inlining decisions for diagnostics we could dump the logical "and" >>>>> of the conditions of the two threadings. Since we have a single >>>>> location per GIMPLE stmt we'd have to keep a "stack" of copy events >>>>> associated with it. That's the linked list (I think a linked list should >>>>> work fine here). >>>> Yes, the linked list to keep the “stack” of copy events should be good >>>> enough >>>> to form the sequence of conditionals event for the diagnostic_path >>>> instance. >>>>> >>>>> I realize things may get a bit "fat", but eventually we are not >>>>> duplicating >>>>> statements that much. I do hope we can share for example a big >>>>> diagnostic_path when we duplicate a basic block during threading >>>>> and use one instance for all stmts in such block copy (IIRC we never >>>>> release locations or their ad-hoc data, we just make sure to never >>>>> use locations with ad-hoc data pointing to BLOCKs that we released, >>>>> but the linemap data will still have pointers in "dead" location entries, >>>>> right?) >>>> Are you still suggesting to add two artificial stmts in the beginning and >>>> the >>>> end of the duplicated block to carry the copy history information for all >>>> the >>>> stmts in the block to save space? >>>> >>>> Compared with the approach to carry such information to each duplicated >>>> stmts (which I preferred), >>>> The major concerns with the approach are: >>>> 1. Optimizations might move stmts out of these two artificial stmts, >>>> therefore we need add >>>> Some memory barrier on these two artificial stmts to prevent such >>>> movements. >>>> This might prevent good optimization from happening and result in >>>> runtime performance loss; >>>> 2. Every time when query whether the stmt is copied and get its copy >>>> history, we have to search backward or >>>> Forward through the stmt chain to get to the artificial stmts that >>>> carry the copy history, compilation time will >>>> be slower. >>>> >>>> I still think that carrying the copy history info to each duplicated stmts >>>> might be better. We can limit the length of >>>> the history to control the space, what do you think? >>> >>> Copying to each stmt is definitely easier so I think it's better to >>> start with that and only resort >>> to other things when this fails. >> Okay. Will try this approach first. >>> >>> Note we could, instead of putting a pointer to data into the ad-hoc >>> info, put in an index >>> and have the actual data be maintained elsewhere. That data could >>> even only keep >>> the "last" 1000 contexts and simply discard "older" ones. Before IPA >>> info can still be >>> transfered between functions, but after IPA where most of the >>> threadings happen the >>> extra info could be discarded once we get a function to the last pass >>> emitting diagnostics >>> we want to amend. Doing an index lookup makes it possible to not >>> worry about "stale" >>> entries. >> >> Are you suggesting to use a global array with upper-bound “1000” to hold the >> copy-history info? >> Then for each copied stmt, put in an index to this array to refer the >> copy-history linked list for it? >> Is “1000” enough for one function? We can do some experiments to decide this >> number. >> >> What do you mean by “stale” entries? The entries for the previous function? >> >> Is a function-level array enough for this purpose? i.e, for each function, >> we use an array to hold the copy-history info, and free this array after >> this function is done with compilation? >> A little confused here…. > > I was thinking of a hash map and noting when we create an entry after IPA so > we can scrap it when done with the function Okay. I see. then where should this hash map be declared? diagnostic-spec.h?
Qing > >> Thanks for clarification. >> >> Qing >> >> >>> >>> Richard. >>> >>>> >>>> Qing >>>> >>>>> >>>>> Richard. >>>>> >>>>>> >>>>>> Dave >>>>>> >>>>>>> I can think of having a -fdiagnostic-try-to-explain-harder option to >>>>>>> clarify confusing >>>>>>> diagnostics people could use on-demand? >>>>>>> >>>>>>> Richard. >>>>>>> >>>>>>>> Qing >>>>>>>>> >>>>>>>>> Richard. >>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> Qing >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Richard. >>>>>>>>>>> >>>>>>>>>>>> But feel free to ask me any questions about the >>>>>>>>>>>> diagnostic_path >>>>>>>>>>>> machinery within the diagnostics subsystem. >>>>>>>>>>>> >>>>>>>>>>>> [...snip...] >>>>>>>>>>>> >>>>>>>>>>>> Regards >>>>>>>>>>>> Dave