Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

2021-10-12 Thread Martin Liška

On 10/11/21 13:05, Maxim Kuvyrkov wrote:

On 8 Oct 2021, at 13:22, Martin Jambor  wrote:

Hi,

On Fri, Oct 01 2021, Gerald Pfeifer wrote:

On Wed, 29 Sep 2021, Maxim Kuvyrkov via Gcc wrote:

Configurations that track master branches have 3-day intervals.
Configurations that track release branches — 6 days.  If a regression is
detected it is narrowed down to component first — binutils, gcc or glibc
— and then the commit range of the component is bisected down to a
specific commit.  All.  Done.  Automatically.

I will make a presentation on this CI at the next GNU Tools Cauldron.


Yes, please! :-)

On Fri, 1 Oct 2021, Maxim Kuvyrkov via Gcc wrote:

It’s our next big improvement — to provide a dashboard with current
performance numbers and historical stats.


Awesome. And then we can even link from gcc.gnu.org.



You all are aware of the openSUSE LNT periodic SPEC benchmarker, right?
Martin may explain better how to move around it, but the two most
interesting result pages are:

- https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report and
- https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/branch



Hi Martin,

The novel part of TCWG CI is that it bisects “regressions” down to a single 
commit, thus pin-pointing the interesting commit, and can send out 
notifications to patch authors.


Hello Maxim.



We do generate a fair number of benchmarking data for AArch64 and AArch32, and 
I want to have them plotted somewhere.  I have started to put together an LNT 
instance to do that, but after a couple of days I couldn't figure out the 
setup.  Could you share the configuration of your LNT instance?  Or, perhaps, 
make it open to the community so that others can upload the results?


Sure, I would be more than happy sharing our LNT configuration. Note we don't 
use the vanilla version, because it does not
support git revisions (so that we use $timeshamp.$hash), and modified LNT GUI 
can interpret that.

As Martin mentioned, the useful page latest_runs_report is upstreamed by me:
https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report

and these pages:
https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/branch
https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/options
https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/tuning

Do rely on special naming scheme of Machines, e.g.:
benzen.spec2006.gcc-10.Ofast_generic

and a custom modification of LNT generates the pages. I can share it with you 
as well.

@Maxim: Please write me a private email and I can share all the details you 
need.

About the public LNT instance, we are likely not willing to share it right now.

Cheers,
Martin



Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org



___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

2021-10-11 Thread Maxim Kuvyrkov
> On 8 Oct 2021, at 13:22, Martin Jambor  wrote:
> 
> Hi,
> 
> On Fri, Oct 01 2021, Gerald Pfeifer wrote:
>> On Wed, 29 Sep 2021, Maxim Kuvyrkov via Gcc wrote:
>>> Configurations that track master branches have 3-day intervals.  
>>> Configurations that track release branches — 6 days.  If a regression is 
>>> detected it is narrowed down to component first — binutils, gcc or glibc 
>>> — and then the commit range of the component is bisected down to a 
>>> specific commit.  All.  Done.  Automatically.
>>> 
>>> I will make a presentation on this CI at the next GNU Tools Cauldron.
>> 
>> Yes, please! :-)
>> 
>> On Fri, 1 Oct 2021, Maxim Kuvyrkov via Gcc wrote:
>>> It’s our next big improvement — to provide a dashboard with current 
>>> performance numbers and historical stats.
>> 
>> Awesome. And then we can even link from gcc.gnu.org.
>> 
> 
> You all are aware of the openSUSE LNT periodic SPEC benchmarker, right?
> Martin may explain better how to move around it, but the two most
> interesting result pages are:
> 
> - https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report and
> - https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/branch
> 

Hi Martin,

The novel part of TCWG CI is that it bisects “regressions” down to a single 
commit, thus pin-pointing the interesting commit, and can send out 
notifications to patch authors.

We do generate a fair number of benchmarking data for AArch64 and AArch32, and 
I want to have them plotted somewhere.  I have started to put together an LNT 
instance to do that, but after a couple of days I couldn't figure out the 
setup.  Could you share the configuration of your LNT instance?  Or, perhaps, 
make it open to the community so that others can upload the results?

Thanks,

--
Maxim Kuvyrkov
https://www.linaro.org

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

2021-10-11 Thread Martin Jambor
Hi,

On Fri, Oct 01 2021, Gerald Pfeifer wrote:
> On Wed, 29 Sep 2021, Maxim Kuvyrkov via Gcc wrote:
>> Configurations that track master branches have 3-day intervals.  
>> Configurations that track release branches — 6 days.  If a regression is 
>> detected it is narrowed down to component first — binutils, gcc or glibc 
>> — and then the commit range of the component is bisected down to a 
>> specific commit.  All.  Done.  Automatically.
>> 
>> I will make a presentation on this CI at the next GNU Tools Cauldron.
>
> Yes, please! :-)
>
> On Fri, 1 Oct 2021, Maxim Kuvyrkov via Gcc wrote:
>> It’s our next big improvement — to provide a dashboard with current 
>> performance numbers and historical stats.
>
> Awesome. And then we can even link from gcc.gnu.org.
>

You all are aware of the openSUSE LNT periodic SPEC benchmarker, right?
Martin may explain better how to move around it, but the two most
interesting result pages are:

- https://lnt.opensuse.org/db_default/v4/SPEC/latest_runs_report and
- https://lnt.opensuse.org/db_default/v4/SPEC/spec_report/branch

Martin

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

2021-10-01 Thread Jeff Law



On 9/27/2021 7:52 AM, Aldy Hernandez wrote:

[CCing Jeff and list for broader audience]

On 9/27/21 2:53 PM, Maxim Kuvyrkov wrote:

Hi Aldy,

Your patch seems to slow down 471.omnetpp by 8% at -O3.  Could you 
please take a look if this is something that could be easily fixed?


First of all, thanks for chasing this down.  It's incredibly useful to 
have these types of bug reports.


Jeff and I have been discussing the repercussions of adjusting the 
loop crossing restrictions in the various threaders.  He's seen some 
regressions in embedded targets when disallowing certain corner cases 
of loop crossing threads causes all sorts of grief.


Out of curiosity, does the attached (untested) patch fix the regression?
And just a note, that patch doesn't seem to fix the regressions on 
visium or rl78.    I haven't checked any of the other regressing targets 
yet.


jeff

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

2021-10-01 Thread Gerald Pfeifer
On Wed, 29 Sep 2021, Maxim Kuvyrkov via Gcc wrote:
> Configurations that track master branches have 3-day intervals.  
> Configurations that track release branches — 6 days.  If a regression is 
> detected it is narrowed down to component first — binutils, gcc or glibc 
> — and then the commit range of the component is bisected down to a 
> specific commit.  All.  Done.  Automatically.
> 
> I will make a presentation on this CI at the next GNU Tools Cauldron.

Yes, please! :-)

On Fri, 1 Oct 2021, Maxim Kuvyrkov via Gcc wrote:
> It’s our next big improvement — to provide a dashboard with current 
> performance numbers and historical stats.

Awesome. And then we can even link from gcc.gnu.org.

Gerald
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

2021-10-01 Thread Maxim Kuvyrkov
> On 29 Sep 2021, at 21:21, Andrew MacLeod  wrote:
> 
> On 9/29/21 7:59 AM, Maxim Kuvyrkov wrote:
>> 
>>>   Does it run like once a day/some-time-period, and if you note a 
>>> regression, narrow it down?
>> Configurations that track master branches have 3-day intervals.  
>> Configurations that track release branches — 6 days.  If a regression is 
>> detected it is narrowed down to component first — binutils, gcc or glibc — 
>> and then the commit range of the component is bisected down to a specific 
>> commit.  All.  Done.  Automatically.
>> 
>> I will make a presentation on this CI at the next GNU Tools Cauldron.
>> 
>>>  Regardless, I think it could be very useful to be able to see the results 
>>> of anything you do run at whatever frequency it happens.
>> Thanks!
>> 
>> --
> 
> One more follow on question.. is this information/summary of the results 
> every 3rd day interval of master  published anywhere?  ie, to a web page or 
> posted somewhere?that seems like it could useful, especially  with a +/- 
> differential from the previous run (which you obviously calculate to 
> determine if there is a regression).

It’s our next big improvement — to provide a dashboard with current performance 
numbers and historical stats.  Performance summary information is publicly 
available as artifacts in jenkins jobs (e.g., [1]), but one needs to know 
exactly where to look.

We plan to implement the dashboard before the end of the year.

We also have raw perf.data files and benchmark executables stashed for detailed 
inspection.  I /think/, we can publish these for SPEC CPU2xxx benchmarks — they 
are all based on open-source software.  For other benchmarks (EEMBC, CoreMark 
Pro) we can’t publish much beyond time/size metrics.

[1] 
https://ci.linaro.org/view/tcwg_bmk_ci_gnu/job/tcwg_bmk_ci_gnu-build-tcwg_bmk_tx1-gnu-master-aarch64-spec2k6-O2/237/artifact/artifacts/11-check_regression/results.csv/*view*/

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

2021-09-30 Thread Andrew MacLeod

On 9/27/21 11:39 AM, Maxim Kuvyrkov via Gcc wrote:

On 27 Sep 2021, at 16:52, Aldy Hernandez  wrote:

[CCing Jeff and list for broader audience]

On 9/27/21 2:53 PM, Maxim Kuvyrkov wrote:

Hi Aldy,
Your patch seems to slow down 471.omnetpp by 8% at -O3.  Could you please take 
a look if this is something that could be easily fixed?

First of all, thanks for chasing this down.  It's incredibly useful to have 
these types of bug reports.

Thanks, Aldy, this is music to my ears :-).

We have built this automated benchmarking CI that bisects code-speed and 
code-size regressions down to a single commit.  It is still work-in-progress, 
and I’m forwarding these reports to patch authors, whose patches caused 
regressions.  If GCC community finds these useful, we can also setup posting to 
one of GCC’s mailing lists.


I second that this sort of thing is incredibly useful.   I don't suppose 
its easy to do the reverse?... let patch authors know when they've 
caused a significant improvement? :-)  That would be much less common I 
suspect, so perhaps not worth it :-)


Its certainly very useful when we are making a wholesale change to a 
pass which we think is beneficial, but aren't sure.


And a followup question...  Sometimes we have no good way of determining 
the widespread run-time effects of a change.  You seem to be running 
SPEC/other things continuously then?   Does it run like once a 
day/some-time-period, and if you note a regression, narrow it down?  
Regardless, I think it could be very useful to be able to see the 
results of anything you do run at whatever frequency it happens.




___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

2021-09-30 Thread Maxim Kuvyrkov
> On 27 Sep 2021, at 16:52, Aldy Hernandez  wrote:
> 
> [CCing Jeff and list for broader audience]
> 
> On 9/27/21 2:53 PM, Maxim Kuvyrkov wrote:
>> Hi Aldy,
>> Your patch seems to slow down 471.omnetpp by 8% at -O3.  Could you please 
>> take a look if this is something that could be easily fixed?
> 
> First of all, thanks for chasing this down.  It's incredibly useful to have 
> these types of bug reports.

Thanks, Aldy, this is music to my ears :-).

We have built this automated benchmarking CI that bisects code-speed and 
code-size regressions down to a single commit.  It is still work-in-progress, 
and I’m forwarding these reports to patch authors, whose patches caused 
regressions.  If GCC community finds these useful, we can also setup posting to 
one of GCC’s mailing lists.

> 
> Jeff and I have been discussing the repercussions of adjusting the loop 
> crossing restrictions in the various threaders.  He's seen some regressions 
> in embedded targets when disallowing certain corner cases of loop crossing 
> threads causes all sorts of grief.
> 
> Out of curiosity, does the attached (untested) patch fix the regression?

I’ll test the patch and will follow up.

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org


> 
> Aldy
> 
>> Regards,
>> --
>> Maxim Kuvyrkov
>> https://www.linaro.org
>>> On 27 Sep 2021, at 02:52, ci_not...@linaro.org wrote:
>>> 
>>> After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5
>>> Author: Aldy Hernandez 
>>> 
>>>Avoid invalid loop transformations in jump threading registry.
>>> 
>>> the following benchmarks slowed down by more than 2%:
>>> - 471.omnetpp slowed down by 8% from 6348 to 6828 perf samples
>>> 
>>> Below reproducer instructions can be used to re-build both "first_bad" and 
>>> "last_good" cross-toolchains used in this bisection.  Naturally, the 
>>> scripts will fail when triggerring benchmarking jobs if you don't have 
>>> access to Linaro TCWG CI.
>>> 
>>> For your convenience, we have uploaded tarballs with pre-processed source 
>>> and assembly files at:
>>> - First_bad save-temps: 
>>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/save-temps/
>>> - Last_good save-temps: 
>>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/save-temps/
>>> - Baseline save-temps: 
>>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/save-temps/
>>> 
>>> Configuration:
>>> - Benchmark: SPEC CPU2006
>>> - Toolchain: GCC + Glibc + GNU Linker
>>> - Version: all components were built from their tip of trunk
>>> - Target: arm-linux-gnueabihf
>>> - Compiler flags: -O3 -marm
>>> - Hardware: NVidia TK1 4x Cortex-A15
>>> 
>>> This benchmarking CI is work-in-progress, and we welcome feedback and 
>>> suggestions at linaro-toolchain@lists.linaro.org .  In our improvement 
>>> plans is to add support for SPEC CPU2017 benchmarks and provide "perf 
>>> report/annotate" data behind these reports.
>>> 
>>> THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, 
>>> REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
>>> 
>>> This commit has regressed these CI configurations:
>>> - tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3
>>> 
>>> First_bad build: 
>>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/
>>> Last_good build: 
>>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/
>>> Baseline build: 
>>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/
>>> Even more details: 
>>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/
>>> 
>>> Reproduce builds:
>>> 
>>> mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5
>>> cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5
>>> 
>>> # Fetch scripts
>>> git clone https://git.linaro.org/toolchain/jenkins-scripts
>>> 
>>> # Fetch manifests and test.sh script
>>> mkdir -p artifacts/manifests
>>> curl -o artifacts/manifests/build-baseline.sh 
>>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-baseline.sh
>>>  --fail
>>> curl -o artifacts/manifests/build-parameters.sh 
>>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-parameters.sh
>>>  --fail
>>> curl -o artifacts/test.sh 
>>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/test.sh
>>>  --fail
>>> chmod +x 

Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

2021-09-30 Thread Maxim Kuvyrkov
> On 27 Sep 2021, at 19:02, Andrew MacLeod  wrote:
> 
> On 9/27/21 11:39 AM, Maxim Kuvyrkov via Gcc wrote:
>>> On 27 Sep 2021, at 16:52, Aldy Hernandez  wrote:
>>> 
>>> [CCing Jeff and list for broader audience]
>>> 
>>> On 9/27/21 2:53 PM, Maxim Kuvyrkov wrote:
 Hi Aldy,
 Your patch seems to slow down 471.omnetpp by 8% at -O3.  Could you please 
 take a look if this is something that could be easily fixed?
>>> First of all, thanks for chasing this down.  It's incredibly useful to have 
>>> these types of bug reports.
>> Thanks, Aldy, this is music to my ears :-).
>> 
>> We have built this automated benchmarking CI that bisects code-speed and 
>> code-size regressions down to a single commit.  It is still 
>> work-in-progress, and I’m forwarding these reports to patch authors, whose 
>> patches caused regressions.  If GCC community finds these useful, we can 
>> also setup posting to one of GCC’s mailing lists.
> 
> I second that this sort of thing is incredibly useful.   I don't suppose its 
> easy to do the reverse?... let patch authors know when they've caused a 
> significant improvement? :-)  That would be much less common I suspect, so 
> perhaps not worth it :-)

We do this occasionally, when identifying a regression in a patch revert commit 
:-).  Seriously, though, it’s an easy enough code-change to the metric, but we 
are maxing out our benchmarking capacity with current configuration matrix.

> 
> Its certainly very useful when we are making a wholesale change to a pass 
> which we think is beneficial, but aren't sure.
> 
> And a followup question...  Sometimes we have no good way of determining the 
> widespread run-time effects of a change.  You seem to be running SPEC/other 
> things continuously then?

We continuously run SPEC CPU2006 on {arm,aarch64}-{-Os/-O2/-O3}-{no LTO/LTO} 
matrix for GNU and LLVM toolchains.

In the GNU toolchain we track master branches and latest-release branches of 
Binutils, GCC and Glibc — and detect code-speed and code-size regressions 
across all toolchain components.

>   Does it run like once a day/some-time-period, and if you note a regression, 
> narrow it down?

Configurations that track master branches have 3-day intervals.  Configurations 
that track release branches — 6 days.  If a regression is detected it is 
narrowed down to component first — binutils, gcc or glibc — and then the commit 
range of the component is bisected down to a specific commit.  All.  Done.  
Automatically.

I will make a presentation on this CI at the next GNU Tools Cauldron.

>  Regardless, I think it could be very useful to be able to see the results of 
> anything you do run at whatever frequency it happens.

Thanks!

--
Maxim Kuvyrkov
https://www.linaro.org

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

2021-09-30 Thread Maxim Kuvyrkov
Also, it slightly increases code size of 450.soplex at -Os -flto: 

https://lists.linaro.org/pipermail/linaro-toolchain/2021-September/007883.html

--
Maxim Kuvyrkov
https://www.linaro.org

> On 27 Sep 2021, at 15:53, Maxim Kuvyrkov  wrote:
> 
> Hi Aldy,
> 
> Your patch seems to slow down 471.omnetpp by 8% at -O3.  Could you please 
> take a look if this is something that could be easily fixed?
> 
> Regards,
> 
> --
> Maxim Kuvyrkov
> https://www.linaro.org
> 
>> On 27 Sep 2021, at 02:52, ci_not...@linaro.org wrote:
>> 
>> After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5
>> Author: Aldy Hernandez 
>> 
>>   Avoid invalid loop transformations in jump threading registry.
>> 
>> the following benchmarks slowed down by more than 2%:
>> - 471.omnetpp slowed down by 8% from 6348 to 6828 perf samples
>> 
>> Below reproducer instructions can be used to re-build both "first_bad" and 
>> "last_good" cross-toolchains used in this bisection.  Naturally, the scripts 
>> will fail when triggerring benchmarking jobs if you don't have access to 
>> Linaro TCWG CI.
>> 
>> For your convenience, we have uploaded tarballs with pre-processed source 
>> and assembly files at:
>> - First_bad save-temps: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/save-temps/
>> - Last_good save-temps: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/save-temps/
>> - Baseline save-temps: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/save-temps/
>> 
>> Configuration:
>> - Benchmark: SPEC CPU2006
>> - Toolchain: GCC + Glibc + GNU Linker
>> - Version: all components were built from their tip of trunk
>> - Target: arm-linux-gnueabihf
>> - Compiler flags: -O3 -marm
>> - Hardware: NVidia TK1 4x Cortex-A15
>> 
>> This benchmarking CI is work-in-progress, and we welcome feedback and 
>> suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans 
>> is to add support for SPEC CPU2017 benchmarks and provide "perf 
>> report/annotate" data behind these reports.
>> 
>> THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, 
>> REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
>> 
>> This commit has regressed these CI configurations:
>> - tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3
>> 
>> First_bad build: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/
>> Last_good build: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/
>> Baseline build: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/
>> Even more details: 
>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/
>> 
>> Reproduce builds:
>> 
>> mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5
>> cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5
>> 
>> # Fetch scripts
>> git clone https://git.linaro.org/toolchain/jenkins-scripts
>> 
>> # Fetch manifests and test.sh script
>> mkdir -p artifacts/manifests
>> curl -o artifacts/manifests/build-baseline.sh 
>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-baseline.sh
>>  --fail
>> curl -o artifacts/manifests/build-parameters.sh 
>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-parameters.sh
>>  --fail
>> curl -o artifacts/test.sh 
>> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/test.sh
>>  --fail
>> chmod +x artifacts/test.sh
>> 
>> # Reproduce the baseline build (build all pre-requisites)
>> ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
>> 
>> # Save baseline build state (which is then restored in artifacts/test.sh)
>> mkdir -p ./bisect
>> rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
>> --exclude /gcc/ ./ ./bisect/baseline/
>> 
>> cd gcc
>> 
>> # Reproduce first_bad build
>> git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5
>> ../artifacts/test.sh
>> 
>> # Reproduce last_good build
>> git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a
>> ../artifacts/test.sh
>> 
>> cd ..
>> 
>> 
>> Full commit (up to 1000 lines):
>> 
>> commit 4a960d548b7d7d942f316c5295f6d849b74214f5
>> Author: Aldy Hernandez 
>> Date:   Thu Sep 23 10:59:24 2021 +0200
>> 
>>   Avoid invalid loop transformations in jump threading registry.
>> 
>>   My upcoming 

Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

2021-09-30 Thread Maxim Kuvyrkov
Hi Aldy,

Your patch seems to slow down 471.omnetpp by 8% at -O3.  Could you please take 
a look if this is something that could be easily fixed?

Regards,

--
Maxim Kuvyrkov
https://www.linaro.org

> On 27 Sep 2021, at 02:52, ci_not...@linaro.org wrote:
> 
> After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5
> Author: Aldy Hernandez 
> 
>Avoid invalid loop transformations in jump threading registry.
> 
> the following benchmarks slowed down by more than 2%:
> - 471.omnetpp slowed down by 8% from 6348 to 6828 perf samples
> 
> Below reproducer instructions can be used to re-build both "first_bad" and 
> "last_good" cross-toolchains used in this bisection.  Naturally, the scripts 
> will fail when triggerring benchmarking jobs if you don't have access to 
> Linaro TCWG CI.
> 
> For your convenience, we have uploaded tarballs with pre-processed source and 
> assembly files at:
> - First_bad save-temps: 
> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/save-temps/
> - Last_good save-temps: 
> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/save-temps/
> - Baseline save-temps: 
> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/save-temps/
> 
> Configuration:
> - Benchmark: SPEC CPU2006
> - Toolchain: GCC + Glibc + GNU Linker
> - Version: all components were built from their tip of trunk
> - Target: arm-linux-gnueabihf
> - Compiler flags: -O3 -marm
> - Hardware: NVidia TK1 4x Cortex-A15
> 
> This benchmarking CI is work-in-progress, and we welcome feedback and 
> suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans 
> is to add support for SPEC CPU2017 benchmarks and provide "perf 
> report/annotate" data behind these reports.
> 
> THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, 
> REPRODUCTION INSTRUCTIONS, AND THE RAW COMMIT.
> 
> This commit has regressed these CI configurations:
> - tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3
> 
> First_bad build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/
> Last_good build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/
> Baseline build: 
> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/
> Even more details: 
> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/
> 
> Reproduce builds:
> 
> mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5
> cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5
> 
> # Fetch scripts
> git clone https://git.linaro.org/toolchain/jenkins-scripts
> 
> # Fetch manifests and test.sh script
> mkdir -p artifacts/manifests
> curl -o artifacts/manifests/build-baseline.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-baseline.sh
>  --fail
> curl -o artifacts/manifests/build-parameters.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-parameters.sh
>  --fail
> curl -o artifacts/test.sh 
> https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/test.sh
>  --fail
> chmod +x artifacts/test.sh
> 
> # Reproduce the baseline build (build all pre-requisites)
> ./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh
> 
> # Save baseline build state (which is then restored in artifacts/test.sh)
> mkdir -p ./bisect
> rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
> --exclude /gcc/ ./ ./bisect/baseline/
> 
> cd gcc
> 
> # Reproduce first_bad build
> git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5
> ../artifacts/test.sh
> 
> # Reproduce last_good build
> git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a
> ../artifacts/test.sh
> 
> cd ..
> 
> 
> Full commit (up to 1000 lines):
> 
> commit 4a960d548b7d7d942f316c5295f6d849b74214f5
> Author: Aldy Hernandez 
> Date:   Thu Sep 23 10:59:24 2021 +0200
> 
>Avoid invalid loop transformations in jump threading registry.
> 
>My upcoming improvements to the forward jump threader make it thread
>more aggressively.  In investigating some "regressions", I noticed
>that it has always allowed threading through empty latches and across
>loop boundaries.  As we have discussed recently, this should be avoided
>until after loop optimizations have run their course.
> 
>Note that this wasn't much 

Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

2021-09-30 Thread Andrew MacLeod

On 9/29/21 7:59 AM, Maxim Kuvyrkov wrote:



   Does it run like once a day/some-time-period, and if you note a regression, 
narrow it down?

Configurations that track master branches have 3-day intervals.  Configurations 
that track release branches — 6 days.  If a regression is detected it is 
narrowed down to component first — binutils, gcc or glibc — and then the commit 
range of the component is bisected down to a specific commit.  All.  Done.  
Automatically.

I will make a presentation on this CI at the next GNU Tools Cauldron.


  Regardless, I think it could be very useful to be able to see the results of 
anything you do run at whatever frequency it happens.

Thanks!

--


One more follow on question.. is this information/summary of the results 
every 3rd day interval of master  published anywhere?  ie, to a web page 
or posted somewhere?    that seems like it could useful, especially  
with a +/- differential from the previous run (which you obviously 
calculate to determine if there is a regression).


Anyway, I like it!

Andrew

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
https://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

2021-09-30 Thread Aldy Hernandez

[CCing Jeff and list for broader audience]

On 9/27/21 2:53 PM, Maxim Kuvyrkov wrote:

Hi Aldy,

Your patch seems to slow down 471.omnetpp by 8% at -O3.  Could you please take 
a look if this is something that could be easily fixed?


First of all, thanks for chasing this down.  It's incredibly useful to 
have these types of bug reports.


Jeff and I have been discussing the repercussions of adjusting the loop 
crossing restrictions in the various threaders.  He's seen some 
regressions in embedded targets when disallowing certain corner cases of 
loop crossing threads causes all sorts of grief.


Out of curiosity, does the attached (untested) patch fix the regression?

Aldy



Regards,

--
Maxim Kuvyrkov
https://www.linaro.org


On 27 Sep 2021, at 02:52, ci_not...@linaro.org wrote:

After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5
Author: Aldy Hernandez 

Avoid invalid loop transformations in jump threading registry.

the following benchmarks slowed down by more than 2%:
- 471.omnetpp slowed down by 8% from 6348 to 6828 perf samples

Below reproducer instructions can be used to re-build both "first_bad" and 
"last_good" cross-toolchains used in this bisection.  Naturally, the scripts will fail 
when triggerring benchmarking jobs if you don't have access to Linaro TCWG CI.

For your convenience, we have uploaded tarballs with pre-processed source and 
assembly files at:
- First_bad save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/save-temps/
- Last_good save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/save-temps/
- Baseline save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/save-temps/

Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: GCC + Glibc + GNU Linker
- Version: all components were built from their tip of trunk
- Target: arm-linux-gnueabihf
- Compiler flags: -O3 -marm
- Hardware: NVidia TK1 4x Cortex-A15

This benchmarking CI is work-in-progress, and we welcome feedback and suggestions at 
linaro-toolchain@lists.linaro.org .  In our improvement plans is to add support for SPEC 
CPU2017 benchmarks and provide "perf report/annotate" data behind these reports.

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION 
INSTRUCTIONS, AND THE RAW COMMIT.

This commit has regressed these CI configurations:
- tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3

First_bad build: 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/
Last_good build: 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/
Baseline build: 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/
Even more details: 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/

Reproduce builds:

mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5
cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-baseline.sh
 --fail
curl -o artifacts/manifests/build-parameters.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-parameters.sh
 --fail
curl -o artifacts/test.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/test.sh
 --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
--exclude /gcc/ ./ ./bisect/baseline/

cd gcc

# Reproduce first_bad build
git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a
../artifacts/test.sh

cd ..


Full commit (up to 1000 lines):

commit 4a960d548b7d7d942f316c5295f6d849b74214f5
Author: Aldy Hernandez 
Date:   Thu Sep 23 10:59:24 2021 +0200

Avoid invalid loop transformations in jump threading registry.

My upcoming improvements to the forward jump 

[TCWG CI] 471.omnetpp slowed down by 8% after gcc: Avoid invalid loop transformations in jump threading registry.

2021-09-26 Thread ci_notify
After gcc commit 4a960d548b7d7d942f316c5295f6d849b74214f5
Author: Aldy Hernandez 

Avoid invalid loop transformations in jump threading registry.

the following benchmarks slowed down by more than 2%:
- 471.omnetpp slowed down by 8% from 6348 to 6828 perf samples

Below reproducer instructions can be used to re-build both "first_bad" and 
"last_good" cross-toolchains used in this bisection.  Naturally, the scripts 
will fail when triggerring benchmarking jobs if you don't have access to Linaro 
TCWG CI.

For your convenience, we have uploaded tarballs with pre-processed source and 
assembly files at:
- First_bad save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/save-temps/
- Last_good save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/save-temps/
- Baseline save-temps: 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/save-temps/

Configuration:
- Benchmark: SPEC CPU2006
- Toolchain: GCC + Glibc + GNU Linker
- Version: all components were built from their tip of trunk
- Target: arm-linux-gnueabihf
- Compiler flags: -O3 -marm
- Hardware: NVidia TK1 4x Cortex-A15

This benchmarking CI is work-in-progress, and we welcome feedback and 
suggestions at linaro-toolchain@lists.linaro.org .  In our improvement plans is 
to add support for SPEC CPU2017 benchmarks and provide "perf report/annotate" 
data behind these reports.

THIS IS THE END OF INTERESTING STUFF.  BELOW ARE LINKS TO BUILDS, REPRODUCTION 
INSTRUCTIONS, AND THE RAW COMMIT.

This commit has regressed these CI configurations:
 - tcwg_bmk_gnu_tk1/gnu-master-arm-spec2k6-O3

First_bad build: 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-4a960d548b7d7d942f316c5295f6d849b74214f5/
Last_good build: 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-29c92857039d0a105281be61c10c9e851aaeea4a/
Baseline build: 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/build-baseline/
Even more details: 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/

Reproduce builds:

mkdir investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5
cd investigate-gcc-4a960d548b7d7d942f316c5295f6d849b74214f5

# Fetch scripts
git clone https://git.linaro.org/toolchain/jenkins-scripts

# Fetch manifests and test.sh script
mkdir -p artifacts/manifests
curl -o artifacts/manifests/build-baseline.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-baseline.sh
 --fail
curl -o artifacts/manifests/build-parameters.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/manifests/build-parameters.sh
 --fail
curl -o artifacts/test.sh 
https://ci.linaro.org/job/tcwg_bmk_ci_gnu-bisect-tcwg_bmk_tk1-gnu-master-arm-spec2k6-O3/40/artifact/artifacts/test.sh
 --fail
chmod +x artifacts/test.sh

# Reproduce the baseline build (build all pre-requisites)
./jenkins-scripts/tcwg_bmk-build.sh @@ artifacts/manifests/build-baseline.sh

# Save baseline build state (which is then restored in artifacts/test.sh)
mkdir -p ./bisect
rsync -a --del --delete-excluded --exclude /bisect/ --exclude /artifacts/ 
--exclude /gcc/ ./ ./bisect/baseline/

cd gcc

# Reproduce first_bad build
git checkout --detach 4a960d548b7d7d942f316c5295f6d849b74214f5
../artifacts/test.sh

# Reproduce last_good build
git checkout --detach 29c92857039d0a105281be61c10c9e851aaeea4a
../artifacts/test.sh

cd ..


Full commit (up to 1000 lines):

commit 4a960d548b7d7d942f316c5295f6d849b74214f5
Author: Aldy Hernandez 
Date:   Thu Sep 23 10:59:24 2021 +0200

Avoid invalid loop transformations in jump threading registry.

My upcoming improvements to the forward jump threader make it thread
more aggressively.  In investigating some "regressions", I noticed
that it has always allowed threading through empty latches and across
loop boundaries.  As we have discussed recently, this should be avoided
until after loop optimizations have run their course.

Note that this wasn't much of a problem before because DOM/VRP
couldn't find these opportunities, but with a smarter solver, we trip
over them more easily.

Because the forward threader doesn't have an independent localized cost
model like the new threader (profitable_path_p), it is difficult to
catch these things at discovery.  However, we can catch them at
registration time, with the added benefit that all the threaders
(forward and backward)