> What about the case where the rebase *lessens* the improvement? That
is, you're expecting these 10 cases to improve, but after a rebase, only
1 improves. That's news! But a blanket "accept improvements" won't tell you.
I don't think that scenario currently triggers a CI failure. So this
wouldn'
Yes, this is exactly one of the issues that marge might run into as well,
the aggregate ends up performing differently from the individual ones. Now
we have marge to ensure that at least the aggregate builds together, which
is the whole point of these merge trains. Not to end up in a situation
wher
What about the case where the rebase *lessens* the improvement? That is, you're
expecting these 10 cases to improve, but after a rebase, only 1 improves.
That's news! But a blanket "accept improvements" won't tell you.
I'm not hard against this proposal, because I know precise tracking has its o
After the idea of letting marge accept unexpected perf improvements and
looking at https://gitlab.haskell.org/ghc/ghc/-/merge_requests/4759
which failed because of a single test, for a single build flavour
crossing the
improvement threshold where CI fails after rebasing I wondered.
When would acc
Simon Peyton Jones via ghc-devs writes:
> > We need to do something about this, and I'd advocate for just not making
> > stats fail with marge.
>
> Generally I agree. One point you don’t mention is that our perf tests
> (which CI forces us to look at assiduously) are often pretty weird
> cases.
Karel Gardas writes:
> On 3/17/21 4:16 PM, Andreas Klebinger wrote:
>> Now that isn't really an issue anyway I think. The question is rather is
>> 2% a large enough regression to worry about? 5%? 10%?
>
> 5-10% is still around system noise even on lightly loaded workstation.
> Not sure if CI is n
My guess is most of the "noise" is not run time, but the compiled code
changing in hard to predict ways.
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/1776/diffs for
example was a very small PR that took *months* of on-off work to get
passing metrics tests. In the end, binding `is_boot`
I left the wiggle room for things like longer wall time causing more time
events in the IO Manager/RTS which can be a thermal/HW issue.
They're small and indirect though
-davean
On Thu, Mar 18, 2021 at 1:37 PM Sebastian Graf wrote:
> To be clear: All performance tests that run as part of CI mea
To be clear: All performance tests that run as part of CI measure
allocations only. No wall clock time.
Those measurements are (mostly) deterministic and reproducible between
compiles of the same worktree and not impacted by thermal issues/hardware
at all.
Am Do., 18. März 2021 um 18:09 Uhr schrie
That really shouldn't be near system noise for a well constructed
performance test. You might be seeing things like thermal issues, etc
though - good benchmarking is a serious subject.
Also we're not talking wall clock tests, we're talking specific metrics.
The machines do tend to be bare metal, bu
On 3/17/21 4:16 PM, Andreas Klebinger wrote:
> Now that isn't really an issue anyway I think. The question is rather is
> 2% a large enough regression to worry about? 5%? 10%?
5-10% is still around system noise even on lightly loaded workstation.
Not sure if CI is not run on some shared cloud reso
On 17 Mar 2021, at 16:16, Andreas Klebinger wrote:
>
> While I fully agree with this. We should *always* want to know if a small
> syntetic benchmark regresses by a lot.
> Or in other words we don't want CI to accept such a regression for us ever,
> but the developer of a patch should need to e
> I'd be quite happy to accept a 25% regression on T9872c if it yielded
a 1% improvement on compiling Cabal. T9872 is very very very strange!
(Maybe if *all* the T9872 tests regressed, I'd be more worried.)
While I fully agree with this. We should *always* want to know if a
small syntetic benchma
Yes, I think the counter point of "automating what Ben does" so people
besides Ben can do it is very important. In this case, I think a good
thing we could do is asynchronously build more of master post-merge,
such as use the perf stats to automatically bisect anything that is
fishy, including
Re: Performance drift: I opened
https://gitlab.haskell.org/ghc/ghc/-/issues/17658 a while ago with an idea
of how to measure drift a bit better.
It's basically an automatically checked version of "Ben stares at
performance reports every two weeks and sees that T9872 has regressed by
10% since 9.0"
> On Mar 17, 2021, at 6:18 AM, Moritz Angermann
> wrote:
>
> But what do we expect of patch authors? Right now if five people write
> patches to GHC, and each of them eventually manage to get their MRs green,
> after a long review, they finally see it assigned to marge, and then it
> starts
I am not advocating to drop perf tests during merge requests, I just want
them to not be fatal for marge batches. Yes this means that a bunch of
unrelated merge requests all could be fine wrt to the perf checks per merge
request, but the aggregate might fail perf. And then subsequently the next
MR
Ah, so it was really two identical pipelines (one for the branch where
Margebot batches commits, and one for the MR that Margebot creates before
merging). That's indeed a non-trivial amount of purely wasted
computer-hours.
Taking a step back, I am inclined to agree with the proposal of not
checkin
We need to do something about this, and I'd advocate for just not making stats
fail with marge.
Generally I agree. One point you don’t mention is that our perf tests (which
CI forces us to look at assiduously) are often pretty weird cases. So there is
at least a danger that these more exotic
*why* is a very good question. The MR fixing it is here:
https://gitlab.haskell.org/ghc/ghc/-/merge_requests/5275
On Wed, Mar 17, 2021 at 4:26 PM Spiwack, Arnaud
wrote:
> Then I have a question: why are there two pipelines running on each merge
> batch?
>
> On Wed, Mar 17, 2021 at 9:22 AM Moritz
Then I have a question: why are there two pipelines running on each merge
batch?
On Wed, Mar 17, 2021 at 9:22 AM Moritz Angermann
wrote:
> No it wasn't. It was about the stat failures described in the next
> paragraph. I could have been more clear about that. My apologies!
>
> On Wed, Mar 17, 20
No it wasn't. It was about the stat failures described in the next
paragraph. I could have been more clear about that. My apologies!
On Wed, Mar 17, 2021 at 4:14 PM Spiwack, Arnaud
wrote:
>
> and if either of both (see below) failed, marge's merge would fail as well.
>>
>
> Re: “see below” is th
> and if either of both (see below) failed, marge's merge would fail as well.
>
Re: “see below” is this referring to a missing part of your email?
___
ghc-devs mailing list
ghc-devs@haskell.org
http://mail.haskell.org/cgi-bin/mailman/listinfo/ghc-devs
*To:* ghc-devs mailto:ghc-devs@haskell.org>>
*Subject:* Re: On CI
I'm not opposed to some effort going into this, but I would
strongly opposite putting all our effort there. Incremental CI can
cut multiple hours to < mere minutes, especially with the test
suite being
n
>
>
>
> *From:* ghc-devs *On Behalf Of *John
> Ericson
> *Sent:* 22 February 2021 05:53
> *To:* ghc-devs
> *Subject:* Re: On CI
>
>
>
> I'm not opposed to some effort going into this, but I would strongly
> opposite putting all our effort there. Inc
r 10) I think we need to
do less compiling - hence incremental CI.
Simon
From: ghc-devs On Behalf Of John Ericson
Sent: 22 February 2021 05:53
To: ghc-devs
Subject: Re: On CI
I'm not opposed to some effort going into this, but I would strongly opposite
putting all our effort there. Incr
the
aggressive caching scheme.
Just my 2p
Josef
*From:* ghc-devs mailto:ghc-devs-boun...@haskell.org>> on behalf of Simon Peyton
Jones via ghc-devs mailto:ghc-devs@haskell.org>>
*Sent:* F
es" and restart building the libraries. That way we can
> validate that the build failure was a true build failure and not just due to
> the aggressive caching scheme.
>
> Just my 2p
>
> Josef
>
> From: ghc-devs <mailto:ghc-devs-boun...@haskell.org>> on
---
> *From:* ghc-devs on behalf of Simon Peyton
> Jones via ghc-devs
> *Sent:* Friday, February 19, 2021 8:57 AM
> *To:* John Ericson ; ghc-devs <
> ghc-devs@haskell.org>
> *Subject:* RE: On CI
>
>
>1. Building and testing happen together. When test
nt: Friday, February 19, 2021 8:57 AM
To: John Ericson ; ghc-devs
Subject: RE: On CI
1. Building and testing happen together. When tests failure spuriously, we
also have to rebuild GHC in addition to re-running the tests. That's pure
waste.
https://gitlab.haskell.org/ghc/ghc/-/i
Simon Peyton Jones via ghc-devs writes:
>> 1. Building and testing happen together. When tests failure
>> spuriously, we also have to rebuild GHC in addition to re-running
>> the tests. That's pure waste.
>> https://gitlab.haskell.org/ghc/ghc/-/issues/13897 tracks this more
>> or less.
imon
From: ghc-devs On Behalf Of John Ericson
Sent: 19 February 2021 03:19
To: ghc-devs
Subject: Re: On CI
I am also wary of us to deferring checking whole platforms and what not. I
think that's just kicking the can down the road, and will result in more
variance and uncertainty. It might
I am also wary of us to deferring checking whole platforms and what not.
I think that's just kicking the can down the road, and will result in
more variance and uncertainty. It might be alright for those authoring
PRs, but it will make Ben's job keeping the system running even more
grueling.
Moritz Angermann writes:
> At this point I believe we have ample Linux build capacity. Darwin looks
> pretty good as well the ~4 M1s we have should in principle also be able to
> build x86_64-darwin at acceptable speeds. Although on Big Sur only.
>
> The aarch64-Linux story is a bit constraint by
Apologies for the latency here. This thread has required a fair amount of
reflection.
Sebastian Graf writes:
> Hi Moritz,
>
> I, too, had my gripes with CI turnaround times in the past. Here's a
> somewhat radical proposal:
>
>- Run "full-build" stage builds only on Marge MRs. Then we can as
I'm glad to report that my math was off. But it was off only because I
assumed that we'd successfully build all
windows configurations, which we of course don't. Thus some builds fail
faster.
Sylvain also provided a windows machine temporarily, until it expired.
This led to a slew of new windows w
At this point I believe we have ample Linux build capacity. Darwin looks
pretty good as well the ~4 M1s we have should in principle also be able to
build x86_64-darwin at acceptable speeds. Although on Big Sur only.
The aarch64-Linux story is a bit constraint by powerful and fast CI
machines but p
Hi Moritz,
I, too, had my gripes with CI turnaround times in the past. Here's a
somewhat radical proposal:
- Run "full-build" stage builds only on Marge MRs. Then we can assign to
Marge much earlier, but probably have to do a bit more of (manual)
bisecting of spoiled Marge batches.
38 matches
Mail list logo