Re: Machine learning is used in gerrit build

2023-08-22 Thread Baole Fang
Hi Thorsten,

I've modified the condition logic a little bit. Originally, gerrit_master
will run if SUCCESS or "jenkins:all" is set, but gerrit_master_seq will run
if not SUCCESS, so it may run again when "jenkins:all" is set.

Now, I've added another condition ("jenkins:all" is not set) for
gerrit_master_seq, such that only one of them will run. Hope it makes sense.

Best,
Baole Fang

On Tue, Aug 22, 2023 at 8:25 AM Thorsten Behrens 
wrote:

> Hi Baole,
>
> Baole Fang wrote:
> > Hi Thorsten,
> >
> > After quickly checking the gerrit_master_ml and gerrit_master_seq
> > configuration, I didn't notice the change. Can you indicate the change
> for
> > me?
> >
> Sure - look into
> https://ci.libreoffice.org/view/Gerrit/job/gerrit_master_ml/configure
> , and search for GERRIT_TOPIC. There's a short-circuit check to divert
> into "normal" multijob, if that matches jenkins:all.
>
> Cheers,
>
> -- Thorsten


Re: Machine learning is used in gerrit build

2023-08-22 Thread Thorsten Behrens
Hi Baole,

Baole Fang wrote:
> Hi Thorsten,
> 
> After quickly checking the gerrit_master_ml and gerrit_master_seq
> configuration, I didn't notice the change. Can you indicate the change for
> me?
> 
Sure - look into
https://ci.libreoffice.org/view/Gerrit/job/gerrit_master_ml/configure
, and search for GERRIT_TOPIC. There's a short-circuit check to divert
into "normal" multijob, if that matches jenkins:all.

Cheers,

-- Thorsten

signature.asc
Description: PGP signature


Re: Machine learning is used in gerrit build

2023-08-22 Thread Baole Fang
Hi Thorsten,

After quickly checking the gerrit_master_ml and gerrit_master_seq
configuration, I didn't notice the change. Can you indicate the change for
me?

Besides, I've looked into the gerrit_master_android and have a rough idea
to integrate "jenkins:all" into gerrit_master_ml.

Best,
Baole Fang

On Tue, Aug 22, 2023 at 5:06 AM Thorsten Behrens 
wrote:

> Hi Noel,
>
> I wrote:
> > I can try & sit down tomorrow (with Baole, and perhaps you can join
> > too?), and tweak things a bit.
> >
> Added a tweak to short-circuit the ML builder, to unconditionally run
> the old 'normal' setup, if the patch has this gerrit topic set (minus
> the quotes): "jenkins:all".
>
> Pondering to make that an alias for "android:all", so with either of
> the two, one would get all Android platforms built, plus have all
> builds queued in parallel from the start.
>
> Cheers,
>
> -- Thorsten


Re: Machine learning is used in gerrit build

2023-08-22 Thread Thorsten Behrens
Hi Noel,

I wrote:
> I can try & sit down tomorrow (with Baole, and perhaps you can join
> too?), and tweak things a bit.
> 
Added a tweak to short-circuit the ML builder, to unconditionally run
the old 'normal' setup, if the patch has this gerrit topic set (minus
the quotes): "jenkins:all".

Pondering to make that an alias for "android:all", so with either of
the two, one would get all Android platforms built, plus have all
builds queued in parallel from the start.

Cheers,

-- Thorsten

signature.asc
Description: PGP signature


Re: Machine learning is used in gerrit build

2023-08-21 Thread Baole Fang
Hi Thorsten,

I'm still traveling tomorrow. What time would you like to meet? I'll see if
I can make it.

Best,
Baole Fang


On Mon, Aug 21, 2023 at 4:52 PM Thorsten Behrens 
wrote:

> Hi Noel,
>
> Noel Grandin wrote:
> > Even in the good case, in the unlikely event of builds passing on the
> first
> > run, we have doubled the latency from submitting to getting a jenkins
> pass,
> > since we are running the two slowest build sub-paths in sequence.
> >
> The idea is to run a fast & reliable build first as a canary (but only
> if there's a high probability for failure, predicted by the ML).
>
> Granted, at the moment linux_clang_dbgutil is anything but.
>
> I can try & sit down tomorrow (with Baole, and perhaps you can join
> too?), and tweak things a bit.
>
> Cheers,
>
> -- Thorsten
>


Re: Machine learning is used in gerrit build

2023-08-21 Thread Thorsten Behrens
Hi Noel,

Noel Grandin wrote:
> Even in the good case, in the unlikely event of builds passing on the first
> run, we have doubled the latency from submitting to getting a jenkins pass,
> since we are running the two slowest build sub-paths in sequence.
> 
The idea is to run a fast & reliable build first as a canary (but only
if there's a high probability for failure, predicted by the ML).

Granted, at the moment linux_clang_dbgutil is anything but.

I can try & sit down tomorrow (with Baole, and perhaps you can join
too?), and tweak things a bit.

Cheers,

-- Thorsten


signature.asc
Description: PGP signature


Re: Machine learning is used in gerrit build

2023-08-20 Thread Noel Grandin
HI

I don't see how this can be an improvement for devs, compared to the simple
expedient of simply getting more hardware (and it is not like TDF is
hurting for the necessary cash right now)

Even in the good case, in the unlikely event of builds passing on the first
run, we have doubled the latency from submitting to getting a jenkins pass,
since we are running the two slowest build sub-paths in sequence.

I am sorry, but this is one of those cases where we are trying to be too
clever.

Regards, Noel.


Re: Machine learning is used in gerrit build

2023-08-20 Thread Baole Fang
Hi Kohei,

There are two models. One model is responsible for predicting the
probability of each unit test to fail, and the other is predicting the
overall probability of the patch to fail any unit test. The first has a
failure recall of 95% (meaning 95% failures are captured) and a success
recall of 85% (meaning reducing the number of unit tests by 85%), but this
model is not directly used in deciding whether to run the fast or normal
track. The second model is far less accurate, so smart inference

is
performed in jenkins, which has 91% failure recall and 57% success recall.

The confusion matrix for the first model is posted here
.

Best,
Baole Fang

On Fri, Aug 18, 2023 at 9:45 AM Kohei Yoshida  wrote:

> Hello,
>
> On 09.08.2023 10:57, Baole Fang wrote:
>
> > Feel free to contact me if you have any questions!
>
> Just out of curiosity, what is the overall accuracy of your model?  Do
> you have a confusion matrix or something similar that shows the
> performance of your model?
>
> Thanks,
> Kohei
>


Re: Machine learning is used in gerrit build

2023-08-20 Thread Thorsten Behrens
Hi y'all,

Xisco Fauli wrote:
> On 18/8/23 18:48, Noel Grandin wrote:
> > Just recently I have to resume my builds 5 or 6 times to get past the ML
> > stage, only then to discover that I made a mistake that affected some
> > other platform, and then having to change things again, and restart the
> > process.
> 
> Maybe having a way to force to build on all platforms as we do with android
> could help.
> 
Yup, that's how we should address this. Sorry, was (mostly) away from
the keyboard the last few days, would have piped up earlier.

Thx Xisco for pointing to that earlier modification.

That said, the new ML-assisted build control is clearly there to make
the overall Jenkins experience better, not worse for devs. The idea
was, to catch likely-fail cases without causing load on the more
expensive/more time-consuming builders (like Windows and Mac), and
trying to keep the queue length for those builders close to zero (so
for normal patches, people would get a build node ~immediately).

Looking at the stats [1], we're still sometimes experiencing queue
lengths north of 35 jobs. The time waiting in the build queue though
dropped recently [2].

What's live now is a first cut, so let's try to fine-tune it. Most
obvious knobs to tweak: threshold to divert build into canary-first
mode, which 'features' to use from any given gerrit submission, and
which build configuration to use for the canary run (might even be a
new setup, with a different compiler/distro-config combination?).

[1] 
https://ci.libreoffice.org/monitoring/nodes?part=graph&graph=buildQueueLength
[2] 
https://ci.libreoffice.org/monitoring/nodes?part=graph&graph=buildQueueWaiting

Cheers,

-- Thorsten


signature.asc
Description: PGP signature


Re: Machine learning is used in gerrit build

2023-08-18 Thread Xisco Fauli

Hello,

On 18/8/23 18:48, Noel Grandin wrote:


Just recently I have to resume my builds 5 or 6 times to get past the 
ML stage, only then to discover that I made a mistake that affected 
some other platform, and then having to change things again, and 
restart the process.


Maybe having a way to force to build on all platforms as we do with 
android could help.


See 
https://lists.freedesktop.org/archives/libreoffice/2023-August/090731.html


--
Xisco Faulí
LibreOffice QA Team
IRC: x1sc0


Re: Machine learning is used in gerrit build

2023-08-18 Thread Khaled Hosny



> On 18 Aug 2023, at 6:21 PM, Baole Fang  wrote:
> 
> Hi Noel,
> 
> The reason to choose it is that it is most likely to fail among all the 
> builds. If it fails, then there is no need to run others.

This assumes it is failing because something in the change being built, but it 
often fails for random reasons, so it is not doing the developer any service 
not running the other job. Choosing a more reliable job would at least increase 
the chance of catching actual issues in the change quicker.

Regards,
Khaled

Re: Machine learning is used in gerrit build

2023-08-18 Thread Noel Grandin
HI

My problem with this design assumes that the only thing we are optimising
for is reducing the load on the build servers.

But we are not a build-service organisation. Jenkins is a service that is
subordinate to the needs of developers.

And what developers need is feedback about which platforms their changes
are working or not working on.

Just recently I have to resume my builds 5 or 6 times to get past the ML
stage, only then to discover that I made a mistake that affected some other
platform, and then having to change things again, and restart the process.

I am really sorry you went to all this effort, but I do not think that this
ML plugin for jenkins is a good idea.

If I had known that this was in progress, I would have spoken up sooner.

If we need better jenkins throughput, TDF should just buy some more/better
hardware.

Regards, Noel Grandin


Re: Machine learning is used in gerrit build

2023-08-18 Thread Kohei Yoshida

Hello,

On 09.08.2023 10:57, Baole Fang wrote:


Feel free to contact me if you have any questions!


Just out of curiosity, what is the overall accuracy of your model?  Do 
you have a confusion matrix or something similar that shows the 
performance of your model?


Thanks,
Kohei


Re: Machine learning is used in gerrit build

2023-08-18 Thread Baole Fang
Hi Noel,

The reason to choose it is that it is most likely to fail among all the
builds. If it fails, then there is no need to run others. If we choose a
job that is stable and likely to pass every patch, then all other builds
also need to be run, saving no computation.

I'm open to all advice on the build design.

Best,
Baole Fang

On Fri, Aug 18, 2023, 4:26 AM Noel Grandin  wrote:

> Hi
>
> Why are we fronting the ml jenkins job with the least reliable subjob?
>
> Surely we should be using the gcc job - which is faster and more reliable.
>
> Regards, Noel Grandin
>
>


Re: Machine learning is used in gerrit build

2023-08-18 Thread Noel Grandin
Hi

Why are we fronting the ml jenkins job with the least reliable subjob?

Surely we should be using the gcc job - which is faster and more reliable.

Regards, Noel Grandin