subject:"GCC Buildbot Update"

Re: jamais-vu can now ignore renumbering of source lines in dg output (Re: GCC Buildbot Update)

2018-01-29 Thread Paulo Matos



On 29/01/18 15:19, David Malcolm wrote:
>>
>> Hi,
>>
>> I am looking at this today and I noticed that having the source file
>> for
>> all recent GCC revisions is costly in terms of time (if we wish to
>> compress them) and space (for storage). I was instead thinking that
>> jv
>> could calculate the differences offline using pysvn and the old and
>> new
>> revision numbers.
> 
> Note that access to the source files is optional - jv doesn't need
> them, it just helps for the particular situation described above.
> 

I understand but it would be great to have line number filtering.

>> I have started implementing this in my port. Would you consider
>> merging it?
> 
> Sounds reasonable - though bear in mind that gcc might be switching to
> git at some point.
> 

Yes, I know... but... if we wait for that to happen to implement
something... :)

> Send a pull request (I've turned on travis CI on the github repository,
> so pull requests now automatically get tested on a bunch of different
> Python 3 versions).
> 

Sure.

-- 
Paulo Matos

Re: jamais-vu can now ignore renumbering of source lines in dg output (Re: GCC Buildbot Update)

2018-01-29 Thread David Malcolm

On Mon, 2018-01-29 at 14:55 +0100, Paulo Matos wrote:
> 
> On 24/01/18 20:20, David Malcolm wrote:
> > 
> > I've added a new feature to jamais-vu (as of
> > 77849e2809ca9a049d5683571e27ebe190977fa8): it can now ignore test
> > results that merely changed line number.  
> > 
> > For example, if the old .sum file has a:
> > 
> >   PASS: g++.dg/diagnostic/param-type-mismatch.C  -
> > std=gnu++11  (test for errors, line 106)
> > 
> > and the new .sum file has a:
> > 
> >   PASS: g++.dg/diagnostic/param-type-mismatch.C  -
> > std=gnu++11  (test for errors, line 103)
> > 
> > and diffing the source trees reveals that line 106 became line 103,
> > the
> > change won't be reported by "jv compare".
> > 
> > It also does it for dg-{begin|end}-multiline-output.
> > 
> > It will report them if the outcome changed (e.g. from PASS to
> > FAIL).
> > 
> > To do this filtering, jv needs access to the old and new source
> > trees,
> > so it can diff the pertinent source files, so "jv compare" has
> > gained
> > the optional arguments
> >   --old-source-path=
> > and
> >   --new-source-path=
> > See the example in the jv Makefile for more info.  If they're not
> > present, it should work as before (without being able to do the
> > above
> > filtering).
> 
> 
> Hi,
> 
> I am looking at this today and I noticed that having the source file
> for
> all recent GCC revisions is costly in terms of time (if we wish to
> compress them) and space (for storage). I was instead thinking that
> jv
> could calculate the differences offline using pysvn and the old and
> new
> revision numbers.

Note that access to the source files is optional - jv doesn't need
them, it just helps for the particular situation described above.

> I have started implementing this in my port. Would you consider
> merging it?

Sounds reasonable - though bear in mind that gcc might be switching to
git at some point.

Send a pull request (I've turned on travis CI on the github repository,
so pull requests now automatically get tested on a bunch of different
Python 3 versions).

Thanks
Dave

Re: jamais-vu can now ignore renumbering of source lines in dg output (Re: GCC Buildbot Update)

2018-01-29 Thread Paulo Matos



On 24/01/18 20:20, David Malcolm wrote:
> 
> I've added a new feature to jamais-vu (as of
> 77849e2809ca9a049d5683571e27ebe190977fa8): it can now ignore test
> results that merely changed line number.  
> 
> For example, if the old .sum file has a:
> 
>   PASS: g++.dg/diagnostic/param-type-mismatch.C  -std=gnu++11  (test for 
> errors, line 106)
> 
> and the new .sum file has a:
> 
>   PASS: g++.dg/diagnostic/param-type-mismatch.C  -std=gnu++11  (test for 
> errors, line 103)
> 
> and diffing the source trees reveals that line 106 became line 103, the
> change won't be reported by "jv compare".
> 
> It also does it for dg-{begin|end}-multiline-output.
> 
> It will report them if the outcome changed (e.g. from PASS to FAIL).
> 
> To do this filtering, jv needs access to the old and new source trees,
> so it can diff the pertinent source files, so "jv compare" has gained
> the optional arguments
>   --old-source-path=
> and
>   --new-source-path=
> See the example in the jv Makefile for more info.  If they're not
> present, it should work as before (without being able to do the above
> filtering).


Hi,

I am looking at this today and I noticed that having the source file for
all recent GCC revisions is costly in terms of time (if we wish to
compress them) and space (for storage). I was instead thinking that jv
could calculate the differences offline using pysvn and the old and new
revision numbers.

I have started implementing this in my port. Would you consider merging it?

-- 
Paulo Matos

Re: jamais-vu can now ignore renumbering of source lines in dg output (Re: GCC Buildbot Update)

2018-01-24 Thread Paulo Matos



On 24/01/18 20:20, David Malcolm wrote:
> 
> I've added a new feature to jamais-vu (as of
> 77849e2809ca9a049d5683571e27ebe190977fa8): it can now ignore test
> results that merely changed line number.  
> 
> For example, if the old .sum file has a:
> 
>   PASS: g++.dg/diagnostic/param-type-mismatch.C  -std=gnu++11  (test for 
> errors, line 106)
> 
> and the new .sum file has a:
> 
>   PASS: g++.dg/diagnostic/param-type-mismatch.C  -std=gnu++11  (test for 
> errors, line 103)
> 
> and diffing the source trees reveals that line 106 became line 103, the
> change won't be reported by "jv compare".
> 
> It also does it for dg-{begin|end}-multiline-output.
> 
> It will report them if the outcome changed (e.g. from PASS to FAIL).
> 
> To do this filtering, jv needs access to the old and new source trees,
> so it can diff the pertinent source files, so "jv compare" has gained
> the optional arguments
>   --old-source-path=
> and
>   --new-source-path=
> See the example in the jv Makefile for more info.  If they're not
> present, it should work as before (without being able to do the above
> filtering).
> 
> Is this something that the buildbot can use?
> 

Hi David,

Thanks for the amazing improvements.
I will take a look at them on Monday. I have a lot of work at the moment
so I decided to take 1/5 of my week (usually Monday) to work on buildbot
so I will definitely get it integrated on Monday and hopefully have
something to say afterwards.

Thanks for keeping me up-to-date with these changes.

-- 
Paulo Matos

jamais-vu can now ignore renumbering of source lines in dg output (Re: GCC Buildbot Update)

2018-01-24 Thread David Malcolm

On Sat, 2017-12-16 at 12:06 +0100, Paulo Matos wrote:
> 
> On 15/12/17 15:29, David Malcolm wrote:
> > On Fri, 2017-12-15 at 10:16 +0100, Paulo Matos wrote:
> > > 
> > > On 14/12/17 12:39, David Malcolm wrote:
> > 
> > [...]
> > 
> > > > It looks like you're capturing the textual output from "jv
> > > > compare"
> > > > and
> > > > using the exit code.  Would you prefer to import "jv" as a
> > > > python
> > > > module and use some kind of API?  Or a different output format?
> > > > 
> > > 
> > > Well, I am using a fork of it which I converted to Python3. Would
> > > you
> > > be
> > > open to convert yours to Python3? The reason I am doing this is
> > > because
> > > all other Python software I have and the buildbot use Python3.
> > 
> > Done.
> > 
> > I found and fixed some more bugs, also (introduced during my
> > refactoring, sigh...)
> > 
> 
> That's great. Thank you very much for this work.
> 
> > > I would also prefer to have some json format or something but
> > > when I
> > > looked at it, the software was just printing to stdout and I
> > > didn't
> > > want
> > > to spend too much time implementing it, so I thought parsing the
> > > output
> > > was just easier.
> > 
> > I can add JSON output (or whatever), but I need to get back to gcc
> > 8
> > work, so if the stdout output is good enough for now, let's defer
> > output changes.
> > 
> 
> Agree, for now I can use what I already have to read the output of
> jv.
> I think I can now delete my fork and just use upstream jv as a
> submodule.

I've added a new feature to jamais-vu (as of
77849e2809ca9a049d5683571e27ebe190977fa8): it can now ignore test
results that merely changed line number.  

For example, if the old .sum file has a:

  PASS: g++.dg/diagnostic/param-type-mismatch.C  -std=gnu++11  (test for 
errors, line 106)

and the new .sum file has a:

  PASS: g++.dg/diagnostic/param-type-mismatch.C  -std=gnu++11  (test for 
errors, line 103)

and diffing the source trees reveals that line 106 became line 103, the
change won't be reported by "jv compare".

It also does it for dg-{begin|end}-multiline-output.

It will report them if the outcome changed (e.g. from PASS to FAIL).

To do this filtering, jv needs access to the old and new source trees,
so it can diff the pertinent source files, so "jv compare" has gained
the optional arguments
  --old-source-path=
and
  --new-source-path=
See the example in the jv Makefile for more info.  If they're not
present, it should work as before (without being able to do the above
filtering).

Is this something that the buildbot can use?

Dave

Re: GCC Buildbot Update

2017-12-20 Thread Paulo Matos

On 20/12/17 12:48, James Greenhalgh wrote:
> On Wed, Dec 20, 2017 at 10:02:45AM +, Paulo Matos wrote:
>>
>>
>> On 20/12/17 10:51, Christophe Lyon wrote:
>>>
>>> The recent fix changed the Makefile and configure script in libatomic.
>>> I guess that if your incremental builds does not run configure, it's
>>> still using old Makefiles, and old options.
>>>
>>>
>> You're right. I guess incremental builds should always call configure,
>> just in case.
> 
> For my personal bisect scripts I try an incremental build, with a
> full rebuild as a fallback on failure.
> 
> That gives me the benefits of an incremental build most of the time (I
> don't have stats on how often) with an automated approach to keeping things
> going where there are issues.
> 
> Note that there are rare cases where depencies are missed in the toolchain
> and an incremental build will give you a toolchain with undefined
> behaviour, as one compilation unit takes a new definition of a
> struct/interface and the other sits on an outdated compile from the
> previous build.
> 
> I don't have a good way to detect these.
> 

That's definitely a shortcoming of incremental builds. Unfortunately we
cannot cope with full builds for each commit (even for incremental
builds we'll need an alternative soon). So I will implement the same
strategy of full build if incremental fails, I think.

With respect with regards to incremental builds with undefined behaviour
that probably means that dependencies are incorrectly calculated. It
would be great to sort these out. If we could detect that there are
issues with the incremental build we could then try to understand which
dependencies were not properly calculated. Just a guess, however
implementing this might take awhile and would obviously need a lot more
resources than we have available now.

-- 
Paulo Matos

Re: GCC Buildbot Update

2017-12-20 Thread James Greenhalgh

On Wed, Dec 20, 2017 at 10:02:45AM +, Paulo Matos wrote:
> 
> 
> On 20/12/17 10:51, Christophe Lyon wrote:
> > 
> > The recent fix changed the Makefile and configure script in libatomic.
> > I guess that if your incremental builds does not run configure, it's
> > still using old Makefiles, and old options.
> > 
> > 
> You're right. I guess incremental builds should always call configure,
> just in case.

For my personal bisect scripts I try an incremental build, with a
full rebuild as a fallback on failure.

That gives me the benefits of an incremental build most of the time (I
don't have stats on how often) with an automated approach to keeping things
going where there are issues.

Note that there are rare cases where depencies are missed in the toolchain
and an incremental build will give you a toolchain with undefined
behaviour, as one compilation unit takes a new definition of a
struct/interface and the other sits on an outdated compile from the
previous build.

I don't have a good way to detect these.

Thanks,
James

Re: GCC Buildbot Update

2017-12-20 Thread Christophe Lyon

On 20 December 2017 at 11:02, Paulo Matos  wrote:
>
>
> On 20/12/17 10:51, Christophe Lyon wrote:
>>
>> The recent fix changed the Makefile and configure script in libatomic.
>> I guess that if your incremental builds does not run configure, it's
>> still using old Makefiles, and old options.
>>
>>
> You're right. I guess incremental builds should always call configure,
> just in case.
>

Maybe, but this does not always work. Sometimes, I have to rm -rf $builddir


> Thanks,
> --
> Paulo Matos

Re: GCC Buildbot Update

2017-12-20 Thread Paulo Matos



On 20/12/17 10:51, Christophe Lyon wrote:
> 
> The recent fix changed the Makefile and configure script in libatomic.
> I guess that if your incremental builds does not run configure, it's
> still using old Makefiles, and old options.
> 
> 
You're right. I guess incremental builds should always call configure,
just in case.

Thanks,
-- 
Paulo Matos

Re: GCC Buildbot Update

2017-12-20 Thread Christophe Lyon

On 20 December 2017 at 09:31, Paulo Matos  wrote:
>
>
> On 15/12/17 10:21, Christophe Lyon wrote:
>> On 15 December 2017 at 10:19, Paulo Matos  wrote:
>>>
>>>
>>> On 14/12/17 21:32, Christophe Lyon wrote:
 Great, I thought the CF machines were reserved for developpers.
 Good news you could add builders on them.

>>>
>>> Oh. I have seen similar things happening on CF machines so I thought it
>>> was not a problem. I have never specifically asked for permission.
>>>
> pmatos@gcc115:~/gcc-8-20171203_BUILD$ as -march=armv8.1-a
> Assembler messages:
> Error: unknown architecture `armv8.1-a'
>
> Error: unrecognized option -march=armv8.1-a
>
> However, if I run the a compiler build manually with just:
>
> $ configure --disable-multilib
> $ nice -n 19 make -j4 all
>
> This compiles just fine. So I am at the moment attempting to investigate
> what might cause the difference between what buildbot does and what I do
> through ssh.
>
 I suspect you are hitting a bug introduced recently, and fixed by:
 https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00434.html

>>>
>>> Wow, that's really useful. Thanks for letting me know.
>>>
>> And the patch was committed last night (r255659), so maybe your builds now 
>> work?
>>
>
> On some machines, in incremental builds I still seeing this:
> Assembler messages:
> Error: unknown architectural extension `lse'
> Error: unrecognized option -march=armv8-a+lse
> make[4]: *** [load_1_1_.lo] Error 1
> make[4]: *** Waiting for unfinished jobs
>
> Looks related... the only strange thing happening is that this doesn't
> happen in full builds.
>

The recent fix changed the Makefile and configure script in libatomic.
I guess that if your incremental builds does not run configure, it's
still using old Makefiles, and old options.


> --
> Paulo Matos

Re: GCC Buildbot Update

2017-12-20 Thread Paulo Matos



On 15/12/17 10:21, Christophe Lyon wrote:
> On 15 December 2017 at 10:19, Paulo Matos  wrote:
>>
>>
>> On 14/12/17 21:32, Christophe Lyon wrote:
>>> Great, I thought the CF machines were reserved for developpers.
>>> Good news you could add builders on them.
>>>
>>
>> Oh. I have seen similar things happening on CF machines so I thought it
>> was not a problem. I have never specifically asked for permission.
>>
 pmatos@gcc115:~/gcc-8-20171203_BUILD$ as -march=armv8.1-a
 Assembler messages:
 Error: unknown architecture `armv8.1-a'

 Error: unrecognized option -march=armv8.1-a

 However, if I run the a compiler build manually with just:

 $ configure --disable-multilib
 $ nice -n 19 make -j4 all

 This compiles just fine. So I am at the moment attempting to investigate
 what might cause the difference between what buildbot does and what I do
 through ssh.

>>> I suspect you are hitting a bug introduced recently, and fixed by:
>>> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00434.html
>>>
>>
>> Wow, that's really useful. Thanks for letting me know.
>>
> And the patch was committed last night (r255659), so maybe your builds now 
> work?
> 

On some machines, in incremental builds I still seeing this:
Assembler messages:
Error: unknown architectural extension `lse'
Error: unrecognized option -march=armv8-a+lse
make[4]: *** [load_1_1_.lo] Error 1
make[4]: *** Waiting for unfinished jobs

Looks related... the only strange thing happening is that this doesn't
happen in full builds.

-- 
Paulo Matos

Re: GCC Buildbot Update

2017-12-16 Thread Paulo Matos



On 15/12/17 18:05, Segher Boessenkool wrote:
> All the cfarm machines are shared resources.  Benchmarking on them will
> not work no matter what.  And being a shared resource means all users
> have to share and be mindful of others.
> 

Yes, we'll definitely need better machines for benchmarking. Something I
haven't thought of yet.

>> So it would be good if there was a strict separation of machines used
>> for bots and machines used by humans. In other words bots should only
>> run on dedicated machines.
> 
> The aarch64 builds should probably not use all of gcc113..gcc116.
>
> We do not have enough resources to dedicate machines to bots.
>

I have disabled gcc116.

Thanks,
-- 
Paulo Matos

Re: GCC Buildbot Update

2017-12-16 Thread Paulo Matos



On 15/12/17 15:29, David Malcolm wrote:
> On Fri, 2017-12-15 at 10:16 +0100, Paulo Matos wrote:
>>
>> On 14/12/17 12:39, David Malcolm wrote:
> 
> [...]
> 
>>> It looks like you're capturing the textual output from "jv compare"
>>> and
>>> using the exit code.  Would you prefer to import "jv" as a python
>>> module and use some kind of API?  Or a different output format?
>>>
>>
>> Well, I am using a fork of it which I converted to Python3. Would you
>> be
>> open to convert yours to Python3? The reason I am doing this is
>> because
>> all other Python software I have and the buildbot use Python3.
> 
> Done.
> 
> I found and fixed some more bugs, also (introduced during my
> refactoring, sigh...)
> 

That's great. Thank you very much for this work.

>> I would also prefer to have some json format or something but when I
>> looked at it, the software was just printing to stdout and I didn't
>> want
>> to spend too much time implementing it, so I thought parsing the
>> output
>> was just easier.
> 
> I can add JSON output (or whatever), but I need to get back to gcc 8
> work, so if the stdout output is good enough for now, let's defer
> output changes.
> 

Agree, for now I can use what I already have to read the output of jv.
I think I can now delete my fork and just use upstream jv as a submodule.

-- 
Paulo Matos

Re: GCC Buildbot Update

2017-12-15 Thread Segher Boessenkool

On Fri, Dec 15, 2017 at 08:42:18AM +0100, Markus Trippelsdorf wrote:
> On 2017.12.14 at 21:32 +0100, Christophe Lyon wrote:
> > On 14 December 2017 at 09:56, Paulo Matos  wrote:
> > > I got an email suggesting I add some aarch64 workers so I did:
> > > 4 workers from CF (gcc113, gcc114, gcc115 and gcc116);
> > >
> > Great, I thought the CF machines were reserved for developpers.
> > Good news you could add builders on them.
> 
> I don't think this is good news at all. 
> 
> Once a buildbot runs on a CF machine it immediately becomes impossible
> to do any meaningful measurement on that machine. That is mainly because
> of the random I/O (untar, rm -fr, etc.) of the bot. As a result variance
> goes to the roof and all measurements drown in noise.

Automated runs should not use an unreasonable amount of resources (and
neither should manual runs, but the bar for automated things lies much
lower, since they are more annoying).

All the cfarm machines are shared resources.  Benchmarking on them will
not work no matter what.  And being a shared resource means all users
have to share and be mindful of others.

> So it would be good if there was a strict separation of machines used
> for bots and machines used by humans. In other words bots should only
> run on dedicated machines.

The aarch64 builds should probably not use all of gcc113..gcc116.

We do not have enough resources to dedicate machines to bots.

Segher

Re: GCC Buildbot Update

2017-12-15 Thread David Malcolm

On Fri, 2017-12-15 at 10:16 +0100, Paulo Matos wrote:
> 
> On 14/12/17 12:39, David Malcolm wrote:

[...]

> > It looks like you're capturing the textual output from "jv compare"
> > and
> > using the exit code.  Would you prefer to import "jv" as a python
> > module and use some kind of API?  Or a different output format?
> > 
> 
> Well, I am using a fork of it which I converted to Python3. Would you
> be
> open to convert yours to Python3? The reason I am doing this is
> because
> all other Python software I have and the buildbot use Python3.

Done.

I found and fixed some more bugs, also (introduced during my
refactoring, sigh...)

> I would also prefer to have some json format or something but when I
> looked at it, the software was just printing to stdout and I didn't
> want
> to spend too much time implementing it, so I thought parsing the
> output
> was just easier.

I can add JSON output (or whatever), but I need to get back to gcc 8
work, so if the stdout output is good enough for now, let's defer
output changes.

> > If you file pull request(s) for the changes you've made in your
> > copy of
> > jamais-vu, I can take at look at merging them.
> > 
> 
> Happy to do so...
> Will merge your changes into my fork first then.
> 
> Kind regards,

Re: GCC Buildbot Update

2017-12-15 Thread Markus Trippelsdorf

On 2017.12.15 at 10:21 +0100, Paulo Matos wrote:
> 
> 
> On 15/12/17 08:42, Markus Trippelsdorf wrote:
> > 
> > I don't think this is good news at all. 
> > 
> 
> As I pointed out in a reply to Chris, I haven't seeked permission but I
> am pretty sure something similar runs in the CF machines from other
> projects.
> 
> The downside is that if we can't use the CF, I have no extra machines to
> run the buildbot on.
> 
> > Once a buildbot runs on a CF machine it immediately becomes impossible
> > to do any meaningful measurement on that machine. That is mainly because
> > of the random I/O (untar, rm -fr, etc.) of the bot. As a result variance
> > goes to the roof and all measurements drown in noise.
> > 
> > So it would be good if there was a strict separation of machines used
> > for bots and machines used by humans. In other words bots should only
> > run on dedicated machines.
> > 
> 
> I understand your concern though. Do you know who this issue could be
> raised with? FSF?

I think the best place would be the CF user mailing list
.
(All admins and users should be subscribed.)

-- 
Markus

Re: GCC Buildbot Update

2017-12-15 Thread Paulo Matos



On 15/12/17 10:21, Christophe Lyon wrote:
> And the patch was committed last night (r255659), so maybe your builds now 
> work?
> 

Forgot to mention that. Yes, it built!
https://gcc-buildbot.linki.tools/#/builders/5

-- 
Paulo Matos

Re: GCC Buildbot Update

2017-12-15 Thread Paulo Matos

On 15/12/17 08:42, Markus Trippelsdorf wrote:
> 
> I don't think this is good news at all. 
> 

As I pointed out in a reply to Chris, I haven't seeked permission but I
am pretty sure something similar runs in the CF machines from other
projects.

The downside is that if we can't use the CF, I have no extra machines to
run the buildbot on.

> Once a buildbot runs on a CF machine it immediately becomes impossible
> to do any meaningful measurement on that machine. That is mainly because
> of the random I/O (untar, rm -fr, etc.) of the bot. As a result variance
> goes to the roof and all measurements drown in noise.
> 
> So it would be good if there was a strict separation of machines used
> for bots and machines used by humans. In other words bots should only
> run on dedicated machines.
> 

I understand your concern though. Do you know who this issue could be
raised with? FSF?

-- 
Paulo Matos

Re: GCC Buildbot Update

2017-12-15 Thread Christophe Lyon

On 15 December 2017 at 10:19, Paulo Matos  wrote:
>
>
> On 14/12/17 21:32, Christophe Lyon wrote:
>> Great, I thought the CF machines were reserved for developpers.
>> Good news you could add builders on them.
>>
>
> Oh. I have seen similar things happening on CF machines so I thought it
> was not a problem. I have never specifically asked for permission.
>
>>> pmatos@gcc115:~/gcc-8-20171203_BUILD$ as -march=armv8.1-a
>>> Assembler messages:
>>> Error: unknown architecture `armv8.1-a'
>>>
>>> Error: unrecognized option -march=armv8.1-a
>>>
>>> However, if I run the a compiler build manually with just:
>>>
>>> $ configure --disable-multilib
>>> $ nice -n 19 make -j4 all
>>>
>>> This compiles just fine. So I am at the moment attempting to investigate
>>> what might cause the difference between what buildbot does and what I do
>>> through ssh.
>>>
>> I suspect you are hitting a bug introduced recently, and fixed by:
>> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00434.html
>>
>
> Wow, that's really useful. Thanks for letting me know.
>
And the patch was committed last night (r255659), so maybe your builds now work?

> --
> Paulo Matos

Re: GCC Buildbot Update

2017-12-15 Thread Paulo Matos



On 14/12/17 21:32, Christophe Lyon wrote:
> Great, I thought the CF machines were reserved for developpers.
> Good news you could add builders on them.
> 

Oh. I have seen similar things happening on CF machines so I thought it
was not a problem. I have never specifically asked for permission.

>> pmatos@gcc115:~/gcc-8-20171203_BUILD$ as -march=armv8.1-a
>> Assembler messages:
>> Error: unknown architecture `armv8.1-a'
>>
>> Error: unrecognized option -march=armv8.1-a
>>
>> However, if I run the a compiler build manually with just:
>>
>> $ configure --disable-multilib
>> $ nice -n 19 make -j4 all
>>
>> This compiles just fine. So I am at the moment attempting to investigate
>> what might cause the difference between what buildbot does and what I do
>> through ssh.
>>
> I suspect you are hitting a bug introduced recently, and fixed by:
> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00434.html
> 

Wow, that's really useful. Thanks for letting me know.

-- 
Paulo Matos

Re: GCC Buildbot Update

2017-12-15 Thread Paulo Matos

On 14/12/17 12:39, David Malcolm wrote:
> 
> Looking at some of the red blobs in e.g. the grid view there seem to be
> a few failures in the initial "update gcc trunk repo" step of the form:
> 
> svn: Working copy '.' locked
> svn: run 'svn cleanup' to remove locks (type 'svn help cleanup' for
> details)
> 

Yes, that's a big annoyance and a reason I have thought about moving to
using the git mirror, however that would probably bring other issues so
I am holding off. I need to add a reporter so that if it fails I am
notified by email and mobile phone.

This happens when there's a timeout from a server _during_ a
checkout/update (the svn repo unfortunately times out way too often). I
thought about doing an svn cleanup before each checkout but read it's
not good practice. If you have any suggestions on this please let me know.

> https://gcc-lnt.linki.tools/#/builders/3/builds/388/steps/0/logs/stdio
> 

Apologies, https://gcc-lnt.linki.tools is currently incorrectly
forwarding you to https://gcc-buildbot.linki.tools. I meant to have it
return an error until I open that up.

> Is there a bug-tracking location for the buildbot?
> Presumably:
>   https://github.com/LinkiTools/gcc-buildbot/issues
> ?
> 

That's correct.

> I actually found a serious bug in jamais-vu yesterday - it got confused
> by  multiple .sum lines for the same source line e.g. from multiple
> "dg-" directives that all specify a particular line).  For example,
> when testing one of my patches, of the 3 tests reporting as
>   "c-c++-common/pr83059.c  -std=c++11  (test for warnings, line 7)"
> one of the 3 PASS results became a FAIL.  jv correctly reported that
> new FAILs had occurred, but wouldn't identify them, and mistakenly
> reported that new PASSes has occurred also.
> 
> I've fixed that now; to do so I've done some refactoring and added a
> testsuite.
>

Perfect, thank you very much for this work.

> It looks like you're capturing the textual output from "jv compare" and
> using the exit code.  Would you prefer to import "jv" as a python
> module and use some kind of API?  Or a different output format?
> 

Well, I am using a fork of it which I converted to Python3. Would you be
open to convert yours to Python3? The reason I am doing this is because
all other Python software I have and the buildbot use Python3.

I would also prefer to have some json format or something but when I
looked at it, the software was just printing to stdout and I didn't want
to spend too much time implementing it, so I thought parsing the output
was just easier.

> If you file pull request(s) for the changes you've made in your copy of
> jamais-vu, I can take at look at merging them.
>

Happy to do so...
Will merge your changes into my fork first then.

Kind regards,
-- 
Paulo Matos

Re: GCC Buildbot Update

2017-12-14 Thread Markus Trippelsdorf

On 2017.12.14 at 21:32 +0100, Christophe Lyon wrote:
> On 14 December 2017 at 09:56, Paulo Matos  wrote:
> > I got an email suggesting I add some aarch64 workers so I did:
> > 4 workers from CF (gcc113, gcc114, gcc115 and gcc116);
> >
> Great, I thought the CF machines were reserved for developpers.
> Good news you could add builders on them.

I don't think this is good news at all. 

Once a buildbot runs on a CF machine it immediately becomes impossible
to do any meaningful measurement on that machine. That is mainly because
of the random I/O (untar, rm -fr, etc.) of the bot. As a result variance
goes to the roof and all measurements drown in noise.

So it would be good if there was a strict separation of machines used
for bots and machines used by humans. In other words bots should only
run on dedicated machines.

-- 
Markus

Re: GCC Buildbot Update

2017-12-14 Thread Christophe Lyon

On 14 December 2017 at 09:56, Paulo Matos  wrote:
> Hello,
>
> Apologies for the delay on the update. It was my plan to do an update on
> a monthly basis but it slipped by a couple of weeks.
>
Hi,

Thanks for the update!


> The current status is:
>
> *Workers:*
>
> - x86_64
>
> 2 workers from CF (gcc16 and gcc20) up and running;
> 1 worker from my farm (jupiter-F26) up and running;
>
> 2 broken CF (gcc75 and gcc76) - the reason for the brokenness is that
> the machines work well but all outgoing ports except the git port is
> open (9418 if not mistaken). This means that not only we cannot svn co
> gcc but we can't connect a worker to the master through port 9918. I
> have contacted the cf admin but the reply was that nothing can be done
> as they don't really own the machine. They seemed to have relayed the
> request to the machine owners.
>
> - aarch64
>
> I got an email suggesting I add some aarch64 workers so I did:
> 4 workers from CF (gcc113, gcc114, gcc115 and gcc116);
>
Great, I thought the CF machines were reserved for developpers.
Good news you could add builders on them.

> *Builds:*
>
> As before we have the full build and the incremental build. Both enabled
> for x86_64 and aarch64, except they are currently failing for aarch64
> (more on that later).
>
> The full build is triggered on Daily bump commit and the incremental
> build is triggered for each commit.
>
> The problem with this setup is that the incremental builder takes too
> long to run the tests. Around 1h30m on CF machines for x86_64.
>
> Segher Boessenkool sent me a patch to disable guality and prettyprinters
> which coupled with --disable-gomp at configure time was supposed to make
> things much faster. I have added this as the Fast builder, except this
> is failing during the test runs:
> unable to alloc 389376 bytes
> /bin/bash: line 21: 32472 Aborted `if [ -f
> ${srcdir}/../dejagnu/runtest ] ; then echo ${srcdir}/../dejagnu/runtest
> ; else echo runtest; fi` --tool gcc
> /bin/bash: fork: Cannot allocate memory
> make[3]: [check-parallel-gcc] Error 254 (ignored)
> make[3]: execvp: /bin/bash: Cannot allocate memory
> make[3]: [check-parallel-gcc_1] Error 127 (ignored)
> make[3]: execvp: /bin/bash: Cannot allocate memory
> make[3]: [check-parallel-gcc_1] Error 127 (ignored)
> make[3]: execvp: /bin/bash: Cannot allocate memory
> make[3]: *** [check-parallel-gcc_1] Error 127
>
>
> However, something interesting is happening here since the munin
> interface for gcc16 doesn't show the machine running out of memory:
> https://cfarm.tetaneutral.net/munin/gccfarm/gcc16/memory.html
> (something confirmed by the cf admins)
>
> The aarch64 build is failing as mentioned earlier. If you check the logs:
> https://gcc-buildbot.linki.tools/#/builders/5/builds/10
> the problem seems to be the assembler issuing:
> Assembler messages:
> Error: unknown architecture `armv8.1-a'
> Error: unrecognized option -march=armv8.1-a
>
>
> If I go to the machines and check the versions I get:
> pmatos@gcc115:~/gcc-8-20171203_BUILD$ as --version
> GNU assembler (GNU Binutils for Ubuntu) 2.24
> Copyright 2013 Free Software Foundation, Inc.
> This program is free software; you may redistribute it under the terms of
> the GNU General Public License version 3 or later.
> This program has absolutely no warranty.
> This assembler was configured for a target of `aarch64-linux-gnu'.
>
> pmatos@gcc115:~/gcc-8-20171203_BUILD$ gcc --version
> gcc (Ubuntu/Linaro 4.8.4-2ubuntu1~14.04.3) 4.8.4
> Copyright (C) 2013 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
>
> pmatos@gcc115:~/gcc-8-20171203_BUILD$ as -march=armv8.1-a
> Assembler messages:
> Error: unknown architecture `armv8.1-a'
>
> Error: unrecognized option -march=armv8.1-a
>
> However, if I run the a compiler build manually with just:
>
> $ configure --disable-multilib
> $ nice -n 19 make -j4 all
>
> This compiles just fine. So I am at the moment attempting to investigate
> what might cause the difference between what buildbot does and what I do
> through ssh.
>
I suspect you are hitting a bug introduced recently, and fixed by:
https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00434.html

> *Reporters:*
>
> There is a single reporter which is a irc bot currently silent.
>
> *Regression analysis:*
>
> This is one of the most important issues to tackle and I have a solution
> in a branch regression-testing :
> https://github.com/LinkiTools/gcc-buildbot/tree/regression-testing
>
> using jamais-vu from David Malcolm to analyze the regressions.
> It needs some more testing and I should be able to get it working still
> this year.
>
Great

> *LNT:*
>
> I had mentioned I wanted to setup an interface which would allow for
> easy visibility of test failures, time taken to build/test, etc.
> Initially I thought a stack of influx+grafana would be a good idea, but
> was pointed ou

Re: GCC Buildbot Update

2017-12-14 Thread David Malcolm

On Thu, 2017-12-14 at 09:56 +0100, Paulo Matos wrote:
> Hello,
> 
> Apologies for the delay on the update. It was my plan to do an update
> on
> a monthly basis but it slipped by a couple of weeks.

Thanks for working on this.

> The current status is:
> 
> *Workers:*

[...snip...]

> *Builds:*

[...snip...]

Looking at some of the red blobs in e.g. the grid view there seem to be
a few failures in the initial "update gcc trunk repo" step of the form:

svn: Working copy '.' locked
svn: run 'svn cleanup' to remove locks (type 'svn help cleanup' for
details)

https://gcc-lnt.linki.tools/#/builders/3/builds/388/steps/0/logs/stdio

Is there a bug-tracking location for the buildbot?
Presumably:
  https://github.com/LinkiTools/gcc-buildbot/issues
?

*Reporters:*
> 
> There is a single reporter which is a irc bot currently silent.
> 
> *Regression analysis:*
> 
> This is one of the most important issues to tackle and I have a
> solution
> in a branch regression-testing :
> https://github.com/LinkiTools/gcc-buildbot/tree/regression-testing
> 
> using jamais-vu from David Malcolm to analyze the regressions.
> It needs some more testing and I should be able to get it working
> still
> this year.

I actually found a serious bug in jamais-vu yesterday - it got confused
by  multiple .sum lines for the same source line e.g. from multiple
"dg-" directives that all specify a particular line).  For example,
when testing one of my patches, of the 3 tests reporting as
  "c-c++-common/pr83059.c  -std=c++11  (test for warnings, line 7)"
one of the 3 PASS results became a FAIL.  jv correctly reported that
new FAILs had occurred, but wouldn't identify them, and mistakenly
reported that new PASSes has occurred also.

I've fixed that now; to do so I've done some refactoring and added a
testsuite.

It looks like you're capturing the textual output from "jv compare" and
using the exit code.  Would you prefer to import "jv" as a python
module and use some kind of API?  Or a different output format?

If you file pull request(s) for the changes you've made in your copy of
jamais-vu, I can take at look at merging them.

[...]

> I hope to send another update in about a months time.
> 
> Kind regards,

Thanks again for your work on this
Dave

GCC Buildbot Update

2017-12-14 Thread Paulo Matos

Hello,

Apologies for the delay on the update. It was my plan to do an update on
a monthly basis but it slipped by a couple of weeks.

The current status is:

*Workers:*

- x86_64

2 workers from CF (gcc16 and gcc20) up and running;
1 worker from my farm (jupiter-F26) up and running;

2 broken CF (gcc75 and gcc76) - the reason for the brokenness is that
the machines work well but all outgoing ports except the git port is
open (9418 if not mistaken). This means that not only we cannot svn co
gcc but we can't connect a worker to the master through port 9918. I
have contacted the cf admin but the reply was that nothing can be done
as they don't really own the machine. They seemed to have relayed the
request to the machine owners.

- aarch64

I got an email suggesting I add some aarch64 workers so I did:
4 workers from CF (gcc113, gcc114, gcc115 and gcc116);

*Builds:*

As before we have the full build and the incremental build. Both enabled
for x86_64 and aarch64, except they are currently failing for aarch64
(more on that later).

The full build is triggered on Daily bump commit and the incremental
build is triggered for each commit.

The problem with this setup is that the incremental builder takes too
long to run the tests. Around 1h30m on CF machines for x86_64.

Segher Boessenkool sent me a patch to disable guality and prettyprinters
which coupled with --disable-gomp at configure time was supposed to make
things much faster. I have added this as the Fast builder, except this
is failing during the test runs:
unable to alloc 389376 bytes
/bin/bash: line 21: 32472 Aborted `if [ -f
${srcdir}/../dejagnu/runtest ] ; then echo ${srcdir}/../dejagnu/runtest
; else echo runtest; fi` --tool gcc
/bin/bash: fork: Cannot allocate memory
make[3]: [check-parallel-gcc] Error 254 (ignored)
make[3]: execvp: /bin/bash: Cannot allocate memory
make[3]: [check-parallel-gcc_1] Error 127 (ignored)
make[3]: execvp: /bin/bash: Cannot allocate memory
make[3]: [check-parallel-gcc_1] Error 127 (ignored)
make[3]: execvp: /bin/bash: Cannot allocate memory
make[3]: *** [check-parallel-gcc_1] Error 127


However, something interesting is happening here since the munin
interface for gcc16 doesn't show the machine running out of memory:
https://cfarm.tetaneutral.net/munin/gccfarm/gcc16/memory.html
(something confirmed by the cf admins)

The aarch64 build is failing as mentioned earlier. If you check the logs:
https://gcc-buildbot.linki.tools/#/builders/5/builds/10
the problem seems to be the assembler issuing:
Assembler messages:
Error: unknown architecture `armv8.1-a'
Error: unrecognized option -march=armv8.1-a


If I go to the machines and check the versions I get:
pmatos@gcc115:~/gcc-8-20171203_BUILD$ as --version
GNU assembler (GNU Binutils for Ubuntu) 2.24
Copyright 2013 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or later.
This program has absolutely no warranty.
This assembler was configured for a target of `aarch64-linux-gnu'.

pmatos@gcc115:~/gcc-8-20171203_BUILD$ gcc --version
gcc (Ubuntu/Linaro 4.8.4-2ubuntu1~14.04.3) 4.8.4
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

pmatos@gcc115:~/gcc-8-20171203_BUILD$ as -march=armv8.1-a
Assembler messages:
Error: unknown architecture `armv8.1-a'

Error: unrecognized option -march=armv8.1-a

However, if I run the a compiler build manually with just:

$ configure --disable-multilib
$ nice -n 19 make -j4 all

This compiles just fine. So I am at the moment attempting to investigate
what might cause the difference between what buildbot does and what I do
through ssh.

*Reporters:*

There is a single reporter which is a irc bot currently silent.

*Regression analysis:*

This is one of the most important issues to tackle and I have a solution
in a branch regression-testing :
https://github.com/LinkiTools/gcc-buildbot/tree/regression-testing

using jamais-vu from David Malcolm to analyze the regressions.
It needs some more testing and I should be able to get it working still
this year.

*LNT:*

I had mentioned I wanted to setup an interface which would allow for
easy visibility of test failures, time taken to build/test, etc.
Initially I thought a stack of influx+grafana would be a good idea, but
was pointed out to using LNT as presented by James Greenhalgh in the GNU
Cauldron. I have setup LNT (soon to be available under
https://gcc-lnt.linki.tools) and contacted James to learn more about the
setup. As it turns out James is just using it for benchmarking results
and out-of-the-box only seems to support the LLVM testing infrastructure
so getting GCC results in there might take a bit more of scripting and
plumbing.

I will probably take the same route and set it up first for the
benchmarking results and then try to get the gcc te

Re: GCC Buildbot Update - Definition of regression

2017-10-13 Thread David Malcolm

On Wed, 2017-10-11 at 16:17 +0200, Marc Glisse wrote:
> On Wed, 11 Oct 2017, David Malcolm wrote:
> 
> > On Wed, 2017-10-11 at 11:18 +0200, Paulo Matos wrote:
> > > 
> > > On 11/10/17 11:15, Christophe Lyon wrote:
> > > > 
> > > > You can have a look at
> > > > https://git.linaro.org/toolchain/gcc-compare-results.git/
> > > > where compare_tests is a patched version of the contrib/
> > > > script,
> > > > it calls the main perl script (which is not the prettiest thing
> > > > :-)
> > > > 
> > > 
> > > Thanks, that's useful. I will take a look.
> > 
> > You may also want to look at this script I wrote:
> > 
> >  https://github.com/davidmalcolm/jamais-vu
> > 
> > (it has Python classes for working with DejaGnu output)
> 
> By the way, David, how do you handle comparisons for the jit
> testsuite? jv 
> gives
> 
> Tests that went away in build/gcc/testsuite/jit/jit.sum: 81
> ---
> 
>   PASS:  t
>   PASS:  test-
>   PASS:  test-arith-overflow.c
>   PASS:  test-arith-overflow.c.exe iteration 1 of 5: verify_uint_over
>   PASS:  test-arith-overflow.c.exe iteration 2 of 5: verify_uint_o
>   PASS:  test-arith-overflow.c.exe iteration 3 of 5: verify
> [...]
> 
> Tests appeared in build/gcc/testsuite/jit/jit.sum: 78
> -
> 
>   PASS:  test-arith-overflow.c.exe iteration 1
>   PASS:  test-arith-overflow.c.exe iteration 2 of
>   PASS:  test-arith-overflow.c.exe iteration 4 of 5: verify_u
>   PASS:  test-combination.
>   PASS:  test-combination.c.exe it
> [...]
> 
> The issue is more likely in the testsuite, but I assume you have a 
> workflow that allows working around the issue?

I believe the issue here is PR jit/69435 ("Truncated lines in
jit.log").
I suspect that the attachment in comment #2 there ought to fix it
(sorry that this issue stalled; in the meantime I've been simplying
verifying the absence of FAILs and checking the number of PASSes in
jit.sum).

Dave

Re: GCC Buildbot Update - Definition of regression

2017-10-11 Thread Hans-Peter Nilsson

On Tue, 10 Oct 2017, Paulo Matos wrote:

> This is a suggestion. I am keen to have corrections from people who use
> this on a daily basis and/or have a better understanding of each status.

Not mentioning them (oddly I don't see anyone mentioning them)
makes me think you've not looked there so allow me to point out:
consider re-using Geoff Keating's regression tester scripts.
They're all in your nearest gcc checkout, in contrib/regression.
I suggest using whatever definition those scripts define.
They've worked for my regression testing (though my local
automated tester is not active at the moment).  Just remember to
always use the option --add-passes-despite-regression or else
btest-gcc.sh requires a clean bill before adding new PASSes to
the list of PASSing tests considered for regression.  (A clean
bill happens too rarely for non-primary targets, for long times,
for reasons beyond port maintainer powers.)

Also, you may have to fight release maintainers for the
"regression" definition.  Previous arguments have been along the
line of "it's not a regression if there hasn't been a release
with the test for that functionality passing".

brgds, H-P

Re: GCC Buildbot Update - Definition of regression

2017-10-11 Thread Joseph Myers

On Wed, 11 Oct 2017, Martin Sebor wrote:

> I don't have a strong opinion on the definition of a Regression
> in this context but I would very much like to see status changes
> highlighted in the test results to indicate that something that

There are lots of things that are useful *if* you have someone actively 
reviewing them for every test run (maybe several times a day) and alerting 
people / filing bugs in Bugzilla if there are problems.  Some of those 
things, however, are likely to have too many false positives for a display 
people can quickly look at to see if the build is red or green, or for 
automatically telling people their patch broke something.

If we can clean up results for each system the bot runs tests on - 
XFAILing and filing bugs in Bugzilla for failures where there isn't a 
reasonably simple and obvious fix - we can make green mean "no FAILs or 
ERRORs" (remembering the possibility that with very broken testing, 
sometimes an ERROR might only be in the .log not the .sum).  Other 
differences (such as PASS -> UNSUPPORTED) can then be reviewed manually by 
someone who takes responsibility for doing so, resulting in bugs being 
filed if appropriate, without affecting the basic red/green status.

(Variants such as green meaning "no FAILs or ERRORs, except for failing 
guality tests where there should be no regressions" are possible as well, 
for cases like that where PASS/FAIL status depends on non-GCC components 
and meaningfully selective XFAILing is hard.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: GCC Buildbot Update - Definition of regression

2017-10-11 Thread Andreas Schwab

On Okt 10 2017, Joseph Myers  wrote:

> Anything else -> FAIL and new FAILing tests aren't regressions at the 
> individual test level, but may be treated as such at the whole testsuite 
> level.

An ICE FAIL is a regression, but this is always a new test.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: GCC Buildbot Update - Definition of regression

2017-10-11 Thread Martin Sebor


PASS -> ANY ; Test moves away from PASS


No, only a regression if the destination result is FAIL (if it's
UNRESOLVED then there might be a separate regression - execution test
becoming UNRESOLVED should be accompanied by compilation becoming FAIL).
If it's XFAIL, it might formally be a regression, but one already being
tracked in another way (presumably Bugzilla) which should not turn the bot
red.  If it's XPASS, that simply means XFAILing conditions slightly wider
than necessary in order to mark failure in another configuration as
expected.

My suggestion is:

PASS -> FAIL is an unambiguous regression.

Anything else -> FAIL and new FAILing tests aren't regressions at the
individual test level, but may be treated as such at the whole testsuite
level.


I don't have a strong opinion on the definition of a Regression
in this context but I would very much like to see status changes
highlighted in the test results to indicate that something that
worked before no longer works as well, to help us spot the kinds
of problems I've run into and a had trouble with.  (Showing the
SVN revision number along with each transition would be great.)
Here are a couple of examples.

A recent change of mine caused a test in the target_supports.exp
file to fail to detect attribute ifunc support.  That in turn
prevented regression tests for the attribute from being compiled
(changed them from PASS to UNSUPPORTED) which ultimately masked
a bug my change had introduced.

My script that looks for regressions in my own test results would
normally catch this before I commit such a change.  Unfortunately,
the script ignores results with the UNSUPPORTED status, so this
bug slipped in unnoticed.

Regardless of whether or not these types of errors are considered
Regressions, highlighting them perhaps in different colors would
be helpful.


Any transition where the destination result is not FAIL is not a
regression.

ERRORs in the .sum or .log files should be watched out for as well,
however, as sometimes they may indicate broken Tcl syntax in the
testsuite, which may cause many tests not to be run.


Yes, please.  I had a problem happen with a test with a bad DejaGnu
directive.  The test failed in an non-obvious way (I think it caused
an ERROR in the log) which caused a small number of tests that ran
after it to fail.  Because of parallel make (I run tests with make
-j96) the failing tests changed from one run of the test suite to
the next and the whole problem ended up being quite hard to debug.
(The ultimate root cause was a stray backslash in a dj-warning
directive introduced by copying and pasting between an Emacs session
in one terminal and a via session in another.  The backslash was in
column 80 and so virtually impossible to see.)

Martin

Re: GCC Buildbot Update - Definition of regression

2017-10-11 Thread Marc Glisse


On Wed, 11 Oct 2017, David Malcolm wrote:


On Wed, 2017-10-11 at 11:18 +0200, Paulo Matos wrote:


On 11/10/17 11:15, Christophe Lyon wrote:


You can have a look at
https://git.linaro.org/toolchain/gcc-compare-results.git/
where compare_tests is a patched version of the contrib/ script,
it calls the main perl script (which is not the prettiest thing :-)



Thanks, that's useful. I will take a look.


You may also want to look at this script I wrote:

 https://github.com/davidmalcolm/jamais-vu

(it has Python classes for working with DejaGnu output)


By the way, David, how do you handle comparisons for the jit testsuite? jv 
gives


Tests that went away in build/gcc/testsuite/jit/jit.sum: 81
---

 PASS:  t
 PASS:  test-
 PASS:  test-arith-overflow.c
 PASS:  test-arith-overflow.c.exe iteration 1 of 5: verify_uint_over
 PASS:  test-arith-overflow.c.exe iteration 2 of 5: verify_uint_o
 PASS:  test-arith-overflow.c.exe iteration 3 of 5: verify
[...]

Tests appeared in build/gcc/testsuite/jit/jit.sum: 78
-

 PASS:  test-arith-overflow.c.exe iteration 1
 PASS:  test-arith-overflow.c.exe iteration 2 of
 PASS:  test-arith-overflow.c.exe iteration 4 of 5: verify_u
 PASS:  test-combination.
 PASS:  test-combination.c.exe it
[...]

The issue is more likely in the testsuite, but I assume you have a 
workflow that allows working around the issue?


--
Marc Glisse

Re: GCC Buildbot Update - Definition of regression

2017-10-11 Thread Joseph Myers

On Wed, 11 Oct 2017, Christophe Lyon wrote:

> * {PASS,UNSUPPORTED,UNTESTED,UNRESOLVED}-> XPASS

I don't think any of these should be considered regressions.  It's good if 
someone manually checks anything that's *consistently* XPASSing, to see if 
the XFAIL should be removed or restricted to narrower conditions, but if 
the result of a test has become any kind of pass, it cannot possibly be 
considered a regression.  (You might have a flaky test XFAILed because it 
passes or fails at random, though I think that random variation is more 
common for GDB than for GCC.)

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: GCC Buildbot Update - Definition of regression

2017-10-11 Thread Joseph Myers

On Wed, 11 Oct 2017, Paulo Matos wrote:

> On 10/10/17 23:25, Joseph Myers wrote:
> > On Tue, 10 Oct 2017, Paulo Matos wrote:
> > 
> >> new test -> FAIL; New test starts as fail
> > 
> > No, that's not a regression, but you might want to treat it as one (in the 
> > sense that it's a regression at the higher level of "testsuite run should 
> > have no unexpected failures", even if the test in question would have 
> > failed all along if added earlier and so the underlying compiler bug, if 
> > any, is not a regression).  It should have human attention to classify it 
> > and either fix the test or XFAIL it (with issue filed in Bugzilla if a 
> > bug), but it's not a regression.  (Exception: where a test failing results 
> > in its name changing, e.g. through adding "(internal compiler error)".)
> > 
> 
> When someone adds a new test to the testsuite, isn't it supposed to not
> FAIL? If is does FAIL, shouldn't this be considered a regression?

Only a regression at the whole-testsuite level (in that "no FAILs" is the 
desired state).  Not a regression in the sense of a regression bug in GCC 
that might be relevant for release management (something user-visible that 
worked in a previous GCC version but no longer works).  And if e.g. 
someone added a dg-require-effective-target (for example) line to a 
testcase, so incrementing all the line numbers in that test, every PASS / 
FAIL assertion in that test will have its line number increase by 1, so 
being renamed, so resulting in spurious detection of a regression if you 
consider new FAILs as regressions (even at the whole-testsuite level, an 
increased line number on an existing FAIL is not meaningfully a 
regression).

> For this reason all of this issues need to be taken care straight away

Well, I think it *does* make sense to do sufficient analysis on existing 
FAILs to decide if they are testsuite issues or compiler bugs, fix if they 
are testsuite issues and XFAIL with reference to a bug in Bugzilla if 
compiler bugs.  That is, try to get to the point where no-FAILs is the 
normal expected testsuite state and it's Bugzilla, not 
expected-FAILs-not-marked-as-XFAIL, that is used to track regressions and 
other bugs.

> By not being unique, you mean between languages?

Yes (e.g. c-c++-common tests in both gcc and g++ tests might have the same 
name in both .sum files, but should still be counted as different tests).

> I assume that two gcc.sum from different builds will always refer to the
> same test/configuration when referring to (for example):
> PASS: gcc.c-torture/compile/2105-1.c   -O1  (test for excess errors)

The problem is when e.g. multiple diagnostics are being tested for on the 
same line but the "test name" field in the dg-* directive is an empty 
string for all of them.  One possible approach is to automatically (in 
your regression checking scripts) append a serial number to the first, 
second, third etc. cases of any given repeated test name in a .sum file.  
Or you could count such duplicates as being errors that automatically 
result in red test results, and get fixes for them into GCC as soon as 
possible.

-- 
Joseph S. Myers
jos...@codesourcery.com

Re: GCC Buildbot Update - Definition of regression

2017-10-11 Thread David Malcolm

On Wed, 2017-10-11 at 11:18 +0200, Paulo Matos wrote:
> 
> On 11/10/17 11:15, Christophe Lyon wrote:
> > 
> > You can have a look at
> > https://git.linaro.org/toolchain/gcc-compare-results.git/
> > where compare_tests is a patched version of the contrib/ script,
> > it calls the main perl script (which is not the prettiest thing :-)
> > 
> 
> Thanks, that's useful. I will take a look.

You may also want to look at this script I wrote:

  https://github.com/davidmalcolm/jamais-vu

(it has Python classes for working with DejaGnu output)

Dave

Re: GCC Buildbot Update - Definition of regression

2017-10-11 Thread Jonathan Wakely

On 11 October 2017 at 07:34, Paulo Matos wrote:
> When someone adds a new test to the testsuite, isn't it supposed to not
> FAIL?

Yes, but sometimes it FAILs because the test is using a new feature
that only works on some targets, and the new test was missing the
right directives to make it UNSUPPORTED on other targets.

> If is does FAIL, shouldn't this be considered a regression?

No, it's not a regression, because it's not something that used to
work and now fails.

Maybe it should still be flagged as red, but it's not strictly a
regression. I would call it a "new failure" rather than regression.

Re: GCC Buildbot Update - Definition of regression

2017-10-11 Thread Paulo Matos



On 11/10/17 11:15, Christophe Lyon wrote:
> 
> You can have a look at
> https://git.linaro.org/toolchain/gcc-compare-results.git/
> where compare_tests is a patched version of the contrib/ script,
> it calls the main perl script (which is not the prettiest thing :-)
> 

Thanks, that's useful. I will take a look.

-- 
Paulo Matos

Re: GCC Buildbot Update - Definition of regression

2017-10-11 Thread Christophe Lyon

On 11 October 2017 at 11:03, Paulo Matos  wrote:
>
>
> On 11/10/17 10:35, Christophe Lyon wrote:
>>
>> FWIW, we consider regressions:
>> * any->FAIL because we don't want such a regression at the whole testsuite 
>> level
>> * any->UNRESOLVED for the same reason
>> * {PASS,UNSUPPORTED,UNTESTED,UNRESOLVED}-> XPASS
>> * new XPASS
>> * XFAIL disappears (may mean that a testcase was removed, worth a manual 
>> check)
>> * ERRORS
>>
>
> That's certainly stricter than what it was proposed by Joseph. I will
> run a few tests on historical data to see what I get using both approaches.
>
>>
>>
 ERRORs in the .sum or .log files should be watched out for as well,
 however, as sometimes they may indicate broken Tcl syntax in the
 testsuite, which may cause many tests not to be run.

 Note that the test names that come after PASS:, FAIL: etc. aren't unique
 between different .sum files, so you need to associate tests with a tuple
 (.sum file, test name) (and even then, sometimes multiple tests in a .sum
 file have the same name, but that's a testsuite bug).  If you're using
 --target_board options that run tests for more than one multilib in the
 same testsuite run, add the multilib to that tuple as well.

>>>
>>> Thanks for all the comments. Sounds sensible.
>>> By not being unique, you mean between languages?
>> Yes, but not only as Joseph mentioned above.
>>
>> You have the obvious example of c-c++-common/*san tests, which are
>> common to gcc and g++.
>>
>>> I assume that two gcc.sum from different builds will always refer to the
>>> same test/configuration when referring to (for example):
>>> PASS: gcc.c-torture/compile/2105-1.c   -O1  (test for excess errors)
>>>
>>> In this case, I assume that "gcc.c-torture/compile/2105-1.c   -O1
>>> (test for excess errors)" will always be referring to the same thing.
>>>
>> In gcc.sum, I can see 4 occurrences of
>> PASS: gcc.dg/Werror-13.c  (test for errors, line )
>>
>> Actually, there are quite a few others like that
>>
>
> That actually surprised me.
>
> I also see:
> PASS: gcc.dg/Werror-13.c  (test for errors, line )
> PASS: gcc.dg/Werror-13.c  (test for errors, line )
> PASS: gcc.dg/Werror-13.c  (test for errors, line )
> PASS: gcc.dg/Werror-13.c  (test for errors, line )
>
> among others like it. Looks like a line number is missing?
>
> In any case, it feels like the code I have to track this down needs to
> be improved.
>
We had to derive our scripts from the ones in contrib/ because these
failed to handle some cases (eg when a same test reports
both PASS and FAIL, yes it does happen).

You can have a look at
https://git.linaro.org/toolchain/gcc-compare-results.git/
where compare_tests is a patched version of the contrib/ script,
it calls the main perl script (which is not the prettiest thing :-)

Christophe

> --
> Paulo Matos

Re: GCC Buildbot Update - Definition of regression

2017-10-11 Thread Paulo Matos



On 11/10/17 10:35, Christophe Lyon wrote:
> 
> FWIW, we consider regressions:
> * any->FAIL because we don't want such a regression at the whole testsuite 
> level
> * any->UNRESOLVED for the same reason
> * {PASS,UNSUPPORTED,UNTESTED,UNRESOLVED}-> XPASS
> * new XPASS
> * XFAIL disappears (may mean that a testcase was removed, worth a manual 
> check)
> * ERRORS
> 

That's certainly stricter than what it was proposed by Joseph. I will
run a few tests on historical data to see what I get using both approaches.

> 
> 
>>> ERRORs in the .sum or .log files should be watched out for as well,
>>> however, as sometimes they may indicate broken Tcl syntax in the
>>> testsuite, which may cause many tests not to be run.
>>>
>>> Note that the test names that come after PASS:, FAIL: etc. aren't unique
>>> between different .sum files, so you need to associate tests with a tuple
>>> (.sum file, test name) (and even then, sometimes multiple tests in a .sum
>>> file have the same name, but that's a testsuite bug).  If you're using
>>> --target_board options that run tests for more than one multilib in the
>>> same testsuite run, add the multilib to that tuple as well.
>>>
>>
>> Thanks for all the comments. Sounds sensible.
>> By not being unique, you mean between languages?
> Yes, but not only as Joseph mentioned above.
> 
> You have the obvious example of c-c++-common/*san tests, which are
> common to gcc and g++.
> 
>> I assume that two gcc.sum from different builds will always refer to the
>> same test/configuration when referring to (for example):
>> PASS: gcc.c-torture/compile/2105-1.c   -O1  (test for excess errors)
>>
>> In this case, I assume that "gcc.c-torture/compile/2105-1.c   -O1
>> (test for excess errors)" will always be referring to the same thing.
>>
> In gcc.sum, I can see 4 occurrences of
> PASS: gcc.dg/Werror-13.c  (test for errors, line )
> 
> Actually, there are quite a few others like that
> 

That actually surprised me.

I also see:
PASS: gcc.dg/Werror-13.c  (test for errors, line )
PASS: gcc.dg/Werror-13.c  (test for errors, line )
PASS: gcc.dg/Werror-13.c  (test for errors, line )
PASS: gcc.dg/Werror-13.c  (test for errors, line )

among others like it. Looks like a line number is missing?

In any case, it feels like the code I have to track this down needs to
be improved.

-- 
Paulo Matos

Re: GCC Buildbot Update - Definition of regression

2017-10-11 Thread Christophe Lyon

On 11 October 2017 at 08:34, Paulo Matos  wrote:
>
>
> On 10/10/17 23:25, Joseph Myers wrote:
>> On Tue, 10 Oct 2017, Paulo Matos wrote:
>>
>>> new test -> FAIL; New test starts as fail
>>
>> No, that's not a regression, but you might want to treat it as one (in the
>> sense that it's a regression at the higher level of "testsuite run should
>> have no unexpected failures", even if the test in question would have
>> failed all along if added earlier and so the underlying compiler bug, if
>> any, is not a regression).  It should have human attention to classify it
>> and either fix the test or XFAIL it (with issue filed in Bugzilla if a
>> bug), but it's not a regression.  (Exception: where a test failing results
>> in its name changing, e.g. through adding "(internal compiler error)".)
>>
>
> When someone adds a new test to the testsuite, isn't it supposed to not
> FAIL? If is does FAIL, shouldn't this be considered a regression?
>
> Now, the danger is that since regressions are comparisons with previous
> run something like this would happen:
>
> run1:
> ...
> FAIL: foo.c ; new test
> ...
>
> run1 fails because new test entered as a FAIL
>
> run2:
> ...
> FAIL: foo.c
> ...
>
> run2 succeeds because there are no changes.
>
> For this reason all of this issues need to be taken care straight away
> or they become part of the 'normal' status and no more failures are
> issued... unless of course a more complex regression analysis is
> implemented.
>
Agreed.

> Also, when I mean, run1 fails or succeeds this is just the term I use to
> display red/green in the buildbot interface for a given build, not
> necessarily what I expect the process will do.
>
>>
>> My suggestion is:
>>
>> PASS -> FAIL is an unambiguous regression.
>>
>> Anything else -> FAIL and new FAILing tests aren't regressions at the
>> individual test level, but may be treated as such at the whole testsuite
>> level.
>>
>> Any transition where the destination result is not FAIL is not a
>> regression.
>>

FWIW, we consider regressions:
* any->FAIL because we don't want such a regression at the whole testsuite level
* any->UNRESOLVED for the same reason
* {PASS,UNSUPPORTED,UNTESTED,UNRESOLVED}-> XPASS
* new XPASS
* XFAIL disappears (may mean that a testcase was removed, worth a manual check)
* ERRORS



>> ERRORs in the .sum or .log files should be watched out for as well,
>> however, as sometimes they may indicate broken Tcl syntax in the
>> testsuite, which may cause many tests not to be run.
>>
>> Note that the test names that come after PASS:, FAIL: etc. aren't unique
>> between different .sum files, so you need to associate tests with a tuple
>> (.sum file, test name) (and even then, sometimes multiple tests in a .sum
>> file have the same name, but that's a testsuite bug).  If you're using
>> --target_board options that run tests for more than one multilib in the
>> same testsuite run, add the multilib to that tuple as well.
>>
>
> Thanks for all the comments. Sounds sensible.
> By not being unique, you mean between languages?
Yes, but not only as Joseph mentioned above.

You have the obvious example of c-c++-common/*san tests, which are
common to gcc and g++.

> I assume that two gcc.sum from different builds will always refer to the
> same test/configuration when referring to (for example):
> PASS: gcc.c-torture/compile/2105-1.c   -O1  (test for excess errors)
>
> In this case, I assume that "gcc.c-torture/compile/2105-1.c   -O1
> (test for excess errors)" will always be referring to the same thing.
>
In gcc.sum, I can see 4 occurrences of
PASS: gcc.dg/Werror-13.c  (test for errors, line )

Actually, there are quite a few others like that

Christophe

> --
> Paulo Matos

Re: GCC Buildbot Update - Definition of regression

2017-10-10 Thread Markus Trippelsdorf

On 2017.10.11 at 08:22 +0200, Paulo Matos wrote:
> 
> 
> On 11/10/17 06:17, Markus Trippelsdorf wrote:
> > On 2017.10.10 at 21:45 +0200, Paulo Matos wrote:
> >> Hi all,
> >>
> >> It's almost 3 weeks since I last posted on GCC Buildbot. Here's an update:
> >>
> >> * 3 x86_64 workers from CF are now installed;
> >> * There's one scheduler for trunk doing fresh builds for every Daily bump;
> >> * One scheduler doing incremental builds for each active branch;
> >> * An IRC bot which is currently silent;
> > 
> > Using -j8 for the bot on a 8/16 (core/thread) machine like gcc67 is not
> > acceptable, because it will render it unusable for everybody else.
> 
> I was going to correct you on that given what I read in
> https://gcc.gnu.org/wiki/CompileFarm#Usage
> 
> but it was my mistake. I assumed that for an N-thread machine, I could
> use N/2 processes but the guide explicitly says N-core, not N-thread.
> Therefore I should be using 4 processes for gcc67 (or 0 given what follows).
> 
> I will fix also the number of processes used by the other workers.

Thanks. And while you are at it please set the niceness to 19.

> > Also gcc67 has a buggy Ryzen CPU that causes random gcc crashes. Not the
> > best setup for a regression tester...
> > 
> 
> Is that documented anywhere? I will remove this worker.

https://community.amd.com/thread/215773

-- 
Markus

Re: GCC Buildbot Update - Definition of regression

2017-10-10 Thread Paulo Matos

On 10/10/17 23:25, Joseph Myers wrote:
> On Tue, 10 Oct 2017, Paulo Matos wrote:
> 
>> new test -> FAIL; New test starts as fail
> 
> No, that's not a regression, but you might want to treat it as one (in the 
> sense that it's a regression at the higher level of "testsuite run should 
> have no unexpected failures", even if the test in question would have 
> failed all along if added earlier and so the underlying compiler bug, if 
> any, is not a regression).  It should have human attention to classify it 
> and either fix the test or XFAIL it (with issue filed in Bugzilla if a 
> bug), but it's not a regression.  (Exception: where a test failing results 
> in its name changing, e.g. through adding "(internal compiler error)".)
> 

When someone adds a new test to the testsuite, isn't it supposed to not
FAIL? If is does FAIL, shouldn't this be considered a regression?

Now, the danger is that since regressions are comparisons with previous
run something like this would happen:

run1:
...
FAIL: foo.c ; new test
...

run1 fails because new test entered as a FAIL

run2:
...
FAIL: foo.c
...

run2 succeeds because there are no changes.

For this reason all of this issues need to be taken care straight away
or they become part of the 'normal' status and no more failures are
issued... unless of course a more complex regression analysis is
implemented.

Also, when I mean, run1 fails or succeeds this is just the term I use to
display red/green in the buildbot interface for a given build, not
necessarily what I expect the process will do.

> 
> My suggestion is:
> 
> PASS -> FAIL is an unambiguous regression.
> 
> Anything else -> FAIL and new FAILing tests aren't regressions at the 
> individual test level, but may be treated as such at the whole testsuite 
> level.
> 
> Any transition where the destination result is not FAIL is not a 
> regression.
> 
> ERRORs in the .sum or .log files should be watched out for as well, 
> however, as sometimes they may indicate broken Tcl syntax in the 
> testsuite, which may cause many tests not to be run.
> 
> Note that the test names that come after PASS:, FAIL: etc. aren't unique 
> between different .sum files, so you need to associate tests with a tuple 
> (.sum file, test name) (and even then, sometimes multiple tests in a .sum 
> file have the same name, but that's a testsuite bug).  If you're using 
> --target_board options that run tests for more than one multilib in the 
> same testsuite run, add the multilib to that tuple as well.
> 

Thanks for all the comments. Sounds sensible.
By not being unique, you mean between languages?
I assume that two gcc.sum from different builds will always refer to the
same test/configuration when referring to (for example):
PASS: gcc.c-torture/compile/2105-1.c   -O1  (test for excess errors)

In this case, I assume that "gcc.c-torture/compile/2105-1.c   -O1
(test for excess errors)" will always be referring to the same thing.

-- 
Paulo Matos

Re: GCC Buildbot Update - Definition of regression

2017-10-10 Thread Paulo Matos



On 11/10/17 06:17, Markus Trippelsdorf wrote:
> On 2017.10.10 at 21:45 +0200, Paulo Matos wrote:
>> Hi all,
>>
>> It's almost 3 weeks since I last posted on GCC Buildbot. Here's an update:
>>
>> * 3 x86_64 workers from CF are now installed;
>> * There's one scheduler for trunk doing fresh builds for every Daily bump;
>> * One scheduler doing incremental builds for each active branch;
>> * An IRC bot which is currently silent;
> 
> Using -j8 for the bot on a 8/16 (core/thread) machine like gcc67 is not
> acceptable, because it will render it unusable for everybody else.

I was going to correct you on that given what I read in
https://gcc.gnu.org/wiki/CompileFarm#Usage

but it was my mistake. I assumed that for an N-thread machine, I could
use N/2 processes but the guide explicitly says N-core, not N-thread.
Therefore I should be using 4 processes for gcc67 (or 0 given what follows).

I will fix also the number of processes used by the other workers.

> Also gcc67 has a buggy Ryzen CPU that causes random gcc crashes. Not the
> best setup for a regression tester...
> 

Is that documented anywhere? I will remove this worker.

Thanks,

-- 
Paulo Matos

Re: GCC Buildbot Update - Definition of regression

2017-10-10 Thread Markus Trippelsdorf

On 2017.10.10 at 21:45 +0200, Paulo Matos wrote:
> Hi all,
> 
> It's almost 3 weeks since I last posted on GCC Buildbot. Here's an update:
> 
> * 3 x86_64 workers from CF are now installed;
> * There's one scheduler for trunk doing fresh builds for every Daily bump;
> * One scheduler doing incremental builds for each active branch;
> * An IRC bot which is currently silent;

Using -j8 for the bot on a 8/16 (core/thread) machine like gcc67 is not
acceptable, because it will render it unusable for everybody else.
Also gcc67 has a buggy Ryzen CPU that causes random gcc crashes. Not the
best setup for a regression tester...

-- 
Markus

Re: GCC Buildbot Update - Definition of regression

2017-10-10 Thread Joseph Myers

On Tue, 10 Oct 2017, Paulo Matos wrote:

> ANY -> no test  ; Test disappears

No, that's not a regression.  Simply adding a line to a testcase will 
change the line number that appears in the PASS / FAIL line for an 
individual assertion therein.  Or the names will change when e.g. 
-std=c++2a becomes -std=c++20 and all the tests with a C++ standard 
version in them change their names.  Or if a bogus test is removed.

> ANY / XPASS -> XPASS; Test goes from any status other than XPASS
> to XPASS
> ANY / KPASS -> KPASS; Test goes from any status other than KPASS
> to KPASS

No, that's not a regression.  It's inevitable that XFAILing conditions may 
sometimes be broader than ideal, if it's not possible to describe the 
exact failure conditions to the testsuite, and so sometimes a test may 
reasonably XPASS.  Such tests *may* sometimes be candidates for a more 
precise XFAIL condition, but they aren't regressions.

> new test -> FAIL; New test starts as fail

No, that's not a regression, but you might want to treat it as one (in the 
sense that it's a regression at the higher level of "testsuite run should 
have no unexpected failures", even if the test in question would have 
failed all along if added earlier and so the underlying compiler bug, if 
any, is not a regression).  It should have human attention to classify it 
and either fix the test or XFAIL it (with issue filed in Bugzilla if a 
bug), but it's not a regression.  (Exception: where a test failing results 
in its name changing, e.g. through adding "(internal compiler error)".)

> PASS -> ANY ; Test moves away from PASS

No, only a regression if the destination result is FAIL (if it's 
UNRESOLVED then there might be a separate regression - execution test 
becoming UNRESOLVED should be accompanied by compilation becoming FAIL).  
If it's XFAIL, it might formally be a regression, but one already being 
tracked in another way (presumably Bugzilla) which should not turn the bot 
red.  If it's XPASS, that simply means XFAILing conditions slightly wider 
than necessary in order to mark failure in another configuration as 
expected.

My suggestion is:

PASS -> FAIL is an unambiguous regression.

Anything else -> FAIL and new FAILing tests aren't regressions at the 
individual test level, but may be treated as such at the whole testsuite 
level.

Any transition where the destination result is not FAIL is not a 
regression.

ERRORs in the .sum or .log files should be watched out for as well, 
however, as sometimes they may indicate broken Tcl syntax in the 
testsuite, which may cause many tests not to be run.

Note that the test names that come after PASS:, FAIL: etc. aren't unique 
between different .sum files, so you need to associate tests with a tuple 
(.sum file, test name) (and even then, sometimes multiple tests in a .sum 
file have the same name, but that's a testsuite bug).  If you're using 
--target_board options that run tests for more than one multilib in the 
same testsuite run, add the multilib to that tuple as well.

-- 
Joseph S. Myers
jos...@codesourcery.com

GCC Buildbot Update - Definition of regression

2017-10-10 Thread Paulo Matos

Hi all,

It's almost 3 weeks since I last posted on GCC Buildbot. Here's an update:

* 3 x86_64 workers from CF are now installed;
* There's one scheduler for trunk doing fresh builds for every Daily bump;
* One scheduler doing incremental builds for each active branch;
* An IRC bot which is currently silent;

The next steps are:
* Enable LNT (I have installed this but have yet to connect to buildbot)
for tracking performance benchmarks over time -- it should come up as
http://gcc-lnt.linki.tools in the near future.
* Enable regression analysis --- This is fundamental. I understand that
without this the buildbot is pretty useless so it has highest priority.
However, I would like some agreement as to what in GCC should be
considered a regression. Each test in deja gnu can have several status:
FAIL, PASS, UNSUPPORTED, UNTESTED, XPASS, KPASS, XFAIL, KFAIL, UNRESOLVED

Since GCC doesn't have a 'clean bill' of test results we need to analyse
the sum files for the current run and compare with the last run of the
same branch. I have written down that if for each test there's a
transition that looks like the following, then a regression exists and
the test run should be marked as failure.

ANY -> no test  ; Test disappears
ANY / XPASS -> XPASS; Test goes from any status other than XPASS
to XPASS
ANY / KPASS -> KPASS; Test goes from any status other than KPASS
to KPASS
new test -> FAIL; New test starts as fail
PASS -> ANY ; Test moves away from PASS

This is a suggestion. I am keen to have corrections from people who use
this on a daily basis and/or have a better understanding of each status.

As soon as we reach a consensus, I will deploy this analysis and enable
IRC bot to notify on the #gcc channel the results of the tests.

-- 
Paulo Matos

45 matches

Mail list logo