Bug#907171: prometheus FTBFS:

2018-09-01 Thread Adrian Bunk
Control: severity -1 normal

On Mon, Aug 27, 2018 at 07:18:30PM +0100, Martín Ferrari wrote:
> On 27/08/18 17:08, Adrian Bunk wrote:
> 
> > How often did you try?
> > 
> > I would say the probability to hit is somewhere around 50%:
> > https://tests.reproducible-builds.org/debian/history/prometheus.html
> 
> I just tried 20 builds, while using desktop applications that put some
> extra strain on tHE CPU, and none failed..

It didn't fail this way on the buildds, so I agree that whatever 
triggers this problem does not look RC right now.

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed



Bug#907171: prometheus FTBFS:

2018-08-30 Thread Ian Campbell
On Thu, 2018-08-30 at 13:23 +0100, Martín Ferrari wrote:

> > Unless the tests are run in their own network namespace (which provides
> > some guarantees over what else might be bound to a port, I can't seem
> > to find logs which would confirm or deny if a netns was in use here
> > though) probably the test needs to use a random port or otherwise be
> > tollerant of 9090 not being available. 

> Right, but this can't be done without root, and buildds do not give root.

I don't think that applies to the "needs to use a random port or otherwise be
tollerant of 9090 not being available" which was the main thrust of my comment.

> > Maybe there is (or should be) an autopkgtest flag to request a clean
> > netns be used?
> 
> This discussion is about a FTBFS bug, not about the CI system. ALso,
> that would not solve the issue, as it is autopkgtest installing (and
> therefore starting) prometheus.

So the issue is a parallel (but unrelated) autopkgtest run on the same
host interfering with builds which are happening at the same time? (I
had thought the autopkgtest was part of the build hence → FTBFS, seems
I was misunderstanding then, sorry for the noise on that front)

Ian.



Bug#907171: prometheus FTBFS:

2018-08-30 Thread Martín Ferrari
Ian,

On 30/08/18 09:51, Ian Campbell wrote:

> Does `nc -l -p 9090` repro the test failure perhaps?

yes, of course.

> Unless the tests are run in their own network namespace (which provides
> some guarantees over what else might be bound to a port, I can't seem
> to find logs which would confirm or deny if a netns was in use here
> though) probably the test needs to use a random port or otherwise be
> tollerant of 9090 not being available. 

Right, but this can't be done without root, and buildds do not give root.

> Maybe there is (or should be) an autopkgtest flag to request a clean
> netns be used?

This discussion is about a FTBFS bug, not about the CI system. ALso,
that would not solve the issue, as it is autopkgtest installing (and
therefore starting) prometheus.

-- 
Martín Ferrari (Tincho)



Bug#907171: prometheus FTBFS:

2018-08-30 Thread Ian Campbell
On Mon, 2018-08-27 at 19:18 +0100, Martín Ferrari wrote:
> On 27/08/18 17:08, Adrian Bunk wrote:
> 
> > How often did you try?
> > 
> > I would say the probability to hit is somewhere around 50%:
> > https://tests.reproducible-builds.org/debian/history/prometheus.html
> 
> I just tried 20 builds, while using desktop applications that put some
> extra strain on tHE CPU, and none failed..

Port 9090 isn't reserved or privileged at all afaik so it's not out of
the question that some other random outgoing socket connection might be
using it as a source port (or listening on it) at any given point,
whether it is prom running on the host or something else.

Does `nc -l -p 9090` repro the test failure perhaps?

Unless the tests are run in their own network namespace (which provides
some guarantees over what else might be bound to a port, I can't seem
to find logs which would confirm or deny if a netns was in use here
though) probably the test needs to use a random port or otherwise be
tollerant of 9090 not being available. 

Maybe there is (or should be) an autopkgtest flag to request a clean
netns be used?

Ian.



Bug#907171: prometheus FTBFS:

2018-08-27 Thread Martín Ferrari
On 27/08/18 17:08, Adrian Bunk wrote:

> How often did you try?
> 
> I would say the probability to hit is somewhere around 50%:
> https://tests.reproducible-builds.org/debian/history/prometheus.html

I just tried 20 builds, while using desktop applications that put some
extra strain on tHE CPU, and none failed..


-- 
Martín Ferrari (Tincho)



Bug#907171: prometheus FTBFS:

2018-08-27 Thread Adrian Bunk
On Mon, Aug 27, 2018 at 04:57:49PM +0100, Martín Ferrari wrote:
> On 27/08/18 16:51, Adrian Bunk wrote:
> 
> >>> panic: Can't start web handler:listen tcp :9090: bind: address
> already in use
> 
> > I am not talking about autopkgtests, I am talking about a FTBFS I was
> > able to reproduce locally.
> > 
> > But retrying a few more times, this might actually be a flaky test?
> 
> I have not seen this error before, I can only imagine you had some old
> process lying around, or somehow the port was still in use? Otherwise,
> it must be some concurrency issue, but as I said, I have not been able
> to reproduce this.

How often did you try?

I would say the probability to hit is somewhere around 50%:
https://tests.reproducible-builds.org/debian/history/prometheus.html

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed



Bug#907171: prometheus FTBFS:

2018-08-27 Thread Martín Ferrari
On 27/08/18 16:51, Adrian Bunk wrote:

>>> panic: Can't start web handler:listen tcp :9090: bind: address
already in use

> I am not talking about autopkgtests, I am talking about a FTBFS I was
> able to reproduce locally.
> 
> But retrying a few more times, this might actually be a flaky test?

I have not seen this error before, I can only imagine you had some old
process lying around, or somehow the port was still in use? Otherwise,
it must be some concurrency issue, but as I said, I have not been able
to reproduce this.

-- 
Martín Ferrari (Tincho)



Bug#907171: prometheus FTBFS:

2018-08-27 Thread Martín Ferrari
Santiago,

On 27/08/18 16:00, Santiago Vila wrote:

>> This is a known problem when running the autopkgtests, becasue the
>> installed package starts the daemon and then the tests fail when trying
>> to use the same port; but this is not a FTBFS: the build daemons do not
>> fail to build from source. I think you should lower the severity.
> 
> Tincho, if we go that route we would be applying a definition for FTBFS
> which would be different from everybody else.

> A FTBFS could be defined as a non-zero exit status from
> dpkg-buildpackage. If you include the test suite as part of the build,
> then a failure in the test suite will naturally mean a FTBFS bug, for
> all purposes.

Yes, but this is not what is happening: the build only fails if
prometheus is already running in the machine, which is the case for
autopkgtest, but not for a normal build. I plan to fix this, but I need
to find a general fix for all go packages that include a daemon.

Look at the buildd logs, there are no failures (except i386, which is
known not to work):
https://buildd.debian.org/status/package.php?p=prometheus

> If, as you explain, the tests are not suitable to be run after
> dh_auto_build, then it follows that they should be disabled from the
> package build, because we don't want dpkg-buildpackage to exit with
> error. In such case, just an empty debian/rules target like this:
If you run dpkg-buildpackage in a clean chroot, it does not fail: I have
just checked again; so this bug is not RC. There are a few other go
packages that fail autopkgtests for the exact same reason, please do not
open FTBFS bugs without actually trying to build the package.

-- 
Martín Ferrari (Tincho)



Bug#907171: prometheus FTBFS:

2018-08-27 Thread Adrian Bunk
On Mon, Aug 27, 2018 at 03:27:07PM +0100, Martín Ferrari wrote:
> Adrian,

Hi Martín,

> On 24/08/18 13:47, Adrian Bunk wrote:
> 
> > https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/prometheus.html
> > 
> > ...
> > panic: Can't start web handler:listen tcp :9090: bind: address already in 
> > use
> 
> This is a known problem when running the autopkgtests, becasue the
> installed package starts the daemon and then the tests fail when trying
> to use the same port; but this is not a FTBFS: the build daemons do not
> fail to build from source. I think you should lower the severity.

I am not talking about autopkgtests, I am talking about a FTBFS I was
able to reproduce locally.

But retrying a few more times, this might actually be a flaky test?

cu
Adrian

-- 

   "Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
   "Only a promise," Lao Er said.
   Pearl S. Buck - Dragon Seed



Bug#907171: prometheus FTBFS:

2018-08-27 Thread Santiago Vila
On Mon, 27 Aug 2018, Martín Ferrari wrote:

> Adrian,
> 
> On 24/08/18 13:47, Adrian Bunk wrote:
> 
> > https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/prometheus.html
> > 
> > ...
> > panic: Can't start web handler:listen tcp :9090: bind: address already in 
> > use
> 
> This is a known problem when running the autopkgtests, becasue the
> installed package starts the daemon and then the tests fail when trying
> to use the same port; but this is not a FTBFS: the build daemons do not
> fail to build from source. I think you should lower the severity.

Tincho, if we go that route we would be applying a definition for FTBFS
which would be different from everybody else.

A FTBFS could be defined as a non-zero exit status from
dpkg-buildpackage. If you include the test suite as part of the build,
then a failure in the test suite will naturally mean a FTBFS bug, for
all purposes.

If, as you explain, the tests are not suitable to be run after
dh_auto_build, then it follows that they should be disabled from the
package build, because we don't want dpkg-buildpackage to exit with
error. In such case, just an empty debian/rules target like this:

override_dh_auto_test:

would do the trick.

(But of course disabling only the failing tests would be better).

Thanks.



Bug#907171: prometheus FTBFS:

2018-08-27 Thread Martín Ferrari
Adrian,

On 24/08/18 13:47, Adrian Bunk wrote:

> https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/prometheus.html
> 
> ...
> panic: Can't start web handler:listen tcp :9090: bind: address already in use

This is a known problem when running the autopkgtests, becasue the
installed package starts the daemon and then the tests fail when trying
to use the same port; but this is not a FTBFS: the build daemons do not
fail to build from source. I think you should lower the severity.

-- 
Martín Ferrari (Tincho)



Bug#907171: prometheus FTBFS:

2018-08-24 Thread Adrian Bunk
Source: prometheus
Version: 2.3.2+ds-1
Severity: serious
Tags: ftbfs

https://tests.reproducible-builds.org/debian/rb-pkg/unstable/amd64/prometheus.html

...
panic: Can't start web handler:listen tcp :9090: bind: address already in use

goroutine 229 [running]:
github.com/prometheus/prometheus/web.TestReadyAndHealthy.func2(0xc420304300)

/build/1st/prometheus-2.3.2+ds/build/src/github.com/prometheus/prometheus/web/web_test.go:118
 +0xf3
created by github.com/prometheus/prometheus/web.TestReadyAndHealthy

/build/1st/prometheus-2.3.2+ds/build/src/github.com/prometheus/prometheus/web/web_test.go:115
 +0x2f2
FAILgithub.com/prometheus/prometheus/web0.203s
=== RUN   TestEndpoints
=== RUN   TestEndpoints/local
=== RUN   TestEndpoints/remote
--- PASS: TestEndpoints (0.14s)
--- PASS: TestEndpoints/local (0.01s)
api_test.go:629: run 0  GET "query=2=123.4"
api_test.go:629: run 0  POST"query=2=123.4"
api_test.go:629: run 1  GET 
"query=0.333=1970-01-01T00%3A02%3A03Z"
api_test.go:629: run 1  POST
"query=0.333=1970-01-01T00%3A02%3A03Z"
api_test.go:629: run 2  GET 
"query=0.333=1970-01-01T01%3A02%3A03%2B01%3A00"
api_test.go:629: run 2  POST
"query=0.333=1970-01-01T01%3A02%3A03%2B01%3A00"
api_test.go:629: run 3  GET "query=0.333"
api_test.go:629: run 3  POST"query=0.333"
api_test.go:629: run 4  GET "end=2=time%28%29=0=1"
api_test.go:629: run 4  POST"end=2=time%28%29=0=1"
api_test.go:629: run 5  GET "end=2=time%28%29=1"
api_test.go:629: run 5  POST"end=2=time%28%29=1"
api_test.go:629: run 6  GET "query=time%28%29=0=1"
api_test.go:629: run 6  POST"query=time%28%29=0=1"
api_test.go:629: run 7  GET "end=2=time%28%29=0"
api_test.go:629: run 7  POST"end=2=time%28%29=0"
api_test.go:629: run 8  GET 
"query=invalid%5D%5Bquery=1970-01-01T01%3A02%3A03%2B01%3A00"
api_test.go:629: run 8  POST
"query=invalid%5D%5Bquery=1970-01-01T01%3A02%3A03%2B01%3A00"
api_test.go:629: run 9  GET 
"end=100=invalid%5D%5Bquery=0=1"
api_test.go:629: run 9  POST
"end=100=invalid%5D%5Bquery=0=1"
api_test.go:629: run 10 GET "end=2=time%28%29=1=0"
api_test.go:629: run 10 POST"end=2=time%28%29=1=0"
api_test.go:629: run 11 GET "end=1=time%28%29=2=1"
api_test.go:629: run 11 POST"end=1=time%28%29=2=1"
api_test.go:629: run 12 GET 
"end=1489667272.372=time%28%29=148966367200.372=1"
api_test.go:629: run 12 POST
"end=1489667272.372=time%28%29=148966367200.372=1"
api_test.go:629: run 13 GET "match%5B%5D=test_metric2"
api_test.go:629: run 14 GET 
"match%5B%5D=test_metric1%7Bfoo%3D~%22.%2Bo%22%7D"
api_test.go:629: run 15 GET 
"match%5B%5D=test_metric1%7Bfoo%3D~%22.%2Bo%24%22%7D%5B%5D=test_metric1%7Bfoo%3D~%22.%2Bo%22%7D"
api_test.go:629: run 16 GET 
"match%5B%5D=test_metric1%7Bfoo%3D~%22.%2Bo%22%7D%5B%5D=none"
api_test.go:629: run 17 GET 
"end=-1%5B%5D=test_metric2=-2"
api_test.go:629: run 18 GET 
"end=11%5B%5D=test_metric2=10"
api_test.go:629: run 19 GET 
"end=10%5B%5D=test_metric2=-1"
api_test.go:629: run 20 GET 
"end=100%5B%5D=test_metric2=1"
api_test.go:629: run 21 GET 
"end=10%5B%5D=test_metric2=1"
api_test.go:629: run 22 GET 
"end=1%5B%5D=test_metric2=-1"
api_test.go:629: run 23 GET ""
api_test.go:629: run 24 GET ""
api_test.go:629: run 25 GET ""
api_test.go:629: run 26 GET ""
api_test.go:629: run 27 GET ""
api_test.go:629: run 28 GET ""
api_test.go:629: run 29 GET ""
api_test.go:629: run 30 GET ""
api_test.go:629: run 31 GET ""
--- PASS: TestEndpoints/remote (0.01s)
api_test.go:629: run 0  GET "query=2=123.4"
api_test.go:629: run 0  POST"query=2=123.4"
api_test.go:629: run 1  GET 
"query=0.333=1970-01-01T00%3A02%3A03Z"
api_test.go:629: run 1  POST
"query=0.333=1970-01-01T00%3A02%3A03Z"
api_test.go:629: run 2  GET 
"query=0.333=1970-01-01T01%3A02%3A03%2B01%3A00"
api_test.go:629: run 2  POST
"query=0.333=1970-01-01T01%3A02%3A03%2B01%3A00"
api_test.go:629: run 3  GET "query=0.333"
api_test.go:629: run 3  POST"query=0.333"
api_test.go:629: run 4  GET "end=2=time%28%29=0=1"
api_test.go:629: run 4  POST"end=2=time%28%29=0=1"
api_test.go:629: run 5  GET "end=2=time%28%29=1"
api_test.go:629: run 5  POST"end=2=time%28%29=1"
api_test.go:629: run 6  GET "query=time%28%29=0=1"
api_test.go:629: run 6  POST"query=time%28%29=0=1"
api_test.go:629: run 7  GET "end=2=time%28%29=0"
api_test.go:629: run 7  POST