Re: Buildfarm failures for hash indexes: buffer leaks

2018-11-19 Thread Andres Freund
On 2018-11-19 14:38:04 -0800, Andres Freund wrote:
> Hi,
> 
> 
> On 2018-11-19 17:32:37 -0500, Tom Lane wrote:
> > Andres Freund  writes:
> > > FWIW, it seems that gcc's trunk works again. But I'm not sure this isn't
> > > just an accident and the optimization's introduced in the above revision
> > > aren't still broken.
> > > https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=moonjelly=HEAD
> > 
> > Yeah, I saw that moonjelly had gone green.  Are you going to update
> > serinus' compiler soon?
> 
> I'm using debian's gcc-snapshot package, which updates a bit less
> frequently than what Fabien does. It'll auto-update as soon as that has
> happened.
> 
> What disconcerts me a bit is that afaict there wasn't an intentional fix
> in the relevant area (disabling value range propagation fixes the
> miscompilation), so I'm not quite sure.

Ooops, somehow deleted too much when editing the sentence. I was trying
to say that I'm not sure whether this is actually fixed, or a bug was
just hidden.

The issue is "fixed" in :
https://github.com/gcc-mirror/gcc/commit/48625f587d786089ce1e245f192bcf706f8fc1fc
and had been "caused" (or surfaced) in:
https://github.com/gcc-mirror/gcc/commit/be44111ed7ea7611d324d4d570e2f2d380f784b4

Greetings,

Andres Freund



Re: Buildfarm failures for hash indexes: buffer leaks

2018-11-19 Thread Andres Freund
Hi,


On 2018-11-19 17:32:37 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > FWIW, it seems that gcc's trunk works again. But I'm not sure this isn't
> > just an accident and the optimization's introduced in the above revision
> > aren't still broken.
> > https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=moonjelly=HEAD
> 
> Yeah, I saw that moonjelly had gone green.  Are you going to update
> serinus' compiler soon?

I'm using debian's gcc-snapshot package, which updates a bit less
frequently than what Fabien does. It'll auto-update as soon as that has
happened.

What disconcerts me a bit is that afaict there wasn't an intentional fix
in the relevant area (disabling value range propagation fixes the
miscompilation), so I'm not quite sure.


> (And while I'm bugging you, would you fix your buildfarm menagerie
> to build REL_11_STABLE?)

Oh, hum. I thought I had. But I did it only for the LLVM enabled
animals. Results should be coming up soon.

Greetings,

Andres Freund



Re: Buildfarm failures for hash indexes: buffer leaks

2018-11-19 Thread Tom Lane
Andres Freund  writes:
> FWIW, it seems that gcc's trunk works again. But I'm not sure this isn't
> just an accident and the optimization's introduced in the above revision
> aren't still broken.
> https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=moonjelly=HEAD

Yeah, I saw that moonjelly had gone green.  Are you going to update
serinus' compiler soon?

(And while I'm bugging you, would you fix your buildfarm menagerie
to build REL_11_STABLE?)

regards, tom lane



Re: Buildfarm failures for hash indexes: buffer leaks

2018-11-19 Thread Andres Freund
Hi,

On 2018-11-01 22:52:19 +0100, Fabien COELHO wrote:
> 
> > > > Their commit r265375 fixed the ability to compile itself, but built
> > > > PostgreSQL binaries remain broken there and thereafter.
> > > > 
> > > > |...]
> > > 
> > > Thanks a lot for this investigation! I can fill in a gcc bug report. There
> > > would be a enormous work to narrow it down to a small test case, it is
> > > unclear how they can act about it, but at least they would know.
> > 
> > Have you done so? If so, what's the bug number?
> 
> Not yet. There was no answer to my suggestion, so I did not feel like that
> urgent. I'll try to do it over next week.

FWIW, it seems that gcc's trunk works again. But I'm not sure this isn't
just an accident and the optimization's introduced in the above revision
aren't still broken.

https://buildfarm.postgresql.org/cgi-bin/show_history.pl?nm=moonjelly=HEAD

Greetings,

Andres Freund



Re: Buildfarm failures for hash indexes: buffer leaks

2018-11-01 Thread Fabien COELHO




Their commit r265375 fixed the ability to compile itself, but built
PostgreSQL binaries remain broken there and thereafter.

|...]


Thanks a lot for this investigation! I can fill in a gcc bug report. There
would be a enormous work to narrow it down to a small test case, it is
unclear how they can act about it, but at least they would know.


Have you done so? If so, what's the bug number?


Not yet. There was no answer to my suggestion, so I did not feel like 
that urgent. I'll try to do it over next week.


--
Fabien.



Re: Buildfarm failures for hash indexes: buffer leaks

2018-11-01 Thread Andres Freund
On 2018-10-27 08:18:12 +0200, Fabien COELHO wrote:
> 
> Hello Jeff,
> 
> > > I suspect the easiest thing to narrow it down would be to bisect the
> > > problem in gcc :(
> > 
> > Their commit r265241 is what broke the PostgreSQL build.  It also broke the
> > compiler itself--at that commit it was no longer possible to build itself.
> > I had to --disable-bootstrap in order to get a r265241 compiler to test
> > PostgreSQL on.
> 
> It seems they have done a API change around some kind of "range" analysis,
> which must have been incomplete.
> 
> > Their commit r265375 fixed the ability to compile itself, but built
> > PostgreSQL binaries remain broken there and thereafter.
> > 
> > |...]
> 
> Thanks a lot for this investigation! I can fill in a gcc bug report. There
> would be a enormous work to narrow it down to a small test case, it is
> unclear how they can act about it, but at least they would know.

Have you done so? If so, what's the bug number?

Greetings,

Andres Freund



Re: Buildfarm failures for hash indexes: buffer leaks

2018-10-27 Thread Fabien COELHO



Hello Jeff,


I suspect the easiest thing to narrow it down would be to bisect the
problem in gcc :(


Their commit r265241 is what broke the PostgreSQL build.  It also broke the
compiler itself--at that commit it was no longer possible to build itself.
I had to --disable-bootstrap in order to get a r265241 compiler to test
PostgreSQL on.


It seems they have done a API change around some kind of "range" analysis, 
which must have been incomplete.



Their commit r265375 fixed the ability to compile itself, but built
PostgreSQL binaries remain broken there and thereafter.

|...]


Thanks a lot for this investigation! I can fill in a gcc bug report. There 
would be a enormous work to narrow it down to a small test case, it is 
unclear how they can act about it, but at least they would know.


--
Fabien.



Re: Buildfarm failures for hash indexes: buffer leaks

2018-10-26 Thread Jeff Janes
On Tue, Oct 23, 2018 at 10:51 AM Andres Freund  wrote:

> On 2018-10-23 13:54:31 +0200, Fabien COELHO wrote:
> >
> > Hello Tom & Amit,
> >
> > > > > Both animals use gcc experimental versions, which may rather
> underline a
> > > > > new bug in gcc head rather than an existing issue in pg. Or not.
> > >
> > > > It is possible, but what could be the possible theory?
> > >
> > > It seems like the two feasible theories are (1) gcc bug, or (2) buffer
> > > leak that only occurs in very narrow circumstances, perhaps from a race
> > > condition.  Given that the hash index code hasn't changed meaningfully
> > > in several months, I thought (1) seemed more probable.
> >
> > Yep, that is my thought as well.
>
>
> FWIW, my animal 'serinus', which runs debian's gcc-snapshot shows the same
> problem:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus=2018-10-22%2006%3A34%3A02
>
> So it seems much more likely to be 1).
>
>
> > The problem is that this kind of issue is not simple to wrap-up as a gcc
> bug
> > report, unlike other earlier instances that I forwarded to clang & gcc
> dev
> > teams.
> >
> > I'm in favor in waiting before trying to report it, to check whether the
> > probable underlying gcc problem is detected, reported by someone else,
> and
> > fixed in gcc head. If it persists, then we'll see.
>
> I suspect the easiest thing to narrow it down would be to bisect the
> problem in gcc :(
>

Their commit r265241 is what broke the PostgreSQL build.  It also broke the
compiler itself--at that commit it was no longer possible to build itself.
I had to --disable-bootstrap in order to get a r265241 compiler to test
PostgreSQL on.

Their commit r265375 fixed the ability to compile itself, but built
PostgreSQL binaries remain broken there and thereafter.

I ran this on a AWS c5d.4xlarge
ubuntu/images/hvm-ssd/ubuntu-bionic-18.04-amd64-server-20180912
(ami-0f65671a86f061fcd)

Configuring PostgreSQL with ./configure --enable-cassert is necessary in
order for the subsequent "make check" to experience the failure.

I'm using PostgreSQL v12dev commit 5953c99697621174f, but the problem is
not sensitive to that and reproduces back to v10.0.

I haven't the slightest idea what to do with this information, and GCC's
cryptic SVN commit messages don't offer much insight to me.

Cheers,

Jeff


Re: Buildfarm failures for hash indexes: buffer leaks

2018-10-23 Thread Michael Paquier
On Tue, Oct 23, 2018 at 07:50:57AM -0700, Andres Freund wrote:
> FWIW, my animal 'serinus', which runs debian's gcc-snapshot shows the same 
> problem:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus=2018-10-22%2006%3A34%3A02
> 
> So it seems much more likely to be 1).

Thanks all for looking at it.  Indeed it looks like a compiler issue
here.
--
Michael


signature.asc
Description: PGP signature


Re: Buildfarm failures for hash indexes: buffer leaks

2018-10-23 Thread Andres Freund
On 2018-10-23 13:54:31 +0200, Fabien COELHO wrote:
> 
> Hello Tom & Amit,
> 
> > > > Both animals use gcc experimental versions, which may rather underline a
> > > > new bug in gcc head rather than an existing issue in pg. Or not.
> > 
> > > It is possible, but what could be the possible theory?
> > 
> > It seems like the two feasible theories are (1) gcc bug, or (2) buffer
> > leak that only occurs in very narrow circumstances, perhaps from a race
> > condition.  Given that the hash index code hasn't changed meaningfully
> > in several months, I thought (1) seemed more probable.
> 
> Yep, that is my thought as well.


FWIW, my animal 'serinus', which runs debian's gcc-snapshot shows the same 
problem:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus=2018-10-22%2006%3A34%3A02

So it seems much more likely to be 1).


> The problem is that this kind of issue is not simple to wrap-up as a gcc bug
> report, unlike other earlier instances that I forwarded to clang & gcc dev
> teams.
> 
> I'm in favor in waiting before trying to report it, to check whether the
> probable underlying gcc problem is detected, reported by someone else, and
> fixed in gcc head. If it persists, then we'll see.

I suspect the easiest thing to narrow it down would be to bisect the
problem in gcc :(

Greetings,

Andres Freund



Re: Buildfarm failures for hash indexes: buffer leaks

2018-10-23 Thread Fabien COELHO



Hello Tom & Amit,


Both animals use gcc experimental versions, which may rather underline a
new bug in gcc head rather than an existing issue in pg. Or not.



It is possible, but what could be the possible theory?


It seems like the two feasible theories are (1) gcc bug, or (2) buffer
leak that only occurs in very narrow circumstances, perhaps from a race
condition.  Given that the hash index code hasn't changed meaningfully
in several months, I thought (1) seemed more probable.


Yep, that is my thought as well.

The problem is that this kind of issue is not simple to wrap-up as a gcc 
bug report, unlike other earlier instances that I forwarded to clang & gcc 
dev teams.


I'm in favor in waiting before trying to report it, to check whether the 
probable underlying gcc problem is detected, reported by someone else, and 
fixed in gcc head. If it persists, then we'll see.


--
Fabien.



Re: Buildfarm failures for hash indexes: buffer leaks

2018-10-23 Thread Tom Lane
Amit Kapila  writes:
> On Mon, Oct 22, 2018 at 1:42 PM Fabien COELHO  wrote:
>> Both animals use gcc experimental versions, which may rather underline a
>> new bug in gcc head rather than an existing issue in pg. Or not.

> It is possible, but what could be the possible theory?

It seems like the two feasible theories are (1) gcc bug, or (2) buffer
leak that only occurs in very narrow circumstances, perhaps from a race
condition.  Given that the hash index code hasn't changed meaningfully
in several months, I thought (1) seemed more probable.

regards, tom lane



Re: Buildfarm failures for hash indexes: buffer leaks

2018-10-23 Thread Amit Kapila
On Mon, Oct 22, 2018 at 1:42 PM Fabien COELHO  wrote:
>
> Hello Michaël,
>
> > The first failure is unrelated to the involved commits, as they touched
> > completely different areas of the code:
> >  INSERT INTO hash_split_heap SELECT a/2 FROM generate_series(1, 25000) a;
> > + WARNING:  buffer refcount leak: [6481] (rel=base/16384/32349, 
> > blockNum=156, flags=0x9380, refcount=1 1)
> >
> > And versions older than HEAD do not complain.
> >
> > Any thoughts?
>
> Both animals use gcc experimental versions, which may rather underline a
> new bug in gcc head rather than an existing issue in pg. Or not.
>

It is possible, but what could be the possible theory?  The warning
indicates that somewhere we forgot to call ReleaseBuffer.  Today, I
had reviewed at the hash index code related to test case that is
failing but didn't find any obvious problem.  What should we our next
step?  Do we want to change gcc version and see?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Buildfarm failures for hash indexes: buffer leaks

2018-10-22 Thread Fabien COELHO


Hello Michaël,


The first failure is unrelated to the involved commits, as they touched
completely different areas of the code:
 INSERT INTO hash_split_heap SELECT a/2 FROM generate_series(1, 25000) a;
+ WARNING:  buffer refcount leak: [6481] (rel=base/16384/32349, blockNum=156, 
flags=0x9380, refcount=1 1)

And versions older than HEAD do not complain.

Any thoughts?


Both animals use gcc experimental versions, which may rather underline a 
new bug in gcc head rather than an existing issue in pg. Or not.


--
Fabien.

Buildfarm failures for hash indexes: buffer leaks

2018-10-22 Thread Michael Paquier
Hi all,

Looking at the buildfarm, serinus and moonjelly have just complained
about the failure of $subject:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=serinus=2018-10-22%2006%3A34%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=moonjelly=2018-10-20%2015%3A17%3A02

The first failure is unrelated to the involved commits, as they touched
completely different areas of the code:
  INSERT INTO hash_split_heap SELECT a/2 FROM generate_series(1, 25000) a;
+ WARNING:  buffer refcount leak: [6481] (rel=base/16384/32349, blockNum=156, 
flags=0x9380, refcount=1 1)

And versions older than HEAD do not complain.

Any thoughts?
--
Michael


signature.asc
Description: PGP signature