Re: [HACKERS] brin regression test intermittent failures

2015-06-04 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Evidently there is a problem right there.  If I simply add an order by
  tenthous as proposed by Peter, many more errors appear; and what errors
  appear differs if I change shared_buffers.  I think the real fix for
  this is to change the hand-picked values used in the brinopers table, so
  that they all pass the test using some reasonable ORDER BY specification
  in the populating query (probably tenk1.unique1).
 
 I may be confused, but why would the physical ordering of the table
 entries make a difference to the correct answers for this test?
 (I can certainly see why that might break the brin code, but not
 why it should change the seqscan's answers.)

We create the brintest using a scan of tenk1 LIMIT 100, without
specifying the order.  So whether we find rows that match each test query
is pure chance.

 Also, what I'd just noticed is that all of the cases that are failing are
 ones where the expected number of matching rows is exactly 1.  I am
 wondering if the test is sometimes just missing random rows, and we're not
 seeing any reported problem unless that makes it go down to no rows.  (But
 I do not know how that could simultaneously affect the seqscan case ...)

Yeah, we compare the ctid sets of the results, and we assume that a
seqscan would get that correctly.

 I think it would be a good idea to extend the brinopers table to include
 the number of expected matches, and to complain if that's not what we got,
 rather than simply checking for zero.

That sounds reasonable.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-06-04 Thread Tom Lane
I wrote:
 I think it would be a good idea to extend the brinopers table to include
 the number of expected matches, and to complain if that's not what we got,
 rather than simply checking for zero.

Also, further experimentation shows that there are about 30 entries in the
brinopers table that give rise to seqscan plans even when we're commanding
a bitmap scan, presumably because those operators aren't brin-indexable.
They're not the problematic cases, but things like 

((charcol)::text  'A'::text)

Is there a reason to have such things in the table, or is this just a
thinko?  Or is it actually a bug that we're getting such plans?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-06-04 Thread Alvaro Herrera
Tom Lane wrote:
 I wrote:
  I think it would be a good idea to extend the brinopers table to include
  the number of expected matches, and to complain if that's not what we got,
  rather than simply checking for zero.
 
 Also, further experimentation shows that there are about 30 entries in the
 brinopers table that give rise to seqscan plans even when we're commanding
 a bitmap scan, presumably because those operators aren't brin-indexable.
 They're not the problematic cases, but things like 
 
   ((charcol)::text  'A'::text)
 
 Is there a reason to have such things in the table, or is this just a
 thinko?  Or is it actually a bug that we're getting such plans?

No, I left those there knowing that there are no plans involving brin --
in a way, they provide some future proofing if some of those operators
are made indexable later.

I couldn't think of a way to test that the plans are actually using the
brin index or not, but if we can do that in some way, that would be
good.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-06-04 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 Also, further experimentation shows that there are about 30 entries in the
 brinopers table that give rise to seqscan plans even when we're commanding
 a bitmap scan, presumably because those operators aren't brin-indexable.
 They're not the problematic cases, but things like 
 
 ((charcol)::text  'A'::text)
 
 Is there a reason to have such things in the table, or is this just a
 thinko?  Or is it actually a bug that we're getting such plans?

 No, I left those there knowing that there are no plans involving brin --
 in a way, they provide some future proofing if some of those operators
 are made indexable later.

On closer investigation, I think the ones involving charcol are a flat
out bug in the test, namely failure to quote char.  Observe:

regression=# explain select ctid from brintest where charcol = 'A'::char;
QUERY PLAN
--
 Seq Scan on brintest  (cost=0.00..101.88 rows=1 width=6)
   Filter: ((charcol)::text = 'A'::text)
(2 rows)

regression=# explain select ctid from brintest where charcol = 'A'::char;
  QUERY PLAN   
---
 Bitmap Heap Scan on brintest  (cost=48.02..58.50 rows=3 width=6)
   Recheck Cond: (charcol = 'A'::char)
   -  Bitmap Index Scan on brinidx  (cost=0.00..48.02 rows=3 width=0)
 Index Cond: (charcol = 'A'::char)
(4 rows)

Presumably we'd like to test the latter case not the former.

The other cases that I found involve cidrcol, and seem to represent
an actual bug in the brin planning logic, ie failure to disregard a
no-op cast.  I'll look closer.

 I couldn't think of a way to test that the plans are actually using the
 brin index or not, but if we can do that in some way, that would be
 good.

Yeah, we can do that --- the way I found out there's a problem is to
modify the test script to check the output of EXPLAIN.

So at this point it looks like (1) chipmunk's issue might be explained
by lack of forced ORDER BY; (2) the test script could be improved to
test more carefully, and it has got an issue with char vs char;
(3) there might be a planner bug.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-06-04 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 I may be confused, but why would the physical ordering of the table
 entries make a difference to the correct answers for this test?
 (I can certainly see why that might break the brin code, but not
 why it should change the seqscan's answers.)

 We create the brintest using a scan of tenk1 LIMIT 100, without
 specifying the order.  So whether we find rows that match each test query
 is pure chance.

Oooh ... normally that would not matter, but I wonder if what's happening
on chipmunk is that the synchronized-seqscan logic kicks in and causes us
to read some other part of tenk1 than we normally would, as a consequence
of some concurrent activity or other.  The connection to smaller than
normal shared_buffers would be that it would change our idea of what's a
large enough table to justify using synchronized seqscans.

Peter's patch failed to hit the place where this matters, btw.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-06-04 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 Fixed, see 79f2b5d583e2e2a7; but AFAICS this has no real-world impact
 so it does not explain whatever is happening on chipmunk.

 Ah, thanks for diagnosing that.

 The chipmunk failure is strange -- notice it only references the
 = operators, except for type box for which it's ~= that fails.  The test
 includes a lot of operators ...

Actually not --- if you browse through the last half dozen failures
on chipmunk you will notice that

(1) the set of operators complained of varies a bit from one failure
to the next;

(2) more often than not, this is one of the failures:

WARNING:  no results for (boxcol,@,box,((1,2),(300,400)))

Certainly the majority of the complaints are about equality operators,
but not quite all of them.

 Also, we have quite a number of ARM boxes: apart from chipmunk we have
 gull, hamster, mereswine, dangomushi, axolotl, grison.  (hamster and
 chipmunk report hostname -m as armv6l, the others armv7l).  All of
 them are running Linux, either Fedora or Debian.  Most are using gcc,
 compilation flags look pretty standard.

I have no idea what might be different about chipmunk compared to any
other ARM buildfarm critter ... Heikki, any thoughts on that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-06-04 Thread Alvaro Herrera
Tom Lane wrote:

 Actually not --- if you browse through the last half dozen failures
 on chipmunk you will notice that
 
 (1) the set of operators complained of varies a bit from one failure
 to the next;
 
 (2) more often than not, this is one of the failures:
 
 WARNING:  no results for (boxcol,@,box,((1,2),(300,400)))
 
 Certainly the majority of the complaints are about equality operators,
 but not quite all of them.

Hm.  Well, what this message says is that we ran that query using
both BRIN and seqscan, and that in both cases no row was returned.  Note
that if the BRIN and seqscan cases had returned different sets of rows,
the error message would have been different.  So this might be related
to the way the test table is created, rather than to a bug in BRIN.
Peter G. recently pointed out that this seems to be relying on an
index-only scan on table tenk1 and suggested an ORDER BY.  Maybe that
assumption is being violated on chipmunk and so the table populated is
different than what the table actually expects.

I just noticed that chipmunk has shared_buffers=10MB on its buildfarm
config.  I don't see that in any of the other ARM animals.  Maybe that
can change the plan choice.

I will test locally with reduced shared_buffers and see if I can
reproduce the results.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-06-04 Thread Alvaro Herrera
Alvaro Herrera wrote:

 Hm.  Well, what this message says is that we ran that query using
 both BRIN and seqscan, and that in both cases no row was returned.  Note
 that if the BRIN and seqscan cases had returned different sets of rows,
 the error message would have been different.  So this might be related
 to the way the test table is created, rather than to a bug in BRIN.
 Peter G. recently pointed out that this seems to be relying on an
 index-only scan on table tenk1 and suggested an ORDER BY.  Maybe that
 assumption is being violated on chipmunk and so the table populated is
 different than what the table actually expects.

Evidently there is a problem right there.  If I simply add an order by
tenthous as proposed by Peter, many more errors appear; and what errors
appear differs if I change shared_buffers.  I think the real fix for
this is to change the hand-picked values used in the brinopers table, so
that they all pass the test using some reasonable ORDER BY specification
in the populating query (probably tenk1.unique1).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-06-04 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Evidently there is a problem right there.  If I simply add an order by
 tenthous as proposed by Peter, many more errors appear; and what errors
 appear differs if I change shared_buffers.  I think the real fix for
 this is to change the hand-picked values used in the brinopers table, so
 that they all pass the test using some reasonable ORDER BY specification
 in the populating query (probably tenk1.unique1).

I may be confused, but why would the physical ordering of the table
entries make a difference to the correct answers for this test?
(I can certainly see why that might break the brin code, but not
why it should change the seqscan's answers.)

Also, what I'd just noticed is that all of the cases that are failing are
ones where the expected number of matching rows is exactly 1.  I am
wondering if the test is sometimes just missing random rows, and we're not
seeing any reported problem unless that makes it go down to no rows.  (But
I do not know how that could simultaneously affect the seqscan case ...)

I think it would be a good idea to extend the brinopers table to include
the number of expected matches, and to complain if that's not what we got,
rather than simply checking for zero.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-06-04 Thread Tom Lane
I wrote:
 The other cases that I found involve cidrcol, and seem to represent
 an actual bug in the brin planning logic, ie failure to disregard a
 no-op cast.  I'll look closer.

I leapt to the wrong conclusion on that one.  The reason for failure to
match to an index column had nothing to do with the extra cast, and
everything to do with the fact that there was no such index column.

I think we're probably good now, though it'd be wise to keep an eye on
chipmunk for awhile to verify that it doesn't see the problem anymore.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-06-04 Thread Alvaro Herrera
Tom Lane wrote:
 I wrote:
  The other cases that I found involve cidrcol, and seem to represent
  an actual bug in the brin planning logic, ie failure to disregard a
  no-op cast.  I'll look closer.
 
 I leapt to the wrong conclusion on that one.  The reason for failure to
 match to an index column had nothing to do with the extra cast, and
 everything to do with the fact that there was no such index column.

Oops!  Thanks for reviewing this.

 I think we're probably good now, though it'd be wise to keep an eye on
 chipmunk for awhile to verify that it doesn't see the problem anymore.

Will do.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-05-26 Thread Alvaro Herrera
Tom Lane wrote:
 Peter Geoghegan p...@heroku.com writes:
  I meant to get around to looking into it, but FWIW I see BRIN-related
  Valgrind issues. e.g.:
 
 Fixed, see 79f2b5d583e2e2a7; but AFAICS this has no real-world impact
 so it does not explain whatever is happening on chipmunk.

Ah, thanks for diagnosing that.

The chipmunk failure is strange -- notice it only references the
= operators, except for type box for which it's ~= that fails.  The test
includes a lot of operators ...

Also, we have quite a number of ARM boxes: apart from chipmunk we have
gull, hamster, mereswine, dangomushi, axolotl, grison.  (hamster and
chipmunk report hostname -m as armv6l, the others armv7l).  All of
them are running Linux, either Fedora or Debian.  Most are using gcc,
compilation flags look pretty standard.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-05-25 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 There's something odd about the brin regression tests. They seem to 
 generate intermittent failures, which suggests some sort of race 
 condition or ordering failure.

 See for example 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=fulmardt=2015-05-15%2001%3A02%3A28
 and 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=sittelladt=2015-05-15%2021%3A08%3A38

I found the cause of this symptom today.  Alvaro said he'd added the
autovacuum_enabled=off option to the brintest table to prevent autovac
from screwing up this expected result ... but that only stops autovacuum
from summarizing the table.  Guess what is in the concurrently-executed
gist.sql test, at line 40.

While we could and perhaps should change that command to a more narrowly
targeted vacuum analyze gist_tbl;, this will not prevent someone from
reintroducing an untargeted vacuum command in one of the concurrent tests
later.  I think a future-proof fix would require either making brintest
a temp table (losing all test coverage of WAL logging :-() or changing
the test so that it does not expect a specific result from
brin_summarize_new_values.

Or, maybe better, let's lose the brin_summarize_new_values call
altogether.  What does it test that wouldn't be better done by
explicitly running vacuum brintest; ?

Also worth noting is that there's a completely different failure symptom
that's shown up a few times, eg here:

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunkdt=2015-05-25%2009%3A56%3A55

This makes it look like brintest sometimes contains no rows at all,
which is difficult to explain ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-05-25 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 I meant to get around to looking into it, but FWIW I see BRIN-related
 Valgrind issues. e.g.:

Fixed, see 79f2b5d583e2e2a7; but AFAICS this has no real-world impact
so it does not explain whatever is happening on chipmunk.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-05-25 Thread Tom Lane
I wrote:
 Peter Geoghegan p...@heroku.com writes:
 I meant to get around to looking into it, but FWIW I see BRIN-related
 Valgrind issues. e.g.:

 Fixed, see 79f2b5d583e2e2a7; but AFAICS this has no real-world impact
 so it does not explain whatever is happening on chipmunk.

BTW, after some further trawling in the buildfarm logs, it seem that that
no results for failure mode has been seen *only* on chipmunk, where it's
happened in roughly half the runs since that particular test case went in.
So it's definitely platform-specific.  It's less clear whether the test
case is bogus, or it's exposing a bug added elsewhere in db5f98ab4fa44bc5,
or the bug was pre-existing but not exposed by any older test case.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-05-25 Thread Peter Geoghegan
On Mon, May 25, 2015 at 3:25 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Also worth noting is that there's a completely different failure symptom
 that's shown up a few times, eg here:

 http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=chipmunkdt=2015-05-25%2009%3A56%3A55

 This makes it look like brintest sometimes contains no rows at all,
 which is difficult to explain ...

I meant to get around to looking into it, but FWIW I see BRIN-related
Valgrind issues. e.g.:

2015-05-20 23:44:42.419 PDT 14787 LOG:  statement: CREATE INDEX
brinidx ON brintest USING brin (
byteacol,
charcol,
namecol,
int8col,
int2col,
int4col,
textcol,
oidcol,
tidcol,
float4col,
float8col,
macaddrcol,
inetcol inet_inclusion_ops,
inetcol inet_minmax_ops,
bpcharcol,
datecol,
timecol,
timestampcol,
timestamptzcol,
intervalcol,
timetzcol,
bitcol,
varbitcol,
numericcol,
uuidcol,
int4rangecol,
lsncol,
boxcol
) with (pages_per_range = 1);
==14787== Unaddressable byte(s) found during client check request
==14787==at 0x7E19AD: PageAddItem (bufpage.c:314)
==14787==by 0x4693AD: brin_doinsert (brin_pageops.c:315)
==14787==by 0x46844C: form_and_insert_tuple (brin.c:1122)
==14787==by 0x4672AD: brinbuildCallback (brin.c:540)
==14787==by 0x544D35: IndexBuildHeapRangeScan (index.c:2549)
==14787==by 0x54426A: IndexBuildHeapScan (index.c:2162)
==14787==by 0x4676EA: brinbuild (brin.c:638)
==14787==by 0x944D19: OidFunctionCall3Coll (fmgr.c:1649)
==14787==by 0x543F75: index_build (index.c:2025)
==14787==by 0x542C4D: index_create (index.c:1100)
==14787==by 0x614E52: DefineIndex (indexcmds.c:605)
==14787==by 0x7F29CC: ProcessUtilitySlow (utility.c:1258)
==14787==  Address 0x6932a3a is 1,738 bytes inside a block of size 8,192 alloc'd
==14787==at 0x4C2AB80: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14787==by 0x966E86: AllocSetAlloc (aset.c:847)
==14787==by 0x969B19: palloc (mcxt.c:825)
==14787==by 0x839203: datumCopy (datum.c:171)
==14787==by 0x46D6B3: brin_minmax_add_value (brin_minmax.c:105)
==14787==by 0x944116: FunctionCall4Coll (fmgr.c:1375)
==14787==by 0x4673AF: brinbuildCallback (brin.c:562)
==14787==by 0x544D35: IndexBuildHeapRangeScan (index.c:2549)
==14787==by 0x54426A: IndexBuildHeapScan (index.c:2162)
==14787==by 0x4676EA: brinbuild (brin.c:638)
==14787==by 0x944D19: OidFunctionCall3Coll (fmgr.c:1649)
==14787==by 0x543F75: index_build (index.c:2025)
==14787==
{
   insert_a_suppression_name_here
   Memcheck:User
   fun:PageAddItem
   fun:brin_doinsert
   fun:form_and_insert_tuple
   fun:brinbuildCallback
   fun:IndexBuildHeapRangeScan
   fun:IndexBuildHeapScan
   fun:brinbuild
   fun:OidFunctionCall3Coll
   fun:index_build
   fun:index_create
   fun:DefineIndex
   fun:ProcessUtilitySlow
}
==14787== Invalid read of size 8
==14787==at 0x4C2F79E: memcpy@@GLIBC_2.14 (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14787==by 0x7E19DB: PageAddItem (bufpage.c:317)
==14787==by 0x4693AD: brin_doinsert (brin_pageops.c:315)
==14787==by 0x46844C: form_and_insert_tuple (brin.c:1122)
==14787==by 0x4672AD: brinbuildCallback (brin.c:540)
==14787==by 0x544D35: IndexBuildHeapRangeScan (index.c:2549)
==14787==by 0x54426A: IndexBuildHeapScan (index.c:2162)
==14787==by 0x4676EA: brinbuild (brin.c:638)
==14787==by 0x944D19: OidFunctionCall3Coll (fmgr.c:1649)
==14787==by 0x543F75: index_build (index.c:2025)
==14787==by 0x542C4D: index_create (index.c:1100)
==14787==by 0x614E52: DefineIndex (indexcmds.c:605)
==14787==  Address 0x6932a38 is 728 bytes inside a block of size 730
client-defined
==14787==at 0x969DC2: palloc0 (mcxt.c:864)
==14787==by 0x46B9FE: brin_form_tuple (brin_tuple.c:166)
==14787==by 0x46840B: form_and_insert_tuple (brin.c:1120)
==14787==by 0x4672AD: brinbuildCallback (brin.c:540)
==14787==by 0x544D35: IndexBuildHeapRangeScan (index.c:2549)
==14787==by 0x54426A: IndexBuildHeapScan (index.c:2162)
==14787==by 0x4676EA: brinbuild (brin.c:638)
==14787==by 0x944D19: OidFunctionCall3Coll (fmgr.c:1649)
==14787==by 0x543F75: index_build (index.c:2025)
==14787==by 0x542C4D: index_create (index.c:1100)
==14787==by 0x614E52: DefineIndex (indexcmds.c:605)
==14787==by 0x7F29CC: ProcessUtilitySlow (utility.c:1258)
==14787==
{
   insert_a_suppression_name_here
   Memcheck:Addr8
   fun:memcpy@@GLIBC_2.14
   fun:PageAddItem
   fun:brin_doinsert
   fun:form_and_insert_tuple
   fun:brinbuildCallback
   fun:IndexBuildHeapRangeScan
   fun:IndexBuildHeapScan
   fun:brinbuild
   fun:OidFunctionCall3Coll
   fun:index_build
   fun:index_create
   fun:DefineIndex
}


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-05-15 Thread Alvaro Herrera
Andrew Dunstan wrote:
 
 There's something odd about the brin regression tests. They seem to generate
 intermittent failures, which suggests some sort of race condition or
 ordering failure.
 
 See for example 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=fulmardt=2015-05-15%2001%3A02%3A28
 and 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=sittelladt=2015-05-15%2021%3A08%3A38

Yeah it's pretty odd.  I guess the way to figure out what is going on is
to get the test to print out the index contents in case of failure.
I guess I could do something with \gset.

(The way to print out the index is to use the pageinspect functions.
One problem is that at the time the brin test is run we don't have
pageinspect)

Of course, if I could reproduce the issue locally, this would be a lot
easier.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] brin regression test intermittent failures

2015-05-15 Thread Andrew Dunstan


There's something odd about the brin regression tests. They seem to 
generate intermittent failures, which suggests some sort of race 
condition or ordering failure.


See for example 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=fulmardt=2015-05-15%2001%3A02%3A28
and 
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=sittelladt=2015-05-15%2021%3A08%3A38


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-05-15 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Tom Lane wrote:
 Just from reading the documentation, couldn't the symptom we're seeing
 arise from autovacuum having hit the table right before
 brin_summarize_new_values got called?

 Well, I added a autovacuum_enabled=off to that table recently precisely
 because that was my hypothesis.  It didn't work though, so it must be
 sometihng else.

Ah.  Not having noticed that, I'd locally added a pg_sleep(60) right
before the brin_summarize_new_values call, and failed to reproduce any
problem.  So it's not AV doing something, but it sure smells like
something close to that.

Is there a good reason why we need to exercise brin_summarize_new_values
as such here, rather than just doing a manual VACUUM on the table?  And
if there is, do we really need to verify its result value?  I mean, even
without whatever sort of race condition we're talking about, that expected
result of 5 looks pretty darn phase-of-the-moon-dependent to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-05-15 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Andrew Dunstan wrote:
 There's something odd about the brin regression tests. They seem to generate
 intermittent failures, which suggests some sort of race condition or
 ordering failure.
 
 See for example 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=fulmardt=2015-05-15%2001%3A02%3A28
 and 
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=sittelladt=2015-05-15%2021%3A08%3A38

 Yeah it's pretty odd.

Oooh.  I saw the sittella failure and assumed it was triggered by the
latest BRIN additions, but that fulmar failure is from before those hit.

Just from reading the documentation, couldn't the symptom we're seeing
arise from autovacuum having hit the table right before
brin_summarize_new_values got called?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] brin regression test intermittent failures

2015-05-15 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Andrew Dunstan wrote:
  There's something odd about the brin regression tests. They seem to 
  generate
  intermittent failures, which suggests some sort of race condition or
  ordering failure.
  
  See for example 
  http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=fulmardt=2015-05-15%2001%3A02%3A28
  and 
  http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=sittelladt=2015-05-15%2021%3A08%3A38
 
  Yeah it's pretty odd.
 
 Oooh.  I saw the sittella failure and assumed it was triggered by the
 latest BRIN additions, but that fulmar failure is from before those hit.
 
 Just from reading the documentation, couldn't the symptom we're seeing
 arise from autovacuum having hit the table right before
 brin_summarize_new_values got called?

Well, I added a autovacuum_enabled=off to that table recently precisely
because that was my hypothesis.  It didn't work though, so it must be
sometihng else.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers