Re: Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace

2018-09-05 Thread Jimmy Yih
>
> Something which would be good to have for all those queries is a set of
> isolation tests.  No need for multiple specs, you could just use one
> spec with one session defining all the object types you would like to
> work on.  How did you find this object list?  Did you test all the
> objects available manually?
>
Attached the isolation spec file.  I originally was only going to fix the
simple CREATE TYPE scenario but decided to look up other objects that can
reside in namespaces and ended up finding a handful of others.  I tested
each one manually before and after adding the AccessShareLock acquire on
the schema.

I think that line of thought leads to an enormous increase in locking
> overhead, for which we'd get little if any gain in usability.  So my
> inclination is to make an engineering judgment that we won't fix this.
>
As I was creating this patch, I had similar feelings on the locking
overhead and was curious how others would feel about it as well.

Regards,
Jimmy


On Tue, Sep 4, 2018 at 10:05 PM, Tom Lane  wrote:

> Andres Freund  writes:
> > On September 4, 2018 9:11:25 PM PDT, Tom Lane  wrote:
> >> I think that line of thought leads to an enormous increase in locking
> >> overhead, for which we'd get little if any gain in usability.  So my
> >> inclination is to make an engineering judgment that we won't fix this.
>
> > Haven't we already significantly started down this road, to avoid a lot
> of the "tuple concurrently updated" type errors?
>
> Not that I'm aware of.  We do not take locks on schemas, nor functions,
> nor any other of the object types I mentioned.
>
> > Would expanding this a git further really be that noticeable?
>
> Frankly, I think it would be not so much "noticeable" as "disastrous".
>
> Making the overhead tolerable would require very large compromises
> in coverage, perhaps like "we'll only lock during DDL not DML".
> At which point I'd question why bother.  We've seen no field reports
> (that I can recall offhand, anyway) that trace to not locking these
> objects.
>
> regards, tom lane
>


concurrent-schema-drop.spec
Description: Binary data


Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-05 Thread Peter Eisentraut
On 05/09/2018 02:51, Andres Freund wrote:
> My current proposal is thus to do add a check that does
> #if defined(__clang__) && defined(__i386__) && !defined(__SSE2_MATH__)
> something-that-fails
> #endif
> in an autoconf test, and have configure complain if that
> fails. Something roughly along the lines of
> "Compiling PostgreSQL with clang, on 32bit x86, requires SSE2 support. Use 
> -msse2 or use gcc."

Would this not be invalidated if the bug you have filed gets fixed?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: A strange GiST error message or fillfactor of GiST build

2018-09-05 Thread Kyotaro HORIGUCHI
At Tue, 4 Sep 2018 13:05:55 +0300, Alexander Korotkov 
 wrote in 

> On Tue, Sep 4, 2018 at 12:16 PM Kyotaro HORIGUCHI
>  wrote:
> > I agree that fillfactor should be ignored in certain cases
> > (inserting the first tuple into empty pages or something like
> > that). Even though GiST doesn't need fillfactor at all, it is
> > defined independently from AM instances
> 
> What exactly do you mean?  Yes, fillfactor is defined in StdRdOptions
> struct, which is shared across many access methods.  But some of them
> uses fillfactor, while others don't.  For instance, GIN and BRIN don't
> support fillfactor at all.

Yes. It was with the intent of keeping Andrey's use case. So I
proposed that just disabling it rather than removing the code at
all.

> > and it gives some
> > (undocumented) convenient on testing even on GiST.
> 
> Do you keep in the mind some particular use cases?  If so, could you
> please share them.  It would let me and others understand what
> behavior we need to preserve and why.

I misunderstood the consensus here, I myself don't have a proper
one. The consensus here is:

fillfactor is useless for GiST.
We don't preserve it for Andrey's use case.
No one is objecting to ignoring it here.

Is it right?

Well, I propose the first attached instaed. It just removes
fillfactor handling from GiST.

Apart from the fillfactor removal, we have an internal error for
rootsplit failure caused by too-long tuples, but this should be a
user error message like gistSplit. (index row size %zu exeeds..)

=# create table y as  select cube(array(SELECT random() as a FROM 
generate_series(1,600))) from generate_series(1,1e3,1); 
=# create index on y using gist(cube);
ERROR:  failed to add item to index page in "y_cube_idx"

The attached second patch changes the  message in the case.

ERROR:  size of 2 index rows (9640 bytes) exceeds maximum 8152 of the root page 
for the index "y_cube_idx"

This is one of my first motivation here but now this might be
another issue.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From e5abf14d5d7de4256a7d40a0e5783655a99c2515 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 5 Sep 2018 09:49:58 +0900
Subject: [PATCH 1/2] Ignore fillfactor in GiST

fillfactor for GiST doesn't work as expected since GiST doesn't
perform sorted bulk insertion to pack pages as tight as
possible. Therefore we remove the fillfactor capability from it. It is
still allowed to be set for backward compatibility but just ignored.
---
 doc/src/sgml/ref/create_index.sgml |  8 +++-
 src/backend/access/common/reloptions.c |  4 ++--
 src/backend/access/gist/gist.c | 13 ++---
 src/backend/access/gist/gistbuild.c| 18 +++---
 src/backend/access/gist/gistutil.c |  5 ++---
 src/include/access/gist_private.h  | 12 +++-
 6 files changed, 23 insertions(+), 37 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 3c1223b324..e2a77e0d4d 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -390,7 +390,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
 

- The B-tree, hash, GiST and SP-GiST index methods all accept this parameter:
+ The B-tree, hash and SP-GiST index methods all accept this parameter:

 

@@ -412,6 +412,12 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
+ 
+   
+ GiST also accepts this parameter just for historical reasons but
+ ignores it.
+   
+ 
 


diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index db84da0678..9c9589a78e 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -186,12 +186,12 @@ static relopt_int intRelOpts[] =
 	{
 		{
 			"fillfactor",
-			"Packs gist index pages only to this percentage",
+			"This is ignored but exists for historical reasons",
 			RELOPT_KIND_GIST,
 			ShareUpdateExclusiveLock	/* since it applies only to later
 		 * inserts */
 		},
-		GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100
+		0, 10, 100
 	},
 	{
 		{
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 8a42effdf7..5915ab2cf2 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -172,7 +172,7 @@ gistinsert(Relation r, Datum *values, bool *isnull,
 		 values, isnull, true /* size is currently bogus */ );
 	itup->t_tid = *ht_ctid;
 
-	gistdoinsert(r, itup, 0, giststate);
+	gistdoinsert(r, itup, giststate);
 
 	/* cleanup */
 	MemoryContextSwitchTo(oldCxt);
@@ -212,7 +212,7 @@ gistinsert(Relation r, Datum *values, bool *isnull,
  * Returns 'true' if the page was split, 'false' otherwise.
  */
 bool
-gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
+gistplacetopage(Relation rel, GISTSTATE *giststate,
 Buffer buffer,
 IndexTuple *itup, int ntup, OffsetNumber o

Re: A strange GiST error message or fillfactor of GiST build

2018-09-05 Thread Andrey Borodin



> 5 сент. 2018 г., в 13:29, Kyotaro HORIGUCHI  
> написал(а):
> 
> We don't preserve it for Andrey's use case.
Just my 2 cents: that was a hacky use case for development reasons. I think 
that removing fillfactor is good idea and your latest patch looks good from my 
POV.

Best regards, Andrey Borodin.


Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

2018-09-05 Thread Alexander Korotkov
On Wed, Sep 5, 2018 at 1:45 AM R, Siva  wrote:
> On Tue, Sep 4, 2018 at 09:16 PM, Alexander Korotkov 
>  wrote:
> > Do you have a test scenario for reproduction of this issue?  We need
> > it to ensure that fix is correct.
>
> Unfortunately, I do not have a way of reproducing this issue.
> So far I have tried a workload consisting of inserts (of the
> same attribute value that is indexed), batch deletes of rows
> and vacuum interleaved with engine crash/restarts.

Issue reproduction and testing is essential for bug fix.  Remember
last time you reported GIN bug [1], after issue reproduction it
appears that we have more things to fix.  I's quite clear for me that
if segment list contains GIN_SEGMENT_INSERT before GIN_SEGMENT_DELETE,
then it might lead to wrong behavior in ginRedoRecompress().  But it's
not yet clear to understand what code patch could lead to such segment
list... I'll explore code more and probably will come with some idea.

Links
[1] https://www.postgresql.org/message-id/flat/1531867212836.63354%40amazon.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-09-05 Thread Alexander Korotkov
On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
 wrote:
> On Mon, Sep 3, 2018 at 2:30 PM, Alexander Korotkov 
>  wrote:
>>
>> The current version of patch doesn't really distinguish spaces and
>> delimiters in format string in non-FX mode.  So, spaces and delimiters
>> are forming single group.  For me Oracle behavior is ridiculous at
>> least because it doesn't allow cases when input string exactly matches
>> format string.
>>
>> This one fails:
>> SELECT to_timestamp('2018- -01 02', '- -MM DD') FROM dual
>
> Related to below - since this works today it should continue to work.  I was 
> under the wrong impression that we followed their behavior today and were 
> pondering deviating from it because of its ridiculousness.

Right, that's just incompatibility with Oracle behavior, not with our
previous behavior.

>> > The couple of regression tests that change do so for the better.  It would 
>> > be illuminating to set this up as two patches though, one introducing all 
>> > of the new regression tests against the current code and then a second 
>> > patch with the changed behavior with only the affected tests.
>>
>> OK, here you go.  0001-to_timestamp-regression-test-v17.patch
>> introduces changes in regression tests and their output for current
>> master, while 0002-to_timestamp-format-checking-v17.patch contain
>> changes to to_timestamp itself.
>>
>
> From those results the question is how important is it to force the following 
> breakage on our users (i.e., introduce FX exact symbol matching):
>
> SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
> - to_timestamp
> ---
> - Sun Feb 16 00:00:00 1997 PST
> -(1 row)
> -
> +ERROR:  unexpected character "/", expected character ":"
> +HINT:  In FX mode, punctuation in the input string must exactly match the 
> format string.
>
> There seemed to be some implicit approvals of this breakage some 30 emails 
> and 10 months ago but given that this is the only change from a correct 
> result to a failure I'd like to officially put it out there for opinion/vote 
> gathering.   Mine is a -1; though keeping the distinction between space and 
> non-alphanumeric characters is expected.

Do I understand correctly that you're -1 to changes to FX mode, but no
objection to changes in non-FX mode?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



PostgreSQL does not start when max_files_per_process> 1200 on Windows 7

2018-09-05 Thread Victor Spirin

Hi,

I have a bug in Windows 7 with max_files_per_process> 1200.

Using dup (0) in the  function count_usable_fds more than 1200 times (0 
= stdin) causes further unpredictable errors with file operations.


When I open a real file and use its descriptor for the dup, no error 
occurs. In the patch I check the file postgresql.conf



--
Victor Spirin
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 8dd51f1..79054e5 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -828,6 +828,15 @@ count_usable_fds(int max_to_probe, int *usable_fds, int 
*already_open)
ereport(WARNING, (errmsg("getrlimit failed: %m")));
 #endif /* HAVE_GETRLIMIT */
 
+   int fd_test = 0;
+#ifdef WIN32
+   /*
+* we have error on Windows7 with max_files_per_process > 1200 when 
dup(0) - stdin
+* make test on postgresql.conf file
+*/
+   fd_test = _open(ConfigFileName, _O_RDONLY);
+   if (fd_test < 0) fd_test = 0; /* if was error then = stdin */
+#endif
/* dup until failure or probe limit reached */
for (;;)
{
@@ -843,7 +852,7 @@ count_usable_fds(int max_to_probe, int *usable_fds, int 
*already_open)
break;
 #endif
 
-   thisfd = dup(0);
+   thisfd = dup(fd_test);
if (thisfd < 0)
{
/* Expect EMFILE or ENFILE, else it's fishy */
@@ -869,7 +878,9 @@ count_usable_fds(int max_to_probe, int *usable_fds, int 
*already_open)
/* release the files we opened */
for (j = 0; j < used; j++)
close(fd[j]);
-
+#ifdef WIN32
+   if (fd_test > 0) _close(fd_test);
+#endif
pfree(fd);
 
/*


Re: Collation versioning

2018-09-05 Thread Christoph Berg
Re: Thomas Munro 2018-09-04 

> I was reminded about that by recent news
> about an upcoming glibc/CLDR resync that is likely to affect
> PostgreSQL users (though, I guess, probably only when they do a major
> OS upgrade).

Or replicating/restoring a database to a newer host.

> ... or, on a Debian system using the locales package, like this:
> 
> libc_collation_version_command = 'dpkg -s locales | grep Version: |
> sed "s/Version: //"'

Ugh. This sounds horribly easy to get wrong on the user side. I could
of course put that preconfigured into the Debian packages, but that
would leave everyone not using any of the standard distro packagings
in the rain.

> Does anyone know
> of a way to extract a version string from glibc using existing
> interfaces?  I heard there was an undocumented way but I haven't been
> able to find it -- probably because I was, erm, looking in the
> documentation.

That sounds more robust. Googling around:

https://www.linuxquestions.org/questions/linux-software-2/how-to-check-glibc-version-263103/

#include 
#include 
int main (void) { puts (gnu_get_libc_version ()); return 0; }

$ ./a.out
2.27

Hopefully that version info is fine-grained enough.

Christoph



Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2018-09-05 Thread Etsuro Fujita

(2018/08/30 21:58), Etsuro Fujita wrote:

(2018/08/30 20:37), Kyotaro HORIGUCHI wrote:

At Fri, 24 Aug 2018 21:45:35 +0900, Etsuro
Fujita wrote
in<5b7ffdef.6020...@lab.ntt.co.jp>

(2018/08/21 11:01), Kyotaro HORIGUCHI wrote:

At Tue, 14 Aug 2018 20:49:02 +0900, Etsuro
Fujita wrote
in<5b72c1ae.8010...@lab.ntt.co.jp>

(2018/08/09 22:04), Etsuro Fujita wrote:

(2018/08/08 17:30), Kyotaro HORIGUCHI wrote:



I spent more time looking at the patch. ISTM that the patch well
suppresses the effect of the tuple-descriptor expansion by making
changes to code in the planner and executor (and ruleutils.c), but I'm
still not sure that the patch is the right direction to go in, because
ISTM that expanding the tuple descriptor on the fly might be a wart.



The exapansion should be safe if the expanded descriptor has the
same defitions for base columns and all the extended coulumns are
junks. The junk columns should be ignored by unrelated nodes and
they are passed safely as far as ForeignModify passes tuples as
is from underlying ForeignScan to ForeignUpdate/Delete.


I'm not sure that would be really safe. Does that work well when
EvalPlanQual, for example?


I was wrong here; I assumed here that we supported late locking for an 
UPDATE or DELETE on a foreign table, and I was a bit concerned that the 
approach you proposed might not work well with EvalPlanQual, but as 
described in fdwhandler.sgml, the core doesn't support for that:


 For an UPDATE or DELETE on a 
foreign table, it
 is recommended that the ForeignScan operation 
on the target
 table perform early locking on the rows that it fetches, perhaps 
via the
 equivalent of SELECT FOR UPDATE.  An FDW can 
detect whether
 a table is an UPDATE/DELETE 
target at plan time
 by comparing its relid to 
root->parse->resultRelation,
 or at execution time by using 
ExecRelationIsTargetRelation().

 An alternative possibility is to perform late locking within the
 ExecForeignUpdate or 
ExecForeignDelete

 callback, but no special support is provided for this.

So, there would be no need to consider about EvalPlanQual.  Sorry for 
the noise.


Best regards,
Etsuro Fujita



Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

2018-09-05 Thread Alexander Korotkov
On Wed, Sep 5, 2018 at 12:26 PM Alexander Korotkov
 wrote:
> On Wed, Sep 5, 2018 at 1:45 AM R, Siva  wrote:
> > On Tue, Sep 4, 2018 at 09:16 PM, Alexander Korotkov 
> >  wrote:
> > > Do you have a test scenario for reproduction of this issue?  We need
> > > it to ensure that fix is correct.
> >
> > Unfortunately, I do not have a way of reproducing this issue.
> > So far I have tried a workload consisting of inserts (of the
> > same attribute value that is indexed), batch deletes of rows
> > and vacuum interleaved with engine crash/restarts.
>
> Issue reproduction and testing is essential for bug fix.  Remember
> last time you reported GIN bug [1], after issue reproduction it
> appears that we have more things to fix.  I's quite clear for me that
> if segment list contains GIN_SEGMENT_INSERT before GIN_SEGMENT_DELETE,
> then it might lead to wrong behavior in ginRedoRecompress().  But it's
> not yet clear to understand what code patch could lead to such segment
> list... I'll explore code more and probably will come with some idea.

Aha, I've managed to reproduce this.
1. Apply ginRedoRecompress-asserts.patch, which add assertions to
ginRedoRecompress() detecting past opaque writes.
2. Setup streaming replication.
3. Execute following on the master.
create or replace function test () returns void as $$
declare
i int;
begin
FOR i IN 1..1000 LOOP
  insert into test values ('{1}');
end loop;
end
$$ language plpgsql;
create table test (a int[]);
insert into test (select '{}'::int[] from generate_series(1,1));
insert into test (select '{1}'::int[] from generate_series(1,10));
create index test_idx on test using gin(a) with (fastupdate = off);
delete from test where a = '{}'::int[];
vacuum test;
select test();

So, since we managed to reproduce this, I'm going to test and review your fix.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 7a1e94a1d56..301c527cab9 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -276,6 +276,7 @@ ginRedoRecompress(Page page, ginxlogRecompressDataLeaf *data)
 break;
 
 			case GIN_SEGMENT_INSERT:
+Assert(segptr + newsegsize + szleft < PageGetSpecialPointer(page));
 /* make room for the new segment */
 memmove(segptr + newsegsize, segptr, szleft);
 /* copy the new segment in place */
@@ -285,6 +286,8 @@ ginRedoRecompress(Page page, ginxlogRecompressDataLeaf *data)
 break;
 
 			case GIN_SEGMENT_REPLACE:
+Assert(segptr + newsegsize + (szleft - segsize) < PageGetSpecialPointer(page));
+Assert(segptr + szleft < PageGetSpecialPointer(page));
 /* shift the segments that follow */
 memmove(segptr + newsegsize,
 		segptr + segsize,


Re: [HACKERS] Bug in to_timestamp().

2018-09-05 Thread amul sul
On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
 wrote:
>
> On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
>  wrote:
> > On Mon, Sep 3, 2018 at 2:30 PM, Alexander Korotkov 
> >  wrote:
> >>
> >> The current version of patch doesn't really distinguish spaces and
> >> delimiters in format string in non-FX mode.  So, spaces and delimiters
> >> are forming single group.  For me Oracle behavior is ridiculous at
> >> least because it doesn't allow cases when input string exactly matches
> >> format string.
> >>
> >> This one fails:
> >> SELECT to_timestamp('2018- -01 02', '- -MM DD') FROM dual
> >
> > Related to below - since this works today it should continue to work.  I 
> > was under the wrong impression that we followed their behavior today and 
> > were pondering deviating from it because of its ridiculousness.
>
> Right, that's just incompatibility with Oracle behavior, not with our
> previous behavior.
>
> >> > The couple of regression tests that change do so for the better.  It 
> >> > would be illuminating to set this up as two patches though, one 
> >> > introducing all of the new regression tests against the current code and 
> >> > then a second patch with the changed behavior with only the affected 
> >> > tests.
> >>
> >> OK, here you go.  0001-to_timestamp-regression-test-v17.patch
> >> introduces changes in regression tests and their output for current
> >> master, while 0002-to_timestamp-format-checking-v17.patch contain
> >> changes to to_timestamp itself.
> >>
> >
> > From those results the question is how important is it to force the 
> > following breakage on our users (i.e., introduce FX exact symbol matching):
> >
> > SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
> > - to_timestamp
> > ---
> > - Sun Feb 16 00:00:00 1997 PST
> > -(1 row)
> > -
> > +ERROR:  unexpected character "/", expected character ":"
> > +HINT:  In FX mode, punctuation in the input string must exactly match the 
> > format string.
> >
> > There seemed to be some implicit approvals of this breakage some 30 emails 
> > and 10 months ago but given that this is the only change from a correct 
> > result to a failure I'd like to officially put it out there for 
> > opinion/vote gathering.   Mine is a -1; though keeping the distinction 
> > between space and non-alphanumeric characters is expected.
>
> Do I understand correctly that you're -1 to changes to FX mode, but no
> objection to changes in non-FX mode?
>
Ditto.

Regards,
Amul Sul.



Re: [WIP] [B-Tree] Retail IndexTuple deletion

2018-09-05 Thread Andrey Lepikhov

Hi,

I prepared next version of Background worker (cleaner) based on a retail 
indextuple deletion patch.
This version shows stable behavior on regression tests and pgbench 
workloads.


In this version:
1. Only AccessShareLock are acquired on a cleanup of heap and index 
relations.
2. Some 'aggressive' cleanup strategy introduced - conditional cleanup 
locks not used.

3. Cleanup only an in-memory blocks.
4. The Cleaner calls heap_page_prune() before cleanup a block.

Benchmarks
-

Two factors were evaluated: performance (tps) and relations blowing.

Before each test some rarefaction of pgbench_accounts was modeled by 
deletion 10% of tuples at each block.
Also, I tested normal and Gaussian distribution of queries on 
pgbench_accounts relation.

Autovacuum uses default settings.

Script:
pgbench -i -s 10
psql -c $"DELETE FROM pgbench_accounts WHERE (random() < 0.1);"
psql -c $"VACUUM;"
psql -c $"CREATE INDEX pgbench_accounts_ext ON public.pgbench_accounts 
USING btree (abalance);" &&

pgbench -T 3600 -c 32 -j 8 -M prepared -P 600

NORMAL distribution:
average tps = 1045 (cleaner); = 1077 (autovacuum)

Relations size at the end of test, MB:
pgbench_accounts: 128 (cleaner); 128 (autovacuum)
pgbench_branches: 0.1 (cleaner); 2.1 (autovacuum)
pgbench_tellers: 0.4 (cleaner); 2.8 (autovacuum)
pgbench_accounts_pkey: 21 (cleaner); 43 (autovacuum)
pgbench_accounts_ext: 48 (cleaner); 56 (autovacuum)

Gaussian distribution:
average tps = 213 (cleaner); = 213 (autovacuum)

Relations size at the end of test, MB:
pgbench_accounts: 128 (cleaner); 128 (autovacuum)
pgbench_accounts_ext: 22 (cleaner); 29 (autovacuum)

Conclusions
---
1. For retail indextuple deletion purposes i replaced ItemIdSetDead() by 
ItemIdMarkDead() in heap_page_prune_execute() operation. Hereupon in the 
case of 100% filling of each relation block we get a blowing HEAP and 
index , more or less. When the blocks already have free space, the 
cleaner can delay blowing the heap and index without a vacuum.
2. Cleaner works fine in the case of skewness of access frequency to 
relation blocks.

3. The cleaner does not cause a decrease of performance.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From b465357b8a884cac3b503ad171976d3afd31f574 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 5 Sep 2018 10:39:11 +0300
Subject: [PATCH 5/5] Heap and Index cleaner

---
 .../postgres_fdw/expected/postgres_fdw.out|  30 ++--
 contrib/postgres_fdw/sql/postgres_fdw.sql |   4 +-
 src/backend/access/heap/pruneheap.c   |   6 +-
 src/backend/access/nbtree/nbtree.c|  15 +-
 src/backend/access/transam/xact.c |   4 +
 src/backend/catalog/system_views.sql  |  11 ++
 src/backend/commands/vacuumlazy.c |  44 +++--
 src/backend/postmaster/Makefile   |   2 +-
 src/backend/postmaster/pgstat.c   |  36 
 src/backend/postmaster/postmaster.c   | 160 +-
 src/backend/storage/buffer/bufmgr.c   |  60 ++-
 src/backend/storage/buffer/localbuf.c |  24 +++
 src/backend/storage/ipc/ipci.c|   3 +
 src/backend/storage/lmgr/lwlocknames.txt  |   1 +
 src/backend/storage/lmgr/proc.c   |   5 +-
 src/backend/tcop/postgres.c   |  12 ++
 src/backend/utils/adt/pgstatfuncs.c   |   2 +
 src/backend/utils/hash/Makefile   |   2 +-
 src/backend/utils/init/miscinit.c |   3 +-
 src/backend/utils/init/postinit.c |  11 +-
 src/include/commands/progress.h   |  12 ++
 src/include/commands/vacuum.h |   3 +
 src/include/pgstat.h  |  14 +-
 src/include/storage/buf_internals.h   |   1 +
 src/include/storage/bufmgr.h  |   5 +-
 src/include/storage/pmsignal.h|   2 +
 src/test/regress/expected/rules.out   |  18 +-
 src/test/regress/expected/triggers.out|   2 +-
 src/test/regress/input/constraints.source |  12 +-
 src/test/regress/output/constraints.source|  34 ++--
 src/test/regress/sql/rules.sql|   2 +-
 src/test/regress/sql/triggers.sql |   2 +-
 32 files changed, 458 insertions(+), 84 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f5498c62bd..622bea2a7d 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -5496,19 +5496,25 @@ ALTER SERVER loopback OPTIONS (DROP extensions);
 INSERT INTO ft2 (c1,c2,c3)
   SELECT id, id % 10, to_char(id, 'FM0') FROM generate_series(2001, 2010) id;
 EXPLAIN (verbose, costs off)
-UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;-- can't be pushed down
-QUERY PLAN
-

Re: [HACKERS] Bug in to_timestamp().

2018-09-05 Thread Alexander Korotkov
On Wed, Sep 5, 2018 at 3:10 PM amul sul  wrote:
> On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
>  wrote:
> > On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
> >  wrote:
> > > From those results the question is how important is it to force the 
> > > following breakage on our users (i.e., introduce FX exact symbol 
> > > matching):
> > >
> > > SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
> > > - to_timestamp
> > > ---
> > > - Sun Feb 16 00:00:00 1997 PST
> > > -(1 row)
> > > -
> > > +ERROR:  unexpected character "/", expected character ":"
> > > +HINT:  In FX mode, punctuation in the input string must exactly match 
> > > the format string.
> > >
> > > There seemed to be some implicit approvals of this breakage some 30 
> > > emails and 10 months ago but given that this is the only change from a 
> > > correct result to a failure I'd like to officially put it out there for 
> > > opinion/vote gathering.   Mine is a -1; though keeping the distinction 
> > > between space and non-alphanumeric characters is expected.
> >
> > Do I understand correctly that you're -1 to changes to FX mode, but no
> > objection to changes in non-FX mode?
> >
> Ditto.

So, if no objections for non-FX mode changes, then I'll extract that
part and commit it separately.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: pointless check in RelationBuildPartitionDesc

2018-09-05 Thread Alvaro Herrera
On 2018-Sep-05, Amit Langote wrote:

> On 2018/09/05 1:50, Alvaro Herrera wrote:
> > Proposed patch.  Checking isnull in a elog(ERROR) is important, because
> > the column is not marked NOT NULL.  This is not true for other columns
> > where we simply do Assert(!isnull).
> 
> Looks good.  Thanks for taking care of other sites as well.
> 
> @@ -14705,7 +14705,9 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
> 
>   (void) SysCacheGetAttr(RELOID, tuple, Anum_pg_class_relpartbound,
>  &isnull);
> - Assert(!isnull);
> + if (isnull)
> + elog(ERROR, "null relpartbound for relation %u",
> +  RelationGetRelid(partRel));
> 
> In retrospect, I'm not sure why this piece of code is here at all; maybe
> just remove the SycCacheGetAttr and Assert?

Yeah, good idea, will do.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-05 Thread Tom Lane
Peter Eisentraut  writes:
> On 05/09/2018 02:51, Andres Freund wrote:
>> My current proposal is thus to do add a check that does
>> #if defined(__clang__) && defined(__i386__) && !defined(__SSE2_MATH__)
>> something-that-fails
>> #endif
>> in an autoconf test, and have configure complain if that fails.

> Would this not be invalidated if the bug you have filed gets fixed?

Perhaps, but how would autoconf tell that?  I wouldn't have any confidence
in a runtime test, as it might or might not trigger the bug depending on
all sorts of factors like -O level.

In any case, it seems like "use -msse2 on x86" is good advice from a
performance standpoint even if it doesn't prevent a compiler bug.

One thought is that maybe we should provide a way to override this,
in case somebody really can't or doesn't want to use -msse2, and
is willing to put up with platform-dependent float behavior.

regards, tom lane



Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

2018-09-05 Thread Alexander Korotkov
On Wed, Sep 5, 2018 at 2:39 PM Alexander Korotkov
 wrote:
> On Wed, Sep 5, 2018 at 12:26 PM Alexander Korotkov
>  wrote:
> > On Wed, Sep 5, 2018 at 1:45 AM R, Siva  wrote:
> > > On Tue, Sep 4, 2018 at 09:16 PM, Alexander Korotkov 
> > >  wrote:
> > > > Do you have a test scenario for reproduction of this issue?  We need
> > > > it to ensure that fix is correct.
> > >
> > > Unfortunately, I do not have a way of reproducing this issue.
> > > So far I have tried a workload consisting of inserts (of the
> > > same attribute value that is indexed), batch deletes of rows
> > > and vacuum interleaved with engine crash/restarts.
> >
> > Issue reproduction and testing is essential for bug fix.  Remember
> > last time you reported GIN bug [1], after issue reproduction it
> > appears that we have more things to fix.  I's quite clear for me that
> > if segment list contains GIN_SEGMENT_INSERT before GIN_SEGMENT_DELETE,
> > then it might lead to wrong behavior in ginRedoRecompress().  But it's
> > not yet clear to understand what code patch could lead to such segment
> > list... I'll explore code more and probably will come with some idea.
>
> Aha, I've managed to reproduce this.
> 1. Apply ginRedoRecompress-asserts.patch, which add assertions to
> ginRedoRecompress() detecting past opaque writes.

It was wrong, sorry.  It appears that I put strict inequality into
asserts, while there should be loose inequality.  Correct version of
patch is attached.  And scenario I've posted isn't really reproducing
the bug...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


ginRedoRecompress-asserts-v2.patch
Description: Binary data


Re: pgsql: Clean up after TAP tests in oid2name and vacuumlo.

2018-09-05 Thread Alvaro Herrera
On 2018-Sep-04, Michael Paquier wrote:

> OK.  I have dug into that, and finished with the attached.  What do you
> think?  One thing is that the definition of submake is moving out of
> REGRESS, and .PHONY gets defined.

Should this be used in src/test/modules/{brin,test_commit_ts} also?

Why do you guys design Makefile rules in pgsql-committers?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Collation versioning

2018-09-05 Thread Thomas Munro
On Wed, Sep 5, 2018 at 3:35 AM Christoph Berg  wrote:
> Re: Thomas Munro 2018-09-04 
> 
> > I was reminded about that by recent news
> > about an upcoming glibc/CLDR resync that is likely to affect
> > PostgreSQL users (though, I guess, probably only when they do a major
> > OS upgrade).
>
> Or replicating/restoring a database to a newer host.

Yeah.

> > Does anyone know
> > of a way to extract a version string from glibc using existing
> > interfaces?  I heard there was an undocumented way but I haven't been
> > able to find it -- probably because I was, erm, looking in the
> > documentation.
>
> That sounds more robust. Googling around:
>
> https://www.linuxquestions.org/questions/linux-software-2/how-to-check-glibc-version-263103/
>
> #include 
> #include 
> int main (void) { puts (gnu_get_libc_version ()); return 0; }
>
> $ ./a.out
> 2.27
>
> Hopefully that version info is fine-grained enough.

Hmm.  I was looking for locale data version, not libc.so itself.  I
realise they come ultimately from the same source package, but are the
locale definitions and libc6 guaranteed to be updated at the same
time?  I see that the locales package depends on libc-bin >> 2.27 (no
upper bound), which in turn depends on libc6 >> 2.27, << 2.28.  So
perhaps you can have a system with locales 2.27 and libc6 2.28?  I
also wonder if there are some configurations where they could get out
of sync because of manual use of locale-gen or something like that.
Clearly most systems would have them in sync through apt-get upgrade
though, so maybe gnu_get_libc_version() would work well in practice?

-- 
Thomas Munro
http://www.enterprisedb.com



Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-05 Thread Chris Travers
Hi all;

For the last few months we have been facing a funny problem on a slave
where queries go to 100% cpu usage and never finish, causing the recovery
process to hang and the replica to fall behind,  Over time we ruled out a
lot of causes and were banging our heads against this one.  Today we got a
break in it when we attached a debugger to various processes even without
debugging symbols.  Not only did we get useful stack traces from the hung
query but attaching a debugger to the startup process caused the query to
finish.  This has shown up in 10.2 and 10.5.

Based on the stack traces we have concluded the following seems to happen:

1.  The query is in a parallel index scan or similar
2.  A process is executing a parallel plan and allocating a significant
chunk of memory (2MB for example) in dynamic shared memory.
3.  The startup process goes into a loop where it sends a sigusr1, sleeps
5m, and sends another sigusr1 etc.
4.  The sigusr1 aborts the system call, which is then retried.
5.  Because the system call takes more than 5ms, we end up in an endless
loop

I see one of two possible solutions here.
1.  Exponential backoff in sending signals to maybe 1s max.
2.  If there is any way to check for signals before retrying the system
call (which I am not 100% sure where it is yet but on my way).

Any feedback or thoughts before we look at implementing a patch?
-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-05 Thread Thomas Munro
On Wed, Sep 5, 2018 at 8:23 AM Chris Travers  wrote:
> 1.  The query is in a parallel index scan or similar
> 2.  A process is executing a parallel plan and allocating a significant chunk 
> of memory (2MB for example) in dynamic shared memory.
> 3.  The startup process goes into a loop where it sends a sigusr1, sleeps 5m, 
> and sends another sigusr1 etc.
> 4.  The sigusr1 aborts the system call, which is then retried.
> 5.  Because the system call takes more than 5ms, we end up in an endless loop

Do you mean this loop in dsm_impl_posix_resize() is getting
interrupted constantly and never completing?

/* We may get interrupted, if so just retry. */
do
{
rc = posix_fallocate(fd, 0, size);
} while (rc == EINTR);

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Out arguments name of "pg_identify_object_as_address" function in 9.5.14 and 11beta3

2018-09-05 Thread Tom Lane
Alvaro Herrera  writes:
> On 2018-Sep-03, Tom Lane wrote:
>> I do not think we can change the names of the output arguments;
>> it'd break existing queries.  However, renaming input arguments
>> shouldn't affect anything.  So I propose we make pg_get_object_address'
>> input arguments be named type,object_names,object_args for
>> consistency with the other function, and update the docs to match.

> Hmm, I don't think it's possible to rename input args without breaking
> working code either:

Yeah, Andrew noted the same ...

> That said, I haven't heard of anyone using these functions in code yet,
> so if we change it in 11 or 12 nobody is going to complain.

... and that's pretty much my feeling.  It seems really unlikely that
anyone's using named-argument notation for pg_get_object_address, and
even if they are, it wouldn't be very painful to change, or just not
use the notation if they need cross-branch compatibility.  I think it's
more useful in the long run to make the names consistent.

Will go take care of it.

regards, tom lane



Re: Out arguments name of "pg_identify_object_as_address" function in 9.5.14 and 11beta3

2018-09-05 Thread Alvaro Herrera
On 2018-Sep-05, Tom Lane wrote:

> Alvaro Herrera  writes:

> > That said, I haven't heard of anyone using these functions in code yet,
> > so if we change it in 11 or 12 nobody is going to complain.
> 
> ... and that's pretty much my feeling.  It seems really unlikely that
> anyone's using named-argument notation for pg_get_object_address, and
> even if they are, it wouldn't be very painful to change, or just not
> use the notation if they need cross-branch compatibility.  I think it's
> more useful in the long run to make the names consistent.
> 
> Will go take care of it.

Agreed, thanks.  If you haven't touched the docs yet, here's the change
-- 9.5/9.6/10 need a slight adjustment from 11/master.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index bb794e044f..853e5fee7c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17701,7 +17701,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
   
   

pg_identify_object_as_address(catalog_id
 oid, object_id oid, 
object_sub_id 
integer)
-   type text, 
name text[], args 
text[]
+   type text, 
object_names text[], 
object_args text[]
get external representation of a database object's 
address
   
   
@@ -17745,7 +17745,8 @@ SELECT collation for ('foo' COLLATE "de_DE");
information is independent of the current server, that is, it could be used
to identify an identically named object in another server.
type identifies the type of database object;
-   name and args are text arrays 
that together
+   object_names and object_args
+   are text arrays that together
form a reference to the object.  These three columns can be passed to
pg_get_object_address to obtain the internal address
of the object.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5e95325250..7f903fa2d1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -17016,7 +17016,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
   
   

pg_identify_object_as_address(catalog_id
 oid, object_id oid, 
object_sub_id 
integer)
-   type text, name 
text[], args text[]
+   type text, object_names 
text[], object_args text[]
get external representation of a database object's 
address
   
   
@@ -17060,7 +17060,7 @@ SELECT collation for ('foo' COLLATE "de_DE");
information is independent of the current server, that is, it could be used
to identify an identically named object in another server.
type identifies the type of database object;
-   name and args are text arrays that together
+   object_names and object_args are text arrays 
that together
form a reference to the object.  These three columns can be passed to
pg_get_object_address to obtain the internal address
of the object.


Re: Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace

2018-09-05 Thread Andres Freund
Hi,

On 2018-09-05 01:05:54 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On September 4, 2018 9:11:25 PM PDT, Tom Lane  wrote:
> >> I think that line of thought leads to an enormous increase in locking
> >> overhead, for which we'd get little if any gain in usability.  So my
> >> inclination is to make an engineering judgment that we won't fix this.
> 
> > Haven't we already significantly started down this road, to avoid a lot of 
> > the "tuple concurrently updated" type errors?
> 
> Not that I'm aware of.  We do not take locks on schemas, nor functions,
> nor any other of the object types I mentioned.

Well, we kinda do, during some of their own DDL. CF
AcquireDeletionLock(), RangeVarGetAndCheckCreationNamespace(), and other
LockDatabaseObject() callers.  The
RangeVarGetAndCheckCreationNamespace() even locks the schema an object
is created in , which is pretty much what we're discussing here.

I thinkt he problem with the current logic is more that the
findDependentObjects() scan doesn't use a "dirty" scan, so it doesn't
ever get to seeing conflicting operations.


> > Would expanding this a git further really be that noticeable?
> 
> Frankly, I think it would be not so much "noticeable" as "disastrous".
> 
> Making the overhead tolerable would require very large compromises
> in coverage, perhaps like "we'll only lock during DDL not DML".
> At which point I'd question why bother.  We've seen no field reports
> (that I can recall offhand, anyway) that trace to not locking these
> objects.

Why would "we'll only lock during DDL not DML" be such a large
compromise? To me that's a pretty darn reasonable subset - preventing
corruption of the catalog contents is, uh, good?

Greetings,

Andres Freund



Re: PostgreSQL does not start when max_files_per_process> 1200 on Windows 7

2018-09-05 Thread Andres Freund
Hi,

On 2018-09-05 13:06:06 +0300, Victor Spirin wrote:
> I have a bug in Windows 7 with max_files_per_process> 1200.
> 
> Using dup (0) in the  function count_usable_fds more than 1200 times (0 =
> stdin) causes further unpredictable errors with file operations.

Could you expand a bit on this?

Greetings,

Andres Freund



Re: PL/Python: Remove use of simple slicing API

2018-09-05 Thread Peter Eisentraut
On 29/08/2018 11:37, Peter Eisentraut wrote:
> I have found some dying code in PL/Python.
> 
> The simple slicing API (sq_slice, sq_ass_slice) has been deprecated
> since Python 2.0 and has been removed altogether in Python 3, so we can
> remove those functions from the PLyResult class.  Instead, the non-slice
> mapping functions mp_subscript and mp_ass_subscript can take slice
> objects as an index.  Since we just pass the index through to the
> underlying list object, we already support that.  Test coverage was
> already in place.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Bug in to_timestamp().

2018-09-05 Thread amul sul
On Wed, Sep 5, 2018, 6:35 PM Alexander Korotkov 
wrote:

> On Wed, Sep 5, 2018 at 3:10 PM amul sul  wrote:
> > On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
> >  wrote:
> > > On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
> > >  wrote:
> > > > From those results the question is how important is it to force the
> following breakage on our users (i.e., introduce FX exact symbol matching):
> > > >
> > > > SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
> > > > - to_timestamp
> > > > ---
> > > > - Sun Feb 16 00:00:00 1997 PST
> > > > -(1 row)
> > > > -
> > > > +ERROR:  unexpected character "/", expected character ":"
> > > > +HINT:  In FX mode, punctuation in the input string must exactly
> match the format string.
> > > >
> > > > There seemed to be some implicit approvals of this breakage some 30
> emails and 10 months ago but given that this is the only change from a
> correct result to a failure I'd like to officially put it out there for
> opinion/vote gathering.   Mine is a -1; though keeping the distinction
> between space and non-alphanumeric characters is expected.
> > >
> > > Do I understand correctly that you're -1 to changes to FX mode, but no
> > > objection to changes in non-FX mode?
> > >
> > Ditto.
>
> So, if no objections for non-FX mode changes, then I'll extract that
> part and commit it separately.
>

Yeah, that make sense to me, thank you.

Regards,
Amul

>


Re: Collation versioning

2018-09-05 Thread Thomas Munro
On Wed, Sep 5, 2018 at 8:20 AM Thomas Munro
 wrote:
> On Wed, Sep 5, 2018 at 3:35 AM Christoph Berg  wrote:
> > int main (void) { puts (gnu_get_libc_version ()); return 0; }
> >
> > $ ./a.out
> > 2.27
>
> Hmm.  I was looking for locale data version, not libc.so itself.  I
> realise they come ultimately from the same source package, but are the
> locale definitions and libc6 guaranteed to be updated at the same
> time?

And even if they are, what if your cluster is still running and still
has the older libc.so.6 mapped in?  Newly forked backends will see new
locale data but gnu_get_libc_version() will return the old string.
(Pointed out off-list by Andres.)  Eventually you restart your cluster
and start seeing the error.

So, it's not ideal but perhaps worth considering on the grounds that
it's better than nothing?

--
Thomas Munro
http://www.enterprisedb.com



Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-05 Thread Chris Travers
Yep,  Maybe we should check for signals there.

On Wed, Sep 5, 2018 at 5:27 PM Thomas Munro 
wrote:

> On Wed, Sep 5, 2018 at 8:23 AM Chris Travers 
> wrote:
> > 1.  The query is in a parallel index scan or similar
> > 2.  A process is executing a parallel plan and allocating a significant
> chunk of memory (2MB for example) in dynamic shared memory.
> > 3.  The startup process goes into a loop where it sends a sigusr1,
> sleeps 5m, and sends another sigusr1 etc.
> > 4.  The sigusr1 aborts the system call, which is then retried.
> > 5.  Because the system call takes more than 5ms, we end up in an endless
> loop
>
> Do you mean this loop in dsm_impl_posix_resize() is getting
> interrupted constantly and never completing?
>
> /* We may get interrupted, if so just retry. */
> do
> {
> rc = posix_fallocate(fd, 0, size);
> } while (rc == EINTR);
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-05 Thread Andres Freund
Hi,

On 2018-09-05 10:05:26 -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > On 05/09/2018 02:51, Andres Freund wrote:
> >> My current proposal is thus to do add a check that does
> >> #if defined(__clang__) && defined(__i386__) && !defined(__SSE2_MATH__)
> >> something-that-fails
> >> #endif
> >> in an autoconf test, and have configure complain if that fails.
> 
> > Would this not be invalidated if the bug you have filed gets fixed?

Realistically we're going to be running into old versions of clang for a
long time.  And the importance of running i386 without SSE2 surely isn't
increasing.  So I don't really see an urgent need to do anything about
it. And if it gets fixed, and we care, we can just add a clang version
check to the test.


> Perhaps, but how would autoconf tell that?  I wouldn't have any confidence
> in a runtime test, as it might or might not trigger the bug depending on
> all sorts of factors like -O level.
> 
> In any case, it seems like "use -msse2 on x86" is good advice from a
> performance standpoint even if it doesn't prevent a compiler bug.

Indeed.


> One thought is that maybe we should provide a way to override this,
> in case somebody really can't or doesn't want to use -msse2, and
> is willing to put up with platform-dependent float behavior.

IDK, people that are capable of making that decision can just hack
configure.

Greetings,

Andres Freund



Re: [HACKERS] Proposal to add work_mem option to postgres_fdw module

2018-09-05 Thread Peter Eisentraut
On 01/09/2018 06:33, Shinoda, Noriyoshi (PN Japan GCS Delivery) wrote:
> Certainly the PQconndefaults function specifies Debug flag for the "options" 
> option.
> I think that eliminating the Debug flag is the simplest solution. 
> For attached patches, GUC can be specified with the following syntax.
> 
> CREATE SERVER remsvr1 FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 'host 
> 1', ..., options '-c work_mem=64MB -c geqo=off');
> ALTER SERVER remsv11 OPTIONS (SET options '-c work_mem=64MB -c geqo=off');
> 
> However, I am afraid of the effect that this patch will change the behavior 
> of official API PQconndefaults.

It doesn't change the behavior of the API, it just changes the result of
the API function, which is legitimate and the reason we have the API
function in the first place.

I think this patch is fine.  I'll work on committing it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-05 Thread Chris Travers
Will submit a patch here shortly.  Thanks!  Should we do for master and
10?  Or 9.6 too?

On Wed, Sep 5, 2018 at 6:40 PM Chris Travers 
wrote:

> Yep,  Maybe we should check for signals there.
>
> On Wed, Sep 5, 2018 at 5:27 PM Thomas Munro 
> wrote:
>
>> On Wed, Sep 5, 2018 at 8:23 AM Chris Travers 
>> wrote:
>> > 1.  The query is in a parallel index scan or similar
>> > 2.  A process is executing a parallel plan and allocating a significant
>> chunk of memory (2MB for example) in dynamic shared memory.
>> > 3.  The startup process goes into a loop where it sends a sigusr1,
>> sleeps 5m, and sends another sigusr1 etc.
>> > 4.  The sigusr1 aborts the system call, which is then retried.
>> > 5.  Because the system call takes more than 5ms, we end up in an
>> endless loop
>>
>> Do you mean this loop in dsm_impl_posix_resize() is getting
>> interrupted constantly and never completing?
>>
>> /* We may get interrupted, if so just retry. */
>> do
>> {
>> rc = posix_fallocate(fd, 0, size);
>> } while (rc == EINTR);
>>
>> --
>> Thomas Munro
>> http://www.enterprisedb.com
>>
>
>
> --
> Best Regards,
> Chris Travers
> Head of Database
>
> Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
> Saarbrücker Straße 37a, 10405 Berlin
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-05 Thread Thomas Munro
On Wed, Sep 5, 2018 at 9:49 AM Chris Travers  wrote:
>
> Will submit a patch here shortly.  Thanks!  Should we do for master and 10?  
> Or 9.6 too?
>
> On Wed, Sep 5, 2018 at 6:40 PM Chris Travers  wrote:
>>
>> Yep,  Maybe we should check for signals there.

Yeah, it seems reasonable to check for interrupts here.  It would need
to be back-patched as far as 9.4.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-05 Thread Peter Eisentraut
On 05/09/2018 18:42, Andres Freund wrote:
> Realistically we're going to be running into old versions of clang for a
> long time.  And the importance of running i386 without SSE2 surely isn't
> increasing.  So I don't really see an urgent need to do anything about
> it. And if it gets fixed, and we care, we can just add a clang version
> check to the test.

Another option perhaps is to let this be and accept it as alternative
floating point behavior.  We already have some of those.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-05 Thread Andres Freund
Hi,

On 2018-09-05 18:48:44 +0200, Chris Travers wrote:
> Will submit a patch here shortly.  Thanks!  Should we do for master and
> 10?  Or 9.6 too?

Please don't top-post on this list.  This needs to be done in all
branches where the posix_fallocate call is present.

> > Yep,  Maybe we should check for signals there.
> >
> > On Wed, Sep 5, 2018 at 5:27 PM Thomas Munro 
> > wrote:
> >
> >> On Wed, Sep 5, 2018 at 8:23 AM Chris Travers 
> >> wrote:
> >> > 1.  The query is in a parallel index scan or similar
> >> > 2.  A process is executing a parallel plan and allocating a significant
> >> chunk of memory (2MB for example) in dynamic shared memory.
> >> > 3.  The startup process goes into a loop where it sends a sigusr1,
> >> sleeps 5m, and sends another sigusr1 etc.
> >> > 4.  The sigusr1 aborts the system call, which is then retried.
> >> > 5.  Because the system call takes more than 5ms, we end up in an
> >> endless loop

What you're presumably encountering here is a recovery conflict.


> On Wed, Sep 5, 2018 at 6:40 PM Chris Travers 
> wrote:
> >> Do you mean this loop in dsm_impl_posix_resize() is getting
> >> interrupted constantly and never completing?
> >>
> >> /* We may get interrupted, if so just retry. */
> >> do
> >> {
> >> rc = posix_fallocate(fd, 0, size);
> >> } while (rc == EINTR);
> >>

Probably worthwile to check that the dsm code is properly robust if
errors are thrown from within here.


Greetings,

Andres Freund



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-05 Thread Andres Freund
On 2018-09-05 18:55:34 +0200, Peter Eisentraut wrote:
> On 05/09/2018 18:42, Andres Freund wrote:
> > Realistically we're going to be running into old versions of clang for a
> > long time.  And the importance of running i386 without SSE2 surely isn't
> > increasing.  So I don't really see an urgent need to do anything about
> > it. And if it gets fixed, and we care, we can just add a clang version
> > check to the test.
> 
> Another option perhaps is to let this be and accept it as alternative
> floating point behavior.  We already have some of those.

-many.  We'd directly violate our own error rules.  I'm actually
personally in favor of not throwing error when float overflows - it's
imo not actually useful and costs performance - but sometimes throwing
an error depending on the specific register allocator behaviour of a
specific version of a compiler is bad.  It's really weird to return
[+-]Infinity depending on just *how much* you overflowed.

Greetings,

Andres Freund



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-05 Thread Tom Lane
Andres Freund  writes:
> On 2018-09-05 18:55:34 +0200, Peter Eisentraut wrote:
>> Another option perhaps is to let this be and accept it as alternative
>> floating point behavior.  We already have some of those.

> -many.  We'd directly violate our own error rules.

I think that just from a regression-test-maintenance standpoint, this
approach would be a mess.  We could expect for instance that such
problems would come and go in weird places in the geometric tests.

regards, tom lane



Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-05 Thread Chris Travers
On Wed, Sep 5, 2018 at 6:55 PM Andres Freund  wrote:

> Hi,
>
> On 2018-09-05 18:48:44 +0200, Chris Travers wrote:
> > Will submit a patch here shortly.  Thanks!  Should we do for master and
> > 10?  Or 9.6 too?
>
> Please don't top-post on this list.  This needs to be done in all
> branches where the posix_fallocate call is present.
>
> > > Yep,  Maybe we should check for signals there.
> > >
> > > On Wed, Sep 5, 2018 at 5:27 PM Thomas Munro <
> thomas.mu...@enterprisedb.com>
> > > wrote:
> > >
> > >> On Wed, Sep 5, 2018 at 8:23 AM Chris Travers <
> chris.trav...@adjust.com>
> > >> wrote:
> > >> > 1.  The query is in a parallel index scan or similar
> > >> > 2.  A process is executing a parallel plan and allocating a
> significant
> > >> chunk of memory (2MB for example) in dynamic shared memory.
> > >> > 3.  The startup process goes into a loop where it sends a sigusr1,
> > >> sleeps 5m, and sends another sigusr1 etc.
> > >> > 4.  The sigusr1 aborts the system call, which is then retried.
> > >> > 5.  Because the system call takes more than 5ms, we end up in an
> > >> endless loop
>
> What you're presumably encountering here is a recovery conflict.
>

Agreed but the question is how to correct what is a fairly interesting race
condition.

>
>
> > On Wed, Sep 5, 2018 at 6:40 PM Chris Travers 
> > wrote:
> > >> Do you mean this loop in dsm_impl_posix_resize() is getting
> > >> interrupted constantly and never completing?
> > >>
> > >> /* We may get interrupted, if so just retry. */
> > >> do
> > >> {
> > >> rc = posix_fallocate(fd, 0, size);
> > >> } while (rc == EINTR);
> > >>
>
> Probably worthwile to check that the dsm code is properly robust if
> errors are thrown from within here.
>

Will check that too.  Thanks!

>
>
> Greetings,
>
> Andres Freund
>


-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

2018-09-05 Thread Masahiko Sawada
On Wed, Sep 5, 2018 at 4:59 AM, R, Siva  wrote:
> Hi,
>
> We recently encountered an issue where the opaque data flags on a gin data
> leaf page was corrupted while replaying a gin insert WAL record. Upon
> further examination of the redo code, we found a bug in ginRedoRecompress
> code, which extracts the WAL information and updates the page.
>
> Specifically, when a new segment is inserted in the middle of a page, a
> memmove operation is performed [1] at the current point in the page to make
> room for the new segment. If this segment insertion is followed by delete
> segment actions that are yet to be processed and the total data size is very
> close to GinDataPageMaxDataSize, then we may move the data portion beyond
> the boundary causing the opaque data to be corrupted.
>
> One way of solving this problem is to perform the replay work on a scratch
> space, perform sanity check on the total size of the data portion before
> copying it back to the actual page. While it involves additional memory
> allocation and memcpy operations, it is safer and similar to the 'do' code
> path where we ensure to make a copy of all segment past the first modified
> segment before placing them back on the page [2].
>

Hmm, could you share the sequence of what kind of WAL has applied to
the broken page? I suspect the segment list contains
GIN_SEGMENT_REPLACE before GIN_SEGMENT_INSERT.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: pgsql: Clean up after TAP tests in oid2name and vacuumlo.

2018-09-05 Thread Michael Paquier
On Wed, Sep 05, 2018 at 11:39:50AM -0300, Alvaro Herrera wrote:
> Should this be used in src/test/modules/{brin,test_commit_ts} also?

Hmm, I have not thought those two ones.  commit_ts relies on REGRESS to
be defined so as it does its cleanup.  brin is more interesting, it
directly quotes that it needs to tweak its Makefile to do the cleanup,
and it uses isolation tests.  Wouldn't it be more adapted to add a
second extra switch for isolation tests similar to REGRESS?  We could
call it ISOLATION, and it would list the set of isolation tests you
could run from an extension.  That would be useful for a lot of folks.

Thoughts?  It would be better to start a new thread rather than posting
a new patch, but I'd rather take the temperature first.

> Why do you guys design Makefile rules in pgsql-committers?

That was a bad idea, and a reaction to what Tom has committed for the
cleanup of the new TAP tests I have added previously.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Clean up after TAP tests in oid2name and vacuumlo.

2018-09-05 Thread Tom Lane
Michael Paquier  writes:
> On Wed, Sep 05, 2018 at 11:39:50AM -0300, Alvaro Herrera wrote:
>> Should this be used in src/test/modules/{brin,test_commit_ts} also?

> Hmm, I have not thought those two ones.  commit_ts relies on REGRESS to
> be defined so as it does its cleanup.  brin is more interesting, it
> directly quotes that it needs to tweak its Makefile to do the cleanup,
> and it uses isolation tests.  Wouldn't it be more adapted to add a
> second extra switch for isolation tests similar to REGRESS?  We could
> call it ISOLATION, and it would list the set of isolation tests you
> could run from an extension.  That would be useful for a lot of folks.

> Thoughts?  It would be better to start a new thread rather than posting
> a new patch, but I'd rather take the temperature first.

Yeah, if we're going to introduce infrastructure for TAP tests, it'd
make sense to do so for isolation tests as well.

regards, tom lane



Re: Funny hang on PostgreSQL 10 during parallel index scan on slave

2018-09-05 Thread Thomas Munro
On Wed, Sep 5, 2018 at 10:13 AM Chris Travers  wrote:
> On Wed, Sep 5, 2018 at 6:55 PM Andres Freund  wrote:
>> > On Wed, Sep 5, 2018 at 6:40 PM Chris Travers 
>> > wrote:
>> > >> Do you mean this loop in dsm_impl_posix_resize() is getting
>> > >> interrupted constantly and never completing?
>> > >>
>> > >> /* We may get interrupted, if so just retry. */
>> > >> do
>> > >> {
>> > >> rc = posix_fallocate(fd, 0, size);
>> > >> } while (rc == EINTR);
>> > >>
>>
>> Probably worthwile to check that the dsm code is properly robust if
>> errors are thrown from within here.

Yeah, currently dsm_impl_posix_resize() returns and lets
dsm_impl_posix() clean up (close(), shm_unlink()) before raising
errors.  We can't just let CHECK_FOR_INTERRUPTS() take a non-local
exit.  Some refactoring involving PG_TRY()/PG_CATCH() may be the
simplest way forward.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-05 Thread Tom Lane
Andres Freund  writes:
> On 2018-09-05 10:05:26 -0400, Tom Lane wrote:
>> One thought is that maybe we should provide a way to override this,
>> in case somebody really can't or doesn't want to use -msse2, and
>> is willing to put up with platform-dependent float behavior.

> IDK, people that are capable of making that decision can just hack
> configure.

"Just hacking configure" is a pretty high bar for most folk.

If you wanted to argue that the set of people who still want to run PG
on pre-SSE2 hardware is the empty set, that'd be an easier sell really.
But what I'm really concerned about here, given that we're apparently
talking about an ABI change, is that someone might want to run on a
platform where the libraries insist on the x87 way.  I'm actually a bit
surprised to hear that you can just randomly add -msse2 on BSDen.  Do
they ship separate copies of libc et al to make that work?

regards, tom lane



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-05 Thread Andres Freund
On 2018-09-05 14:08:11 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-09-05 10:05:26 -0400, Tom Lane wrote:
> >> One thought is that maybe we should provide a way to override this,
> >> in case somebody really can't or doesn't want to use -msse2, and
> >> is willing to put up with platform-dependent float behavior.
> 
> > IDK, people that are capable of making that decision can just hack
> > configure.
> 
> "Just hacking configure" is a pretty high bar for most folk.

Sure. But being able to accurately judge whether you're ok that math
behaviour doesn't work as documented seems like a high bar too.  And
this'd only be relevant if they insist on using clang rather than gcc.


> If you wanted to argue that the set of people who still want to run PG
> on pre-SSE2 hardware is the empty set, that'd be an easier sell
> really.

I think it's an empty set, yea, especially with clang.


> But what I'm really concerned about here, given that we're apparently
> talking about an ABI change, is that someone might want to run on a
> platform where the libraries insist on the x87 way.
> I'm actually a bit surprised to hear that you can just randomly add
> -msse2 on BSDen.  Do they ship separate copies of libc et al to make
> that work?

I don't think we're talking about an ABI change - with -msse2 the
calling conventions are unchanged, the stuff just gets moved out of the
x87 registers into SSE ones / xmm for the actual math. Which in turn
solves our "80bit register" problem.  There really shouldn't be any need
for libc6 itself to care.

Greetings,

Andres Freund



Re: pgsql: Clean up after TAP tests in oid2name and vacuumlo.

2018-09-05 Thread Alvaro Herrera
On 2018-Sep-05, Michael Paquier wrote:

> On Wed, Sep 05, 2018 at 11:39:50AM -0300, Alvaro Herrera wrote:
> > Should this be used in src/test/modules/{brin,test_commit_ts} also?
> 
> Hmm, I have not thought those two ones.  commit_ts relies on REGRESS to
> be defined so as it does its cleanup.  brin is more interesting, it
> directly quotes that it needs to tweak its Makefile to do the cleanup,
> and it uses isolation tests.  Wouldn't it be more adapted to add a
> second extra switch for isolation tests similar to REGRESS?  We could
> call it ISOLATION, and it would list the set of isolation tests you
> could run from an extension.  That would be useful for a lot of folks.
> 
> Thoughts?

Yeah, +1 for that.

> It would be better to start a new thread rather than posting
> a new patch, but I'd rather take the temperature first.

Slightly feverish, it appears.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pg_verify_checksums failure with hash indexes

2018-09-05 Thread Michael Paquier
On Wed, Sep 05, 2018 at 12:16:00AM -0400, Tom Lane wrote:
> Amit Kapila  writes:
>> Does anybody else have any idea on how can we write a test for
>> non-default block size or if we already have anything similar?
> 
> Build with a non-default BLCKSZ and see if the regression tests pass.
> There's no way that a build with BLCKSZ x can run any tests for
> BLCKSZ y.

Or we could implement block-level configuration at initdb time?  That's
what Andres has done for WAL segment size recently.

/me hides and runs fast

> Note that you can expect some plan variations from a different BLCKSZ,
> so there'd be at least a few "failures" in the regression tests, which'd
> require manual inspection. Otherwise this could be delegated to a
> buildfarm animal using a nonstandard BLCKSZ. 

Last time I did that I saw only plan diffs, which was a couple of weeks
ago.
--
Michael


signature.asc
Description: PGP signature


Re: JIT compiling with LLVM v12

2018-09-05 Thread Andres Freund
Hi,

On 2018-08-25 21:34:22 -0400, Robert Haas wrote:
> On Wed, Aug 22, 2018 at 6:43 PM, Andres Freund  wrote:
> > Now you can say that'd be solved by bumping the cost up, sure. But
> > obviously the row / cost model is pretty much out of whack here, I don't
> > see how we can make reasonable decisions in a trivial query that has a
> > misestimation by five orders of magnitude.
> 
> Before JIT, it didn't matter whether the costing was wrong, provided
> that the path with the lowest cost was the cheapest path (or at least
> close enough to the cheapest path not to bother anyone).

I don't thinkt that's really true. Due to the cost fuzzing absurdly high
cost very commonly lead to the actually different planning choices to
not have a large enough influence to matter.


> I'd guess that, as you read this, you're thinking, well, but if I'd
> added JIT and non-JIT paths for every option, it would have doubled
> the number of paths, and that would have slowed the planner down way
> too much.  That's certainly true, but my point is just that the
> problem is probably not as simple as "the defaults are too low".  I
> think the problem is more fundamentally that the model you've chosen
> is kinda broken.

Right. And that's why I repeatedly brought up this part in
discussions...  I still think it's a reasonable compromise, but it
certainly has costs.

I'm also doubtful that just adding a separate path for JIT (with a
significantly smaller cpu_*_cost or such) would really have helped in
the cases with borked estimations - we'd *still* end up choosing JITing
if the loop count is absurd, just because the cost is high.

There *are* cases where it helps - if all the cost is incurred, say, due
to random page fetches, then JITing isn't going to help that much.


> I'm not saying I know how you could have done any better, but I do
> think we're going to have to try to figure out something to do about
> it, because saying, "check-pg_upgrade is 4x slower, but that's just
> because of all those bad estimates" is not going to fly.

That I'm unconvinced by however. This was on some quite slow machine
and/or with LLVM assertions enabled - the performance difference on a
normal machine is smaller:

$ PGOPTIONS='-cjit=0' time make -s check
...
5.21user 2.11system 0:24.95elapsed 29%CPU (0avgtext+0avgdata 54212maxresident)k
20976inputs+340848outputs (14major+342228minor)pagefaults 0swaps

$ PGOPTIONS='-cjit=1' time make -s check
...
5.33user 2.01system 0:30.49elapsed 24%CPU (0avgtext+0avgdata 54236maxresident)k
0inputs+340856outputs (0major+342616minor)pagefaults 0swaps


But also importantly, I think there's actual advantages in triggering
JIT in some places in the regression tests. There's buildfarm animals
exercising the path that everything is JITed, but that's not really
helpful during development.


> Those bad estimates were harmlessly bad before,

I think that's not true.


Greetings,

Andres Freund



Re: pread() and pwrite()

2018-09-05 Thread Jesper Pedersen

Hi,

On 07/26/2018 10:04 PM, Thomas Munro wrote:

Done.  Rebased.



This needs a rebase again.

Once resolved the patch passes make check-world, and a strace analysis 
shows the associated read()/write() have been turned into 
pread64()/pwrite64(). All lseek()'s are SEEK_END's.


Best regards,
 Jesper



Re: JIT compiling with LLVM v12

2018-09-05 Thread Andres Freund
Hi,

On 2018-08-22 06:20:21 +, Noah Misch wrote:
> I see jit slows the regression tests considerably:

Is this with LLVM assertions enabled or not?  The differences seem
bigger than what I'm observing, especially on the mips animal - which I
observe uses a separately installed LLVM build.

On my machine, with master, on a postgres assert build w/ non-debug llvm build:
PGOPTIONS='-c jit=0' time make -Otarget -j10 -s check-world  && echo success || 
echo f
240.37user 55.55system 2:08.17elapsed 230%CPU (0avgtext+0avgdata 
66264maxresident)k

PGOPTIONS='-c jit=1' time make -Otarget -j10 -s check-world  && echo success || 
echo f
253.02user 55.77system 2:16.22elapsed 226%CPU (0avgtext+0avgdata 
54756maxresident)k


Using your command, on a postgres optimized build w/ non-debug llvm build:

> # x86_64, non-assert, w/o llvm
> $ for n in 1 2 3; do env time make -C src/bin/pg_upgrade check; done 2>&1 | 
> grep elapsed
> 7.64user 4.24system 0:36.40elapsed 32%CPU (0avgtext+0avgdata 
> 36712maxresident)k
> 8.09user 4.50system 0:37.71elapsed 33%CPU (0avgtext+0avgdata 
> 36712maxresident)k
> 7.53user 4.18system 0:36.54elapsed 32%CPU (0avgtext+0avgdata 
> 36712maxresident)k
>
> # x86_64, non-assert, w/  llvm trunk
> $ for n in 1 2 3; do env time make -C src/bin/pg_upgrade check; done 2>&1 | 
> grep elapsed
> 9.58user 5.79system 0:49.61elapsed 30%CPU (0avgtext+0avgdata 
> 36712maxresident)k
> 9.47user 5.92system 0:47.84elapsed 32%CPU (0avgtext+0avgdata 
> 36712maxresident)k
> 9.09user 5.51system 0:47.94elapsed 30%CPU (0avgtext+0avgdata 
> 36712maxresident)k
>

andres@alap4:~/build/postgres/master-optimize/vpath$ for n in 1 2 3; do 
PGOPTIONS='-cjit=0' env time make -C src/bin/pg_upgrade check; done 2>&1 | grep 
elapsed
8.01user 3.63system 0:39.88elapsed 29%CPU (0avgtext+0avgdata 50196maxresident)k
7.96user 3.86system 0:39.70elapsed 29%CPU (0avgtext+0avgdata 50064maxresident)k
7.96user 3.80system 0:37.17elapsed 31%CPU (0avgtext+0avgdata 50148maxresident)k
andres@alap4:~/build/postgres/master-optimize/vpath$ for n in 1 2 3; do 
PGOPTIONS='-cjit=1' env time make -C src/bin/pg_upgrade check; done 2>&1 | grep 
elapsed
7.88user 3.76system 0:44.98elapsed 25%CPU (0avgtext+0avgdata 50092maxresident)k
7.99user 3.72system 0:46.53elapsed 25%CPU (0avgtext+0avgdata 50036maxresident)k
7.88user 3.87system 0:45.26elapsed 25%CPU (0avgtext+0avgdata 50132maxresident)k

So here the difference is smaller, but not hugely so.


> # mips32el, assert, w/o llvm (buildfarm member topminnow) [1]
> 28min install-check-*
> 35min check-pg_upgrade
>
> # mips32el, assert, w/  llvm 6.0.1 [1]
>  63min install-check-*
> 166min check-pg_upgrade

But this seems so absurdly large of a difference that I kinda think LLVM
assertions (wich are really expensive and add O(N) operations in a bunch
of places) might be to blame.

Greetings,

Andres Freund



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-05 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> If you wanted to argue that the set of people who still want to
 Tom> run PG on pre-SSE2 hardware is the empty set, that'd be an easier
 Tom> sell really.

At the very minimum, I believe FreeBSD project policy would require that
the i386 packages for postgresql be built to run without SSE2.

 Tom> But what I'm really concerned about here, given that we're
 Tom> apparently talking about an ABI change,

No, there's no ABI change involved. 

 Tom> I'm actually a bit surprised to hear that you can just randomly
 Tom> add -msse2 on BSDen. Do they ship separate copies of libc et al to
 Tom> make that work?

-msse2 just tells the compiler to assume that the SSE2 instructions and
registers are available for use (and if so, it will always use them for
floats in preference to the x87 registers where possible), it doesn't
change the ABI: all function results are still bounced through the x87
register stack (and float/double parameters are not passed in registers
anyway).

-- 
Andrew (irc:RhodiumToad)



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-05 Thread Andrew Gierth
> "Peter" == Peter Eisentraut  writes:

 > On 05/09/2018 18:42, Andres Freund wrote:
 >> Realistically we're going to be running into old versions of clang
 >> for a long time. And the importance of running i386 without SSE2
 >> surely isn't increasing. So I don't really see an urgent need to do
 >> anything about it. And if it gets fixed, and we care, we can just
 >> add a clang version check to the test.

 Peter> Another option perhaps is to let this be and accept it as
 Peter> alternative floating point behavior. We already have some of
 Peter> those.

If it was only a matter of error handling, then the best fix would
likely to be just avoiding __builtin_isinf if (clang AND i386 AND not
sse2).

The problem is that if we're relying on -fexcess-precision=standard
semantics in places besides infinity checks, then we won't get those
semantics on clang/i386/no-sse2 since it has no comparable option. (What
are we doing about compilers for x86-32 other than clang and gcc?)

-- 
Andrew (irc:RhodiumToad)



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-05 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> If you wanted to argue that the set of people who still want to
>  Tom> run PG on pre-SSE2 hardware is the empty set, that'd be an easier
>  Tom> sell really.

> At the very minimum, I believe FreeBSD project policy would require that
> the i386 packages for postgresql be built to run without SSE2.

Well, can we tell them they have to use gcc to compile, or will that
be against their policy too?

regards, tom lane



Re: Collation versioning

2018-09-05 Thread Christoph Berg
Re: Thomas Munro 2018-09-05 

> > Hopefully that version info is fine-grained enough.
> 
> Hmm.  I was looking for locale data version, not libc.so itself.  I
> realise they come ultimately from the same source package, but are the
> locale definitions and libc6 guaranteed to be updated at the same
> time?  I see that the locales package depends on libc-bin >> 2.27 (no
> upper bound), which in turn depends on libc6 >> 2.27, << 2.28.  So
> perhaps you can have a system with locales 2.27 and libc6 2.28?

No because libc6.deb "breaks" locales << 2.27:

Package: libc6
Source: glibc
Version: 2.27-5
Breaks: [...] locales (<< 2.27), locales-all (<< 2.27),

(I can't tell off-hand why this isn't just a stricter dependency in
locales.deb, but it's probably because this variant works better for
upgrades.)

> I also wonder if there are some configurations where they could get out
> of sync because of manual use of locale-gen or something like that.
> Clearly most systems would have them in sync through apt-get upgrade
> though, so maybe gnu_get_libc_version() would work well in practice?

I'd hope so. I'm more worried about breakage because of fixes applied
within one glibc version (2.27-5 vs 2.27-6), but I guess Debian's
glibc maintainers are clueful enough not to do that.


Re: Thomas Munro 2018-09-05 

> And even if they are, what if your cluster is still running and still
> has the older libc.so.6 mapped in?  Newly forked backends will see new
> locale data but gnu_get_libc_version() will return the old string.
> (Pointed out off-list by Andres.)  Eventually you restart your cluster
> and start seeing the error.

That problem isn't protected against by PG itself. I've seen clusters
that were upgraded on disk but not restarted yet where every plpgsql
invocation was throwing symbol errors. So I guess we don't have to try
harder for libc.

> So, it's not ideal but perhaps worth considering on the grounds that
> it's better than nothing?

Ack.

Christoph



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-05 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >> At the very minimum, I believe FreeBSD project policy would require
 >> that the i386 packages for postgresql be built to run without SSE2.

 Tom> Well, can we tell them they have to use gcc to compile, or will
 Tom> that be against their policy too?

That's no problem, lots of ports have USE_GCC=yes (or some specified
version or minimum version of gcc).

-- 
Andrew (irc:RhodiumToad)



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-05 Thread Tom Lane
Andrew Gierth  writes:
> The problem is that if we're relying on -fexcess-precision=standard
> semantics in places besides infinity checks, then we won't get those
> semantics on clang/i386/no-sse2 since it has no comparable option. (What
> are we doing about compilers for x86-32 other than clang and gcc?)

Well, the bigger problem is that we don't really know exactly where
nonstandard precision semantics might cause visible behavior changes.
It's just about a dead certainty, IMO, that there are some other
places we don't know about because the regression tests don't expose
them; or that future code rearrangements might create new problems.

Given that (if I understand Andres correctly) -fexcess-precision=standard
behavior is required by C99, it's hard to get excited about expending
a whole lot of work here.  I'm fine with adding some compiler options
or telling the user to do so, but I don't think we should do anything
much beyond that.

regards, tom lane



On the need for a snapshot in exec_bind_message()

2018-09-05 Thread Daniel Wood
In particular:
exec_bind_message()
PushActiveSnapshot(GetTransactionSnapshot());


Suppressing this I've achieved over 1.9 M TXN's a second on select only pgbench 
on a 48 core box.  It is about 50% faster with this change.  The cpu usage of 
GetSnapshotData drops from about 22% to 4.5%.

If there were no input functions, that needed this, nor reparsing or 
reanalyzing needed, and we knew this up front, it'd be a huge win.  We could 
test for a number of conditions on the first parse/optimization of the query 
and set a flag to suppress this for subsequent executions.


NOTE:

In GetSnapshotData because pgxact, is declared volatile, the compiler will not 
reduce the following two IF tests into a single test:


if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
continue;


if (pgxact->vacuumFlags & PROC_IN_VACUUM)
continue;


You can reduce the code path in the inner loop by coding this as:

if (pgxact->vacuumFlags & (PROC_IN_LOGICAL_DECODING|PROC_IN_VACUUM))
continue;


I'm still working on quantifying any gain.  Note it isn't just one L1 cache

fetch and one conditional branch eliminated.  Due to the update frequency of 
the pgxact cache line, for single statement TXN's, there are a certain number 
of full cache misses, due to invalidation, that occurs when given pgxact is 
updated between the first fetch of vacuumFlags and the 2nd fetch.

Re: On the need for a snapshot in exec_bind_message()

2018-09-05 Thread Tom Lane
Daniel Wood  writes:
> In particular:
> exec_bind_message()
> PushActiveSnapshot(GetTransactionSnapshot());

> If there were no input functions, that needed this, nor reparsing or
> reanalyzing needed, and we knew this up front, it'd be a huge win.

Unfortunately, that's not the case, so I think trying to get rid of
this call is a nonstarter.

What we have kicked around a bit is trying to get rid of the additional
snapshot-taking at the start of execution, so that the snapshot taken
at BIND time serves all the way through the query.  That'd require a
fair amount of refactoring I think, but at least it's not a broken idea
on its face.

> In GetSnapshotData because pgxact, is declared volatile, the compiler will 
> not reduce the following two IF tests into a single test:
> if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
> continue;
> if (pgxact->vacuumFlags & PROC_IN_VACUUM)
> continue;

That, on the other hand, is just crummy coding :-(

> I'm still working on quantifying any gain.

It'll be interesting to see if you can show visible improvement from
merging those.  It's enough of a hotspot that I wouldn't be surprised
to find some, but actual numbers would be nice.

regards, tom lane



Re: buildfarm: could not read block 3 in file "base/16384/2662": read only 0 of 8192 bytes

2018-09-05 Thread Tom Lane
I wrote:
> Andres Freund  writes:
>> One concern I have with your approach is that it isn't particularly
>> bullet-proof for cases where the rebuild is triggered by something that
>> doesn't hold a conflicting lock.

> Wouldn't that be a bug in the something-else?

So where are we on this?  Should I proceed with my patch, or are we
going to do further investigation?  Does anyone want to do an actual
patch review?

I think it's important to have some fix in place in time for the next
11beta release, so time grows short ...

regards, tom lane



Re: pgsql: Clean up after TAP tests in oid2name and vacuumlo.

2018-09-05 Thread Michael Paquier
On Wed, Sep 05, 2018 at 03:20:03PM -0300, Alvaro Herrera wrote:
> On 2018-Sep-05, Michael Paquier wrote:
> > It would be better to start a new thread rather than posting
> > a new patch, but I'd rather take the temperature first.
> 
> Slightly feverish, it appears.

As far as I can see, the proposal is not cold to death and could have a
future, so I'll begin a new thread with a more complete patch.
--
Michael


signature.asc
Description: PGP signature


Re: On the need for a snapshot in exec_bind_message()

2018-09-05 Thread Andres Freund
Hi,

On 2018-09-05 12:31:04 -0700, Daniel Wood wrote:
> NOTE:
> 
> In GetSnapshotData because pgxact, is declared volatile, the compiler will 
> not reduce the following two IF tests into a single test:
> 
> 
> if (pgxact->vacuumFlags & PROC_IN_LOGICAL_DECODING)
> continue;
> 
> 
> if (pgxact->vacuumFlags & PROC_IN_VACUUM)
> continue;
> 
> 
> You can reduce the code path in the inner loop by coding this as:
> 
> if (pgxact->vacuumFlags & (PROC_IN_LOGICAL_DECODING|PROC_IN_VACUUM))
> continue;
> 
> 
> I'm still working on quantifying any gain.  Note it isn't just one L1 cache
> 
> fetch and one conditional branch eliminated.  Due to the update frequency of 
> the pgxact cache line, for single statement TXN's, there are a certain number 
> of full cache misses, due to invalidation, that occurs when given pgxact is 
> updated between the first fetch of vacuumFlags and the 2nd fetch.

These two obviously could be combined, but I think we should just get
rid of the volatiles. With a small bit of work they shouldn't be
required anymore these days.

Greetings,

Andres Freund



Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

2018-09-05 Thread Alexander Korotkov
On Wed, Sep 5, 2018 at 5:05 PM Alexander Korotkov
 wrote:
> On Wed, Sep 5, 2018 at 2:39 PM Alexander Korotkov
>  wrote:
> > On Wed, Sep 5, 2018 at 12:26 PM Alexander Korotkov
> >  wrote:
> > > On Wed, Sep 5, 2018 at 1:45 AM R, Siva  wrote:
> > > > On Tue, Sep 4, 2018 at 09:16 PM, Alexander Korotkov 
> > > >  wrote:
> > > > > Do you have a test scenario for reproduction of this issue?  We need
> > > > > it to ensure that fix is correct.
> > > >
> > > > Unfortunately, I do not have a way of reproducing this issue.
> > > > So far I have tried a workload consisting of inserts (of the
> > > > same attribute value that is indexed), batch deletes of rows
> > > > and vacuum interleaved with engine crash/restarts.
> > >
> > > Issue reproduction and testing is essential for bug fix.  Remember
> > > last time you reported GIN bug [1], after issue reproduction it
> > > appears that we have more things to fix.  I's quite clear for me that
> > > if segment list contains GIN_SEGMENT_INSERT before GIN_SEGMENT_DELETE,
> > > then it might lead to wrong behavior in ginRedoRecompress().  But it's
> > > not yet clear to understand what code patch could lead to such segment
> > > list... I'll explore code more and probably will come with some idea.
> >
> > Aha, I've managed to reproduce this.
> > 1. Apply ginRedoRecompress-asserts.patch, which add assertions to
> > ginRedoRecompress() detecting past opaque writes.
>
> It was wrong, sorry.  It appears that I put strict inequality into
> asserts, while there should be loose inequality.  Correct version of
> patch is attached.  And scenario I've posted isn't really reproducing
> the bug...

Finally I managed to reproduce the bug.  The scenario is following.
Underlying idea is that when insertion of multiple tuples goes to the
beginning of the page and this insertion succeed only thanks to
collapse of some short segments together, then this insertion wouldn't
fit to the page if given alone.

create table test (i integer primary key, a int[]) with (fillfactor = 50);
insert into test (select id, array[id%2]::int[] from
generate_series(1,15376) id);
create index test_idx on test using gin(a) with (fastupdate = off);
update test set a = '{1}' where i % 4 = 0 or i % 16 = 2 or i % 64 in
(6, 46, 36) or i % 256 = 54;
vacuum test;
insert into test (select id + 16376, '{0}' from generate_series(1,5180) id);
update test set a = '{1}' where i <= 15376 and i % 256 in (182, 198);
vacuum test;
alter index test_idx set (fastupdate = on);
delete from test where i <= 134 and a = '{1}';
vacuum test;
insert into test (select id+3, '{0}' from generate_series(1,110) id);
vacuum test;

With ginRedoRecompress-asserts-v2.patch following assertion is fired.
TRAP: FailedAssertion("!(segptr + newsegsize + (szleft - segsize) <= (
((void) ((_Bool) (! (!(PageValidateSpecialPointer(page))) ||
(ExceptionalCondition("!(PageValidateSpecialPointer(page))",
("FailedAssertion"), "ginxlog.c", 289), 0, (char *) ((char *)
(page) + ((PageHeader) (page))->pd_special) ))", File: "ginxlog.c",
Line: 289)

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



*_to_xml() should copy SPI_processed/SPI_tuptable

2018-09-05 Thread Chapman Flack
I encountered this in 9.5 and haven't yet written a reproducer
for 10 or 11, but the code in question looks the same in master.


In xml.c, query_to_xml_internal() contains a loop that refers
to SPI_processed every iteration:

for (i = 0; i < SPI_processed; i++)
SPI_sql_row_to_xmlelement(i, result, tablename, nulls,
  tableforest, targetns, top_level);

likewise, that function it is calling contains a loop that
repeatedly dereferences SPI_tuptable:

for (i = 1; i <= SPI_tuptable->tupdesc->natts; i++)
...
... map_sql_value_to_xml_value(colval,
   SPI_gettypeid(SPI_tuptable->tupdesc, ...


and map_sql_value_to_xml_value can use OidOutputFunctionCall. If that
attribute's data type has an output function that uses SPI, the next
iteration here dereferences a null SPI_tuptable.

The root of the problem does not seem to be an output function that
uses SPI; indeed, the chance that one might do so seems to have been
the original motivation for OutputFunctionCall().

The essence of the problem here seems to be simply that these routines
in xml.c are not making local-variable copies of SPI_processed and
SPI_tuptable, as the docs for SPI_execute recommend.

Would it be worth also expanding that note in the docs? It currently
says:

---
All SPI query-execution functions set both SPI_processed and
SPI_tuptable  Save these two global variables into local procedure
variables if you need to access the result table of SPI_execute or
another query-execution function across later calls.
---

but might it bear emphasis that those later calls can happen
indirectly from things that you call (e.g. output functions)?
Those will be nested in their own SPI_connect/SPI_finish, which
a reasonable person might guess would save/restore those globals,
but they don't; SPI_finish merely clears them, on the assumption
that you've got your local copies if you need them.

-Chap



Re: Collation versioning

2018-09-05 Thread Thomas Munro
On Wed, Sep 5, 2018 at 12:10 PM Christoph Berg  wrote:
> > So, it's not ideal but perhaps worth considering on the grounds that
> > it's better than nothing?
>
> Ack.

Ok, here's a little patch like that.

postgres=# select collname, collcollate, collversion from pg_collation
where collname = 'en_NZ';
 collname | collcollate | collversion
--+-+-
 en_NZ| en_NZ.utf8  | 2.24
(1 row)

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Set-collversion-for-collations-that-come-from-glibc.patch
Description: Binary data


Re: PostgreSQL does not start when max_files_per_process> 1200 on Windows 7

2018-09-05 Thread Victor Spirin
When max_files_per_process=1201 or more server not start. I tested and 
debugged this on Windows 7. On Windows 10 work all right.


I found this take place after call function  count_usable_fds(int 
max_to_probe, int *usable_fds, int *already_open)


Error occured on the first call selres = select(nSockets, &rmask, NULL, 
NULL, &timeout) from ServerLoop. Error 0xC142:  "{DLL Initialization 
Failed} Initialization of the dynamic link library %hs failed. The 
process is terminating abnormally."


I temporarily moved call set_max_safe_fds() to top of the PostmasterMain 
and error moved to ReadFile in function pipe_read_line(char *cmd, char 
*line, int maxsize):


if (!ReadFile(childstdoutrddup, lineptr, maxsize - (lineptr - line),
                          &bytesread, NULL))

Repeat the error in a separate example did not work yet.

Replacing dup for stdin on dup of file handle resolve this problem.

Victor Spirin
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

05.09.2018 19:24, Andres Freund пишет:

Hi,

On 2018-09-05 13:06:06 +0300, Victor Spirin wrote:

I have a bug in Windows 7 with max_files_per_process> 1200.

Using dup (0) in the  function count_usable_fds more than 1200 times (0 =
stdin) causes further unpredictable errors with file operations.

Could you expand a bit on this?

Greetings,

Andres Freund






Re: *_to_xml() should copy SPI_processed/SPI_tuptable

2018-09-05 Thread Tom Lane
Chapman Flack  writes:
> In xml.c, query_to_xml_internal() contains a loop that refers
> to SPI_processed every iteration:
> for (i = 0; i < SPI_processed; i++)
> SPI_sql_row_to_xmlelement(i, result, tablename, nulls,
>   tableforest, targetns, top_level);
> likewise, that function it is calling contains a loop that
> repeatedly dereferences SPI_tuptable:
> for (i = 1; i <= SPI_tuptable->tupdesc->natts; i++)
> ...
> ... map_sql_value_to_xml_value(colval,
>SPI_gettypeid(SPI_tuptable->tupdesc, ...
> and map_sql_value_to_xml_value can use OidOutputFunctionCall
> [ which could call arbitrary code that clobbers SPI_tuptable ]

Ugh.  I wonder how many other places have latent issues of the same kind.

> The essence of the problem here seems to be simply that these routines
> in xml.c are not making local-variable copies of SPI_processed and
> SPI_tuptable, as the docs for SPI_execute recommend.

I think that's blaming the messenger TBH.  The *real* problem is the
damfool decision to use global variables for this in the first place.
Should we change that somehow?

Really the right API would have entailed something like

int SPI_execute(const char *src, bool read_only, long tcount,
SPITupleTable **tuptable, uint64 *processed);

... and I guess we'd have to think about SPI_lastoid too, so maybe
returning a struct would be smarter than a pointer-per-value.

So some possible alternatives are:

* Just do it.  Probably breaks too much 3rd-party code to be very
popular, though.

* Invent SPI_execute_r() and so on with APIs as above, and deprecate
the old functions, but don't remove 'em right away.  Grotty, and it
does little to fix any latent problems that may exist outside core.

* Replace SPI_tuptable et al with macros that access fields in the
current SPI stack level (similar to the way that, eg, errno works
on most modern platforms).  This seems do-able, if a bit grotty.

None of this is very back-patchable, so I guess we need a one-off
backpatched change for query_to_xml_internal and anyplace else that
looks to be at serious risk.

regards, tom lane



Re: On the need for a snapshot in exec_bind_message()

2018-09-05 Thread Daniel Wood
> > exec_bind_message()
> > PushActiveSnapshot(GetTransactionSnapshot());
> 
> > If there were no input functions, that needed this, nor reparsing or
> > reanalyzing needed, and we knew this up front, it'd be a huge win.
> 
> Unfortunately, that's not the case, so I think trying to get rid of
> this call is a nonstarter.

Queries stop getting re-optimized after 5 times, unless better plans are to be 
had.  In the absence of schema changes or changing search path why is the 
snapshot needed?

- Dan



Re: unaccent(text) fails depending on search_path (WAS: pg_upgrade fails saying function unaccent(text) doesn't exist)

2018-09-05 Thread Tom Lane
[ redirecting to pgsql-hackers ]

I wrote:
> Gunnlaugur Thor Briem  writes:
>> SET search_path = "$user"; SELECT public.unaccent('foo');
>> SET
>> ERROR:  text search dictionary "unaccent" does not exist

> Meh.  I think we need the attached, or something just about like it.
>
> It's barely possible that there's somebody out there who's relying on
> setting the search path to allow choosing among multiple "unaccent"
> dictionaries.  But there are way more people whose functions are
> broken due to the recent search-path-tightening changes.

Here's a slightly more efficient version.

regards, tom lane

diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c
index 247c202..dbf2bb9 100644
*** a/contrib/unaccent/unaccent.c
--- b/contrib/unaccent/unaccent.c
***
*** 20,26 
--- 20,28 
  #include "tsearch/ts_locale.h"
  #include "tsearch/ts_public.h"
  #include "utils/builtins.h"
+ #include "utils/lsyscache.h"
  #include "utils/regproc.h"
+ #include "utils/syscache.h"
  
  PG_MODULE_MAGIC;
  
*** unaccent_dict(PG_FUNCTION_ARGS)
*** 376,382 
  
  	if (PG_NARGS() == 1)
  	{
! 		dictOid = get_ts_dict_oid(stringToQualifiedNameList("unaccent"), false);
  		strArg = 0;
  	}
  	else
--- 378,398 
  
  	if (PG_NARGS() == 1)
  	{
! 		/*
! 		 * Use the "unaccent" dictionary that is in the same schema that this
! 		 * function is in.
! 		 */
! 		Oid			procnspid = get_func_namespace(fcinfo->flinfo->fn_oid);
! 		const char *dictname = "unaccent";
! 
! 		dictOid = GetSysCacheOid2(TSDICTNAMENSP,
!   PointerGetDatum(dictname),
!   ObjectIdGetDatum(procnspid));
! 		if (!OidIsValid(dictOid))
! 			ereport(ERROR,
! 	(errcode(ERRCODE_UNDEFINED_OBJECT),
! 	 errmsg("text search dictionary \"%s.%s\" does not exist",
! 			get_namespace_name(procnspid), dictname)));
  		strArg = 0;
  	}
  	else


Re: *_to_xml() should copy SPI_processed/SPI_tuptable

2018-09-05 Thread Chapman Flack
On 09/05/18 18:07, Tom Lane wrote:

> * Replace SPI_tuptable et al with macros that access fields in the
> current SPI stack level (similar to the way that, eg, errno works
> on most modern platforms).  This seems do-able, if a bit grotty.

It would mean they'd have to *be* in the stack frame, where they
currently aren't; they're ignored by SPI_connect and just zeroed
by SPI_finish, on the grounds that they're transient and you've
copied them if you care.

Another alternative might be to have SPI_connect save them and
SPI_finish put them back, which leaves you just responsible for
reasoning about your own code. You'd still be expected to save them
across your own uses of other SPI calls, but no longer exposed to
spooky action at a distance from nested uses of SPI in stuff you call.

-Chap



Re: On the need for a snapshot in exec_bind_message()

2018-09-05 Thread Tom Lane
Daniel Wood  writes:
>>> exec_bind_message()
>>>   PushActiveSnapshot(GetTransactionSnapshot());
>>> 
>>> If there were no input functions, that needed this, nor reparsing or
>>> reanalyzing needed, and we knew this up front, it'd be a huge win.

>> Unfortunately, that's not the case, so I think trying to get rid of
>> this call is a nonstarter.

> Queries stop getting re-optimized after 5 times, unless better plans are to 
> be had.  In the absence of schema changes or changing search path why is the 
> snapshot needed?

The snapshot has little to do with the query plan, usually.  It's about
what view of the database the executed query will see, and particularly
about what view the parameter input functions will see, if they look.

You could maybe argue that immutable input functions shouldn't need
snapshots, but there are enough not-immutable ones that I don't think
that gets you very far.  In any case, we'd still need a snapshot for
query execution.  The set of queries that could possibly never need
a snapshot at all is probably not large enough to be interesting.

regards, tom lane



Re: On the need for a snapshot in exec_bind_message()

2018-09-05 Thread Andres Freund
On 2018-09-05 18:45:52 -0400, Tom Lane wrote:
> Daniel Wood  writes:
> >>> exec_bind_message()
> >>>   PushActiveSnapshot(GetTransactionSnapshot());
> >>> 
> >>> If there were no input functions, that needed this, nor reparsing or
> >>> reanalyzing needed, and we knew this up front, it'd be a huge win.
> 
> >> Unfortunately, that's not the case, so I think trying to get rid of
> >> this call is a nonstarter.
> 
> > Queries stop getting re-optimized after 5 times, unless better plans are to 
> > be had.  In the absence of schema changes or changing search path why is 
> > the snapshot needed?
> 
> The snapshot has little to do with the query plan, usually.  It's about
> what view of the database the executed query will see, and particularly
> about what view the parameter input functions will see, if they look.

The snapshot in exec_bind_message() shouldn't be about what the executed
query sees, no?  Sure, we'll evaluate input params and might replan etc,
but other than that?


> You could maybe argue that immutable input functions shouldn't need
> snapshots, but there are enough not-immutable ones that I don't think
> that gets you very far.  In any case, we'd still need a snapshot for
> query execution.  The set of queries that could possibly never need
> a snapshot at all is probably not large enough to be interesting.

It'd not be insane to rejigger things so there's a version of
PushActiveSnapshot() doesn't eagerly compute the snapshot, but instead
does so on first access.  Obviously we couldn't use that everywhere, but
the exec_bind_message() seems like a prime case for it.

Greetings,

Andres Freund



Re: *_to_xml() should copy SPI_processed/SPI_tuptable

2018-09-05 Thread Tom Lane
Chapman Flack  writes:
> On 09/05/18 18:07, Tom Lane wrote:
>> * Replace SPI_tuptable et al with macros that access fields in the
>> current SPI stack level (similar to the way that, eg, errno works
>> on most modern platforms).  This seems do-able, if a bit grotty.

> It would mean they'd have to *be* in the stack frame, where they
> currently aren't;

Right, I was assuming that was obvious ...

> Another alternative might be to have SPI_connect save them and
> SPI_finish put them back, which leaves you just responsible for
> reasoning about your own code. You'd still be expected to save them
> across your own uses of other SPI calls, but no longer exposed to
> spooky action at a distance from nested uses of SPI in stuff you call.

Hmm.  I'd thought about that briefly and concluded that it didn't offer
a full fix, but on reflection it's not clear why it'd be any less of
a fix than the macroized-SPI_tuptable approach.  You end up with
per-SPI-stack-level storage either way, and while that isn't perfect
it does go a long way, as you say.  More, this has the huge advantage
of being back-patchable, because there'd be no API/ABI change.

regards, tom lane



Re: On the need for a snapshot in exec_bind_message()

2018-09-05 Thread Tom Lane
Andres Freund  writes:
> On 2018-09-05 18:45:52 -0400, Tom Lane wrote:
>> The snapshot has little to do with the query plan, usually.  It's about
>> what view of the database the executed query will see, and particularly
>> about what view the parameter input functions will see, if they look.

> The snapshot in exec_bind_message() shouldn't be about what the executed
> query sees, no?  Sure, we'll evaluate input params and might replan etc,
> but other than that?

As things stand, yeah, it's not used for execution.

> It'd not be insane to rejigger things so there's a version of
> PushActiveSnapshot() doesn't eagerly compute the snapshot, but instead
> does so on first access.  Obviously we couldn't use that everywhere, but
> the exec_bind_message() seems like a prime case for it.

I dislike this proposal, because it introduces an element of uncertainty
into when the snapshot is taken.  It doesn't seem a lot different from
saying that we'll try to save a gettimeofday call by postponing
calculation of the statement_timestamp until somebody asks for it.
Then it's no longer the statement start time, but some hard-to-define
time within execution.

I think the other idea of trying to keep the bind-time snapshot in use
throughout execution is more worth pursuing.  The main problem with it,
now that I think harder, is that we need the execution snapshot to be
taken after we've acquired whatever locks the query needs.  But we
actually know the set of required locks, if we have a cached plan
at hand.  (It's *possible* that that set would change, if we're forced
to replan, but it's unlikely.)  So you could imagine rejiggering things
so that we do the acquire-locks bit before doing parameter input, and
unless the lock list changes, we can keep the parameter-input snapshot
in force throughout execution.  Not quite sure how to make that play
with custom plans though.

regards, tom lane



Re: On the need for a snapshot in exec_bind_message()

2018-09-05 Thread Daniel Wood
> > Queries stop getting re-optimized after 5 times, unless better plans are to 
> > be had.  In the absence of schema changes or changing search path why is 
> > the snapshot needed?
> 
> The snapshot has little to do with the query plan, usually.  It's about
> what view of the database the executed query will see, and particularly
> about what view the parameter input functions will see, if they look.
> 
> You could maybe argue that immutable input functions shouldn't need
> snapshots, but there are enough not-immutable ones that I don't think
> that gets you very far.  In any case, we'd still need a snapshot for
> query execution.  The set of queries that could possibly never need
> a snapshot at all is probably not large enough to be interesting.

I would think that the vast majority of bind variables in customer queries 
would involve built-in data types whose input functions are immutable.  As far 
as the "view of the database the executed query will see" goes, either the 
database itself changes, which should trigger catcache invalidations, that can 
be accounted for, or the "view" changes as in searchpath which can be tested 
for.  I would think the optimization would be applicable most of the time.  It 
isn't the number of obscure user created non-immutable input functions that 
exist.  It is the frequency with which immutable input functions are used in 
real app's.  This is a 50% performance improvement in point lookups on larger 
boxes.



Re: On the need for a snapshot in exec_bind_message()

2018-09-05 Thread Andres Freund
On 2018-09-05 19:11:56 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-09-05 18:45:52 -0400, Tom Lane wrote:
> >> The snapshot has little to do with the query plan, usually.  It's about
> >> what view of the database the executed query will see, and particularly
> >> about what view the parameter input functions will see, if they look.
> 
> > The snapshot in exec_bind_message() shouldn't be about what the executed
> > query sees, no?  Sure, we'll evaluate input params and might replan etc,
> > but other than that?
> 
> As things stand, yeah, it's not used for execution.
> 
> > It'd not be insane to rejigger things so there's a version of
> > PushActiveSnapshot() doesn't eagerly compute the snapshot, but instead
> > does so on first access.  Obviously we couldn't use that everywhere, but
> > the exec_bind_message() seems like a prime case for it.
> 
> I dislike this proposal, because it introduces an element of uncertainty
> into when the snapshot is taken.  It doesn't seem a lot different from
> saying that we'll try to save a gettimeofday call by postponing
> calculation of the statement_timestamp until somebody asks for it.
> Then it's no longer the statement start time, but some hard-to-define
> time within execution.

I don't think that really is comparable. The timestamp really exists on
a (well, for our purposes) absolute scale. That's much less clearer for
snapshots in read committed: Without access to the database or returning
results the moment the snapshot has been taken really doesn't mean that
much - the session could really just have been scheduled out just before
taking the snapshot (or waited for the procarray lock, to be more on
topic).

Additionally, this isn't the snapshot for the actual query execution -
it's the one for the parameter evaluation. And we *already* take a
separate snapshot for that from the one actually executing the query, so
skew between the two really can't be meaningful.


> I think the other idea of trying to keep the bind-time snapshot in use
> throughout execution is more worth pursuing.

Hm, that seems to have its own set of problems. Some of them
nontrivial. It's e.g. perfectly legal to bind multiple statements and
then execute them - and I think it's quite possible that pgjdbc would
generate something like that.

And then there's:
> The main problem with it, now that I think harder, is that we need the
> execution snapshot to be taken after we've acquired whatever locks the
> query needs.

Greetings,

Andres Freund



RE: automatic restore point

2018-09-05 Thread Yotsunaga, Naoki
-Original Message-
From: Yotsunaga, Naoki [mailto:yotsunaga.na...@jp.fujitsu.com] 
Sent: Tuesday, June 26, 2018 10:18 AM
To: Postgres hackers 
Subject: automatic restore point

>Hi, I attached a patch to output the LSN before execution to the server log 
>>when executing a specific command and accidentally erasing data.

Since there was an error in the attached patch, I attached the modified patch.

--
Naoki Yotsunaga






automatic_restore_point_v2.patch
Description: automatic_restore_point_v2.patch


Add extension options to control TAP and isolation tests

2018-09-05 Thread Michael Paquier
Hi all,

On a recent thread of pgsql-committers has been discussed the fact that
we lacked a bit of infrastructure to allow extensions to control
isolation and TAP tests:
https://www.postgresql.org/message-id/20180905174527.ga2...@paquier.xyz

Attached is a patch which is the result of the previous thread, where a
couple of variables are added for extension authors:
- ISOLATION, similar to REGRESS for pg_regress, which lists isolation
tests.
- ISOLATION_OPTS, which can be used to pass an option set to
pg_isolation_regress.
- TAP_TESTS, a switch to enable running TAP tests.

This results in a bit of cleanup in the Makefile of modules we ship with
upstream:
- modules/brin
- modules/commit_ts
- modules/test_pg_dump
- contrib/bloom, where the TAP tests were actually not running.
- contrib/oid2name
- contrib/test_decoding, which keeps the same feature set, and is
heavily cleaned up (it missed for example the use of the existing
NO_INSTALLCHECK).
- contrib/vacuumlo

Thoughts?
--
Michael
diff --git a/contrib/bloom/Makefile b/contrib/bloom/Makefile
index 13bd397b70..da9553a2d0 100644
--- a/contrib/bloom/Makefile
+++ b/contrib/bloom/Makefile
@@ -8,6 +8,7 @@ DATA = bloom--1.0.sql
 PGFILEDESC = "bloom access method - signature file based index"
 
 REGRESS = bloom
+TAP_TESTS = 1
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -19,6 +20,3 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
-
-wal-check: temp-install
-	$(prove_check)
diff --git a/contrib/oid2name/Makefile b/contrib/oid2name/Makefile
index 908e078714..361a80a7a1 100644
--- a/contrib/oid2name/Makefile
+++ b/contrib/oid2name/Makefile
@@ -6,11 +6,11 @@ PGAPPICON = win32
 PROGRAM = oid2name
 OBJS	= oid2name.o $(WIN32RES)
 
+TAP_TESTS = 1
+
 PG_CPPFLAGS = -I$(libpq_srcdir)
 PG_LIBS_INTERNAL = $(libpq_pgport)
 
-EXTRA_CLEAN = tmp_check
-
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
@@ -21,9 +21,3 @@ top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
-
-check:
-	$(prove_check)
-
-installcheck:
-	$(prove_installcheck)
diff --git a/contrib/test_decoding/Makefile b/contrib/test_decoding/Makefile
index afcab930f7..8cd83a763f 100644
--- a/contrib/test_decoding/Makefile
+++ b/contrib/test_decoding/Makefile
@@ -3,9 +3,20 @@
 MODULES = test_decoding
 PGFILEDESC = "test_decoding - example of a logical decoding output plugin"
 
-# Note: because we don't tell the Makefile there are any regression tests,
-# we have to clean those result files explicitly
-EXTRA_CLEAN = $(pg_regress_clean_files)
+EXTRA_INSTALL=contrib/test_decoding
+
+REGRESS = ddl xact rewrite toast permissions decoding_in_xact \
+	decoding_into_rel binary prepared replorigin time messages \
+	spill slot truncate
+ISOLATION = mxact delayed_startup ondisk_startup concurrent_ddl_dml \
+	oldest_xmin snapshot_transfer
+
+REGRESS_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
+ISOLATION_OPTS = --temp-config $(top_srcdir)/contrib/test_decoding/logical.conf
+
+# Disabled because these tests require "wal_level=logical", which
+# typical installcheck users do not have (e.g. buildfarm clients).
+NO_INSTALLCHECK = 1
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
@@ -18,52 +29,8 @@ include $(top_builddir)/src/Makefile.global
 include $(top_srcdir)/contrib/contrib-global.mk
 endif
 
-# Disabled because these tests require "wal_level=logical", which
-# typical installcheck users do not have (e.g. buildfarm clients).
-installcheck:;
-
 # But it can nonetheless be very helpful to run tests on preexisting
 # installation, allow to do so, but only if requested explicitly.
-installcheck-force: regresscheck-install-force isolationcheck-install-force
-
-check: regresscheck isolationcheck
-
-submake-regress:
-	$(MAKE) -C $(top_builddir)/src/test/regress all
-
-submake-isolation:
-	$(MAKE) -C $(top_builddir)/src/test/isolation all
-
-submake-test_decoding:
-	$(MAKE) -C $(top_builddir)/contrib/test_decoding
-
-REGRESSCHECKS=ddl xact rewrite toast permissions decoding_in_xact \
-	decoding_into_rel binary prepared replorigin time messages \
-	spill slot truncate
-
-regresscheck: | submake-regress submake-test_decoding temp-install
-	$(pg_regress_check) \
-	--temp-config $(top_srcdir)/contrib/test_decoding/logical.conf \
-	$(REGRESSCHECKS)
-
-regresscheck-install-force: | submake-regress submake-test_decoding temp-install
-	$(pg_regress_installcheck) \
-	$(REGRESSCHECKS)
-
-ISOLATIONCHECKS=mxact delayed_startup ondisk_startup concurrent_ddl_dml \
-	oldest_xmin snapshot_transfer
-
-isolationcheck: | submake-isolation submake-test_decoding temp-install
-	$(pg_isolation_regress_check) \
-	--temp-config $(top_srcdir)/contrib/test_decoding/logical.conf \
-	$(ISOLATIONCHECKS)
-
-isolationcheck-install-force: all | submake-isolation submake-test_decoding temp-install
-	$(pg_isolation_regress_installcheck) \
-	$(ISOLATIONCHECKS)
-
-

Re: pgsql: Clean up after TAP tests in oid2name and vacuumlo.

2018-09-05 Thread Michael Paquier
On Wed, Sep 05, 2018 at 01:05:42PM -0700, Michael Paquier wrote:
> As far as I can see, the proposal is not cold to death and could have a
> future, so I'll begin a new thread with a more complete patch.

Done.  Let's move the discussion here then:
https://www.postgresql.org/message-id/20180906014849.gg2...@paquier.xyz
--
Michael


signature.asc
Description: PGP signature


Re: JIT compiling with LLVM v12

2018-09-05 Thread Noah Misch
On Wed, Sep 05, 2018 at 11:55:39AM -0700, Andres Freund wrote:
> On 2018-08-22 06:20:21 +, Noah Misch wrote:
> > I see jit slows the regression tests considerably:
> 
> Is this with LLVM assertions enabled or not?

Without, I think.  I configured them like this:
cmake -G Ninja -DCMAKE_INSTALL_PREFIX=$HOME/sw/nopath/llvm   
-DCMAKE_BUILD_TYPE=MinSizeRel -DLLVM_USE_LINKER=gold 
-DLLVM_TARGETS_TO_BUILD=X86 ../llvm
cmake -G Ninja -DCMAKE_INSTALL_PREFIX=$HOME/sw/nopath/llvm-el32  
-DCMAKE_BUILD_TYPE=MinSizeRel -DLLVM_USE_LINKER=gold 
-DLLVM_PARALLEL_LINK_JOBS=1 ../llvm

> > # mips32el, assert, w/o llvm (buildfarm member topminnow) [1]
> > 28min install-check-*
> > 35min check-pg_upgrade
> >
> > # mips32el, assert, w/  llvm 6.0.1 [1]
> >  63min install-check-*
> > 166min check-pg_upgrade
> 
> But this seems so absurdly large of a difference that I kinda think LLVM
> assertions (wich are really expensive and add O(N) operations in a bunch
> of places) might be to blame.

The 2018-08-25 and 2018-09-01 published runs were far less bad.  Most of the
blame goes to the reason given in the footnote (competing load on a shared
machine), not to JIT.



Re: Implement predicate propagation for non-equivalence clauses

2018-09-05 Thread Richard Guo
On Wed, Sep 5, 2018 at 2:55 PM, Heikki Linnakangas  wrote:

> On 05/09/18 09:34, Richard Guo wrote:
>
>> Hi,
>>
>> As we know, current planner will generate additional restriction clauses
>> from
>> equivalence clauses. This will generally lower the total cost because
>> some of
>> tuples may be filtered out before joins.
>>
>> In this patch, we are trying to do the similar deduction, from
>> non-equivalence
>> clauses, that is, A=B AND f(A) implies A=B AND f(A) and f(B), under some
>> restrictions on f.
>>
>
> I haven't read the patch in detail, but that really only applies under
> special circumstances. Tom caught me making that assumption just recently (
> https://www.postgresql.org/message-id/8003.1527092720%40sss.pgh.pa.us). I
> think the restriction here is that f(x) must be an operator that's in the
> same operator family as the = operator. In a quick read-through, it's not
> clear to me what conditions are in the patch now. Please have a comment
> somewhere to list them explicitly.


Right. Above all the operator in f(x) should be in the same opfamily as the
equivalence class. We neglected that in this patch and it would result in
wrong plan. In addition, it should not contain volatile functions or
subplans. Will address this in v2 and list the conditions in comment.
Thanks!


>
> This patch will introduce extra cost for relation scan, due to the
>> cost of evaluating the new implied quals. Meanwhile, since the extra
>> filter may reduce the number of tuples returned by the scan, it may
>> lower the cost of following joins. So, whether we will get a better
>> plan depends on the selectivity of the implied quals.
>>
> Perhaps we should evaluate the selectivity of the clause, and only add
> them if they seem helpful, based on the cost vs. selectivity?
>
> At least in this case from the regression tests:
>
>  explain (costs off)
>>select * from ec0 a, ec1 b
>>where a.ff = b.ff and a.ff = 43::bigint::int8alias1;
>> - QUERY PLAN
>> --
>> +  QUERY PLAN
>> +--
>>   Nested Loop
>> ->  Index Scan using ec0_pkey on ec0 a
>>   Index Cond: (ff = '43'::int8alias1)
>> ->  Index Scan using ec1_pkey on ec1 b
>>   Index Cond: (ff = a.ff)
>> - Filter: (f1 < '5'::int8alias1)
>> + Filter: ((f1 < '5'::int8alias1) AND (ff = '43'::int8alias1))
>>  (6 rows)
>>
>
> the new qual is redundant with the Index Condition. If we could avoid
> generating such redundant quals, that would be good.


Nice point. I am not sure how complex to evaluate the selectivity of the
new qual before applying it. But that deserves a try.


>
> - Heikki
>


Re: Implement predicate propagation for non-equivalence clauses

2018-09-05 Thread Richard Guo
Hi,

On Wed, Sep 5, 2018 at 2:56 PM, Tom Lane  wrote:

> Richard Guo  writes:
> > In this patch, we are trying to do the similar deduction, from
> > non-equivalence
> > clauses, that is, A=B AND f(A) implies A=B AND f(A) and f(B), under some
> > restrictions on f.
>
> Uh, *what* restrictions on f()?  In general the above equivalence
> does not hold, at least not for any data type more complicated than
> integers; and we do not have any semantic model for deciding
> which functions it would be correct for.
>

Exactly! The operator in f() should be at least in the same opfamily as the
equivalence class containing A,B.
Besides, as far as I can consider, the clause in f() should not
contain volatile
functions or subplans. Not sure
if these restrictions are enough to make it safe.


> One simple example to show what I'm talking about is that float8 zero
> and minus zero are equal according to float8eq (assuming IEEE float
> arithmetic); but they aren't equivalent for any function f() that is
> sensitive to the sign or the text representation of the value.
> The numeric data type likewise has values that are "equal" without
> being identical for all purposes, eg 0.0 vs 0.000.  Or consider
> citext.
>

Thanks for the example. Heikki materialized this example as:

create table a (f float8);
create table b (f float8);

insert into a values ('0'), ('-0');
insert into b values ('0'), ('-0');

select * from a, b where a.f = b.f and a.f::text <> '-0';

And run that query, this patch would give wrong result. Will address this
in v2.


> The existing planner deduction rules for equivalence classes are
> carefully designed to ensure that we only generate derived clauses
> using operators from the same operator class or family, so that
> it's on the opclass author to ensure that the operators have consistent
> semantics.  I don't think we can extrapolate from that to any random
> function that accepts the datatype.






> regards, tom lane
>


Re: pgsql: Allow extensions to install built as well as unbuilt headers.

2018-09-05 Thread Michael Paquier
Hi Andrew,

On Wed, Sep 05, 2018 at 09:45:49PM +, Andrew Gierth wrote:
> Allow extensions to install built as well as unbuilt headers.
> 
> Commit df163230b overlooked the case that an out-of-tree extension
> might need to build its header files (e.g. via ./configure). If it is
> also doing a VPATH build, the HEADERS_* rules in the original commit
> would then fail to find the files, since they would be looking only
> under $(srcdir) and not in the build directory.
> 
> Fix by adding HEADERS_built and HEADERS_built_$(MODULE) which behave
> like DATA_built in that they look in the build dir rather than the
> source dir (and also make the files dependencies of the "all" target).
> 
> No Windows support appears to be needed for this, since it is only
> relevant to out-of-tree builds (no support exists in Mkvcbuild.pm to
> build extension header files in any case).

prairiedog is unhappy with this commit:
make -C ../../../contrib/spi
../../src/makefiles/pgxs.mk:140: Extraneous text after `else' directive
../../src/makefiles/pgxs.mk:144: *** only one `else' per conditional.
Stop.
make[2]: *** [submake-contrib-spi] Error 2
make[2]: *** Waiting for unfinished jobs
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog&dt=2018-09-05%2023%3A21%3A01
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Allow extensions to install built as well as unbuilt headers.

2018-09-05 Thread Andrew Gierth
> "Michael" == Michael Paquier  writes:

 Michael> prairiedog is unhappy with this commit:

What version of GNU Make is on there, do you know? Tom? I don't see it
mentioned in the output anywhere.

-- 
Andrew.



Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2018-09-05 Thread Kyotaro HORIGUCHI
Hello.

At Wed, 05 Sep 2018 20:02:04 +0900, Etsuro Fujita  
wrote in <5b8fb7ac.5020...@lab.ntt.co.jp>
> (2018/08/30 21:58), Etsuro Fujita wrote:
> > (2018/08/30 20:37), Kyotaro HORIGUCHI wrote:
> >> At Fri, 24 Aug 2018 21:45:35 +0900, Etsuro
> >> Fujita wrote
> >> in<5b7ffdef.6020...@lab.ntt.co.jp>
> >>> (2018/08/21 11:01), Kyotaro HORIGUCHI wrote:
>  At Tue, 14 Aug 2018 20:49:02 +0900, Etsuro
>  Fujita wrote
>  in<5b72c1ae.8010...@lab.ntt.co.jp>
> > (2018/08/09 22:04), Etsuro Fujita wrote:
> >> (2018/08/08 17:30), Kyotaro HORIGUCHI wrote:
> >>>
> > I spent more time looking at the patch. ISTM that the patch well
> > suppresses the effect of the tuple-descriptor expansion by making
> > changes to code in the planner and executor (and ruleutils.c), but I'm
> > still not sure that the patch is the right direction to go in, because
> > ISTM that expanding the tuple descriptor on the fly might be a wart.
> >>>
>  The exapansion should be safe if the expanded descriptor has the
>  same defitions for base columns and all the extended coulumns are
>  junks. The junk columns should be ignored by unrelated nodes and
>  they are passed safely as far as ForeignModify passes tuples as
>  is from underlying ForeignScan to ForeignUpdate/Delete.
> >>>
> >>> I'm not sure that would be really safe. Does that work well when
> >>> EvalPlanQual, for example?
> 
> I was wrong here; I assumed here that we supported late locking for an
> UPDATE or DELETE on a foreign table, and I was a bit concerned that
> the approach you proposed might not work well with EvalPlanQual, but
> as described in fdwhandler.sgml, the core doesn't support for that:
> 
>  For an UPDATE or DELETE on a
>  foreign table, it
>  is recommended that the ForeignScan operation on
>  the target
>  table perform early locking on the rows that it fetches, perhaps via
>  the
>  equivalent of SELECT FOR UPDATE.  An FDW can detect
>  whether
>  a table is an UPDATE/DELETE
>  target at plan time
>  by comparing its relid to
>  root->parse->resultRelation,
>  or at execution time by using
>  ExecRelationIsTargetRelation().
>  An alternative possibility is to perform late locking within the
>  ExecForeignUpdate or
>  ExecForeignDelete
>  callback, but no special support is provided for this.
> 
> So, there would be no need to consider about EvalPlanQual.  Sorry for
> the noise.

I don't think it is a noise at all. Thank you for the pointer.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: pgsql: Allow extensions to install built as well as unbuilt headers.

2018-09-05 Thread Michael Paquier
On Thu, Sep 06, 2018 at 06:26:51AM +0100, Andrew Gierth wrote:
> What version of GNU Make is on there, do you know? Tom? I don't see it
> mentioned in the output anywhere.

I don't know it.  What I can see is that the use of "else ifdef" is
forbidden.  You could bypass the problem with more ifdef-only tweaks,
but it seems to me that it would be cleaner to somewhat reduce the level
of dependency between all the different header concepts.

By the way, when you use at least two layers in ifdef/endif in pgxs.mk,
putting a comment close to the endif to mention which part gets closed
improves readability.
--
Michael


signature.asc
Description: PGP signature


Re: pgsql: Allow extensions to install built as well as unbuilt headers.

2018-09-05 Thread Andrew Gierth
> "Michael" == Michael Paquier  writes:

 >> What version of GNU Make is on there, do you know? Tom? I don't see
 >> it mentioned in the output anywhere.

 Michael> I don't know it.

Consulting old manuals suggests it may have version 3.80 (c. 2002),
which lacks the 'else if..' syntax introduced in 3.81 (c. 2006).

-- 
Andrew (irc:RhodiumToad)



Re: Prevent concurrent DROP SCHEMA when certain objects are being initially created in the namespace

2018-09-05 Thread Jimmy Yih
>
> Something which would be good to have for all those queries is a set of
> isolation tests.  No need for multiple specs, you could just use one
> spec with one session defining all the object types you would like to
> work on.  How did you find this object list?  Did you test all the
> objects available manually?
>
Attached the isolation spec file.  I originally was only going to fix the
simple CREATE TYPE scenario but decided to look up other objects that can
reside in namespaces and ended up finding a handful of others.  I tested
each one manually before and after adding the AccessShareLock acquire on
the schema.

I think that line of thought leads to an enormous increase in locking
> overhead, for which we'd get little if any gain in usability.  So my
> inclination is to make an engineering judgment that we won't fix this.
>
As I was creating this patch, I had similar feelings on the locking
overhead and was curious how others would feel about it as well.

Regards,
Jimmy


On Tue, Sep 4, 2018 at 10:05 PM, Tom Lane  wrote:

> Andres Freund  writes:
> > On September 4, 2018 9:11:25 PM PDT, Tom Lane  wrote:
> >> I think that line of thought leads to an enormous increase in locking
> >> overhead, for which we'd get little if any gain in usability.  So my
> >> inclination is to make an engineering judgment that we won't fix this.
>
> > Haven't we already significantly started down this road, to avoid a lot
> of the "tuple concurrently updated" type errors?
>
> Not that I'm aware of.  We do not take locks on schemas, nor functions,
> nor any other of the object types I mentioned.
>
> > Would expanding this a git further really be that noticeable?
>
> Frankly, I think it would be not so much "noticeable" as "disastrous".
>
> Making the overhead tolerable would require very large compromises
> in coverage, perhaps like "we'll only lock during DDL not DML".
> At which point I'd question why bother.  We've seen no field reports
> (that I can recall offhand, anyway) that trace to not locking these
> objects.
>
> regards, tom lane
>


concurrent-schema-drop.spec
Description: Binary data


Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-05 Thread Peter Eisentraut
On 05/09/2018 02:51, Andres Freund wrote:
> My current proposal is thus to do add a check that does
> #if defined(__clang__) && defined(__i386__) && !defined(__SSE2_MATH__)
> something-that-fails
> #endif
> in an autoconf test, and have configure complain if that
> fails. Something roughly along the lines of
> "Compiling PostgreSQL with clang, on 32bit x86, requires SSE2 support. Use 
> -msse2 or use gcc."

Would this not be invalidated if the bug you have filed gets fixed?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: A strange GiST error message or fillfactor of GiST build

2018-09-05 Thread Kyotaro HORIGUCHI
At Tue, 4 Sep 2018 13:05:55 +0300, Alexander Korotkov 
 wrote in 

> On Tue, Sep 4, 2018 at 12:16 PM Kyotaro HORIGUCHI
>  wrote:
> > I agree that fillfactor should be ignored in certain cases
> > (inserting the first tuple into empty pages or something like
> > that). Even though GiST doesn't need fillfactor at all, it is
> > defined independently from AM instances
> 
> What exactly do you mean?  Yes, fillfactor is defined in StdRdOptions
> struct, which is shared across many access methods.  But some of them
> uses fillfactor, while others don't.  For instance, GIN and BRIN don't
> support fillfactor at all.

Yes. It was with the intent of keeping Andrey's use case. So I
proposed that just disabling it rather than removing the code at
all.

> > and it gives some
> > (undocumented) convenient on testing even on GiST.
> 
> Do you keep in the mind some particular use cases?  If so, could you
> please share them.  It would let me and others understand what
> behavior we need to preserve and why.

I misunderstood the consensus here, I myself don't have a proper
one. The consensus here is:

fillfactor is useless for GiST.
We don't preserve it for Andrey's use case.
No one is objecting to ignoring it here.

Is it right?

Well, I propose the first attached instaed. It just removes
fillfactor handling from GiST.

Apart from the fillfactor removal, we have an internal error for
rootsplit failure caused by too-long tuples, but this should be a
user error message like gistSplit. (index row size %zu exeeds..)

=# create table y as  select cube(array(SELECT random() as a FROM 
generate_series(1,600))) from generate_series(1,1e3,1); 
=# create index on y using gist(cube);
ERROR:  failed to add item to index page in "y_cube_idx"

The attached second patch changes the  message in the case.

ERROR:  size of 2 index rows (9640 bytes) exceeds maximum 8152 of the root page 
for the index "y_cube_idx"

This is one of my first motivation here but now this might be
another issue.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From e5abf14d5d7de4256a7d40a0e5783655a99c2515 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Wed, 5 Sep 2018 09:49:58 +0900
Subject: [PATCH 1/2] Ignore fillfactor in GiST

fillfactor for GiST doesn't work as expected since GiST doesn't
perform sorted bulk insertion to pack pages as tight as
possible. Therefore we remove the fillfactor capability from it. It is
still allowed to be set for backward compatibility but just ignored.
---
 doc/src/sgml/ref/create_index.sgml |  8 +++-
 src/backend/access/common/reloptions.c |  4 ++--
 src/backend/access/gist/gist.c | 13 ++---
 src/backend/access/gist/gistbuild.c| 18 +++---
 src/backend/access/gist/gistutil.c |  5 ++---
 src/include/access/gist_private.h  | 12 +++-
 6 files changed, 23 insertions(+), 37 deletions(-)

diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 3c1223b324..e2a77e0d4d 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -390,7 +390,7 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
 

- The B-tree, hash, GiST and SP-GiST index methods all accept this parameter:
+ The B-tree, hash and SP-GiST index methods all accept this parameter:

 

@@ -412,6 +412,12 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] 
+ 
+   
+ GiST also accepts this parameter just for historical reasons but
+ ignores it.
+   
+ 
 


diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index db84da0678..9c9589a78e 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -186,12 +186,12 @@ static relopt_int intRelOpts[] =
 	{
 		{
 			"fillfactor",
-			"Packs gist index pages only to this percentage",
+			"This is ignored but exists for historical reasons",
 			RELOPT_KIND_GIST,
 			ShareUpdateExclusiveLock	/* since it applies only to later
 		 * inserts */
 		},
-		GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100
+		0, 10, 100
 	},
 	{
 		{
diff --git a/src/backend/access/gist/gist.c b/src/backend/access/gist/gist.c
index 8a42effdf7..5915ab2cf2 100644
--- a/src/backend/access/gist/gist.c
+++ b/src/backend/access/gist/gist.c
@@ -172,7 +172,7 @@ gistinsert(Relation r, Datum *values, bool *isnull,
 		 values, isnull, true /* size is currently bogus */ );
 	itup->t_tid = *ht_ctid;
 
-	gistdoinsert(r, itup, 0, giststate);
+	gistdoinsert(r, itup, giststate);
 
 	/* cleanup */
 	MemoryContextSwitchTo(oldCxt);
@@ -212,7 +212,7 @@ gistinsert(Relation r, Datum *values, bool *isnull,
  * Returns 'true' if the page was split, 'false' otherwise.
  */
 bool
-gistplacetopage(Relation rel, Size freespace, GISTSTATE *giststate,
+gistplacetopage(Relation rel, GISTSTATE *giststate,
 Buffer buffer,
 IndexTuple *itup, int ntup, OffsetNumber o

Re: A strange GiST error message or fillfactor of GiST build

2018-09-05 Thread Andrey Borodin



> 5 сент. 2018 г., в 13:29, Kyotaro HORIGUCHI  
> написал(а):
> 
> We don't preserve it for Andrey's use case.
Just my 2 cents: that was a hacky use case for development reasons. I think 
that removing fillfactor is good idea and your latest patch looks good from my 
POV.

Best regards, Andrey Borodin.


Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

2018-09-05 Thread Alexander Korotkov
On Wed, Sep 5, 2018 at 1:45 AM R, Siva  wrote:
> On Tue, Sep 4, 2018 at 09:16 PM, Alexander Korotkov 
>  wrote:
> > Do you have a test scenario for reproduction of this issue?  We need
> > it to ensure that fix is correct.
>
> Unfortunately, I do not have a way of reproducing this issue.
> So far I have tried a workload consisting of inserts (of the
> same attribute value that is indexed), batch deletes of rows
> and vacuum interleaved with engine crash/restarts.

Issue reproduction and testing is essential for bug fix.  Remember
last time you reported GIN bug [1], after issue reproduction it
appears that we have more things to fix.  I's quite clear for me that
if segment list contains GIN_SEGMENT_INSERT before GIN_SEGMENT_DELETE,
then it might lead to wrong behavior in ginRedoRecompress().  But it's
not yet clear to understand what code patch could lead to such segment
list... I'll explore code more and probably will come with some idea.

Links
[1] https://www.postgresql.org/message-id/flat/1531867212836.63354%40amazon.com

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Bug in to_timestamp().

2018-09-05 Thread Alexander Korotkov
On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
 wrote:
> On Mon, Sep 3, 2018 at 2:30 PM, Alexander Korotkov 
>  wrote:
>>
>> The current version of patch doesn't really distinguish spaces and
>> delimiters in format string in non-FX mode.  So, spaces and delimiters
>> are forming single group.  For me Oracle behavior is ridiculous at
>> least because it doesn't allow cases when input string exactly matches
>> format string.
>>
>> This one fails:
>> SELECT to_timestamp('2018- -01 02', '- -MM DD') FROM dual
>
> Related to below - since this works today it should continue to work.  I was 
> under the wrong impression that we followed their behavior today and were 
> pondering deviating from it because of its ridiculousness.

Right, that's just incompatibility with Oracle behavior, not with our
previous behavior.

>> > The couple of regression tests that change do so for the better.  It would 
>> > be illuminating to set this up as two patches though, one introducing all 
>> > of the new regression tests against the current code and then a second 
>> > patch with the changed behavior with only the affected tests.
>>
>> OK, here you go.  0001-to_timestamp-regression-test-v17.patch
>> introduces changes in regression tests and their output for current
>> master, while 0002-to_timestamp-format-checking-v17.patch contain
>> changes to to_timestamp itself.
>>
>
> From those results the question is how important is it to force the following 
> breakage on our users (i.e., introduce FX exact symbol matching):
>
> SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
> - to_timestamp
> ---
> - Sun Feb 16 00:00:00 1997 PST
> -(1 row)
> -
> +ERROR:  unexpected character "/", expected character ":"
> +HINT:  In FX mode, punctuation in the input string must exactly match the 
> format string.
>
> There seemed to be some implicit approvals of this breakage some 30 emails 
> and 10 months ago but given that this is the only change from a correct 
> result to a failure I'd like to officially put it out there for opinion/vote 
> gathering.   Mine is a -1; though keeping the distinction between space and 
> non-alphanumeric characters is expected.

Do I understand correctly that you're -1 to changes to FX mode, but no
objection to changes in non-FX mode?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



PostgreSQL does not start when max_files_per_process> 1200 on Windows 7

2018-09-05 Thread Victor Spirin

Hi,

I have a bug in Windows 7 with max_files_per_process> 1200.

Using dup (0) in the  function count_usable_fds more than 1200 times (0 
= stdin) causes further unpredictable errors with file operations.


When I open a real file and use its descriptor for the dup, no error 
occurs. In the patch I check the file postgresql.conf



--
Victor Spirin
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 8dd51f1..79054e5 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -828,6 +828,15 @@ count_usable_fds(int max_to_probe, int *usable_fds, int 
*already_open)
ereport(WARNING, (errmsg("getrlimit failed: %m")));
 #endif /* HAVE_GETRLIMIT */
 
+   int fd_test = 0;
+#ifdef WIN32
+   /*
+* we have error on Windows7 with max_files_per_process > 1200 when 
dup(0) - stdin
+* make test on postgresql.conf file
+*/
+   fd_test = _open(ConfigFileName, _O_RDONLY);
+   if (fd_test < 0) fd_test = 0; /* if was error then = stdin */
+#endif
/* dup until failure or probe limit reached */
for (;;)
{
@@ -843,7 +852,7 @@ count_usable_fds(int max_to_probe, int *usable_fds, int 
*already_open)
break;
 #endif
 
-   thisfd = dup(0);
+   thisfd = dup(fd_test);
if (thisfd < 0)
{
/* Expect EMFILE or ENFILE, else it's fishy */
@@ -869,7 +878,9 @@ count_usable_fds(int max_to_probe, int *usable_fds, int 
*already_open)
/* release the files we opened */
for (j = 0; j < used; j++)
close(fd[j]);
-
+#ifdef WIN32
+   if (fd_test > 0) _close(fd_test);
+#endif
pfree(fd);
 
/*


Re: Collation versioning

2018-09-05 Thread Christoph Berg
Re: Thomas Munro 2018-09-04 

> I was reminded about that by recent news
> about an upcoming glibc/CLDR resync that is likely to affect
> PostgreSQL users (though, I guess, probably only when they do a major
> OS upgrade).

Or replicating/restoring a database to a newer host.

> ... or, on a Debian system using the locales package, like this:
> 
> libc_collation_version_command = 'dpkg -s locales | grep Version: |
> sed "s/Version: //"'

Ugh. This sounds horribly easy to get wrong on the user side. I could
of course put that preconfigured into the Debian packages, but that
would leave everyone not using any of the standard distro packagings
in the rain.

> Does anyone know
> of a way to extract a version string from glibc using existing
> interfaces?  I heard there was an undocumented way but I haven't been
> able to find it -- probably because I was, erm, looking in the
> documentation.

That sounds more robust. Googling around:

https://www.linuxquestions.org/questions/linux-software-2/how-to-check-glibc-version-263103/

#include 
#include 
int main (void) { puts (gnu_get_libc_version ()); return 0; }

$ ./a.out
2.27

Hopefully that version info is fine-grained enough.

Christoph



Re: Problem while updating a foreign table pointing to a partitioned table on foreign server

2018-09-05 Thread Etsuro Fujita

(2018/08/30 21:58), Etsuro Fujita wrote:

(2018/08/30 20:37), Kyotaro HORIGUCHI wrote:

At Fri, 24 Aug 2018 21:45:35 +0900, Etsuro
Fujita wrote
in<5b7ffdef.6020...@lab.ntt.co.jp>

(2018/08/21 11:01), Kyotaro HORIGUCHI wrote:

At Tue, 14 Aug 2018 20:49:02 +0900, Etsuro
Fujita wrote
in<5b72c1ae.8010...@lab.ntt.co.jp>

(2018/08/09 22:04), Etsuro Fujita wrote:

(2018/08/08 17:30), Kyotaro HORIGUCHI wrote:



I spent more time looking at the patch. ISTM that the patch well
suppresses the effect of the tuple-descriptor expansion by making
changes to code in the planner and executor (and ruleutils.c), but I'm
still not sure that the patch is the right direction to go in, because
ISTM that expanding the tuple descriptor on the fly might be a wart.



The exapansion should be safe if the expanded descriptor has the
same defitions for base columns and all the extended coulumns are
junks. The junk columns should be ignored by unrelated nodes and
they are passed safely as far as ForeignModify passes tuples as
is from underlying ForeignScan to ForeignUpdate/Delete.


I'm not sure that would be really safe. Does that work well when
EvalPlanQual, for example?


I was wrong here; I assumed here that we supported late locking for an 
UPDATE or DELETE on a foreign table, and I was a bit concerned that the 
approach you proposed might not work well with EvalPlanQual, but as 
described in fdwhandler.sgml, the core doesn't support for that:


 For an UPDATE or DELETE on a 
foreign table, it
 is recommended that the ForeignScan operation 
on the target
 table perform early locking on the rows that it fetches, perhaps 
via the
 equivalent of SELECT FOR UPDATE.  An FDW can 
detect whether
 a table is an UPDATE/DELETE 
target at plan time
 by comparing its relid to 
root->parse->resultRelation,
 or at execution time by using 
ExecRelationIsTargetRelation().

 An alternative possibility is to perform late locking within the
 ExecForeignUpdate or 
ExecForeignDelete

 callback, but no special support is provided for this.

So, there would be no need to consider about EvalPlanQual.  Sorry for 
the noise.


Best regards,
Etsuro Fujita



Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

2018-09-05 Thread Alexander Korotkov
On Wed, Sep 5, 2018 at 12:26 PM Alexander Korotkov
 wrote:
> On Wed, Sep 5, 2018 at 1:45 AM R, Siva  wrote:
> > On Tue, Sep 4, 2018 at 09:16 PM, Alexander Korotkov 
> >  wrote:
> > > Do you have a test scenario for reproduction of this issue?  We need
> > > it to ensure that fix is correct.
> >
> > Unfortunately, I do not have a way of reproducing this issue.
> > So far I have tried a workload consisting of inserts (of the
> > same attribute value that is indexed), batch deletes of rows
> > and vacuum interleaved with engine crash/restarts.
>
> Issue reproduction and testing is essential for bug fix.  Remember
> last time you reported GIN bug [1], after issue reproduction it
> appears that we have more things to fix.  I's quite clear for me that
> if segment list contains GIN_SEGMENT_INSERT before GIN_SEGMENT_DELETE,
> then it might lead to wrong behavior in ginRedoRecompress().  But it's
> not yet clear to understand what code patch could lead to such segment
> list... I'll explore code more and probably will come with some idea.

Aha, I've managed to reproduce this.
1. Apply ginRedoRecompress-asserts.patch, which add assertions to
ginRedoRecompress() detecting past opaque writes.
2. Setup streaming replication.
3. Execute following on the master.
create or replace function test () returns void as $$
declare
i int;
begin
FOR i IN 1..1000 LOOP
  insert into test values ('{1}');
end loop;
end
$$ language plpgsql;
create table test (a int[]);
insert into test (select '{}'::int[] from generate_series(1,1));
insert into test (select '{1}'::int[] from generate_series(1,10));
create index test_idx on test using gin(a) with (fastupdate = off);
delete from test where a = '{}'::int[];
vacuum test;
select test();

So, since we managed to reproduce this, I'm going to test and review your fix.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 7a1e94a1d56..301c527cab9 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -276,6 +276,7 @@ ginRedoRecompress(Page page, ginxlogRecompressDataLeaf *data)
 break;
 
 			case GIN_SEGMENT_INSERT:
+Assert(segptr + newsegsize + szleft < PageGetSpecialPointer(page));
 /* make room for the new segment */
 memmove(segptr + newsegsize, segptr, szleft);
 /* copy the new segment in place */
@@ -285,6 +286,8 @@ ginRedoRecompress(Page page, ginxlogRecompressDataLeaf *data)
 break;
 
 			case GIN_SEGMENT_REPLACE:
+Assert(segptr + newsegsize + (szleft - segsize) < PageGetSpecialPointer(page));
+Assert(segptr + szleft < PageGetSpecialPointer(page));
 /* shift the segments that follow */
 memmove(segptr + newsegsize,
 		segptr + segsize,


Re: [HACKERS] Bug in to_timestamp().

2018-09-05 Thread amul sul
On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
 wrote:
>
> On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
>  wrote:
> > On Mon, Sep 3, 2018 at 2:30 PM, Alexander Korotkov 
> >  wrote:
> >>
> >> The current version of patch doesn't really distinguish spaces and
> >> delimiters in format string in non-FX mode.  So, spaces and delimiters
> >> are forming single group.  For me Oracle behavior is ridiculous at
> >> least because it doesn't allow cases when input string exactly matches
> >> format string.
> >>
> >> This one fails:
> >> SELECT to_timestamp('2018- -01 02', '- -MM DD') FROM dual
> >
> > Related to below - since this works today it should continue to work.  I 
> > was under the wrong impression that we followed their behavior today and 
> > were pondering deviating from it because of its ridiculousness.
>
> Right, that's just incompatibility with Oracle behavior, not with our
> previous behavior.
>
> >> > The couple of regression tests that change do so for the better.  It 
> >> > would be illuminating to set this up as two patches though, one 
> >> > introducing all of the new regression tests against the current code and 
> >> > then a second patch with the changed behavior with only the affected 
> >> > tests.
> >>
> >> OK, here you go.  0001-to_timestamp-regression-test-v17.patch
> >> introduces changes in regression tests and their output for current
> >> master, while 0002-to_timestamp-format-checking-v17.patch contain
> >> changes to to_timestamp itself.
> >>
> >
> > From those results the question is how important is it to force the 
> > following breakage on our users (i.e., introduce FX exact symbol matching):
> >
> > SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
> > - to_timestamp
> > ---
> > - Sun Feb 16 00:00:00 1997 PST
> > -(1 row)
> > -
> > +ERROR:  unexpected character "/", expected character ":"
> > +HINT:  In FX mode, punctuation in the input string must exactly match the 
> > format string.
> >
> > There seemed to be some implicit approvals of this breakage some 30 emails 
> > and 10 months ago but given that this is the only change from a correct 
> > result to a failure I'd like to officially put it out there for 
> > opinion/vote gathering.   Mine is a -1; though keeping the distinction 
> > between space and non-alphanumeric characters is expected.
>
> Do I understand correctly that you're -1 to changes to FX mode, but no
> objection to changes in non-FX mode?
>
Ditto.

Regards,
Amul Sul.



Re: [WIP] [B-Tree] Retail IndexTuple deletion

2018-09-05 Thread Andrey Lepikhov

Hi,

I prepared next version of Background worker (cleaner) based on a retail 
indextuple deletion patch.
This version shows stable behavior on regression tests and pgbench 
workloads.


In this version:
1. Only AccessShareLock are acquired on a cleanup of heap and index 
relations.
2. Some 'aggressive' cleanup strategy introduced - conditional cleanup 
locks not used.

3. Cleanup only an in-memory blocks.
4. The Cleaner calls heap_page_prune() before cleanup a block.

Benchmarks
-

Two factors were evaluated: performance (tps) and relations blowing.

Before each test some rarefaction of pgbench_accounts was modeled by 
deletion 10% of tuples at each block.
Also, I tested normal and Gaussian distribution of queries on 
pgbench_accounts relation.

Autovacuum uses default settings.

Script:
pgbench -i -s 10
psql -c $"DELETE FROM pgbench_accounts WHERE (random() < 0.1);"
psql -c $"VACUUM;"
psql -c $"CREATE INDEX pgbench_accounts_ext ON public.pgbench_accounts 
USING btree (abalance);" &&

pgbench -T 3600 -c 32 -j 8 -M prepared -P 600

NORMAL distribution:
average tps = 1045 (cleaner); = 1077 (autovacuum)

Relations size at the end of test, MB:
pgbench_accounts: 128 (cleaner); 128 (autovacuum)
pgbench_branches: 0.1 (cleaner); 2.1 (autovacuum)
pgbench_tellers: 0.4 (cleaner); 2.8 (autovacuum)
pgbench_accounts_pkey: 21 (cleaner); 43 (autovacuum)
pgbench_accounts_ext: 48 (cleaner); 56 (autovacuum)

Gaussian distribution:
average tps = 213 (cleaner); = 213 (autovacuum)

Relations size at the end of test, MB:
pgbench_accounts: 128 (cleaner); 128 (autovacuum)
pgbench_accounts_ext: 22 (cleaner); 29 (autovacuum)

Conclusions
---
1. For retail indextuple deletion purposes i replaced ItemIdSetDead() by 
ItemIdMarkDead() in heap_page_prune_execute() operation. Hereupon in the 
case of 100% filling of each relation block we get a blowing HEAP and 
index , more or less. When the blocks already have free space, the 
cleaner can delay blowing the heap and index without a vacuum.
2. Cleaner works fine in the case of skewness of access frequency to 
relation blocks.

3. The cleaner does not cause a decrease of performance.

--
Andrey Lepikhov
Postgres Professional
https://postgrespro.com
The Russian Postgres Company
>From b465357b8a884cac3b503ad171976d3afd31f574 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Wed, 5 Sep 2018 10:39:11 +0300
Subject: [PATCH 5/5] Heap and Index cleaner

---
 .../postgres_fdw/expected/postgres_fdw.out|  30 ++--
 contrib/postgres_fdw/sql/postgres_fdw.sql |   4 +-
 src/backend/access/heap/pruneheap.c   |   6 +-
 src/backend/access/nbtree/nbtree.c|  15 +-
 src/backend/access/transam/xact.c |   4 +
 src/backend/catalog/system_views.sql  |  11 ++
 src/backend/commands/vacuumlazy.c |  44 +++--
 src/backend/postmaster/Makefile   |   2 +-
 src/backend/postmaster/pgstat.c   |  36 
 src/backend/postmaster/postmaster.c   | 160 +-
 src/backend/storage/buffer/bufmgr.c   |  60 ++-
 src/backend/storage/buffer/localbuf.c |  24 +++
 src/backend/storage/ipc/ipci.c|   3 +
 src/backend/storage/lmgr/lwlocknames.txt  |   1 +
 src/backend/storage/lmgr/proc.c   |   5 +-
 src/backend/tcop/postgres.c   |  12 ++
 src/backend/utils/adt/pgstatfuncs.c   |   2 +
 src/backend/utils/hash/Makefile   |   2 +-
 src/backend/utils/init/miscinit.c |   3 +-
 src/backend/utils/init/postinit.c |  11 +-
 src/include/commands/progress.h   |  12 ++
 src/include/commands/vacuum.h |   3 +
 src/include/pgstat.h  |  14 +-
 src/include/storage/buf_internals.h   |   1 +
 src/include/storage/bufmgr.h  |   5 +-
 src/include/storage/pmsignal.h|   2 +
 src/test/regress/expected/rules.out   |  18 +-
 src/test/regress/expected/triggers.out|   2 +-
 src/test/regress/input/constraints.source |  12 +-
 src/test/regress/output/constraints.source|  34 ++--
 src/test/regress/sql/rules.sql|   2 +-
 src/test/regress/sql/triggers.sql |   2 +-
 32 files changed, 458 insertions(+), 84 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f5498c62bd..622bea2a7d 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -5496,19 +5496,25 @@ ALTER SERVER loopback OPTIONS (DROP extensions);
 INSERT INTO ft2 (c1,c2,c3)
   SELECT id, id % 10, to_char(id, 'FM0') FROM generate_series(2001, 2010) id;
 EXPLAIN (verbose, costs off)
-UPDATE ft2 SET c3 = 'bar' WHERE postgres_fdw_abs(c1) > 2000 RETURNING *;-- can't be pushed down
-QUERY PLAN
-

Re: [HACKERS] Bug in to_timestamp().

2018-09-05 Thread Alexander Korotkov
On Wed, Sep 5, 2018 at 3:10 PM amul sul  wrote:
> On Wed, Sep 5, 2018 at 3:05 PM Alexander Korotkov
>  wrote:
> > On Wed, Sep 5, 2018 at 1:22 AM David G. Johnston
> >  wrote:
> > > From those results the question is how important is it to force the 
> > > following breakage on our users (i.e., introduce FX exact symbol 
> > > matching):
> > >
> > > SELECT to_timestamp('97/Feb/16', 'FXYY:Mon:DD');
> > > - to_timestamp
> > > ---
> > > - Sun Feb 16 00:00:00 1997 PST
> > > -(1 row)
> > > -
> > > +ERROR:  unexpected character "/", expected character ":"
> > > +HINT:  In FX mode, punctuation in the input string must exactly match 
> > > the format string.
> > >
> > > There seemed to be some implicit approvals of this breakage some 30 
> > > emails and 10 months ago but given that this is the only change from a 
> > > correct result to a failure I'd like to officially put it out there for 
> > > opinion/vote gathering.   Mine is a -1; though keeping the distinction 
> > > between space and non-alphanumeric characters is expected.
> >
> > Do I understand correctly that you're -1 to changes to FX mode, but no
> > objection to changes in non-FX mode?
> >
> Ditto.

So, if no objections for non-FX mode changes, then I'll extract that
part and commit it separately.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: pointless check in RelationBuildPartitionDesc

2018-09-05 Thread Alvaro Herrera
On 2018-Sep-05, Amit Langote wrote:

> On 2018/09/05 1:50, Alvaro Herrera wrote:
> > Proposed patch.  Checking isnull in a elog(ERROR) is important, because
> > the column is not marked NOT NULL.  This is not true for other columns
> > where we simply do Assert(!isnull).
> 
> Looks good.  Thanks for taking care of other sites as well.
> 
> @@ -14705,7 +14705,9 @@ ATExecDetachPartition(Relation rel, RangeVar *name)
> 
>   (void) SysCacheGetAttr(RELOID, tuple, Anum_pg_class_relpartbound,
>  &isnull);
> - Assert(!isnull);
> + if (isnull)
> + elog(ERROR, "null relpartbound for relation %u",
> +  RelationGetRelid(partRel));
> 
> In retrospect, I'm not sure why this piece of code is here at all; maybe
> just remove the SycCacheGetAttr and Assert?

Yeah, good idea, will do.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bug fix for glibc broke freebsd build in REL_11_STABLE

2018-09-05 Thread Tom Lane
Peter Eisentraut  writes:
> On 05/09/2018 02:51, Andres Freund wrote:
>> My current proposal is thus to do add a check that does
>> #if defined(__clang__) && defined(__i386__) && !defined(__SSE2_MATH__)
>> something-that-fails
>> #endif
>> in an autoconf test, and have configure complain if that fails.

> Would this not be invalidated if the bug you have filed gets fixed?

Perhaps, but how would autoconf tell that?  I wouldn't have any confidence
in a runtime test, as it might or might not trigger the bug depending on
all sorts of factors like -O level.

In any case, it seems like "use -msse2 on x86" is good advice from a
performance standpoint even if it doesn't prevent a compiler bug.

One thought is that maybe we should provide a way to override this,
in case somebody really can't or doesn't want to use -msse2, and
is willing to put up with platform-dependent float behavior.

regards, tom lane



Re: Bug in ginRedoRecompress that causes opaque data on page to be overrun

2018-09-05 Thread Alexander Korotkov
On Wed, Sep 5, 2018 at 2:39 PM Alexander Korotkov
 wrote:
> On Wed, Sep 5, 2018 at 12:26 PM Alexander Korotkov
>  wrote:
> > On Wed, Sep 5, 2018 at 1:45 AM R, Siva  wrote:
> > > On Tue, Sep 4, 2018 at 09:16 PM, Alexander Korotkov 
> > >  wrote:
> > > > Do you have a test scenario for reproduction of this issue?  We need
> > > > it to ensure that fix is correct.
> > >
> > > Unfortunately, I do not have a way of reproducing this issue.
> > > So far I have tried a workload consisting of inserts (of the
> > > same attribute value that is indexed), batch deletes of rows
> > > and vacuum interleaved with engine crash/restarts.
> >
> > Issue reproduction and testing is essential for bug fix.  Remember
> > last time you reported GIN bug [1], after issue reproduction it
> > appears that we have more things to fix.  I's quite clear for me that
> > if segment list contains GIN_SEGMENT_INSERT before GIN_SEGMENT_DELETE,
> > then it might lead to wrong behavior in ginRedoRecompress().  But it's
> > not yet clear to understand what code patch could lead to such segment
> > list... I'll explore code more and probably will come with some idea.
>
> Aha, I've managed to reproduce this.
> 1. Apply ginRedoRecompress-asserts.patch, which add assertions to
> ginRedoRecompress() detecting past opaque writes.

It was wrong, sorry.  It appears that I put strict inequality into
asserts, while there should be loose inequality.  Correct version of
patch is attached.  And scenario I've posted isn't really reproducing
the bug...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


ginRedoRecompress-asserts-v2.patch
Description: Binary data


  1   2   >