Re: [HACKERS] parallelize queries containing initplans

2017-08-30 Thread Amit Kapila
On Mon, Aug 21, 2017 at 2:40 PM, Amit Kapila  wrote:
> On Mon, Aug 21, 2017 at 1:44 PM, Haribabu Kommi
>  wrote:
>>
>>
>> Thanks for adding more details. It is easy to understand.
>>
>> I marked the patch as ready for committer in the commitfest.
>>

Rebased the patch.  The output of test case added by the patch is also
slightly changed because of the recent commit
7df2c1f8daeb361133ac8bdeaf59ceb0484e315a.  I have verified that the
new test result is as expected.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


pq_pushdown_initplan_v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support to COMMENT ON DATABASE CURRENT_DATABASE

2017-08-30 Thread Jing Wang
Hi All,

Enclosed please find the patch only for the pg_dump using the 'comment on
current_database' statement.

This patch should be working with the previous patch which is
comment_on_current_database_no_pgdump_v3.patch

Regards,
Jing Wang
Fujitsu Australia


comment_on_current_database_for_pgdump_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade changes can it use CREATE EXTENSION?

2017-08-30 Thread Regina Obe
Sorry for the cross posting on this one, but I think it's important both groups 
are aware.


>> I think this thread covers most of the issues.
>> https://lists.osgeo.org/pipermail/postgis-devel/2017-August/026355.html
>> My thought was is it possible for pg_upgrade to be taught to use CREATE
>> EXENSION if asked?

> We intentionally *don't* do that; pg_dump goes to a lot of trouble to
> duplicate the old extension contents exactly, instead.  There are a bunch
> of corner cases that would fail if we allowed the new installation to
> have different extension contents than the old.  Believe you me, we'd
> rather have just issued CREATE EXTENSION, but it doesn't work.

> Looking quickly at the thread you cite, I wonder how much of this problem
> is caused by including version numbers in the library's .so filename.

Most of it is.  That's why I proposed at least only bumping on major upgrade.  
So postgis 2.4 so would be called postgis-2.so instead of postgis-2.4.so

We would only change on disk format during major in which case pg_upgrade 
wouldn’t work for folks anyway (such as what happened going from PostGIS 1.5 to 
2.0)  


> Have you considered not doing that?  Our experience with maintaining the
> contrib modules is that it's easier to attach a version number to an
> individual function (in its C name, where it's irrelevant to SQL users).
> If you incompatibly upgrade a given function, you can leave a stub behind,
> with the old C symbol, that does nothing but throw an error if called.
> Or you can keep on supporting the old API if it's easy enough; it
> doesn't have to be a library-wide decision.
> Have you considered not doing that?  Our experience with maintaining the
> contrib modules is that it's easier to attach a version number to an
> individual function (in its C name, where it's irrelevant to SQL users).
> If you incompatibly upgrade a given function, you can leave a stub behind,
> with the old C symbol, that does nothing but throw an error if called.
> Or you can keep on supporting the old API if it's easy enough; it
> doesn't have to be a library-wide decision.

People were all worked up about breaking ABI  and also not being able to run 
two different versions of PostGIS in same cluster.
We rarely break ABI and if we did, like you said it wouldn't kill us
to keep the old C name around until we did a major upgrade.

So I'm all for that idea.  I figure we'll rarely need to do that anyway.

It's mostly PostGIS developers like me that need to run two different versions 
of PostGIS in same cluster mostly for regression testing.
Which is why I proposed having a configure switch which is by default off.

Here is my original vote request.
https://lists.osgeo.org/pipermail/postgis-devel/2017-August/026319.html


> My solution of let's not call it postgis-2.4  but just postgis-2  from
> thenceforward for the life of 2 major series because we don't break backward
> compatibility often in a PostGIS minor version got shot down.

> The thread you mention doesn't seem to include any arguments why not
>  to do that.

 >   regards, tom lane


Some people had issue with trying to do that at PostGIS 2.4 right after we 
already released the alpha and are less than a month away from release.
Though technically we haven't released beta yet so I didn't think it was that 
big of a deal.

But I'm willing to wait for PostGIS 2.5 to appease people.

Tom, as always, thanks for being a voice of reason,

Regina








-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] generated columns

2017-08-30 Thread Peter Eisentraut
Here is another attempt to implement generated columns.  This is a
well-known SQL-standard feature, also available for instance in DB2,
MySQL, Oracle.  A quick example:

  CREATE TABLE t1 (
...,
height_cm numeric,
height_in numeric GENERATED ALWAYS AS (height_cm * 2.54)
  );

(This is not related to the recent identity columns feature, other than
the similar syntax and some overlap internally.)

In previous discussions, it has often been a source of confusion whether
these generated columns are supposed to be computed on insert/update and
stored, or computed when read.  The SQL standard is not explicit, but
appears to lean toward stored.  DB2 stores.  Oracle computes on read.
MySQL supports both.  So I target implementing both.  This makes sense:
Both regular views and materialized views have their uses, too.  For the
syntax, I use the MySQL/Oracle syntax of appending [VIRTUAL|STORED].  In
this patch, only VIRTUAL is fully implemented.  I also have STORED kind
of working, but it wasn't fully baked, so I haven't included it here.

Known bugs:

- pg_dump produces a warning about a dependency loop when dumping these.
 Will need to be fixed at some point, but it doesn't prevent anything
from working right now.

Open design issues:

- COPY behavior: Currently, generated columns are automatically omitted
if there is no column list, and prohibited if specified explicitly.
When stored generated columns are implemented, they could be copied out.
 Some user options might be possible here.

- Catalog storage: I store the generation expression in pg_attrdef, like
a default.  For the most part, this works well.  It is not clear,
however, what pg_attribute.atthasdef should say.  Half the code thinks
that atthasdef means "there is something in pg_attrdef", the other half
thinks "column has a DEFAULT expression".  Currently, I'm going with the
former interpretation, because that is wired in quite deeply and things
start to crash if you violate it, but then code that wants to know
whether a column has a traditional DEFAULT expression needs to check
atthasdef && !attgenerated or something like that.

Missing/future functionality:

- STORED variant

- various ALTER TABLE variants

- index support (and related constraint support)

These can be added later once the basics are nailed down.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From f0938109f995adf7b4b0b4adbe652d9881549cee Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 30 Aug 2017 23:38:08 -0400
Subject: [PATCH] Generated columns

This is an SQL-standard feature that allows creating columns that are
computed from expressions rather than assigned, similar to a view but on
a column basis.
---
 doc/src/sgml/catalogs.sgml  |  11 +
 doc/src/sgml/information_schema.sgml|  10 +-
 doc/src/sgml/ref/copy.sgml  |   3 +-
 doc/src/sgml/ref/create_table.sgml  |  46 +++-
 src/backend/access/common/tupdesc.c |   5 +
 src/backend/catalog/genbki.pl   |   3 +
 src/backend/catalog/heap.c  |  93 +--
 src/backend/catalog/index.c |   1 +
 src/backend/catalog/information_schema.sql  |   8 +-
 src/backend/commands/copy.c |  12 +-
 src/backend/commands/indexcmds.c|  24 +-
 src/backend/commands/tablecmds.c|  45 +++-
 src/backend/commands/trigger.c  |  36 +++
 src/backend/commands/typecmds.c |   6 +-
 src/backend/executor/execMain.c |   7 +-
 src/backend/nodes/copyfuncs.c   |   2 +
 src/backend/nodes/equalfuncs.c  |   2 +
 src/backend/nodes/outfuncs.c|   9 +
 src/backend/parser/gram.y   |  26 +-
 src/backend/parser/parse_agg.c  |  11 +
 src/backend/parser/parse_expr.c |   5 +
 src/backend/parser/parse_func.c |  12 +
 src/backend/parser/parse_utilcmd.c  |  87 ++-
 src/backend/replication/logical/worker.c|   2 +-
 src/backend/rewrite/rewriteHandler.c| 122 +-
 src/backend/utils/cache/lsyscache.c |  32 +++
 src/backend/utils/cache/relcache.c  |   1 +
 src/bin/pg_dump/pg_dump.c   |  39 ++-
 src/bin/pg_dump/pg_dump.h   |   1 +
 src/bin/pg_dump/t/002_pg_dump.pl|  39 +++
 src/bin/psql/describe.c |  28 ++-
 src/include/catalog/catversion.h|   2 +-
 src/include/catalog/heap.h  |   5 +-
 src/include/catalog/pg_attribute.h  |  23 +-
 src/include/catalog/pg_class.h  |   2 +-
 src/include/nodes/parsenodes.h  |  12 +-
 src/include/parser/kwlist.h |   2 +
 src/include/parser/parse_node.h

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

2017-08-30 Thread Peter Eisentraut
On 3/20/17 11:03, Ronan Dunklau wrote:
>> Great idea.  This is too late for v10 at this point, but please add it
>> to the next CommitFest so we don't forget about it.
> I know it is too late, and thought that it was too early to add it to the 
> commitfest properly since so many design decisions should be discussed. 
> Thanks 
> for the feedback, I added it.

This patch no longer applies.  Please send an update.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-30 Thread Noah Misch
On Tue, Aug 29, 2017 at 08:44:42PM +0900, Michael Paquier wrote:
> On Mon, Aug 28, 2017 at 8:25 AM, Michael Paquier
>  wrote:
> > Today's run has finished with the same failure:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dangomushi=2017-08-27%2018%3A00%3A13
> > Attached is a patch to make this code path wait that the transaction
> > has been replayed. We could use as well synchronous_commit = apply,
> > but I prefer the solution of this patch with a wait query.
> 
> I have added an open item for this issue for PG10. The 2PC tests in
> src/test/recovery are new in PG10, introduced by 3082098.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Álvaro,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-08-30 Thread Amit Langote
Hi Ashutosh,

Thanks for the comments and sorry that it took me a while to reply here.

On 2017/08/23 20:16, Ashutosh Bapat wrote:
> On Mon, Aug 21, 2017 at 12:07 PM, Amit Langote
>  wrote:
>> I've been working on implementing a way to perform plan-time
>> partition-pruning that is hopefully faster than the current method of
>> using constraint exclusion to prune each of the potentially many
>> partitions one-by-one.  It's not fully cooked yet though.
>>
>> Meanwhile, I thought I'd share a couple of patches that implement some
>> restructuring of the planner code related to partitioned table inheritance
>> planning that I think would be helpful.  They are to be applied on top of
>> the patches being discussed at [1].  Note that these patches themselves
>> don't implement the actual code that replaces constraint exclusion as a
>> method of performing partition pruning.  I will share that patch after
>> debugging it some more.
>>
>> The main design goal of the patches I'm sharing here now is to defer the
>> locking and  opening of leaf partitions in a given partition tree to a
>> point after set_append_rel_size() is called on the root partitioned table.
>>  Currently, AFAICS, we need to lock and open the child tables in
>> expand_inherited_rtentry() only to set the translated_vars field in
>> AppendRelInfo that we create for the child.  ISTM, we can defer the
>> creation of a child AppendRelInfo to a point when it (its field
>> translated_vars in particular) will actually be used and so lock and open
>> the child tables only at such a time.  Although we don't lock and open the
>> partition child tables in expand_inherited_rtentry(), their RT entries are
>> still created and added to root->parse->rtable, so that
>> setup_simple_rel_arrays() knows the maximum number of entries
>> root->simple_rel_array will need to hold and allocate the memory for that
>> array accordingly.   Slots in simple_rel_array[] corresponding to
>> partition child tables will be empty until they are created when
>> set_append_rel_size() is called on the root parent table and it determines
>> the partitions that will be scanned after all.
> 
> The partition pruning can happen only after the quals have been
> distributed to Rels i.e. after deconstruct_jointree(),
> reconsider_outer_join_clauses() and generate_base_implied_equalities()
> have been called. If the goal is to not heap_open() the partitions
> which are pruned, we can't do that in expand_inherited_rtentry(). One
> reason why I think we don't want to heap_open() partition relations is
> to avoid relcache bloat because of opened partition relations, which
> are ultimately pruned. But please note that according to your patches,
> we still need to populate catalog caches to get relkind and reltype
> etc.

Yes, we still hit syscache for *all* partitions.  I haven't yet thought
very hard about avoiding that altogether.

> There are many functions that traverse simple_rel_array[] after it's
> created. Most of them assume that the empty entries in that array
> correspond to non-simple range entries like join RTEs. But now we are
> breaking that assumption. Most of these functions also skip "other"
> relations, so that may be OK now, but I am not sure if it's really
> going to be fine if we keep empty slots in place of partition
> relations. There may be three options here 1. add placeholder
> RelOptInfos for partition relations (may be mark those specially) and
> mark the ones which get pruned as dummy later. 2. Prune the partitions
> before any functions scans simple_rel_array[] or postpone creating
> simple_rel_array till pruning. 3. Examine all the current scanners
> esp. the ones which will be called before pruning to make sure that
> skipping "other" relations is going to be kosher.

Between the point when slots in simple_rel_array are allocated
(setup_simple_rel_arrays) and partition RelOptInfos are actually created
after the partition-pruning step has occurred (set_append_rel_size), it
seems that most places that iterate over simple_rel_array know also to
skip slots containing NULL values.  We might need to document that NULL
means partitions in addition to its current meaning - non-baserels.

>> Patch augments the existing PartitionedChildRelInfo node, which currently
>> holds only the partitioned child rel RT indexes, to carry some more
>> information about the partition tree, which includes the information
>> returned by RelationGetPartitionDispatchInfo() when it is called from
>> expand_inherited_rtentry() (per the proposed patch in [1], we call it to
>> be able to add partitions to the query tree in the bound order).
>> Actually, since PartitionedChildRelInfo now contains more information
>> about the partition tree than it used to before, I thought the struct's
>> name is no longer relevant, so renamed it to PartitionRootInfo and renamed
>> root->pcinfo_list accordingly to prinfo_list.  That seems okay because we
>> only use that node 

Re: [HACKERS] More replication race conditions

2017-08-30 Thread Noah Misch
On Sun, Aug 27, 2017 at 02:32:49AM +, Noah Misch wrote:
> On Fri, Aug 25, 2017 at 12:09:00PM +0200, Petr Jelinek wrote:
> > On 24/08/17 19:54, Tom Lane wrote:
> > > sungazer just failed with
> > > 
> > > pg_recvlogical exited with code '256', stdout '' and stderr 
> > > 'pg_recvlogical: could not send replication command "START_REPLICATION 
> > > SLOT "test_slot" LOGICAL 0/0 ("include-xids" '0', "skip-empty-xacts" 
> > > '1')": ERROR:  replication slot "test_slot" is active for PID 8913148
> > > pg_recvlogical: disconnected
> > > ' at 
> > > /home/nm/farm/gcc64/HEAD/pgsql.build/src/test/recovery/../../../src/test/perl/PostgresNode.pm
> > >  line 1657.
> > > 
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2017-08-24%2015%3A16%3A10
> > > 
> > > Looks like we're still not there on preventing replication startup
> > > race conditions.
> > 
> > Hmm, that looks like "by design" behavior. Slot acquiring will throw
> > error if the slot is already used by somebody else (slots use their own
> > locking mechanism that does not wait). In this case it seems the
> > walsender which was using slot for previous previous step didn't finish
> > releasing the slot by the time we start new command. We can work around
> > this by changing the test to wait perhaps.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Revisiting NAMEDATALEN

2017-08-30 Thread Bruce Momjian
On Mon, Jul  3, 2017 at 11:31:01AM -0700, Emrul wrote:
> Hi hackers,
> 
> This question came up again on Reddit:
> https://www.reddit.com/r/PostgreSQL/comments/6kyyev/i_have_hit_the_table_name_length_limit_a_number/
> and I thought I'd echo it here.
> 
> I totally am on board with short, descriptive names and a good convention. 
> However, there are just so many cases where 63 characters can't
> descriptively describe a column name.  I've been on projects where we have

I am coming in late on this, but just to clarify, the NAMEDATALEN is in
_bytes_, meaning multi-byte names are often less than 63 characters.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] code cleanup empty string initializations

2017-08-30 Thread Peter Eisentraut
In initdb, many global string variables are initialized as empty strings
("") and then checked later with strcmp(), instead of just using NULL.
I think this is probably left over from the shell script conversion.
The style has also spread to pg_basebackup.  So here is a patch to clean
that up, and a second patch to clean up some other a bit confusing
business in initdb that seems like old shell script code.

While looking around, I found some useless empty string initializations
in the ecpg test suite as well, and here is another patch for that.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 13071706e5e82e2bebd68ae6b68af471e1141802 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 30 Aug 2017 22:28:36 -0400
Subject: [PATCH 1/3] Remove useless dead code

---
 .../ecpg/test/expected/pgtypeslib-dt_test.c| 22 +++---
 src/interfaces/ecpg/test/pgtypeslib/dt_test.pgc| 22 +++---
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/src/interfaces/ecpg/test/expected/pgtypeslib-dt_test.c 
b/src/interfaces/ecpg/test/expected/pgtypeslib-dt_test.c
index 00d43915b2..69a605c7e6 100644
--- a/src/interfaces/ecpg/test/expected/pgtypeslib-dt_test.c
+++ b/src/interfaces/ecpg/test/expected/pgtypeslib-dt_test.c
@@ -155,7 +155,7 @@ if (sqlca.sqlcode < 0) sqlprint ( );}
 
/* rdate_defmt_asc() */
 
-   date1 = 0; text = "";
+   date1 = 0;
fmt = "yy/mm/dd";
in = "In the year 1995, the month of December, it is the 25th day";
/*0123456789012345678901234567890123456789012345678901234567890
@@ -166,7 +166,7 @@ if (sqlca.sqlcode < 0) sqlprint ( );}
printf("date_defmt_asc1: %s\n", text);
free(text);
 
-   date1 = 0; text = "";
+   date1 = 0;
fmt = ". dd. ";
in = "12/25/95";
PGTYPESdate_defmt_asc(, fmt, in);
@@ -174,7 +174,7 @@ if (sqlca.sqlcode < 0) sqlprint ( );}
printf("date_defmt_asc2: %s\n", text);
free(text);
 
-   date1 = 0; text = "";
+   date1 = 0;
fmt = "yy/mm/dd";
in = "95/12/25";
PGTYPESdate_defmt_asc(, fmt, in);
@@ -182,7 +182,7 @@ if (sqlca.sqlcode < 0) sqlprint ( );}
printf("date_defmt_asc3: %s\n", text);
free(text);
 
-   date1 = 0; text = "";
+   date1 = 0;
fmt = "yy/mm/dd";
in = "1995, December 25th";
PGTYPESdate_defmt_asc(, fmt, in);
@@ -190,7 +190,7 @@ if (sqlca.sqlcode < 0) sqlprint ( );}
printf("date_defmt_asc4: %s\n", text);
free(text);
 
-   date1 = 0; text = "";
+   date1 = 0;
fmt = "dd-mm-yy";
in = "This is 25th day of December, 1995";
PGTYPESdate_defmt_asc(, fmt, in);
@@ -198,7 +198,7 @@ if (sqlca.sqlcode < 0) sqlprint ( );}
printf("date_defmt_asc5: %s\n", text);
free(text);
 
-   date1 = 0; text = "";
+   date1 = 0;
fmt = "mmddyy";
in = "Dec. 25th, 1995";
PGTYPESdate_defmt_asc(, fmt, in);
@@ -206,7 +206,7 @@ if (sqlca.sqlcode < 0) sqlprint ( );}
printf("date_defmt_asc6: %s\n", text);
free(text);
 
-   date1 = 0; text = "";
+   date1 = 0;
fmt = "mmm. dd. ";
in = "dec 25th 1995";
PGTYPESdate_defmt_asc(, fmt, in);
@@ -214,7 +214,7 @@ if (sqlca.sqlcode < 0) sqlprint ( );}
printf("date_defmt_asc7: %s\n", text);
free(text);
 
-   date1 = 0; text = "";
+   date1 = 0;
fmt = "mmm. dd. ";
in = "DEC-25-1995";
PGTYPESdate_defmt_asc(, fmt, in);
@@ -222,7 +222,7 @@ if (sqlca.sqlcode < 0) sqlprint ( );}
printf("date_defmt_asc8: %s\n", text);
free(text);
 
-   date1 = 0; text = "";
+   date1 = 0;
fmt = "mm yy   dd.";
in = "12199525";
PGTYPESdate_defmt_asc(, fmt, in);
@@ -230,7 +230,7 @@ if (sqlca.sqlcode < 0) sqlprint ( );}
printf("date_defmt_asc9: %s\n", text);
free(text);
 
-   date1 = 0; text = "";
+   date1 = 0;
fmt = " fierj mm   dd.";
in = "19951225";
PGTYPESdate_defmt_asc(, fmt, in);
@@ -238,7 +238,7 @@ if (sqlca.sqlcode < 0) sqlprint ( );}
printf("date_defmt_asc10: %s\n", text);
free(text);
 
-   date1 = 0; text = "";
+   date1 = 0;
fmt = "mm/dd/yy";
in = "122595";
PGTYPESdate_defmt_asc(, fmt, in);
diff --git a/src/interfaces/ecpg/test/pgtypeslib/dt_test.pgc 
b/src/interfaces/ecpg/test/pgtypeslib/dt_test.pgc
index 768cbd5e6f..35f0b6dda5 100644
--- a/src/interfaces/ecpg/test/pgtypeslib/dt_test.pgc
+++ b/src/interfaces/ecpg/test/pgtypeslib/dt_test.pgc
@@ -81,7 +81,7 @@ main(void)
 
/* rdate_defmt_asc() */
 
-   date1 = 0; text = "";
+   date1 = 0;
fmt = "yy/mm/dd";
in = "In the year 1995, the month of December, it is the 25th day";
/*

Re: [HACKERS] Back-branch release notes up for review

2017-08-30 Thread Noah Misch
On Sat, Aug 26, 2017 at 03:31:12PM -0400, Tom Lane wrote:
> +
> +
> + 
> +  Show foreign tables
> +  in information_schema.table_privileges
> +  view (Peter Eisentraut)
> + 
> +
> + 
> +  All other relevant information_schema views include
> +  foreign tables, but this one ignored them.
> + 
> +
> + 
> +  Since this view definition is installed by initdb,
> +  merely upgrading will not fix the problem.  If you need to fix this
> +  in an existing installation, you can, as a superuser, do this
> +  in psql:
> +
> +BEGIN;
> +DROP SCHEMA information_schema CASCADE;
> +\i SHAREDIR/information_schema.sql
> +COMMIT;
> +
> +  (Run pg_config --sharedir if you're uncertain
> +  where SHAREDIR is.)  This must be repeated in each
> +  database to be fixed.
> + 
> +

"DROP SCHEMA information_schema CASCADE;" will drop objects outside
information_schema that depend on objects inside information_schema.  For
example, this will drop user-defined views if the view query refers to
information_schema.  That's improper in a release-noted update procedure.
This could instead offer a CREATE OR REPLACE VIEW or just hand-wave about the
repaired definition being available in information_schema.sql.

I regret not reading this before today.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] document and use SPI_result_code_string()

2017-08-30 Thread Peter Eisentraut
A lot of semi-internal code just prints out numeric SPI error codes,
which is not very helpful.  We already have an API function
SPI_result_code_string() to convert the codes to a string, so here is a
patch to make more use of that and also document it for external use.

Also included are two patches to clarify that some RI error handling
code that I encountered at the same time.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From fa2783e7c263addd68ced77bcfbad0a10a3117ef Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 30 Aug 2017 22:16:50 -0400
Subject: [PATCH 1/3] Move SPI error reporting out of ri_ReportViolation()

These are two completely unrelated code paths, so it doesn't make sense
to pack them into one function.
---
 src/backend/utils/adt/ri_triggers.c | 29 +++--
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/src/backend/utils/adt/ri_triggers.c 
b/src/backend/utils/adt/ri_triggers.c
index c2891e6fa1..014b204439 100644
--- a/src/backend/utils/adt/ri_triggers.c
+++ b/src/backend/utils/adt/ri_triggers.c
@@ -242,7 +242,7 @@ static void ri_ExtractValues(Relation rel, HeapTuple tup,
 static void ri_ReportViolation(const RI_ConstraintInfo *riinfo,
   Relation pk_rel, Relation fk_rel,
   HeapTuple violator, TupleDesc tupdesc,
-  int queryno, bool spi_err);
+  int queryno);
 
 
 /* --
@@ -2499,7 +2499,7 @@ RI_Initial_Check(Trigger *trigger, Relation fk_rel, 
Relation pk_rel)
ri_ReportViolation(_riinfo,
   pk_rel, fk_rel,
   tuple, tupdesc,
-  RI_PLAN_CHECK_LOOKUPPK, 
false);
+  RI_PLAN_CHECK_LOOKUPPK);
}
 
if (SPI_finish() != SPI_OK_FINISH)
@@ -3147,11 +3147,13 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
elog(ERROR, "SPI_execute_snapshot returned %d", spi_result);
 
if (expect_OK >= 0 && spi_result != expect_OK)
-   ri_ReportViolation(riinfo,
-  pk_rel, fk_rel,
-  new_tuple ? new_tuple : 
old_tuple,
-  NULL,
-  qkey->constr_queryno, true);
+   ereport(ERROR,
+   (errcode(ERRCODE_INTERNAL_ERROR),
+errmsg("referential integrity query on \"%s\" 
from constraint \"%s\" on \"%s\" gave unexpected result",
+   RelationGetRelationName(pk_rel),
+   NameStr(riinfo->conname),
+   
RelationGetRelationName(fk_rel)),
+errhint("This is most likely due to a rule 
having rewritten the query.")));
 
/* XXX wouldn't it be clearer to do this part at the caller? */
if (qkey->constr_queryno != RI_PLAN_CHECK_LOOKUPPK_FROM_PK &&
@@ -3161,7 +3163,7 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
   pk_rel, fk_rel,
   new_tuple ? new_tuple : 
old_tuple,
   NULL,
-  qkey->constr_queryno, false);
+  qkey->constr_queryno);
 
return SPI_processed != 0;
 }
@@ -3205,7 +3207,7 @@ static void
 ri_ReportViolation(const RI_ConstraintInfo *riinfo,
   Relation pk_rel, Relation fk_rel,
   HeapTuple violator, TupleDesc tupdesc,
-  int queryno, bool spi_err)
+  int queryno)
 {
StringInfoData key_names;
StringInfoData key_values;
@@ -3216,15 +3218,6 @@ ri_ReportViolation(const RI_ConstraintInfo *riinfo,
AclResult   aclresult;
boolhas_perm = true;
 
-   if (spi_err)
-   ereport(ERROR,
-   (errcode(ERRCODE_INTERNAL_ERROR),
-errmsg("referential integrity query on \"%s\" 
from constraint \"%s\" on \"%s\" gave unexpected result",
-   RelationGetRelationName(pk_rel),
-   NameStr(riinfo->conname),
-   
RelationGetRelationName(fk_rel)),
-errhint("This is most likely due to a rule 
having rewritten the query.")));
-

[HACKERS] CLUSTER command progress monitor

2017-08-30 Thread Tatsuro Yamada
Hi,

Following is a proposal for reporting the progress of CLUSTER command:

It seems that the following could be the phases of CLUSTER processing:

  1. scanning heap
  2. sort tuples
  3. writing new heap
  4. scan heap and write new heap
  5. swapping relation files
  6. rebuild index
  7. performing final cleanup

These phases are based on Rahila's presentation at PGCon 2017
 
(https://www.pgcon.org/2017/schedule/attachments/472_Progress%20Measurement%20PostgreSQL.pdf)
and I added some phases on it.

CLUSTER command may use Index Scan or Seq Scan when scanning the heap.
Depending on which one is chosen, the command will proceed in the
following sequence of phases:

  * Seq Scan
1. scanning heap
2. sort tuples
3. writing new heap
5. swapping relation files
6. rebuild index
7. performing final cleanup

  * Index Scan
4. scan heap and write new heap
5. swapping relation files
6. rebuild index
7. performing final cleanup

The view provides the information of CLUSTER command progress details as follows
postgres=# \d pg_stat_progress_cluster
   View "pg_catalog.pg_stat_progress_cluster"
   Column|  Type   | Collation | Nullable | Default
-+-+---+--+-
 pid | integer |   |  |
 datid   | oid |   |  |
 datname | name|   |  |
 relid   | oid |   |  |
 phase   | text|   |  |
 scan_method | text|   |  |
 scan_index_relid| bigint  |   |  |
 heap_tuples_total   | bigint  |   |  |
 heap_tuples_scanned | bigint  |   |  |


Then I have questions.

  * Should we have separate views for them?  Or should both be covered by the
same view with some indication of which command (CLUSTER or VACUUM FULL)
is actually running?
I mean this progress monitor could be covering not only CLUSTER command but 
also
VACUUM FULL command.

  * I chose tuples as scan heap's counter (heap_tuples_scanned) since it's not
easy to get current blocks from Index Scan. Is it Ok?


I'll add this patch to CF2017-09.
Any comments or suggestion are welcome.

Regards,
Tatsuro Yamada
NTT Open Source Software Center
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5575c2c..18fe2c6 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -332,6 +332,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_progress_clusterpg_stat_progress_cluster
+  One row for each backend running
+   CLUSTER, showing current progress.
+   See .
+  
+ 
+
 

   
@@ -3229,9 +3237,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
   
PostgreSQL has the ability to report the progress of
-   certain commands during command execution.  Currently, the only command
-   which supports progress reporting is VACUUM.  This may be
-   expanded in the future.
+   certain commands during command execution.  Currently, the suppoted 
+   progress reporting commands are VACUUM and CLUSTER.
+   This may be expanded in the future.
   
 
  
@@ -3423,6 +3431,157 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
 
  
+
+ 
+  CLUSTER Progress Reporting
+
+  
+   Whenever CLUSTER is running, the
+   pg_stat_progress_cluster view will contain
+   one row for each backend that is currently clustering. 
+   The tables below describe the information that will be reported and
+   provide information about how to interpret it.
+  
+
+  
+   pg_stat_progress_cluster View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+
+ datid
+ oid
+ OID of the database to which this backend is connected.
+
+
+ datname
+ name
+ Name of the database to which this backend is connected.
+
+
+ relid
+ oid
+ OID of the table being clustered.
+
+
+ phase
+ text
+ 
+   Current processing phase of cluster.  See .
+ 
+
+
+ scan_method
+ text
+ 
+   Scan method of table: index scan/seq scan.
+ 
+
+
+ scan_index_relid
+ bigint
+ 
+   OID of the index.
+ 
+
+
+ heap_tuples_total
+ bigint
+ 
+   Total number of heap tuples in the table.  This number is reported
+   as of the beginning of the scan; tuples added later will not be (and
+   need not be) visited by this CLUSTER.
+ 
+
+
+ heap_tuples_scanned
+ bigint
+ 
+   Number of heap tuples scanned.
+   This counter only advances when the phase is scanning heap, 
+   writing new heap and scan heap and write new heap.
+ 
+
+   
+   
+  
+
+  

Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-30 Thread Masahiko Sawada
On Wed, Aug 30, 2017 at 3:39 PM, Fabien COELHO  wrote:
>
> Hello,
>
>>> Hmmm. The existing "is_no_vacuum" variable is typed *both* as int (in
>>> "main") and as bool (in "init"), called by main (yuk!). I see no reason
>>> to
>>> choose the bad one for the global:-)
>>
>>
>> Yeah, I think this might be a good timing to re-consider int-for-bool
>> habits in pgbench. If we decided to change is_no_vacuum to bool I want
>> to change other similar variables as well.
>
>
> Franckly I would be fine with that, but committers might get touchy about
> "unrelated changes" in the patch... The "is_no_vacuum" is related to the
> patch and is already a bool -- if you chose the "init" definition as a
> reference -- so it is okay to bool it.

Okay, I changed only is_no_vacuum in this patch and other similar
variables would be changed in another patch.

>
>>> I think that the "-I" it should be added to the "--help" line, as it is
>>> done
>>> with other short & long options.
>>
>>
>> Okay, I'll leave it as of now. Maybe we can discuss later.
>
>
> Maybe we did not understand one another. I'm just suggesting to insert
> -I in the help line, that is change:
>
>"  --custom-initialize=[...]+\n"
>
> to
>
>"  -I, --custom-initialize=[...]+\n"

Fixed.

> I'm not sure it deserves to be discussed in depth later:-)

Sorry, I meant about having short option --custom-initialize.

>
>>> I wonder if this could be avoided easily? Maybe by setting the constraint
>>> name explicitely so that the second one fails on the existing one, which
>>> is
>>> fine, like for primary keys? [...]
>>
>>
>> Good point, I agree with first option.
>
>
> Ok.
>
>>> Maybe the initial cleanup (DROP TABLE) could be made an option added to
>>> the
>>> default, so that cleaning up the database could be achieved with some
>>> "pgbench -i -I c", instead of connecting and droping the tables one by
>>> one
>>> which I have done quite a few times... What do you think?
>>
>>
>> Yeah, I sometimes wanted that. Having the cleaning up tables option
>> would be good idea.
>
>
> Ok.
>
>> I'd say "g" for data generation would be better. Also, I'm inclined to
>> add a command for the unlogged tables. How about this?
>>
>> c - [c]leanup / or [d]rop tables
>> t - create table / [t]able creation or [c]reate table
>> u - create unlogged table
>> g - data generation / [g]enerate data
>> p - [p]rimary key
>> f - [f]oreign key
>> v - [v]acuum
>
>
> I'm okay with that. I also put an alternative with d/c above, without any
> preference from my part.

Personally I prefer "t" for table creation because "c" for create is a
generic word. We might want to have another initialization command
that creates something.

>
> I'm not sure about "u", though. Unlogged, like tablespace, is an orthogonal
> option: other table creation options (I intend to submit one which conforms
> to the TPC-B standard, that is use an INT8 balance as INT4 is not wide
> enough per spec, and always use an INT8 aid) may be also unlogged or
> tablespaced. So that would mean having two ways to trigger them... thus I
> would avoid it and keep only --unlogged.
>

Yeah, I think I had misunderstood it. -I option is for specifying some
particular initialization steps. So we don't need to have a command as
a option for other initializatoin commands.

Attached latest patch. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 5:38 PM, Robert Haas  wrote:
> Wow.  Just to be clear, I am looking for the BEST case for replacement
> selection, not the worst case.  But I would have expected that case to
> be a win for replacement selection, and it clearly isn't.  I can
> reproduce your results here.

But I *was* trying to get a best case. That's why it isn't even worse.
That's what the docs say the best case is, after all.

This is the kind of thing that replacement selection actually did do
better with on 9.6. I clearly remember Tomas Vondra doing lots of
benchmarking, showing some benefit with RS with a work_mem of 8MB or
less. As I said in my introduction on this thread, we weren't wrong to
add replacement_sort_tuples to 9.6, given where things were with
merging at the time. But, it does very much appear to create less than
zero benefit these days. The picture changed.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 6:14 PM, Peter Geoghegan  wrote:
> On Wed, Aug 30, 2017 at 3:01 PM, Robert Haas  wrote:
>> That may all be true, but my point is that if it wins in some cases,
>> we should keep it -- and proving it no longer wins in those cases will
>> require running tests.
>
> That's not hard. On my laptop:
>
> $ pgbench -i -s 100
> ...
>
> postgres=# set work_mem = '2MB';
> SET
> postgres=# show replacement_sort_tuples ;
>  replacement_sort_tuples
> ─
>  15
> (1 row)
> (30784) /postgres=# select count(distinct aid) from pgbench_accounts ;
>count
> 
>  10,000,000
> (1 row)
>
> Time: 4157.267 ms (00:04.157)
> (30784) /postgres=# set replacement_sort_tuples = 0;
> SET
> (30784) /postgres=# select count(distinct aid) from pgbench_accounts ;
>count
> 
>  10,000,000
> (1 row)
>
> Time: 3343.789 ms (00:03.344)
>
> This is significantly faster, in a way that's clearly reproducible and
> consistent, despite the fact that we need about 10 merge passes
> without replacement selection, and only have enough memory for 7
> tapes. I think that I could find a case that makes replacement
> selection look much worse, if I tried.

Wow.  Just to be clear, I am looking for the BEST case for replacement
selection, not the worst case.  But I would have expected that case to
be a win for replacement selection, and it clearly isn't.  I can
reproduce your results here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update low-level backup documentation to match actual behavior

2017-08-30 Thread Michael Paquier
On Wed, Aug 30, 2017 at 8:02 PM, David Steele  wrote:
> On 8/29/17 9:44 PM, Michael Paquier wrote:
>> On Tue, Aug 29, 2017 at 10:59 PM, David Steele  wrote:
>>>
>>> Attached is the 9.6 patch.  It required a bit more work in func.sgml
>>> than I was expecting so have a close look at that.  The rest was mostly
>>> removing irrelevant hunks.
>>
>> + switch to the next WAL segment.  On a standby, it is not possible to
>> + automatically switch WAL segments, so you may wish to run
>> + pg_switch_wal on the primary to perform a manual
>> + switch. The reason for the switch is to arrange for
>> [...]
>> +WAL segments have been archived. If write activity on the primary
>> is low, it
>> +may be useful to run pg_switch_wal on the primary in order 
>> to
>> +trigger an immediate segment switch of the last required WAL
>> It seems to me that both portions are wrong. There is no archiving
>> wait on standbys for 9.6, and
> I think its clearly stated here that pg_stop_backup() does not wait for
> WAL to archive on a standby.  Even, it is very important for the backup
> routine to make sure that all the WAL *is* archived.

Yes, it seems that I somewhat missed the "on the primary portion"
during the first read of the patch.

>> pg_stop_backup triggers by itself the
>> segment switch, so saying that enforcing pg_switch_wal on the primary
>> is moot.
>
> pg_stop_backup() does not perform a WAL switch on the standby which is
> what this sentence is referring to.  I have separated this section out
> into a new paragraph to (hopefully) make it clearer.
>
>> pg_switch_xlog has been renamed to pg_switch_wal in PG10, so
>> the former name applies.
>
> Whoops!
>
> New patch is attached.

Thanks for the new version. This looks fine to me.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 3:14 PM, Peter Geoghegan  wrote:
> This is significantly faster, in a way that's clearly reproducible and
> consistent, despite the fact that we need about 10 merge passes
> without replacement selection, and only have enough memory for 7
> tapes. I think that I could find a case that makes replacement
> selection look much worse, if I tried.

Just to see where we end up, I quickly wrote a patch to remove
replacement selection + replacement_sort_tuples. The LOC impact worked
out like this:

$ git diff master --stat
 doc/src/sgml/config.sgml  |  39 
 src/backend/utils/init/globals.c  |   1 -
 src/backend/utils/misc/guc.c  |  10 -
 src/backend/utils/misc/postgresql.conf.sample |   1 -
 src/backend/utils/sort/tuplesort.c| 403
+--
 src/include/miscadmin.h   |   1 -
 src/test/regress/expected/cluster.out |  17 +-
 src/test/regress/sql/cluster.sql  |  14 +-
 8 files changed, 52 insertions(+), 434 deletions(-)

It was nice to be able to remove comments that make certain
distinctions that replacement selection cares about. These were
tedious to read. I managed to remove several paragraphs within
tuplesort.c's header comments.

Another advantage of the changes made that I noticed in passing is
that SortTuple.tupindex is now only used while merging. It would be
quite possible to remove SortTuple.tupindex entirely, as an additional
piece of work, by providing some other indirection to get that
information when merging. From there, we could replace SortTuple.tuple
with a bitfield, that has one bit for NULLness, and treats other bits
as a 63-bit or 31-bit integer. This integer would be used an offset
into a buffer that we repalloc(), thus removing all retail palloc()s,
even during run generation. Using one big buffer for tuples would make
tuplesort.c quite a bit more memory efficient (SortTuple will only be
16 bytes, we won't waste memory on palloc() headers, and sequential
access is made more cache efficient in presorted pass-by-reference
cases).

I'm not planning to work on this myself. It is similar to something
that Heikki described last year, but goes a bit further by eliminating
all palloc() header overhead, while also not playing tricks with
reclaiming bits from pointers (I recall that Tom didn't like that
aspect of Heikki's proposal at all). There would be new infrastructure
required to make this work -- we'd have to be able to ask routines
like ExecCopySlotMinimalTuple() and heap_copytuple() to copy into our
own large tuple buffer, rather than doing their own palloc() on
tuplesort's behalf. But that seems like a good idea anyway.

I may submit the simple patch to remove replacement selection, if
other contributors are receptive. Apart from everything else, the
"incrementalism" of replacement selection works against cleverer batch
memory management of the type I just outlined, which seems like a
useful direction to take tuplesort in.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-08-30 Thread Michael Paquier
On Thu, Aug 31, 2017 at 8:35 AM, David G. Johnston
 wrote:
> Inspired by the syntax documentation for EXPLAIN:
>
> VACUUM [ ( option [, ...] ) ] [ table_def [, ...] ]
>
> where option can be one of:
> FULL
> FREEZE
> VERBOSE
> DISABLE_PAGE_SKIPPING
>
> and where table_def is:
> table_name [ ( column_name [, ... ] ) ]

Yes, splitting things would be nice with the column list. I need more coffee.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_rewind: Fix some problems when copying files >2GB.

2017-08-30 Thread Michael Paquier
On Thu, Aug 31, 2017 at 8:39 AM, Alvaro Herrera  wrote:
> Michael Paquier wrote:
>> So, perhaps it would be better to fix that before the next point release?
>
> Sure, I'll get it done on Friday, or tomorrow if I can manage it.

Thanks, Álvaro.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_rewind: Fix some problems when copying files >2GB.

2017-08-30 Thread Alvaro Herrera
Michael Paquier wrote:
> On Tue, Aug 29, 2017 at 11:26 AM, Tom Lane  wrote:
> > Michael Paquier  writes:
> >> I don't like breaking the abstraction of pg_log() with the existing
> >> flags with some kind of pg_debug() layer. The set of APIs present now
> >> in pg_rewind for logging is nice to have, and I think that those debug
> >> messages should be translated. So what about the attached?
> >
> > Your point about INT64_FORMAT not necessarily working with fprintf
> > is an outstanding reason not to keep it like it is.  I've not reviewed
> > this patch in detail but I think this is basically the way to fix it.
> 
> So, perhaps it would be better to fix that before the next point release?

Sure, I'll get it done on Friday, or tomorrow if I can manage it.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-08-30 Thread David G. Johnston
On Wed, Aug 30, 2017 at 4:08 PM, Bossart, Nathan 
wrote:

> On 8/30/17, 5:37 PM, "Michael Paquier"  wrote:
> > Yeah... Each approach has its cost and its advantages. It may be
> > better to wait for more opinions, no many people have complained yet
> > that for example a list of columns using twice the same one fails.
>
> Sounds good to me.
>
> > +VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [  > class="PARAMETER">table_name ] [, ...]
> > I just noticed that... But regarding the docs, I think that you have
> > misplaced the position of "[, ...]", which should be inside the
> > table_name portion in the case of what I quote here, no?
>
> I think that's what I had initially, but it was changed somewhere along
> the line.  It is a little more complicated for the versions that accept
> column lists.
>
> VACUUM ... ANALYZE [ [ table_name [ (column_name [, ...] ) ] ] [, ...] ]
>
> ISTM that we need the extra brackets here to clarify that the table and
> column list combination is what can be provided in a list.  Does that
> make sense?  Or do you think we can omit the outermost brackets here?
>

​Inspired by the syntax documentation for EXPLAIN:

​VACUUM [ ( option [, ...] ) ] [ table_def [, ...] ]

where option can be one of:
FULL
FREEZE
VERBOSE
DISABLE_PAGE_SKIPPING

and where table_def is:
table_name [ ( column_name [, ... ] ) ]

David J.


Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-08-30 Thread Bossart, Nathan
On 8/30/17, 5:37 PM, "Michael Paquier"  wrote:
> Yeah... Each approach has its cost and its advantages. It may be
> better to wait for more opinions, no many people have complained yet
> that for example a list of columns using twice the same one fails.

Sounds good to me.

> +VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [  class="PARAMETER">table_name ] [, ...]
> I just noticed that... But regarding the docs, I think that you have
> misplaced the position of "[, ...]", which should be inside the
> table_name portion in the case of what I quote here, no?

I think that's what I had initially, but it was changed somewhere along
the line.  It is a little more complicated for the versions that accept
column lists.

VACUUM ... ANALYZE [ [ table_name [ (column_name [, ...] ) ] ] [, ...] ]

ISTM that we need the extra brackets here to clarify that the table and
column list combination is what can be provided in a list.  Does that
make sense?  Or do you think we can omit the outermost brackets here?

> +VacuumRelation *
> +makeVacuumRelation(RangeVar *relation, List *va_cols, Oid oid)
> +{
> +   VacuumRelation *vacrel = makeNode(VacuumRelation);
> +   vacrel->relation = relation;
> +   vacrel->va_cols = va_cols;
> +   vacrel->oid = oid;
> +   return vacrel;
> +}
> Perhaps in makefuncs.c instead of vacuum.c? That's usually the place
> used for node constructions like that.

Ah, yes.  That is a much better place.  I'll make this change.

Nathan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-08-30 Thread Michael Paquier
On Wed, Aug 30, 2017 at 1:47 AM, Bossart, Nathan  wrote:
> On 8/28/17, 11:26 PM, "Michael Paquier"  wrote:
>> About the de-duplication patch, I have to admit that I am still not a
>> fan of doing such a thing. Another road that we could take is to
>> simply complain with a proper error message if:
>> - the same column name is specified twice for a relation.
>> - the same relation is defined twice. In the case of partitions, we
>> could track the fact that it is already listed as part of a parent,
>> though perhaps it does not seem worth the extra CPU cost especially
>> when there are multiple nesting levels with partitions.
>> Autovacuum has also the advantage, if I recall correctly, to select
>> all columns for analyze, and skip parent partitions when scanning for
>> relations so that's a safe bet from this side. Opinions welcome.
>
> I lean towards favoring the de-duplication patch, but maybe I am biased
> as the author.

I can be biased as reviewer then.

> I can see the following advantages:
>
> 1. Ease of use.  By taking care of de-duplicating on behalf of the user,
> they needn't worry about inheritance structures or accidentally
> specifying the same relation or column twice.  This might be especially
> useful if a large number of relations or columns must be specified.
> 2. Resource conservation.  By de-duplicating, VACUUM and ANALYZE are
> doing roughly the same thing but with less work.
> 3. The obnoxious errors you were experiencing are resolved.  This seems
> like the strongest argument to me, as it fixes an existing issue.
>
> Disadvantages might include:
>
> 1. Users cannot schedule repeated VACUUMs on the same relation (e.g.
> 'VACUUM table, table, table;').  However, I cannot think of a time when
> I needed this, and it seems like something else is wrong with VACUUM if
> folks are resorting to this.  In the end, you could still achieve this
> via several VACUUM statements.
> 2. Any inferred ordering for how the relations are processed will not
> be accurate if there are duplicates.  Ultimately, users might lose some
> amount of control here, but I am not sure how prevalent this use case
> might be.  In the worst case, you could achieve this via several
> individual VACUUM statements as well.
>
> Your suggestion to ERROR seems like a reasonable compromise, but I
> could see it causing frustration in some cases, especially with
> partitioning.

Yeah... Each approach has its cost and its advantages. It may be
better to wait for more opinions, no many people have complained yet
that for example a list of columns using twice the same one fails.

+VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ table_name ] [, ...]
I just noticed that... But regarding the docs, I think that you have
misplaced the position of "[, ...]", which should be inside the
table_name portion in the case of what I quote here, no?

+VacuumRelation *
+makeVacuumRelation(RangeVar *relation, List *va_cols, Oid oid)
+{
+   VacuumRelation *vacrel = makeNode(VacuumRelation);
+   vacrel->relation = relation;
+   vacrel->va_cols = va_cols;
+   vacrel->oid = oid;
+   return vacrel;
+}
Perhaps in makefuncs.c instead of vacuum.c? That's usually the place
used for node constructions like that.

Those are minor tweaks, I'll be fine to move that as ready for
committer after for those points are addressed for the main patch.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_rewind: Fix some problems when copying files >2GB.

2017-08-30 Thread Michael Paquier
On Tue, Aug 29, 2017 at 11:26 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> I don't like breaking the abstraction of pg_log() with the existing
>> flags with some kind of pg_debug() layer. The set of APIs present now
>> in pg_rewind for logging is nice to have, and I think that those debug
>> messages should be translated. So what about the attached?
>
> Your point about INT64_FORMAT not necessarily working with fprintf
> is an outstanding reason not to keep it like it is.  I've not reviewed
> this patch in detail but I think this is basically the way to fix it.

So, perhaps it would be better to fix that before the next point release?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 3:01 PM, Robert Haas  wrote:
> That may all be true, but my point is that if it wins in some cases,
> we should keep it -- and proving it no longer wins in those cases will
> require running tests.

That's not hard. On my laptop:

$ pgbench -i -s 100
...

postgres=# set work_mem = '2MB';
SET
postgres=# show replacement_sort_tuples ;
 replacement_sort_tuples
─
 15
(1 row)
(30784) /postgres=# select count(distinct aid) from pgbench_accounts ;
   count

 10,000,000
(1 row)

Time: 4157.267 ms (00:04.157)
(30784) /postgres=# set replacement_sort_tuples = 0;
SET
(30784) /postgres=# select count(distinct aid) from pgbench_accounts ;
   count

 10,000,000
(1 row)

Time: 3343.789 ms (00:03.344)

This is significantly faster, in a way that's clearly reproducible and
consistent, despite the fact that we need about 10 merge passes
without replacement selection, and only have enough memory for 7
tapes. I think that I could find a case that makes replacement
selection look much worse, if I tried.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade changes can it use CREATE EXTENSION?

2017-08-30 Thread Tom Lane
"Regina Obe"  writes:
> I think this thread covers most of the issues.
> https://lists.osgeo.org/pipermail/postgis-devel/2017-August/026355.html
> My thought was is it possible for pg_upgrade to be taught to use CREATE
> EXENSION if asked? 

We intentionally *don't* do that; pg_dump goes to a lot of trouble to
duplicate the old extension contents exactly, instead.  There are a bunch
of corner cases that would fail if we allowed the new installation to
have different extension contents than the old.  Believe you me, we'd
rather have just issued CREATE EXTENSION, but it doesn't work.

Looking quickly at the thread you cite, I wonder how much of this problem
is caused by including version numbers in the library's .so filename.
Have you considered not doing that?  Our experience with maintaining the
contrib modules is that it's easier to attach a version number to an
individual function (in its C name, where it's irrelevant to SQL users).
If you incompatibly upgrade a given function, you can leave a stub behind,
with the old C symbol, that does nothing but throw an error if called.
Or you can keep on supporting the old API if it's easy enough; it
doesn't have to be a library-wide decision.

> My solution of let's not call it postgis-2.4  but just postgis-2  from
> thenceforward for the life of 2 major series because we don't break backward
> compatibility often in a PostGIS minor version got shot down.

The thread you mention doesn't seem to include any arguments why not
to do that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 4:18 PM, Peter Geoghegan  wrote:
> On Wed, Aug 30, 2017 at 12:51 PM, Robert Haas  wrote:
>> On Fri, Jul 14, 2017 at 6:20 PM, Peter Geoghegan  wrote:
>>> With the additional enhancements made to Postgres 10, I doubt that
>>> there are any remaining cases where it wins.
>>
>> The thing to do about that would be to come up with some cases where
>> someone might plausibly think it would win and benchmark them to find
>> out what happens.  I find it really hard to believe that sorting a
>> long presorted stream of tuples (or, say, 2-1-4-3-6-5-8-7-10-9 etc.)
>> is ever going to be as fast with any other algorithm as it is with
>> replacement selection.
>
> Replacement selection as implemented in Postgres is supposed to be
> about the "single run, no merge" best case. This must use
> TSS_SORTEDONTAPE processing, which is optimized for random access,
> which is usually the wrong thing.
>
> In general, sorting is only one cost that is involved here, and is not
> the predominant cost with presorted input.

That may all be true, but my point is that if it wins in some cases,
we should keep it -- and proving it no longer wins in those cases will
require running tests.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel worker error

2017-08-30 Thread Tom Lane
Alvaro Herrera  writes:
> Robert Haas wrote:
>> In this case,
>> I'll blame the fact that we allow a role to be dropped while there are
>> users connected using that role.

> Actually, my first comment when Pavan mentioned this on IM was that we
> should look into fixing that problem sometime.  It's not terribly urgent
> since it doesn't seem to hurt anything too badly, but it's still a bug.

My feeling is that it's going to be unreasonably expensive.  Are we
going to take a lock every time we call a SECURITY DEFINER function?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_upgrade changes can it use CREATE EXTENSION?

2017-08-30 Thread Regina Obe
I'm not too familiar with the innards of pg_upgrade, but we've been
discussing it a lot for past couple of days and how it's causing issues for
PostGIS upgrades.

I think this thread covers most of the issues.

https://lists.osgeo.org/pipermail/postgis-devel/2017-August/026355.html

My thought was is it possible for pg_upgrade to be taught to use CREATE
EXENSION if asked? 

Right now we don't support PostgreSQL 11 on PostGIS 2.3 and we really would
like not to because there are too many changes done in 11 that we feel
queezy about backporting.
Even if we did, package maintainers would have to provide 2.3 on 11 and 2.4
on 11 just so people can pg_upgrade to PostgreSQL 11 and then 

ALTER EXTESNION postgis UPDATE;

To postgis 2.4.0

Given that latest PostgreSQL 11 head already doesn't compile against PostGIS
2.4, I'm not confident we can fix 2.4 for 11.  So this will continue to be
more of a problem especially at the rate that PostgreSQL is changing these
days.


Right now crafty users have to do something like this to use pg_upgrade

https://gist.github.com/Komzpa/994d5aaf340067ccec0e

My solution of let's not call it postgis-2.4  but just postgis-2  from
thenceforward for the life of 2 major series because we don't break backward
compatibility often in a PostGIS minor version got shot down.


Any thoughts on this?


Thanks,
Regina Obe
PostGIS PSC member



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel worker error

2017-08-30 Thread Alvaro Herrera
Robert Haas wrote:

> In this case,
> I'll blame the fact that we allow a role to be dropped while there are
> users connected using that role.  That's about as sensible as allowing
> a table to be dropped while there are users reading from it, or a
> database to be dropped while there are users connected to it, both of
> which we in fact prohibit.  IOW, I'd say the problem is that we've
> allowed the system state to become inconsistent and now we're sad
> because stuff doesn't work.
> 
> But since that's an established design fl^H^Hprinciple,

Actually, my first comment when Pavan mentioned this on IM was that we
should look into fixing that problem sometime.  It's not terribly urgent
since it doesn't seem to hurt anything too badly, but it's still a bug.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Polyphase merge is obsolete

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 12:48 PM, Robert Haas  wrote:
> These are separate topics.  They should each be discussed on their own
> thread.  Please don't hijack this thread to talk about something else.

I don't think that that is a fair summary.

Heikki has done a number of things in this area, of which this is only
the latest. I'm saying "hey, have you thought about RS too?". Whether
or not I'm "hijacking" this thread is, at best, ambiguous.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 12:51 PM, Robert Haas  wrote:
> On Fri, Jul 14, 2017 at 6:20 PM, Peter Geoghegan  wrote:
>> With the additional enhancements made to Postgres 10, I doubt that
>> there are any remaining cases where it wins.
>
> The thing to do about that would be to come up with some cases where
> someone might plausibly think it would win and benchmark them to find
> out what happens.  I find it really hard to believe that sorting a
> long presorted stream of tuples (or, say, 2-1-4-3-6-5-8-7-10-9 etc.)
> is ever going to be as fast with any other algorithm as it is with
> replacement selection.

Replacement selection as implemented in Postgres is supposed to be
about the "single run, no merge" best case. This must use
TSS_SORTEDONTAPE processing, which is optimized for random access,
which is usually the wrong thing.

In general, sorting is only one cost that is involved here, and is not
the predominant cost with presorted input.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-08-30 Thread Robert Haas
On Fri, Jul 14, 2017 at 6:20 PM, Peter Geoghegan  wrote:
> With the additional enhancements made to Postgres 10, I doubt that
> there are any remaining cases where it wins.

The thing to do about that would be to come up with some cases where
someone might plausibly think it would win and benchmark them to find
out what happens.  I find it really hard to believe that sorting a
long presorted stream of tuples (or, say, 2-1-4-3-6-5-8-7-10-9 etc.)
is ever going to be as fast with any other algorithm as it is with
replacement selection.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Polyphase merge is obsolete

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 2:54 PM, Peter Geoghegan  wrote:
> I noticed that this is in the upcoming CF 1 for v11. I'm signed up to review.
>
> I'd like to point out that replacement selection is also obsolete,
> which is something I brought up recently [1]. I don't actually have
> any feature-driven reason to want to kill replacement selection - it's
> just an annoyance at this point. I do think that RS is more deserving
> of being killed than Polyphase merge, because it actually costs users
> something to continue to support it. The replacement_sort_tuples GUC
> particularly deserves to be removed.
>
> It would be nice if killing RS was put in scope here. I'd appreciate
> it, at least, since it would simplify the heap routines noticeably.
> The original analysis that led to adding replacement_sort_tuples was
> based on certain performance characteristics of merging that have
> since changed by quite a bit, due to our work for v10.

These are separate topics.  They should each be discussed on their own
thread.  Please don't hijack this thread to talk about something else.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expanding inheritance in partition bound order

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 12:47 PM, Ashutosh Bapat
 wrote:
> +1. I think we should just pull out the OIDs from partition descriptor.

Like this?  The first patch refactors the expansion of a single child
out into a separate function, and the second patch implements EIBO on
top of it.

I realized while doing this that we really want to expand the
partitioning hierarchy depth-first, not breadth-first.  For some
things, like partition-wise join in the case where all bounds match
exactly, we really only need a *predictable* ordering that will be the
same for two equi-partitioned tables.  A breadth-first expansion will
give us that.  But it's not actually in bound order.  For example:

create table foo (a int, b text) partition by list (a);
create table foo1 partition of foo for values in (2);
create table foo2 partition of foo for values in (1) partition by range (b);
create table foo2a partition of foo2 for values from ('b') to ('c');
create table foo2b partition of foo2 for values from ('a') to ('b');
create table foo3 partition of foo for values in (3);

The correct bound-order expansion of this is foo2b - foo2a - foo1 -
foo3, which is indeed what you get with the attached patch.  But if we
did the expansion in breadth-first fashion, we'd get foo1 - foo3 -
foo2a, foo2b, which is, well, not in bound order.  If the idea is that
you see a > 2 and rule out all partitions that appear before the first
one with an a-value >= 2, it's not going to work.

Mind you, that idea has some problems anyway in the face of default
partitions, null partitions, and list partitions which accept
non-contiguous values (e.g. one partition for 1, 3, 5; another for 2,
4, 6).  We might need to mark the PartitionDesc to indicate whether
PartitionDesc-order is in fact bound-order in a particular instance,
or something like that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


0001-expand_single_inheritance_child.patch
Description: Binary data


0002-EIBO.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Polyphase merge is obsolete

2017-08-30 Thread Peter Geoghegan
On Mon, Feb 27, 2017 at 2:45 PM, Peter Geoghegan  wrote:
> Since we have an awful lot of stuff in the last CF, and this patch
> doesn't seem particularly strategic, I've marked it "Returned with
> Feedback".

I noticed that this is in the upcoming CF 1 for v11. I'm signed up to review.

I'd like to point out that replacement selection is also obsolete,
which is something I brought up recently [1]. I don't actually have
any feature-driven reason to want to kill replacement selection - it's
just an annoyance at this point. I do think that RS is more deserving
of being killed than Polyphase merge, because it actually costs users
something to continue to support it. The replacement_sort_tuples GUC
particularly deserves to be removed.

It would be nice if killing RS was put in scope here. I'd appreciate
it, at least, since it would simplify the heap routines noticeably.
The original analysis that led to adding replacement_sort_tuples was
based on certain performance characteristics of merging that have
since changed by quite a bit, due to our work for v10.

[1] 
postgr.es/m/cah2-wzmmnjg_k0r9nqywmq3zjyjjk+hcbizynghay-zyjs6...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-30 Thread Tom Lane
Amit Kapila  writes:
> On Tue, Aug 29, 2017 at 10:05 PM, Tom Lane  wrote:
>> If no objections, I'll do the additional legwork and push.

> No objections.

Done.  Out of curiosity, I pushed just the rescan-param patch to the
buildfarm to start with, to see if anything would fall over, and indeed
some things did:

* prairiedog has shown several instances of a parallel bitmap heap scan
test failing with too many rows being retrieved.  I think what's
happening there is that the leader's ExecReScanBitmapHeapScan call is
slow enough to happen that the worker(s) have already retrieved some rows
using the old shared state.  We'd determined that the equivalent case
for a plain seqscan would result in no failure because the workers would
think they had nothing to do, but this evidently isn't true for a parallel
bitmap scan.

* prairiedog and loach have both shown failures with the test case from
a2b70c89c, in which the *first* scan produces too many rows and then the
later ones are fine.  This befuddled me initially, but then I remembered
that nodeNestloop.c will unconditionally do an ExecReScan call on its
inner plan before the first ExecProcNode call.  With the modified code
from 7df2c1f8d, this results in the leader's Gather node's top child
having a pending rescan on it due to a chgParam bit.  That's serviced
when we do the first ExecProcNode call on the child, after having started
the workers.  So that's another way in which a ReScan call can happen
in the leader when workers are already running, and if the workers have
already scanned some pages then those pages will get scanned again.

So I think this is all fixed up by 41b0dd987, but evidently those patches
are not nearly as independent as I first thought.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expanding inheritance in partition bound order

2017-08-30 Thread Ashutosh Bapat
On Wed, Aug 30, 2017 at 9:42 PM, Robert Haas  wrote:
> On Wed, Aug 30, 2017 at 6:08 AM, Amit Khandekar  
> wrote:
>> On 25 August 2017 at 23:58, Robert Haas  wrote:
>>> That just leaves indexes.  In a world where keystate, tupslot, and
>>> tupmap are removed from the PartitionDispatchData, you must need
>>> indexes or there would be no point in constructing a
>>> PartitionDispatchData object in the first place; any application that
>>> needs neither indexes nor the executor-specific stuff could just use
>>> the Relation directly.
>>
>> But there is expand_inherited_rtentry() which neither requires indexes
>> nor any executor stuff, but still requires to call
>> RelationGetPartitionDispatchInfo(), and so these indexes get built
>> unnecessarily.
>
> True, but the reason why expand_inherited_rtentry() needs to call
> RelationGetPartitionDispatchInfo() is to get back the leaf partition
> OIDs in bound order.  If we're using
> RelationGetPartitionDispatchInfo() to get the leaf partition OIDs into
> bound order, we've got to run the loop that builds leaf_part_oids, and
> the same loop constructs indexes.  So I don't think we're doing much
> redundant work there.
>
> Now, if we made it the job of expand_inherited_rtentry() to loop over
> the PartitionDesc, then it really wouldn't need to call
> RelationGetPartitionDispatchInfo at all.  Maybe that's actually a
> better plan anyway, because as Ashutosh points out, we don't want the
> partitioned children to show up before the unpartitioned children in
> the resulting ordering.
>

+1. I think we should just pull out the OIDs from partition descriptor.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A design for amcheck heapam verification

2017-08-30 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 5:02 AM, Alvaro Herrera
 wrote:
> Eh, if you want to optimize it for the case where debug output is not
> enabled, make sure to use ereport() not elog().  ereport()
> short-circuits evaluation of arguments, whereas elog() does not.

I should do that, but it's still not really noticeable.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expanding inheritance in partition bound order

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 6:08 AM, Amit Khandekar  wrote:
> On 25 August 2017 at 23:58, Robert Haas  wrote:
>> That just leaves indexes.  In a world where keystate, tupslot, and
>> tupmap are removed from the PartitionDispatchData, you must need
>> indexes or there would be no point in constructing a
>> PartitionDispatchData object in the first place; any application that
>> needs neither indexes nor the executor-specific stuff could just use
>> the Relation directly.
>
> But there is expand_inherited_rtentry() which neither requires indexes
> nor any executor stuff, but still requires to call
> RelationGetPartitionDispatchInfo(), and so these indexes get built
> unnecessarily.

True, but the reason why expand_inherited_rtentry() needs to call
RelationGetPartitionDispatchInfo() is to get back the leaf partition
OIDs in bound order.  If we're using
RelationGetPartitionDispatchInfo() to get the leaf partition OIDs into
bound order, we've got to run the loop that builds leaf_part_oids, and
the same loop constructs indexes.  So I don't think we're doing much
redundant work there.

Now, if we made it the job of expand_inherited_rtentry() to loop over
the PartitionDesc, then it really wouldn't need to call
RelationGetPartitionDispatchInfo at all.  Maybe that's actually a
better plan anyway, because as Ashutosh points out, we don't want the
partitioned children to show up before the unpartitioned children in
the resulting ordering.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expanding inheritance in partition bound order

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 10:31 AM, Robert Haas  wrote:
> On Wed, Aug 30, 2017 at 9:22 AM, Ashutosh Bapat
>  wrote:
>> Amit's patches seem to be addressing the third point here. But the
>> expansion is not happening in breadth-first manner. We are expanding
>> all the partitioned partitions first and then leaf partitions. So
>> that's not exactly "breadth-first".
>
> Correct, but I think Amit's ordering is what we actually want:
> breadth-first, low-OID-first over interior partitioned tables, and
> then breadth-first, low-OID-first again over leaves.  If we don't keep
> partitioned partitions first, then we're going to have problems
> keeping the locking order consistent when we start doing pruning
> during expansion.

No, I'm wrong and you're correct.  We want the partitions to be locked
first, but we don't want them to be pulled to the front of the
expansion order, because then it's not in bound order anymore and any
optimization that tries to rely on that ordering will break.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock optimization for multicore Power machines

2017-08-30 Thread Sokolov Yura

On 2017-08-30 16:24, Tom Lane wrote:

Alexander Korotkov  writes:
It doesn't seems to make sense to consider this patch unless we get 
access

to suitable Power machine to reproduce benefits.
This is why I'm going to mark this patch "Returned with feedback".
Once we would get access to the appropriate machine, I will resubmit 
this

patch.


What about hydra (that 60-core POWER7 machine we have community
access to)?

regards, tom lane


Anyway, I encourage to consider first another LWLock patch:
https://commitfest.postgresql.org/14/1166/
It positively affects performance on any platform (I've tested it
on Power as well).

And I have prototype of further adaptation of that patch for POWER
platform (ie using inline assembly) (not published yet).

--
Sokolov Yura
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 10:43 AM, amul sul  wrote:
> Thanks for the suggestion, I have updated 0002-patch accordingly.
> Using this I found some strange behaviours as follow:
>
> 1) standard and extended0 output for the jsonb_hash case is not same.
> 2) standard and extended0 output for the hash_range case is not same when
> input
>is int4range(550274, 1550274)  other case in the patch are fine. This can
> be
>reproducible with other values as well, for e.g. int8range(1570275,
> 208112489).
>
> Will look into this tomorrow.

Those sound like bugs in your patch.  Specifically:

+/* Same approach as hash_range */
+result = hash_uint32_extended((uint32) flags, seed);
+result ^= lower_hash;
+result = (result << 1) | (result >> 63);
+result ^= upper_hash;

That doesn't give compatible results.  The easiest thing to do might
be to rotate the high 32 bits and the low 32 bits separately.
JsonbHashScalarValueExtended has the same problem.  Maybe create a
helper function that does something like this (untested):

((x << 1) & UINT64COUNT(0xfffefffe)) | ((x >> 31) &
UINT64CONST(0x10001))

> Another case which I want to discuss is, extended and standard version of
> hashfloat4, hashfloat8 & hash_numeric function will have the same output for
> zero
> value irrespective of seed value. Do you think we need a fix for this?

Yes, I think you should return the seed rather than 0 in the cases
where the current code hard-codes a 0 return.  That will give the same
behavior in the seed == 0 case while cheaply getting us something a
bit different when there is a seed.

BTW, you should probably invent and use a PG_RETURN_UINT64 macro in this patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segment size depending *_wal_size defaults (was increasing the default WAL segment size)

2017-08-30 Thread Andres Freund
Hi,


To make it clear: I don't have a strong opinion on these, I'm happy
enough to commit the patch as is, minus one unrelated change. I just
think it should be discussed.


On 2017-08-30 07:01:54 -0400, Peter Eisentraut wrote:
> On 8/30/17 00:45, Andres Freund wrote:
> > 1) Currently the default for {min,max}_wal_size depends on the segment
> >size. Given that the segment size is about to be configurable, that
> >seems confusing.
> 
> On the one hand, I agree that it seems confusing and unnecessary to vary
> this with the segment size.  On the other hand, the problem is that if
> the segment size is larger than the default max_wal_size, we have to do
> something different anyway.

Well, rounding up to two segments, is different than rounding up to
64GB... I think there's a fair argument to be made that that's confusing
too.


> Also, the reason for wanting a larger segment size is that you expect
> a lot of WAL, so setting max_wal_size larger by default caters to
> that.

This I find unconvincing. If we want more autotuning, we should do that,
rather than define random things as that.


> Right now, we have max_wal_size = 1GB by default.  What should the
> behavior be for wal_segment_size = 1GB?

2GB.



> > 2) Currently wal_segment_size is measured in GUC_UNIT_XBLOCKS, which
> >requires us to keep two copies of the underlying variable, one in
> >XBLOCKS one in bytes. I'd rather just have the byte variant.
> 
> It seems to me that one could structure the code to make do with the
> existing variable. I had a brief look at the patch, and I think some
> more imaginative refactoring is possible.

That'd add yet more instructions to tight spinlock'ed paths, so I'm
loathe to do so. Or maybe I'm just missing what you're thinking about?


> If we want/need to change it, I'd prefer using the _KB or _MB
> variants, so as to not to restrict future size limits too much.

I'm a bit surprised at this one - if we want to add larger segment size
limits (which I don't think are useful), wouldn't it be just as
appropriate to extend the GUC infrastructure?


Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel worker error

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 9:58 AM, Tom Lane  wrote:
> The problem here is exactly that we cannot transmit the leader's
> state to the worker.  You can't blame it on SET ROLE, because
> I didn't do one.

Hmm, that's a good reason for holding it blameless.  In this case,
I'll blame the fact that we allow a role to be dropped while there are
users connected using that role.  That's about as sensible as allowing
a table to be dropped while there are users reading from it, or a
database to be dropped while there are users connected to it, both of
which we in fact prohibit.  IOW, I'd say the problem is that we've
allowed the system state to become inconsistent and now we're sad
because stuff doesn't work.

But since that's an established design fl^H^Hprinciple, maybe that
means we should go with the approach of teaching SerializeGUCState()
to ignore role altogether and instead have ParallelWorkerMain call
SetCurrentRoleId using information passed via the FixedParallelState
(not sure of the precise details here).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hash Functions

2017-08-30 Thread amul sul
On Tue, Aug 29, 2017 at 11:48 PM, Robert Haas  wrote:

> On Tue, Aug 22, 2017 at 8:14 AM, amul sul  wrote:
> > Attaching patch 0002 for the reviewer's testing.
>
> I think that this 0002 is not something we can think of committing
> because there's no guarantee that hash functions will return the same
> results on all platforms.  However, what we could and, I think, should
> do is hash some representative values of each data type and verify
> that hash(x) and hashextended(x, 0) come out the same at least as to
> the low-order 32 bits -- and maybe that hashextend(x, 1) comes out
> different as to the low-order 32 bits.  The easiest way to do this
> seems to be to cast to bit(32).  For example:
>
> SELECT v, hashint4(v)::bit(32) as standard,
> hashint4extended(v, 0)::bit(32) as extended0,
> hashint4extended(v, 1)::bit(32) as extended1
> FROM (VALUES (0), (1), (17), (42), (550273), (207112489)) x(v)
> WHERE hashint4(v)::bit(32) != hashint4extended(v, 0)::bit(32)
>OR hashint4(v)::bit(32) = hashint4extended(v, 1)::bit(32);
>
> I suggest adding a version of this for each data type.
>

​Thanks for the suggestion, I have updated 0002-patch accordingly.
Using this I found some strange behaviours as follow:

1) standard and extended0 output for the jsonb_hash case is not same.
2) standard and extended0 output for the hash_range case is not same when
input
   is int4range(550274, 1550274)  other case in the patch are fine. This
can be
   reproducible with other values as well, for e.g. int8range(1570275,
208112489).

Will look into this tomorrow.

​Another case which I want to discuss is, extended and standard version of
hashfloat4, hashfloat8 & hash_numeric function will have the same output
for zero
value irrespective of seed value. Do you think we need a fix for this?


> From your latest version of 0001, I get:
>
> rangetypes.c:1297:8: error: unused variable 'rngtypid'
>   [-Werror,-Wunused-variable]
> Oid rngtypid = RangeTypeGetOid(r);
>
> ​Fixed in the attached version.​


> I suggest not duplicating the comments from each regular function into
> the extended function, but just writing /* Same approach as hashfloat8
> */ when the implementation is non-trivial (no need for this if the
> extended function is a single line or the original function had no
> comments anyway).
>
> ​
Fixed in the attached version.​


> hash_aclitem seems embarrassingly stupid.  I suggest that we make the
> extended version slightly less stupid -- i.e. if the seed is non-zero,
> actually call hash_any_extended on the sum and pass the seed through.
>
>   * Reset info about hash function whenever we pick up new info
> about
>   * equality operator.  This is so we can ensure that the hash
> function
>   * matches the operator.
>   */
>  typentry->flags &= ~(TCFLAGS_CHECKED_HASH_PROC);
> +typentry->flags &= ~(TCFLAGS_CHECKED_HASH_EXTENDED_PROC);
>
> Adjust comment: "about hash function" -> "about hash functions", "hash
> functions matches" -> "hash functions match".
>
> ​
Fixed in the attached version.​


> +extern Datum
> +hash_any_extended(register const unsigned char *k, register int
> +  keylen, uint64 seed);
>
> Formatting.
> ​
>

​Fixed in the attached version.​

​Thanks !

Regards,
Amul​


0001-add-optional-second-hash-proc-v3.patch
Description: Binary data


0002-test-Hash_functions_v2_wip.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expanding inheritance in partition bound order

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 9:22 AM, Ashutosh Bapat
 wrote:
> Amit's patches seem to be addressing the third point here. But the
> expansion is not happening in breadth-first manner. We are expanding
> all the partitioned partitions first and then leaf partitions. So
> that's not exactly "breadth-first".

Correct, but I think Amit's ordering is what we actually want:
breadth-first, low-OID-first over interior partitioned tables, and
then breadth-first, low-OID-first again over leaves.  If we don't keep
partitioned partitions first, then we're going to have problems
keeping the locking order consistent when we start doing pruning
during expansion.

> A better way for translating partition hierarchy into inheritance
> hierarchy may be to expand all partitions (partitioned or leaf) of a
> given parent at a time in breadth-first manner. This allows us to
> create child RTE, AppendRelInfo, rowmarks while we have corresponding
> parent structures at hand, rather than searching for those. This would
> still satisfy Amit Khandekar's requirement to expand leaf partitions
> in the same order as their OIDs would be returned by
> RelationGetPartitionDispatchInfo(). I have a feeling that, if we go
> that route, we will replace almost all the changes that Amit Langote's
> patches do to expand_inherited_rtentry().

I think we will, too, but I think that's basically the problem of the
partition-wise join patch.  Either find_all_inheritors is going to
have to return enough additional information to let
expand_inherited_rtentry work efficiently, or else
expand_inherited_rtentry is going to have to duplicate some of the
logic from find_all_inheritors.  But that doesn't make what Amit is
doing here a bad idea -- getting stuff that shouldn't be part of
PartitionDispatch removed and getting the expansion order in
expand_inherited_rtentry() changed seem to be the right things to do
even if the way it's implemented has to be revised to meet other
goals.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel worker error

2017-08-30 Thread Tom Lane
Robert Haas  writes:
> On Wed, Aug 30, 2017 at 9:20 AM, Tom Lane  wrote:
>> I"m okay with a narrow solution if SET ROLE really is
>> the only problem, but at this stage I'm not convinced of that.

> I don't think the problem with role is that it's catalog-dependent,
> but that the effective value is changed by code that never calls
> SetConfigOption() or set_config_option() or anything other mechanism
> that the GUC code knows about.

It's certainly possible that that's a contributing factor, but the
variant that Amit alluded to demonstrates that catalog dependency
is part of the problem:

regression=# create user testuser1;
CREATE ROLE
regression=# \c - testuser1
You are now connected to database "regression" as user "testuser1".

-- in a different session, do "drop user testuser1;", then:

regression=> show role;
 role 
--
 none
(1 row)

regression=> show session authorization;
 session_authorization 
---
 testuser1
(1 row)

regression=> select count(*) from pg_class;
 count 
---
  1113
(1 row)

regression=> set force_parallel_mode TO 1;
SET
regression=> select count(*) from pg_class;  
ERROR:  role with OID 110981 does not exist
CONTEXT:  parallel worker
regression=> set force_parallel_mode TO 0;
SET
regression=> select count(*) from pg_class;
 count 
---
  1113
(1 row)

The problem here is exactly that we cannot transmit the leader's
state to the worker.  You can't blame it on SET ROLE, because
I didn't do one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Fix drop replication slot blocking instead of returning error

2017-08-30 Thread Alvaro Herrera
And another patch to restore behavior to replication origin drop.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 66c1b1072feb95a08739d9a752f1d6fc73d1dc77 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 30 Aug 2017 15:38:15 +0200
Subject: [PATCH 2/2] Restore original behavior for replication origins, too

---
 src/backend/replication/logical/origin.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/replication/logical/origin.c 
b/src/backend/replication/logical/origin.c
index 14cb3d0bf2..edc6efb8a6 100644
--- a/src/backend/replication/logical/origin.c
+++ b/src/backend/replication/logical/origin.c
@@ -1205,7 +1205,7 @@ pg_replication_origin_drop(PG_FUNCTION_ARGS)
roident = replorigin_by_name(name, false);
Assert(OidIsValid(roident));
 
-   replorigin_drop(roident, false);
+   replorigin_drop(roident, true);
 
pfree(name);
 
-- 
2.11.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel worker error

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 9:20 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Aug 30, 2017 at 8:04 AM, Tom Lane  wrote:
>>> We might need to redesign the GUC-propagation mechanism so it sends
>>> the various internal representations of GUC values, not the user-visible
>>> strings.
>
>> That would probably be better in the long run, but I'm not keen to do
>> it in a back-branch under time pressure.
>
> Definitely a valid objection.  But before assuming that this issue is
> limited to SET ROLE, it'd be wise to push a bit on the other GUCs with
> catalog-dependent values, to see if there are any others we need to
> worry about.  I"m okay with a narrow solution if SET ROLE really is
> the only problem, but at this stage I'm not convinced of that.

I don't think the problem with role is that it's catalog-dependent,
but that the effective value is changed by code that never calls
SetConfigOption() or set_config_option() or anything other mechanism
that the GUC code knows about. That coding pattern is known to be
broken (see the commit message for
a16d421ca4fc639929bc964b2585e8382cf16e33, for example) and my bet is
the only reason why set role has gotten by with it for so long is
because the code was written a long time ago and nobody wants to risk
breaking anything by trying to clean it up.  It's almost unfair to
blame this on parallel query; if someone were to submit a patch
tomorrow that changes the effective value of a GUC without going
through SetConfigOption(), you'd punt it into outer space at
relativistic speed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Fix drop replication slot blocking instead of returning error

2017-08-30 Thread Alvaro Herrera
Craig Ringer wrote:

> > FWIW, I also don't think it's ok to just change the behaviour
> > unconditionally and without a replacement for existing behaviour.
> 
> Seems like it just needs a new argument nowait DEFAULT false

I added a WAIT flag to DROP_REPLICATION_SLOT, preliminary patch
attached.  Running tests now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From b395155c504ad1c6a4eb1faa7c74e2df8b559545 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 29 Aug 2017 12:08:47 +0200
Subject: [PATCH] Add a WAIT option to DROP_REPLICATION_COMMAND

This restores the default behavior of the command prior to 9915de6c1cb2.

Per complaint from Simone Gotti.
Discussion: 
https://postgr.es/m/caevsy6wgdf90o6puvg2wsvxl2omh5opc-38od4zzgk-fxav...@mail.gmail.com
---
 doc/src/sgml/logicaldecoding.sgml   |  2 +-
 doc/src/sgml/protocol.sgml  | 17 ++---
 src/backend/commands/subscriptioncmds.c |  2 +-
 src/backend/replication/repl_gram.y | 10 ++
 src/backend/replication/repl_scanner.l  |  1 +
 src/backend/replication/slotfuncs.c |  2 +-
 src/backend/replication/walsender.c |  2 +-
 src/include/nodes/replnodes.h   |  1 +
 8 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/logicaldecoding.sgml 
b/doc/src/sgml/logicaldecoding.sgml
index 8dcfc6c742..f8142518c1 100644
--- a/doc/src/sgml/logicaldecoding.sgml
+++ b/doc/src/sgml/logicaldecoding.sgml
@@ -303,7 +303,7 @@ $ pg_recvlogical -d postgres --slot test --drop-slot
  
 
  
-  DROP_REPLICATION_SLOT 
slot_name
+  DROP_REPLICATION_SLOT 
slot_name  WAIT 

  
 
  
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 7c012f59a3..2bb4e38a9d 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2173,13 +2173,13 @@ The commands accepted in walsender mode are:
   
 
   
-DROP_REPLICATION_SLOT slot_name
+
+ DROP_REPLICATION_SLOT slot_name  WAIT 
  DROP_REPLICATION_SLOT
 
 
  
-  Drops a replication slot, freeing any reserved server-side resources. If
-  the slot is currently in use by an active connection, this command fails.
+  Drops a replication slot, freeing any reserved server-side resources.
   If the slot is a logical slot that was created in a database other than
   the database the walsender is connected to, this command fails.
  
@@ -2192,6 +2192,17 @@ The commands accepted in walsender mode are:
  

   
+
+  
+   WAIT
+   
+
+ This option causes the command to wait if the slot is active until
+ it becomes inactive, instead of the default behavior of raising an
+ error.
+
+   
+  
  
 
   
diff --git a/src/backend/commands/subscriptioncmds.c 
b/src/backend/commands/subscriptioncmds.c
index 9bc1d178fc..2ef414e084 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -959,7 +959,7 @@ DropSubscription(DropSubscriptionStmt *stmt, bool 
isTopLevel)
load_file("libpqwalreceiver", false);
 
initStringInfo();
-   appendStringInfo(, "DROP_REPLICATION_SLOT %s", 
quote_identifier(slotname));
+   appendStringInfo(, "DROP_REPLICATION_SLOT %s WAIT", 
quote_identifier(slotname));
 
wrconn = walrcv_connect(conninfo, true, subname, );
if (wrconn == NULL)
diff --git a/src/backend/replication/repl_gram.y 
b/src/backend/replication/repl_gram.y
index ec047c827c..a012447fa2 100644
--- a/src/backend/replication/repl_gram.y
+++ b/src/backend/replication/repl_gram.y
@@ -72,6 +72,7 @@ static SQLCmd *make_sqlcmd(void);
 %token K_LABEL
 %token K_PROGRESS
 %token K_FAST
+%token K_WAIT
 %token K_NOWAIT
 %token K_MAX_RATE
 %token K_WAL
@@ -272,6 +273,15 @@ drop_replication_slot:
DropReplicationSlotCmd *cmd;
cmd = makeNode(DropReplicationSlotCmd);
cmd->slotname = $2;
+   cmd->wait = false;
+   $$ = (Node *) cmd;
+   }
+   | K_DROP_REPLICATION_SLOT IDENT K_WAIT
+   {
+   DropReplicationSlotCmd *cmd;
+   cmd = makeNode(DropReplicationSlotCmd);
+   cmd->slotname = $2;
+   cmd->wait = true;
$$ = (Node *) cmd;
}
;
diff --git a/src/backend/replication/repl_scanner.l 
b/src/backend/replication/repl_scanner.l
index 52ae7b343f..62bb5288c0 100644
--- a/src/backend/replication/repl_scanner.l
+++ 

Re: [HACKERS] LWLock optimization for multicore Power machines

2017-08-30 Thread Tom Lane
Alexander Korotkov  writes:
> It doesn't seems to make sense to consider this patch unless we get access
> to suitable Power machine to reproduce benefits.
> This is why I'm going to mark this patch "Returned with feedback".
> Once we would get access to the appropriate machine, I will resubmit this
> patch.

What about hydra (that 60-core POWER7 machine we have community
access to)?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expanding inheritance in partition bound order

2017-08-30 Thread Ashutosh Bapat
On Wed, Aug 30, 2017 at 8:33 AM, Robert Haas  wrote:
> On Tue, Aug 29, 2017 at 10:36 PM, Amit Langote
>  wrote:
>>> I keep having the feeling that this is a big patch with a small patch
>>> struggling to get out.  Is it really necessary to change
>>> RelationGetPartitionDispatchInfo so much or could you just do a really
>>> minimal surgery to remove the code that sets the stuff we don't need?
>>> Like this:
>>
>> Sure, done in the attached updated patch.
>
> On first glance, that looks pretty good.  I'll have a deeper look
> tomorrow.
>

In one of your earlier mails on this thread, you had described how
expand_inherited_rtentry() would look like as

-- quote --
1. Calling find_all_inheritors with a new only-lock-the-partitions
flag.  This should result in locking all partitioned tables in the
inheritance hierarchy in breadth-first, low-OID-first order.  (When
the only-lock-the-partitions isn't specified, all partitioned tables
should still be locked before any unpartitioned tables, so that the
locking order in that case is consistent with what we do here.)

2. Iterate over the partitioned tables identified in step 1 in the
order in which they were returned.  For each one:
- Decide which children can be pruned.
- Lock the unpruned, non-partitioned children in low-OID-first order.

3. Make another pass over the inheritance hierarchy, starting at the
root.  Traverse the whole hierarchy in breadth-first in *bound* order.
Add RTEs and AppendRelInfos as we go -- these will have rte->inh =
true for partitioned tables and rte->inh = false for leaf partitions.

-- quote ends --

Amit's patches seem to be addressing the third point here. But the
expansion is not happening in breadth-first manner. We are expanding
all the partitioned partitions first and then leaf partitions. So
that's not exactly "breadth-first".

I tried to rebase first patch from partition-wise join patchset [1] on
top of these two patches. I am having hard time applying those
changes. The goal of the my patch is to expand the partitioned table
into an inheritance hierarchy which retains the partition hierarchy.
For that to happen, we need to know which partition belongs to which
partitioned table in the partition hierarchy. PartitionDispatch array
provided by RelationGetPartitionDispatchInfo() provides me the parent
OIDs of partitioned parents but it doesn't do so for the leaf
partitions. So, I changed the signature of that function to return the
list of parent OIDs of leaf partitions. But for building
AppendRelInfos, child RTEs and child row marks, I need parent's RTI,
RTE and row marks, which are not available directly. Given parent's
OID, I need to search root->parse->rtable to find its RTE, RTI and
then using RTI I can find rowmarks. But that seems to defeat the
purpose why partition-wise join needs EIBO i.e. to avoid O(n ^2) loop
in build_simple_rel(). For eliminating that loop we are introducing
another O(n^2) loop in expand_inherited_rtentry(). Even without
considering O(n^2) complexity this looks ugly.

A better way for translating partition hierarchy into inheritance
hierarchy may be to expand all partitions (partitioned or leaf) of a
given parent at a time in breadth-first manner. This allows us to
create child RTE, AppendRelInfo, rowmarks while we have corresponding
parent structures at hand, rather than searching for those. This would
still satisfy Amit Khandekar's requirement to expand leaf partitions
in the same order as their OIDs would be returned by
RelationGetPartitionDispatchInfo(). I have a feeling that, if we go
that route, we will replace almost all the changes that Amit Langote's
patches do to expand_inherited_rtentry().

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel worker error

2017-08-30 Thread Tom Lane
Robert Haas  writes:
> On Wed, Aug 30, 2017 at 8:04 AM, Tom Lane  wrote:
>> We might need to redesign the GUC-propagation mechanism so it sends
>> the various internal representations of GUC values, not the user-visible
>> strings.

> That would probably be better in the long run, but I'm not keen to do
> it in a back-branch under time pressure.

Definitely a valid objection.  But before assuming that this issue is
limited to SET ROLE, it'd be wise to push a bit on the other GUCs with
catalog-dependent values, to see if there are any others we need to
worry about.  I"m okay with a narrow solution if SET ROLE really is
the only problem, but at this stage I'm not convinced of that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel worker error

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 8:04 AM, Tom Lane  wrote:
> We might need to redesign the GUC-propagation mechanism so it sends
> the various internal representations of GUC values, not the user-visible
> strings.

That would probably be better in the long run, but I'm not keen to do
it in a back-branch under time pressure.  I think one approach is to
stick another test against InitializingParallelWorker into check_role,
making it treat that case like no-role-specified.  But I'm a little
worried about whether that would ever lead to a case where the role
should get set and doesn't, leading to a security bug.

Another approach is to attack this problem right at the root, which
IMHO is that changing "session_authorization" is implicitly setting
"role", but for the reasons mentioned in the comment in show_role(),
it doesn't set it explicitly.  Well, that results in a situation where
the actual value assigned to the GUC doesn't match the value that
needs to be passed to the worker, so it breaks.  I'm not sure what
would be a practical approach to that problem, but it's a thought.

A third approach is to do something like what you're proposing but
just limited to "role".  In particular, the obvious thing to do would
be exclude it from SerializeGUCState and somehow propagate it as part
of FixedParallelState, which already has authenticated_user_id and
current_user_id.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LWLock optimization for multicore Power machines

2017-08-30 Thread Alexander Korotkov
On Thu, Apr 6, 2017 at 5:38 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Thu, Apr 6, 2017 at 5:37 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> On Thu, Apr 6, 2017 at 2:16 AM, Andres Freund  wrote:
>>
>>> On 2017-04-03 11:56:13 -0700, Andres Freund wrote:
>>> > Have you done x86 benchmarking?
>>>
>>> I think unless such benchmarking is done in the next 24h we need to move
>>> this patch to the next CF...
>>>
>>
>> Thank you for your comments.
>> I didn't x86 benchmarking.  I even didn't manage to reproduce previous
>> results on Power8.
>> Presumably, it's because previous benchmarks were done on bare metal,
>> while now I have to some kind of virtual machine on IBM E880 where I can't
>> reproduce any win of Power8 LWLock optimization.  But probably there is
>> another reason.
>> Thus, I'm moving this patch to the next CF.
>>
>
> I see it's already moved. OK!
>

It doesn't seems to make sense to consider this patch unless we get access
to suitable Power machine to reproduce benefits.
This is why I'm going to mark this patch "Returned with feedback".
Once we would get access to the appropriate machine, I will resubmit this
patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] ALTER INDEX .. SET STATISTICS ... behaviour

2017-08-30 Thread Alexander Korotkov
On Thu, Jun 1, 2017 at 4:10 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Wed, May 31, 2017 at 7:18 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> On Wed, May 31, 2017 at 6:53 PM, Tom Lane  wrote:
>>
>>> Alexander Korotkov  writes:
>>> > I've discovered that PostgreSQL is able to run following kind of
>>> queries in
>>> > order to change statistic-gathering target for an indexed expression.
>>>
>>> > ALTER INDEX index_name ALTER COLUMN expr SET STATISTICS stat_target;
>>>
>>> > It's been previously discussed in [1].
>>>
>>> > I think this should be fixed not just in docs.  This is why I've
>>> started
>>> > thread in pgsql-hackers.  For me usage of internal column names "expr",
>>> > "expr1", "expr2" etc. looks weird.  And I think we should replace it
>>> with a
>>> > better syntax.  What do you think about these options?
>>>
>>> > ALTER INDEX index_name ALTER EXPRESSION 0 SET STATISTICS stat_target;
>>> --
>>> > Refer expression by its number
>>> > ALTER INDEX index_name ALTER EXPRESSION (x + y) SET STATISTICS
>>> stat_target;
>>> > -- Refer expression by its definition
>>>
>>> Don't like either of those particularly, but what about just targeting
>>> a column by column number, independently of whether it's an expression?
>>>
>>> ALTER INDEX myindex ALTER COLUMN 4 SET STATISTICS 1000;
>>>
>>
>> I completely agree with that.
>> If no objections, I will write a patch for that.
>>
>
> Please, find it in attached patch.
>
> I decided to forbid referencing columns by numbers for tables, because
> those numbers could contain gaps.  Also, I forbid altering statistics
> target for non-expression index columns, because it has no effect.
>

Patch rebased to current master is attached.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


alter-index-statistics-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Separate log file for extension

2017-08-30 Thread Antonin Houska
Masahiko Sawada  wrote:

> On Fri, Aug 25, 2017 at 4:12 PM, Antonin Houska  wrote:
> > Attached is a draft patch to allow extension to write log messages to a
> > separate file.
> 
> Does it allow postgres core code to write log messages to multiple log
> files as well? I can imagine a use case where we want to write log
> messages to separate log files by error level or purpose (server log
> and SQL log etc).

At low level it should work as long as several log streams are reserved for
the core code. However, as every single stream needs multiple variables, I
have no idea how to configure those additional streams w/o adding many new GUC
variables to the core.

Also the discussion what should (not) be logged separate would probably be
difficult.

Specifically to log SQL statements to a separate file, I think (but not
verified) that you can write an extension that reserves a stream for itself
and also uses emit_log_hook to recognize SQL statements among the logged
messages. If the hook adjusts the syslogger_stream field of ErrorData
accordingly, the message should end up in the separate file.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Append implementation

2017-08-30 Thread Amit Khandekar
Hi Rafia,

On 17 August 2017 at 14:12, Amit Khandekar  wrote:
> But for all of the cases here, partial
> subplans seem possible, and so even on HEAD it executed Partial
> Append. So between a Parallel Append having partial subplans and a
> Partial Append having partial subplans , the cost difference would not
> be significant. Even if we assume that Parallel Append was chosen
> because its cost turned out to be a bit cheaper, the actual
> performance gain seems quite large as compared to the expected cost
> difference. So it might be even possible that the performance gain
> might be due to some other reasons. I will investigate this, and the
> other queries.
>

I ran all the queries that were showing performance benefits in your
run. But for me, the ParallelAppend benefits are shown only for plans
that use Partition-Wise-Join.

For all the queries that use only PA plans but not PWJ plans, I got
the exact same plan for HEAD as for PA+PWJ patch, except that for the
later, the Append is a ParallelAppend. Whereas, for you, the plans
have join-order changed.

Regarding actual costs; consequtively, for me the actual-cost are more
or less the same for HEAD and PA+PWJ. Whereas, for your runs, you have
quite different costs naturally because the plans themselves are
different on head versus PA+PWJ.

My PA+PWJ plan outputs (and actual costs) match exactly what you get
with PA+PWJ patch. But like I said, I get the same join order and same
plans (and actual costs) for HEAD as well (except
ParallelAppend=>Append).

May be, if you have the latest HEAD code with your setup, you can
yourself check some of the queries again to see if they are still
seeing higher costs as compared to PA ? I suspect that some changes in
latest code might be causing this discrepancy; because when I tested
some of the explains with a HEAD-branch server running with your
database, I got results matching PA figures.

Attached is my explain-analyze outputs.

On 16 August 2017 at 18:34, Robert Haas  wrote:
> Thanks for the benchmarking results!
>
> On Tue, Aug 15, 2017 at 11:35 PM, Rafia Sabih
>  wrote:
>> Q4 | 244 | 12 | PA and PWJ, time by only PWJ - 41
>
> 12 seconds instead of 244?  Whoa.  I find it curious that we picked a
> Parallel Append with a bunch of non-partial plans when we could've
> just as easily picked partial plans, or so it seems to me.  To put
> that another way, why did we end up with a bunch of Bitmap Heap Scans
> here instead of Parallel Bitmap Heap Scans?

Actually, the cost difference would be quite low for Parallel Append
with partial plans and Parallel Append with non-partial plans with 2
workers. But yes, I should take a look at why it is consistently
taking non-partial Bitmap Heap Scan.



> Q6 | 29 | 12 | PA only

This one needs to be analysed, because here, the plan cost is the
same, but actual cost for PA is almost half the cost for HEAD. This is
the same observation for my run also.

Thanks
-Amit


PA-test-AmitKh.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel worker error

2017-08-30 Thread Tom Lane
Amit Kapila  writes:
> I am able to reproduce this without involving session authorization
> guc as well.  One needs to drop the newly created role from another
> session, then also we can see the same error.

Hm.  I suspect the basic shape of what's happening here is "an existing
session can continue to run with OuterUserId corresponding to a dropped
role, but we fail when trying to duplicate that state into a parallel
worker".  I wonder whether there aren't similar gotchas for other GUCs
whose interpretation depends on catalog lookups, eg search_path.

We might need to redesign the GUC-propagation mechanism so it sends
the various internal representations of GUC values, not the user-visible
strings.  (I'm thinking of the blobs that guc.c can use to restore a
previous state at transaction abort ... don't recall what the code
calls them ATM.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A design for amcheck heapam verification

2017-08-30 Thread Alvaro Herrera
Peter Geoghegan wrote:

> > Your patch brings us one step closer to that goal.  (The book says
> > that this approach is good far sparse bitsets, but your comment says
> > that we expect something near 50%.  That's irrelevant anyway since a
> > future centralised popcount() implementation would do this in
> > word-sized chunks with a hardware instruction or branch-free-per-word
> > lookups in a table and not care at all about sparseness.)
> 
> I own a copy of Hacker's Delight (well, uh, Daniel Farina lent me his
> copy about 2 years ago!). pop()/popcount() does seem like a clever
> algorithm, that we should probably think about adopting in some cases,
> but I should point at that the current caller to my
> bloom_prop_bits_set() function is an elog() DEBUG1 call. This is not
> at all performance critical.

Eh, if you want to optimize it for the case where debug output is not
enabled, make sure to use ereport() not elog().  ereport()
short-circuits evaluation of arguments, whereas elog() does not.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel worker error

2017-08-30 Thread Pavan Deolasee
On Wed, Aug 30, 2017 at 5:19 PM, Amit Kapila 
wrote:

>
>
> I am able to reproduce this without involving session authorization
> guc as well.  One needs to drop the newly created role from another
> session, then also we can see the same error.
>
>
Yeah, that's how I first created the case. But concurrent dropping of role
(which is surprisingly allowed even when there are active connections with
the role active) opens another set of errors. Hence I tried to reproduce
this in a single session. While it might be interesting to fix the
concurrent role drop problem someday, the single session problem looks more
pressing.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes

2017-08-30 Thread Robert Haas
On Sun, Aug 27, 2017 at 8:31 PM, Michael Malis  wrote:
> (Sorry David. I initially replied only to you)
>
> Ok. I've attached a patch of a proof-of-concept. I have a few
> questions about tests.
>
> What is typical workflow to add tests for changes to the planner?

Add submitted patches at commitfest.postgresql.org

> Also
> I ran make check and it appears one of the existing tests is failing.
> What is a typical way for going about discovering why the query plan
> for a specific query changed?

I don't have any magic answer on this point.

> Also, how should I go about changing the
> old test? Should I replace the old test output with the new test
> output or modify the old test slightly to get it to produce the same
> case as before?

That's a judgement call, based on what you think the point of the test was.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel worker error

2017-08-30 Thread Tom Lane
I wrote:
> I can duplicate this in HEAD and v10, but not 9.6, so I've added it
> as an open issue for v10.  No idea what broke it.

Oh, scratch that: the issue exists in 9.6, it's just that the particular
test query you're using here does not generate a parallelized plan in
9.6, as shown by "explain".  With a different query that does get
parallelized, 9.6 fails too:

regression=#  select sum(ten) from tenk1;
ERROR:  role "testuser1" does not exist
CONTEXT:  parallel worker

Still, I think it's reasonable to characterize this as "must fix for v10".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 7:39 AM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Tue, Aug 29, 2017 at 10:05 PM, Tom Lane  wrote:
>> ! /* Make sure any existing workers are gracefully shut down */
>>   ExecShutdownGatherWorkers(node);
>
>> The above call doesn't ensure the shutdown. It just ensures that we
>> receive all messages from parallel workers.  Basically, it doesn't
>> call WaitForParallelWorkersToExit.
>
> Perhaps you should submit a patch to rename ExecShutdownGatherWorkers
> to something less misleading, then.  But the previous comment there
> was even more wrong :-(

Your (Tom's) proposed comment doesn't seem wrong to me.  Shutting down
workers consists of several stages.  We destroy the tuple queues --
which will cause them to cease generating tuples once they notice --
then we wait for them to send us an 'X' message to indicate that
they've shut down cleanly -- then they actually exit -- then the
postmaster notices and releases their slots for reuse.  After
ExecShutdownGatherWorkers has completed, the first two of those things
have finished but the third and fourth may not be quite done yet.  I'd
say it's fair to say, at that point, that the workers are gracefully
shut down.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel worker error

2017-08-30 Thread Amit Kapila
On Wed, Aug 30, 2017 at 5:11 PM, Robert Haas  wrote:
> On Wed, Aug 30, 2017 at 6:58 AM, Pavan Deolasee
>  wrote:
>> It's quite possible that I don't understand the differences in "role" and
>> "session authorization", but it still looks like a bug to me. May be
>> SerializeGUCState() should check if SetRoleIsActive is true and only then
>> save the role information?
>
> Ugh.  Well, this is definitely a bug, but I'm not sure if that's the right 
> fix.
>

Yeah.

> Mutually interdependent GUCs are bad news.
>

I am able to reproduce this without involving session authorization
guc as well.  One needs to drop the newly created role from another
session, then also we can see the same error.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel worker error

2017-08-30 Thread Tom Lane
Pavan Deolasee  writes:
> The last statement in this test fails with an error:
> ERROR:  role "testuser1" does not exist
> CONTEXT:  parallel worker

I can duplicate this in HEAD and v10, but not 9.6, so I've added it
as an open issue for v10.  No idea what broke it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel worker error

2017-08-30 Thread Robert Haas
On Wed, Aug 30, 2017 at 6:58 AM, Pavan Deolasee
 wrote:
> It's quite possible that I don't understand the differences in "role" and
> "session authorization", but it still looks like a bug to me. May be
> SerializeGUCState() should check if SetRoleIsActive is true and only then
> save the role information?

Ugh.  Well, this is definitely a bug, but I'm not sure if that's the right fix.

Mutually interdependent GUCs are bad news.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-30 Thread Tom Lane
Amit Kapila  writes:
> On Tue, Aug 29, 2017 at 10:05 PM, Tom Lane  wrote:
> ! /* Make sure any existing workers are gracefully shut down */
>   ExecShutdownGatherWorkers(node);

> The above call doesn't ensure the shutdown. It just ensures that we
> receive all messages from parallel workers.  Basically, it doesn't
> call WaitForParallelWorkersToExit.

Perhaps you should submit a patch to rename ExecShutdownGatherWorkers
to something less misleading, then.  But the previous comment there
was even more wrong :-(

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Update low-level backup documentation to match actual behavior

2017-08-30 Thread David Steele
Hi Michael,

Thanks for reviewing!

On 8/29/17 9:44 PM, Michael Paquier wrote:
> On Tue, Aug 29, 2017 at 10:59 PM, David Steele  wrote:
>>
>> Attached is the 9.6 patch.  It required a bit more work in func.sgml
>> than I was expecting so have a close look at that.  The rest was mostly
>> removing irrelevant hunks.
> 
> + switch to the next WAL segment.  On a standby, it is not possible to
> + automatically switch WAL segments, so you may wish to run
> + pg_switch_wal on the primary to perform a manual
> + switch. The reason for the switch is to arrange for
> [...]
> +WAL segments have been archived. If write activity on the primary
> is low, it
> +may be useful to run pg_switch_wal on the primary in order 
> to
> +trigger an immediate segment switch of the last required WAL
> It seems to me that both portions are wrong. There is no archiving
> wait on standbys for 9.6, and 
I think its clearly stated here that pg_stop_backup() does not wait for
WAL to archive on a standby.  Even, it is very important for the backup
routine to make sure that all the WAL *is* archived.

> pg_stop_backup triggers by itself the
> segment switch, so saying that enforcing pg_switch_wal on the primary
> is moot. 

pg_stop_backup() does not perform a WAL switch on the standby which is
what this sentence is referring to.  I have separated this section out
into a new paragraph to (hopefully) make it clearer.

> pg_switch_xlog has been renamed to pg_switch_wal in PG10, so
> the former name applies.

Whoops!

New patch is attached.

Thanks,
-- 
-David
da...@pgmasters.net
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index 03c0dbf1cd..fd696c38db 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -889,8 +889,11 @@ SELECT pg_start_backup('label', false, false);
 
 SELECT * FROM pg_stop_backup(false);
 
- This terminates the backup mode and performs an automatic switch to
- the next WAL segment.  The reason for the switch is to arrange for
+ This terminates backup mode. On a primary, it also performs an automatic
+ switch to the next WAL segment.  On a standby, it is not possible to
+ automatically switch WAL segments, so you may wish to run
+ pg_switch_xlog on the primary to perform a manual
+ switch. The reason for the switch is to arrange for
  the last WAL segment file written during the backup interval to be
  ready to archive.
 
@@ -908,7 +911,7 @@ SELECT * FROM pg_stop_backup(false);
  Once the WAL segment files active during the backup are archived, you are
  done.  The file identified by pg_stop_backup's first return
  value is the last segment that is required to form a complete set of
- backup files.  If archive_mode is enabled,
+ backup files.  On a primary, if archive_mode is enabled,
  pg_stop_backup does not return until the last segment has
  been archived.
  Archiving of these files happens automatically since you have
@@ -924,6 +927,13 @@ SELECT * FROM pg_stop_backup(false);
  pg_stop_backup terminates because of this your backup
  may not be valid.
 
+
+
+ Note that on a standby pg_stop_backup does not wait for
+ WAL segments to be archived so the backup process must ensure that all WAL
+ segments required for the backup are successfully archived.
+
+

   
 
@@ -932,9 +942,9 @@ SELECT * FROM pg_stop_backup(false);
 Making an exclusive low level backup
 
  The process for an exclusive backup is mostly the same as for a
- non-exclusive one, but it differs in a few key steps. It does not allow
- more than one concurrent backup to run, and there can be some issues on
- the server if it crashes during the backup. Prior to PostgreSQL 9.6, this
+ non-exclusive one, but it differs in a few key steps. This type of backup
+ can only be taken on a primary and does not allow concurrent backups.
+ Prior to PostgreSQL 9.6, this
  was the only low-level method available, but it is now recommended that
  all users upgrade their scripts to use non-exclusive backups if possible.
 
@@ -992,6 +1002,11 @@ SELECT pg_start_backup('label', true);
   for things to
  consider during this backup.
 
+
+  Note that if the server crashes during the backup it may not be
+  possible to restart until the backup_label file has been
+  manually deleted from the PGDATA directory.
+


 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 8482601294..01ebfb8d90 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18070,11 +18070,22 @@ postgres=# select pg_start_backup('label_goes_here');
 pg_start_backup. In a non-exclusive backup, the contents of
 the backup_label and tablespace_map are returned
 in the result of the function, and should be written to files in the
-backup (and not in the data directory).
+backup (and 

Re: [HACKERS] segment size depending *_wal_size defaults (was increasing the default WAL segment size)

2017-08-30 Thread Peter Eisentraut
On 8/30/17 00:45, Andres Freund wrote:
> 1) Currently the default for {min,max}_wal_size depends on the segment
>size. Given that the segment size is about to be configurable, that
>seems confusing.

On the one hand, I agree that it seems confusing and unnecessary to vary
this with the segment size.  On the other hand, the problem is that if
the segment size is larger than the default max_wal_size, we have to do
something different anyway.  Also, the reason for wanting a larger
segment size is that you expect a lot of WAL, so setting max_wal_size
larger by default caters to that.

Right now, we have max_wal_size = 1GB by default.  What should the
behavior be for wal_segment_size = 1GB?

> 2) Currently wal_segment_size is measured in GUC_UNIT_XBLOCKS, which
>requires us to keep two copies of the underlying variable, one in
>XBLOCKS one in bytes. I'd rather just have the byte variant.

It seems to me that one could structure the code to make do with the
existing variable.  I had a brief look at the patch, and I think some
more imaginative refactoring is possible.  If we want/need to change it,
I'd prefer using the _KB or _MB variants, so as to not to restrict
future size limits too much.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallel worker error

2017-08-30 Thread Pavan Deolasee
Hello,

While investing an issue in Postgres-XL 10, I came across this rather
surprisingly behaviour in PG master. See this test case:

create role testuser1;
set role to testuser1;
show role; -- shows testuser1

-- reset back to default role
reset session authorization ;
show role; -- shows none

set force_parallel_mode TO 1;
select count(*) from pg_class ; -- ok

-- now drop the role and try parallel query again
drop role  testuser1 ;
select count(*) from pg_class ; -- fails

The last statement in this test fails with an error:
ERROR:  role "testuser1" does not exist
CONTEXT:  parallel worker

Looks like the leader process when serialises the GUC state, saves the
"role" value, which is still set to testuser1 (and which is now dropped).
Later, when the parallel worker process tries to restore the GUC state, it
fails on catalog lookup.

Comments in show_role() mentions a kludge because apparently SET SESSION
AUTHORIZATION cannot call set_config_option and change the current value of
"role". So that probably explains why show_role() displays the correct
information, but parallel worker fails to handle the case.

It's quite possible that I don't understand the differences in "role" and
"session authorization", but it still looks like a bug to me. May
be SerializeGUCState() should check if SetRoleIsActive is true and only
then save the role information?

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-30 Thread Amit Kapila
On Tue, Aug 29, 2017 at 10:05 PM, Tom Lane  wrote:
> Amit Kapila  writes:
>> On Tue, Aug 29, 2017 at 8:32 AM, Robert Haas  wrote:
>>> There's already ExecParallelReinitialize, which could be made to walk
>>> the nodes in addition to what it does already, but I don't understand
>>> exactly what else needs fixing.
>
>> Sure, but it is not advisable to reset state of all the nodes below
>> gather at that place, otherwise, it will be more or less like we are
>> forcing rescan of each node.  I think there we can reset the shared
>> parallel state of parallel-aware nodes (or anything related) and then
>> allow rescan to reset the master backend specific state for all nodes
>> beneath gather.
>
> Right, the idea is to make this happen separately from the "rescan"
> logic.  In general, it's a good idea for ExecReScanFoo to do as little
> as possible, so that you don't pay if a node is rescanned more than
> once before it's asked to do anything, or indeed if no rows are ever
> demanded from it at all.
>
> Attached is a WIP patch along this line.
>

The idea looks sane to me.

>  It's unfinished because
> I've not done the tedium of extending the FDW and CustomScan APIs
> to support this new type of per-node operation; but that part seems
> straightforward enough.  The core code is complete and survived
> light testing.
>

I have also played a bit with both of the patches together and didn't
found any problem.  In your second patch, I have a minor comment.

void
  ExecReScanGather(GatherState *node)
  {
! /* Make sure any existing workers are gracefully shut down */
  ExecShutdownGatherWorkers(node);

The above call doesn't ensure the shutdown. It just ensures that we
receive all messages from parallel workers.  Basically, it doesn't
call WaitForParallelWorkersToExit.


> I'm pretty happy with the results --- note in
> particular how we get rid of some very dubious coding in
> ExecReScanIndexScan and ExecReScanIndexOnlyScan.
>
> If you try the test case from a2b70c89c on this patch alone, you'll notice
> that instead of sometimes reporting too-small counts during the rescans,
> it pretty consistently reports too-large counts.  This is because we are
> now successfully resetting the shared state for the parallel seqscan, but
> we haven't done anything about the leader's HashAgg node deciding that it
> can re-use its old hashtable.  So on the first scan, the leader typically
> scans all or most of the table because of its startup time advantage, and
> saves those counts in its hashtable.  On later scans, the workers read all
> of the table while the leader decides it need do no scanning.  So we get
> counts that reflect all of the table (from the workers) plus whatever part
> of the table the leader read the first time.  So this by no means removes
> the need for my other patch.
>
> If no objections, I'll do the additional legwork and push.
>

No objections.

>  As before,
> I think we can probably get away without fixing 9.6, even though it's
> nominally got the same bug.
>

+1.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expanding inheritance in partition bound order

2017-08-30 Thread Amit Khandekar
On 25 August 2017 at 23:58, Robert Haas  wrote:
>
> That just leaves indexes.  In a world where keystate, tupslot, and
> tupmap are removed from the PartitionDispatchData, you must need
> indexes or there would be no point in constructing a
> PartitionDispatchData object in the first place; any application that
> needs neither indexes nor the executor-specific stuff could just use
> the Relation directly.

But there is expand_inherited_rtentry() which neither requires indexes
nor any executor stuff, but still requires to call
RelationGetPartitionDispatchInfo(), and so these indexes get built
unnecessarily.

Looking at the latest patch, I can see that those indexes can be
separately built in ExecSetupPartitionTupleRouting() where it is
required, instead of in RelationGetPartitionDispatchInfo(). In the
loop which iterates through the pd[] returned from
RelationGetPartitionDispatchInfo(), we can build them using the exact
code currently written to build them in
RelationGetPartitionDispatchInfo().

In the future, if we require such applications where indexes are also
required, we may have a separate function only to build indexes, and
then ExecSetupPartitionTupleRouting() would also call that function.




On 21 August 2017 at 11:40, Amit Langote  wrote:
>> In ExecSetupPartitionTupleRouting(), not sure why ptrinfos array is an
>> array of pointers. Why can't it be an array of
>> PartitionTupleRoutingInfo structure  rather than pointer to that
>> structure ?
>
> AFAIK, assigning pointers is less expensive than assigning struct and we
> end up doing a lot of assigning of the members of that array to a local
> variable

I didn't get why exactly we would have to copy the structures. We
could just pass the address of ptrinfos[index], no ?

My only point for this was : we would not have to call palloc0() for
each of the element of ptrinfos. Instead, just allocate memory for all
of the elements in a single palloc0(). We anyways have to allocate
memory for *each* of the element.

> in get_partition_for_tuple(), for example.  Perhaps, we could
> avoid those assignments and implement it the way you suggest.

You mean at these 2 places in get_partition_for_tuple() , right ? :

1. /* start with the root partitioned table */
parent = ptrinfos[0];

2. else
parent = ptrinfos[-parent->pd->indexes[cur_index]];

Both of the above places, we could just use [...] instead of
ptrinfos[...]. But I guess you meant something else.



RelationGetPartitionDispatchInfo() opens all the partitioned tables.
But in ExecSetupPartitionTupleRouting(), it again opens all the
parents, that is all the partitioned tables, and closes them back.

Instead, would it be possible to do this : Instead of the
PartitionDispatch->parentoid field, PartitionDispatch can have
parentrel Relation field, which will point to reldesc field of one of
the pds[] elements.



For me, the CopyStateData->ptrinfos and ModifyTableState.mt_ptrinfos
field names sound confusing. How about part_tuple_routing_info or just
tuple_routing_info ?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] expanding inheritance in partition bound order

2017-08-30 Thread Amit Langote
On 2017/08/30 12:03, Robert Haas wrote:
> On Tue, Aug 29, 2017 at 10:36 PM, Amit Langote
>  wrote:
>>> I keep having the feeling that this is a big patch with a small patch
>>> struggling to get out.  Is it really necessary to change
>>> RelationGetPartitionDispatchInfo so much or could you just do a really
>>> minimal surgery to remove the code that sets the stuff we don't need?
>>> Like this:
>>
>> Sure, done in the attached updated patch.
> 
> On first glance, that looks pretty good.  I'll have a deeper look
> tomorrow.

Thanks.

> It strikes me that if PartitionTupleRoutingInfo is an
> executor structure, we should probably (as a separate patch) relocate
> FormPartitionKeyDatum and get_partition_for_tuple to someplace in
> src/backend/executor and rename the accordingly; maybe it's time for
> an execPartition.c? catalog/partition.c is getting really bug, so

I agree.

It would be a good idea to introduce an execPartition.c so that future
patches in this area (such as executor partition-pruning patch on the
horizon) have a convenient place to park their code.

Will see if I can make a patch for that.

> it's not a bad thing if some of that stuff gets moved elsewhere.  A
> lingering question in my mind, though, is whether it's really correct
> to think of PartitionTupleRoutingInfo as executor-specific.  Maybe
> we're going to need that for other things too?

Hmm.  Maybe, a subset of PartitionTupleRoutingInfo's fields are usable
outside the executor (only PartitionDispatch, which is exported by
partition.h anyway?), but not all of it.  For example, fields keystate,
tupslot seem pretty executor-specific.  In fact, they seem to be required
only for tuple routing.

One idea is to not introduce PartitionTupleRoutingInfo at all and add its
fields directly as ModifyTableState/CopyState fields.  We currently have,
for example, a mt_partition_tupconv_maps array with one slot for every
leaf partition.  Maybe, there could be following fields in
ModifyTableState (and similarly in CopyState):

int   mt_num_parted/* this one exists today */
struct PartitionDispatchData **mt_partition_dispatch_info; /* and this */
List**mt_parted_keystate;
TupleConversionMap  **mt_parted_tupconv_maps;
TupleTableSlot  **mt_parted_tupslots;

Each of those arrays will have mt_num_parted slots.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Tuple-routing for certain partitioned tables not working as expected

2017-08-30 Thread Etsuro Fujita

On 2017/08/30 9:13, Amit Langote wrote:

On 2017/08/29 20:18, Etsuro Fujita wrote:

On 2017/08/25 22:26, Robert Haas wrote:

On Wed, Aug 23, 2017 at 4:55 AM, Etsuro Fujita
 wrote:

Agreed, but I'd vote for fixing this in v10 as proposed; I agree that just
ripping the CheckValidResultRel checks out entirely is not a good idea,
but
that seems OK to me at least as a fix just for v10.


I'm still not on-board with having this be the one case where we don't
do CheckValidResultRel.  If we want to still call it but pass down
some additional information that can selectively skip certain checks,
I could probably live with that.


Another idea would be to not do CheckValidResultRel for partitions in
ExecSetupPartitionTupleRouting; instead, do that the first time the
partition is chosen by ExecFindPartition, and if successfully checked,
initialize the partition's ResultRelInfo and other stuff.  (We could skip
this after the first time by checking whether we already have a valid
ResultRelInfo for the chosen partition.)  That could allow us to not only
call CheckValidResultRel the way it is, but avoid initializing useless
partitions for which tuples aren't routed at all.


I too have thought about the idea of lazy initialization of the partition
ResultRelInfos.  I think it would be a good idea, but definitely something
for PG 11.


Agreed.  Here is a patch to skip the CheckValidResultRel checks for a 
target table if it's a foreign partition to perform tuple-routing for, 
as proposed by Robert.


Best regards,
Etsuro Fujita
*** /dev/null
--- b/contrib/file_fdw/data/list1.csv
***
*** 0 
--- 1,2 
+ 1,foo
+ 1,bar
*** /dev/null
--- b/contrib/file_fdw/data/list2.bad
***
*** 0 
--- 1,2 
+ 2,baz
+ 1,qux
*** /dev/null
--- b/contrib/file_fdw/data/list2.csv
***
*** 0 
--- 1,2 
+ 2,baz
+ 2,qux
*** a/contrib/file_fdw/input/file_fdw.source
--- b/contrib/file_fdw/input/file_fdw.source
***
*** 162,167  SELECT tableoid::regclass, * FROM agg FOR UPDATE;
--- 162,188 
  ALTER FOREIGN TABLE agg_csv NO INHERIT agg;
  DROP TABLE agg;
  
+ -- declarative partitioning tests
+ SET ROLE regress_file_fdw_superuser;
+ CREATE TABLE pt (a int, b text) partition by list (a);
+ CREATE FOREIGN TABLE p1 partition of pt for values in (1) SERVER file_server
+ OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',');
+ CREATE TABLE p2 partition of pt for values in (2);
+ SELECT tableoid::regclass, * FROM pt;
+ SELECT tableoid::regclass, * FROM p1;
+ SELECT tableoid::regclass, * FROM p2;
+ COPY pt FROM '@abs_srcdir@/data/list2.bad' with (format 'csv', delimiter 
','); -- ERROR
+ COPY pt FROM '@abs_srcdir@/data/list2.csv' with (format 'csv', delimiter ',');
+ SELECT tableoid::regclass, * FROM pt;
+ SELECT tableoid::regclass, * FROM p1;
+ SELECT tableoid::regclass, * FROM p2;
+ INSERT INTO pt VALUES (1, 'xyzzy'); -- ERROR
+ INSERT INTO pt VALUES (2, 'xyzzy');
+ SELECT tableoid::regclass, * FROM pt;
+ SELECT tableoid::regclass, * FROM p1;
+ SELECT tableoid::regclass, * FROM p2;
+ DROP TABLE pt;
+ 
  -- privilege tests
  SET ROLE regress_file_fdw_superuser;
  SELECT * FROM agg_text ORDER BY a;
*** a/contrib/file_fdw/output/file_fdw.source
--- b/contrib/file_fdw/output/file_fdw.source
***
*** 289,294  SELECT tableoid::regclass, * FROM agg FOR UPDATE;
--- 289,375 
  
  ALTER FOREIGN TABLE agg_csv NO INHERIT agg;
  DROP TABLE agg;
+ -- declarative partitioning tests
+ SET ROLE regress_file_fdw_superuser;
+ CREATE TABLE pt (a int, b text) partition by list (a);
+ CREATE FOREIGN TABLE p1 partition of pt for values in (1) SERVER file_server
+ OPTIONS (format 'csv', filename '@abs_srcdir@/data/list1.csv', delimiter ',');
+ CREATE TABLE p2 partition of pt for values in (2);
+ SELECT tableoid::regclass, * FROM pt;
+  tableoid | a |  b  
+ --+---+-
+  p1   | 1 | foo
+  p1   | 1 | bar
+ (2 rows)
+ 
+ SELECT tableoid::regclass, * FROM p1;
+  tableoid | a |  b  
+ --+---+-
+  p1   | 1 | foo
+  p1   | 1 | bar
+ (2 rows)
+ 
+ SELECT tableoid::regclass, * FROM p2;
+  tableoid | a | b 
+ --+---+---
+ (0 rows)
+ 
+ COPY pt FROM '@abs_srcdir@/data/list2.bad' with (format 'csv', delimiter 
','); -- ERROR
+ ERROR:  cannot route inserted tuples to a foreign table
+ CONTEXT:  COPY pt, line 2: "1,qux"
+ COPY pt FROM '@abs_srcdir@/data/list2.csv' with (format 'csv', delimiter ',');
+ SELECT tableoid::regclass, * FROM pt;
+  tableoid | a |  b  
+ --+---+-
+  p1   | 1 | foo
+  p1   | 1 | bar
+  p2   | 2 | baz
+  p2   | 2 | qux
+ (4 rows)
+ 
+ SELECT tableoid::regclass, * FROM p1;
+  tableoid | a |  b  
+ --+---+-
+  p1   | 1 | foo
+  p1   | 1 | bar
+ (2 rows)
+ 
+ SELECT tableoid::regclass, * FROM p2;
+  tableoid | a |  b  
+ --+---+-
+  p2   | 2 | baz
+  p2   | 2 | qux
+ (2 rows)
+ 
+ INSERT INTO pt VALUES (1, 'xyzzy'); -- 

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2017-08-30 Thread Amit Kapila
On Wed, Aug 30, 2017 at 2:43 AM, Robert Haas  wrote:
> On Tue, Jul 4, 2017 at 12:33 AM, Amit Kapila  wrote:
>> I have updated the patch to support wait events and moved it to upcoming CF.
>
> This patch doesn't apply any more, but I made it apply with a hammer
> and then did a little benchmarking (scylla, EDB server, Intel Xeon
> E5-2695 v3 @ 2.30GHz, 2 sockets, 14 cores/socket, 2 threads/core).
> The results were not impressive.  There's basically no clog contention
> to remove, so the patch just doesn't really do anything.
>

Yeah, in such a case patch won't help.

>  For example,
> here's a wait event profile with master and using Ashutosh's test
> script with 5 savepoints:
>
>   1  Lock| tuple
>   2  IO  | SLRUSync
>   5  LWLock  | wal_insert
>   5  LWLock  | XidGenLock
>   9  IO  | DataFileRead
>  12  LWLock  | lock_manager
>  16  IO  | SLRURead
>  20  LWLock  | CLogControlLock
>  97  LWLock  | buffer_content
> 216  Lock| transactionid
> 237  LWLock  | ProcArrayLock
>1238  IPC | ProcArrayGroupUpdate
>2266  Client  | ClientRead
>
> This is just a 5-minute test; maybe things would change if we ran it
> for longer, but if only 0.5% of the samples are blocked on
> CLogControlLock without the patch, obviously the patch can't help
> much.  I did some other experiments too, but I won't bother
> summarizing the results here because they're basically boring.  I
> guess I should have used a bigger machine.
>

That would have been better. In any case, will do the tests on some
higher end machine and will share the results.

> Given that we've changed the approach here somewhat, I think we need
> to validate that we're still seeing a substantial reduction in
> CLogControlLock contention on big machines.
>

Sure will do so.  In the meantime, I have rebased the patch.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


group_update_clog_v14.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-30 Thread Fabien COELHO


Hello,


Hmmm. The existing "is_no_vacuum" variable is typed *both* as int (in
"main") and as bool (in "init"), called by main (yuk!). I see no reason to
choose the bad one for the global:-)


Yeah, I think this might be a good timing to re-consider int-for-bool
habits in pgbench. If we decided to change is_no_vacuum to bool I want
to change other similar variables as well.


Franckly I would be fine with that, but committers might get touchy about 
"unrelated changes" in the patch... The "is_no_vacuum" is related to the 
patch and is already a bool -- if you chose the "init" definition as a 
reference -- so it is okay to bool it.



I think that the "-I" it should be added to the "--help" line, as it is done
with other short & long options.


Okay, I'll leave it as of now. Maybe we can discuss later.


Maybe we did not understand one another. I'm just suggesting to insert
-I in the help line, that is change:

   "  --custom-initialize=[...]+\n"

to

   "  -I, --custom-initialize=[...]+\n"

I'm not sure it deserves to be discussed in depth later:-)


I wonder if this could be avoided easily? Maybe by setting the constraint
name explicitely so that the second one fails on the existing one, which is
fine, like for primary keys? [...]


Good point, I agree with first option.


Ok.


Maybe the initial cleanup (DROP TABLE) could be made an option added to the
default, so that cleaning up the database could be achieved with some
"pgbench -i -I c", instead of connecting and droping the tables one by one
which I have done quite a few times... What do you think?


Yeah, I sometimes wanted that. Having the cleaning up tables option
would be good idea.


Ok.


I'd say "g" for data generation would be better. Also, I'm inclined to
add a command for the unlogged tables. How about this?

c - [c]leanup / or [d]rop tables
t - create table / [t]able creation or [c]reate table
u - create unlogged table
g - data generation / [g]enerate data
p - [p]rimary key
f - [f]oreign key
v - [v]acuum


I'm okay with that. I also put an alternative with d/c above, without 
any preference from my part.


I'm not sure about "u", though. Unlogged, like tablespace, is an 
orthogonal option: other table creation options (I intend to submit one 
which conforms to the TPC-B standard, that is use an INT8 balance as INT4 
is not wide enough per spec, and always use an INT8 aid) may be also 
unlogged or tablespaced. So that would mean having two ways to trigger 
them... thus I would avoid it and keep only --unlogged.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] segment size depending *_wal_size defaults (was increasing the default WAL segment size)

2017-08-30 Thread Magnus Hagander
On Wed, Aug 30, 2017 at 6:45 AM, Andres Freund  wrote:

> On 2017-08-30 12:52:26 +0900, Michael Paquier wrote:
> > On Wed, Aug 30, 2017 at 11:20 AM, Peter Eisentraut
> >  wrote:
> > > On 8/29/17 20:36, Andres Freund wrote:
> > >> So the question is whether we want {max,min}_wal_size be sized in
> > >> multiples of segment sizes or as a proper byte size.  I'm leaning
> > >> towards the latter.
> > >
> > > I'm not sure what the question is or what its impact would be.
> >
> > FWIW, I get the question as: do we want the in-memory values of
> > min/max_wal_size to be calculated in MB (which is now the case) or
> > just bytes. Andres tends for using bytes.
>
> Not quite.  There's essentially two things:
>
> 1) Currently the default for {min,max}_wal_size depends on the segment
>size. Given that the segment size is about to be configurable, that
>seems confusing.
> 2) Currently wal_segment_size is measured in GUC_UNIT_XBLOCKS, which
>requires us to keep two copies of the underlying variable, one in
>XBLOCKS one in bytes. I'd rather just have the byte variant.
>

I'd say we definitely want the "user interface" to be in
bytes(/mbytes/gbytes etc). We used to have that in segments and it was
quite confusing for a lot of new uers, and seemed very silly...

Also agreed that (1) above seems very confusing. Going to using bytes all
the way seems a lot more clear.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/