Re: [HACKERS] SQL/JSON in PostgreSQL

2018-01-05 Thread Pavel Stehule
Hi

I am checking the JSONPath related code

Questions, notes:

1. jsonpath operators are not consistent with any other .. json, xml .. I
am missing ?, @> operátors
2. documentation issue - there is "'{"a":[1,2,3,4,5]}'::json *? '$.a[*] ?
(@ > 2)'" - operator *? doesn't exists
3. operator @~ looks like too aggressive shortcut - should be better
commented

What is not clean, if jsonpath should to create some new operators for
json, jsonb types? It is special filter, defined by type, so from my
perspective the special operators are not necessary.

Regards

Pavel


Re: January CommitFest is underway!

2018-01-05 Thread Michael Paquier
On Sat, Jan 6, 2018 at 9:58 AM, Ryan Murphy  wrote:
>> BTW, for those who aren't already aware of it, I'd like to point out
>> this very handy resource:
>>
>> http://commitfest.cputube.org
>
> Wow, that makes it way easier to see what's going on!  Thanks Tom.

Thomas Munro has created that.
-- 
Michael



Re: [HACKERS] SQL/JSON in PostgreSQL

2018-01-05 Thread Pavel Stehule
2018-01-03 17:04 GMT+01:00 Oleg Bartunov :

>
>
> On 3 Jan 2018 00:23, "Andrew Dunstan" 
> wrote:
>
>
>
> On 01/02/2018 05:04 PM, Nikita Glukhov wrote:
> >
> > I have removed all extra features from the patch set, they can be
> > found in our
> > github repository:
> > https://github.com/postgrespro/sqljson/tree/sqljson_ext.
> >
> > Now there are 10 patches which have the following dependencies:
> >
> > 1:
> > 2: 1
> > 3: 2
> > 4: 2
> > 5:
> > 6:
> > 7: 2, 5, 6
> > 8: 7, 4
> > 9: 7
> > 10: 8, 9
> >
>
>
> OK. We need to get this into the commitfest. Preferably not all in one
> piece. I would do:
>
> 1, 5, and 6  independently
> 2, 3 and 4 together
> 7 and 8 together
> 9 and 10 together
>
>
> What's about 11,12 ? They are about json_table.
>

I did some tests of json_table and it is working well.

good work

Pavel


> Those seem like digestible pieces.
>
> Also, there is a spurious BOM at the start of
> src/test/regress/sql/sqljson.sql in patch 7. Please fix that.
>
> cheers
>
> andrew
>
> --
> Andrew Dunstanhttps://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
>


Re: Invalid pg_upgrade error message during live check

2018-01-05 Thread Bruce Momjian
On Fri, Jan  5, 2018 at 04:58:39PM -0500, Bruce Momjian wrote:
> Pg_upgrade is able to run in --check mode when the old server is still
> running.  Unfortunately, all supported versions of pg_upgrade generate
> an incorrect (but harmless) "failure" message when doing this:

Based on recent discussions, I am thinking we would just fix this in PG
10 and HEAD and leave the harmless error message in the other branches,
right?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: pgsql: pg_upgrade: simplify code layout in a few places

2018-01-05 Thread Bruce Momjian
On Fri, Jan  5, 2018 at 08:39:46PM -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-01-05 18:57:55 -0500, Bruce Momjian wrote:
> >> On Fri, Jan  5, 2018 at 03:51:15PM -0800, Andres Freund wrote:
> >>> Also, leaving translatability aside, why was *any* of this backpatched?
> 
> >> Tom has preferred that I backpatch all safe patches so we keep that code
> >> consistent so we can backpatch other things more easily.
> 
> > I've a hard time believing this. Tom?
> 
> I've been known to back-patch stuff just to keep branches consistent,
> but it's always a judgement call.  In this case I wouldn't have done it
> (even if the patch were a good idea in HEAD) because it would cause
> churn in translatable messages in the back branches.  Also, the case
> for cosmetic back-patching is only strong when a particular file is
> already pretty similar across all branches, and I'm not sure that
> holds for pg_upgrade.

There was a time when pg_upgrade was similar in all branches and
churning a lot with fixes, so I was going on that plan.  At this point I
don't think that is true anymore, so maybe we can switch just to head
and PG 10 on this.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: pgsql: pg_upgrade: simplify code layout in a few places

2018-01-05 Thread Tom Lane
Andres Freund  writes:
> On 2018-01-05 18:57:55 -0500, Bruce Momjian wrote:
>> On Fri, Jan  5, 2018 at 03:51:15PM -0800, Andres Freund wrote:
>>> Also, leaving translatability aside, why was *any* of this backpatched?

>> Tom has preferred that I backpatch all safe patches so we keep that code
>> consistent so we can backpatch other things more easily.

> I've a hard time believing this. Tom?

I've been known to back-patch stuff just to keep branches consistent,
but it's always a judgement call.  In this case I wouldn't have done it
(even if the patch were a good idea in HEAD) because it would cause
churn in translatable messages in the back branches.  Also, the case
for cosmetic back-patching is only strong when a particular file is
already pretty similar across all branches, and I'm not sure that
holds for pg_upgrade.

regards, tom lane



Re: January CommitFest is underway!

2018-01-05 Thread Ryan Murphy
> BTW, for those who aren't already aware of it, I'd like to point out
> this very handy resource:
>
> http://commitfest.cputube.org
>
>
Wow, that makes it way easier to see what's going on!  Thanks Tom.


Re: [HACKERS] Timeline ID in backup_label file

2018-01-05 Thread Michael Paquier
On Sat, Jan 6, 2018 at 9:54 AM, Michael Paquier
 wrote:
> On Fri, Jan 5, 2018 at 11:13 PM, Simon Riggs  wrote:
>> On 27 November 2017 at 14:06, David Steele  wrote:
>>
>>> I have tested and get an error as expected when I munge the backup_label
>>> file:
>>>
>>> FATAL:  invalid data in file "backup_label"
>>> DETAIL:  Timeline ID parsed is 2, but expected 1
>>>
>>> Everything else looks good so I will mark it ready for committer.
>>
>> Sounds good.
>
> Thanks for the feedback.

Previous patch was incorrect, this was the same v2 as upthread.
Attached is the correct v3.
-- 
Michael


backup_label_tli_v3.patch
Description: Binary data


Re: [HACKERS] Timeline ID in backup_label file

2018-01-05 Thread Michael Paquier
On Fri, Jan 5, 2018 at 11:13 PM, Simon Riggs  wrote:
> On 27 November 2017 at 14:06, David Steele  wrote:
>
>> I have tested and get an error as expected when I munge the backup_label
>> file:
>>
>> FATAL:  invalid data in file "backup_label"
>> DETAIL:  Timeline ID parsed is 2, but expected 1
>>
>> Everything else looks good so I will mark it ready for committer.
>
> Sounds good.

Thanks for the feedback.

> No tests?

Do you think that's worth the cycles as a TAP test? This can be done
by generating a dummy backup_label file and then start a standby to
see the failure, but this looks quite costly for minimum gain. This
code path is also taken tested in
src/test/recovery/t/001_stream_rep.pl for example, though that's not
the failure path. So if we add a DEBUG1 line after fetching correctly
the timeline ID this could help by looking at
tmp_check/log/001_stream_rep_standby_1.log, but this needs to set
"log_min_messages = debug1" PostgresNode.pm for example so as this
shows up in the logs (this could be useful if done by default
actually, DEBUG1 is not too talkative). Attached is an example of
patch doing so. See for the addition in PostgresNode.pm and this new
bit:
+   ereport(DEBUG1,
+   (errmsg("backup timeline %u in file \"%s\"",
+   tli_from_file, BACKUP_LABEL_FILE)));

> No docs/extended explanatory comments?

There is no existing documentation about the format of the backup_file
in doc/. So it seems to me that it is not the problem of this patch to
fill in the hole. Regarding the comments, perhaps something better
could be done, but I am not quite sure what. We don't much explain
what the current fields mean, except that one can guess what they mean
from their names, which is the intention behind the code.
-- 
Michael


backup_label_tli_v2.patch
Description: Binary data


Re: pgsql: Implement channel binding tls-server-end-point for SCRAM

2018-01-05 Thread Michael Paquier
On Sat, Jan 6, 2018 at 2:56 AM, Alvaro Herrera  wrote:
> Peter Eisentraut wrote:
>> On 1/5/18 09:28, Michael Paquier wrote:
>> > In order to do things cleanly, we should make this TAP test
>> > conditional on the version of OpenSSL.
>>
>> How about looking into pg_config.h, like in the attached patch?

+# Determine whether build supports tls-server-end-point.
+open my $pg_config_h, '<', '../../include/pg_config.h' or die "$!";
+my $supports_tls_server_end_point = (grep {/^#define
HAVE_X509_GET_SIGNATURE_NID 1/} <$pg_config_h>);
+close $pg_config_h;

Good idea as a whole, but I don't think this is the right approach. As
we include $(bindir) in PATH when running the prove command, why not
getting the include path from "pg_config --includedir"?

> I suppose if this starts to spread, we'll come up with a better approach
> ... maybe generating a Perl file with variables that can be slurped more
> ellegantly, or something like that.  (We mentioned the need for
> config-based test selection re. patches for new SSL implementations.)

One case I have in mind where this would be nice to have is
020_pg_receivewal.pl to have tests depending on if PG is built with
zlib or not. So we definitely want something more. At least I do. I
agree that the most elegant approach would be to generate pg_config.h
from this variable set, and not feed on parsing pg_config.h directly.
Or we could just live with an API in TestLib.pm which is able to get
the wanted information as Peter is doing but for a wanted variable
from pg_config. I could use that for my test case with HAVE_LIBZ as
well.
--
Michael



Re: pgsql: pg_upgrade: simplify code layout in a few places

2018-01-05 Thread Andres Freund
On 2018-01-05 18:57:55 -0500, Bruce Momjian wrote:
> On Fri, Jan  5, 2018 at 03:51:15PM -0800, Andres Freund wrote:
> > On 2018-01-05 14:20:59 -0500, Robert Haas wrote:
> > > On Fri, Jan 5, 2018 at 2:11 PM, Bruce Momjian  wrote:
> > > > pg_upgrade:  simplify code layout in a few places
> > > >
> > > > Backpatch-through: 9.4 (9.3 didn't need improving)
> > > 
> > > Hmm.  We don't normally do things like this, because it breaks 
> > > translatability.
> > 
> > Also, leaving translatability aside, why was *any* of this backpatched?
> > Unless there's very good maintainability reasons we normally don't
> > backpatch minor refactorings?
> 
> Tom has preferred that I backpatch all safe patches so we keep that code
> consistent so we can backpatch other things more easily.

I've a hard time believing this. Tom?

Greetings,

Andres Freund



Re: pgsql: pg_upgrade: simplify code layout in a few places

2018-01-05 Thread Bruce Momjian
On Fri, Jan  5, 2018 at 03:51:15PM -0800, Andres Freund wrote:
> On 2018-01-05 14:20:59 -0500, Robert Haas wrote:
> > On Fri, Jan 5, 2018 at 2:11 PM, Bruce Momjian  wrote:
> > > pg_upgrade:  simplify code layout in a few places
> > >
> > > Backpatch-through: 9.4 (9.3 didn't need improving)
> > 
> > Hmm.  We don't normally do things like this, because it breaks 
> > translatability.
> 
> Also, leaving translatability aside, why was *any* of this backpatched?
> Unless there's very good maintainability reasons we normally don't
> backpatch minor refactorings?

Tom has preferred that I backpatch all safe patches so we keep that code
consistent so we can backpatch other things more easily.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: pgsql: pg_upgrade: simplify code layout in a few places

2018-01-05 Thread Andres Freund
On 2018-01-05 14:20:59 -0500, Robert Haas wrote:
> On Fri, Jan 5, 2018 at 2:11 PM, Bruce Momjian  wrote:
> > pg_upgrade:  simplify code layout in a few places
> >
> > Backpatch-through: 9.4 (9.3 didn't need improving)
> 
> Hmm.  We don't normally do things like this, because it breaks 
> translatability.

Also, leaving translatability aside, why was *any* of this backpatched?
Unless there's very good maintainability reasons we normally don't
backpatch minor refactorings?

Greetings,

Andres Freund



Re: [HACKERS] Proposal: Local indexes for partitioned table

2018-01-05 Thread Alvaro Herrera
Robert Haas wrote:
> On Fri, Jan 5, 2018 at 4:57 PM, Peter Eisentraut
>  wrote:
> > On 1/4/18 23:08, David Rowley wrote:

> >> I admit to not having had a chance to look at any code with this yet,
> >> but I'm just thinking about a case like the following.
> >>
> >> CREATE TABLE part (a INT, b INT) PARTITION BY RANGE (a);
> >> CREATE TABLE part_a1 PARTITION OF part FOR VALUES FROM (0) TO (10)
> >> PARTITION BY RANGE (b);
> >> CREATE TABLE part_a1_b1 PARTITION OF part_a1 FOR VALUES FROM (0) TO (10);
> >>
> >> CREATE INDEX ON part_a1 (a); -- sub-partition index (creates index on
> >> part_a1_b1)
> >>
> >> CREATE INDEX ON part (a); -- What do we do here?
> >>
> >> Should we:
> >>
> >> 1. Create another identical index on part_a1_b1; or
> >> 2. Allow the existing index on part_a1_b1 to have multiple parents; or
> >> 3. ERROR... (probably not)
> >
> > 4. It should adopt part_a1 and its subindexes into its hierarchy.  That
> > shouldn't be a problem under the current theory, should it?
> 
> +1.

This is what the code does today.  IIRC there's a test for this exact
scenario.  Now, part_a1_b1 is grandchild of part, not direct parent --
so there exists an intermediate relkind=I index in part_a1, and that is
the one that becomes parent of part_a1_b1, not part's directly.

Now, if you drop the index in part, then it gets dropped in all
descendants too, including part_a1_b1.  If you DETACH the partition
first, then the index remains.  All of that seems good to me, and is as
discussed previously.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Proposal: Local indexes for partitioned table

2018-01-05 Thread Robert Haas
On Fri, Jan 5, 2018 at 4:57 PM, Peter Eisentraut
 wrote:
> On 1/4/18 23:08, David Rowley wrote:
>> On 5 January 2018 at 11:01, Alvaro Herrera  wrote:
>>> (The more I think of this, the more I believe that pg_inherits is a
>>> better answer.  Opinions?)
>>
>> I admit to not having had a chance to look at any code with this yet,
>> but I'm just thinking about a case like the following.
>>
>> CREATE TABLE part (a INT, b INT) PARTITION BY RANGE (a);
>> CREATE TABLE part_a1 PARTITION OF part FOR VALUES FROM (0) TO (10)
>> PARTITION BY RANGE (b);
>> CREATE TABLE part_a1_b1 PARTITION OF part_a1 FOR VALUES FROM (0) TO (10);
>>
>> CREATE INDEX ON part_a1 (a); -- sub-partition index (creates index on
>> part_a1_b1)
>>
>> CREATE INDEX ON part (a); -- What do we do here?
>>
>> Should we:
>>
>> 1. Create another identical index on part_a1_b1; or
>> 2. Allow the existing index on part_a1_b1 to have multiple parents; or
>> 3. ERROR... (probably not)
>
> 4. It should adopt part_a1 and its subindexes into its hierarchy.  That
> shouldn't be a problem under the current theory, should it?

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Proposal: Local indexes for partitioned table

2018-01-05 Thread Robert Haas
On Thu, Jan 4, 2018 at 5:01 PM, Alvaro Herrera  wrote:
> Tangentially: I didn't like very much that I added a new index to
> pg_index to support this feature.  I thought maybe it'd be better to
> change the index on indrelid to be on (indrelid,indparentidx) instead,
> but that doesn't seem great either because it bloats that index which is
> used to support common relcache operations ...
>
> (The more I think of this, the more I believe that pg_inherits is a
> better answer.  Opinions?)

I actually haven't looked at the code, but the idea that pg_inherits
is on the way out is news to me.  If that method will work, I don't
quite see why we should invent something new.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Invalid pg_upgrade error message during live check

2018-01-05 Thread Bruce Momjian
Pg_upgrade is able to run in --check mode when the old server is still
running.  Unfortunately, all supported versions of pg_upgrade generate
an incorrect (but harmless) "failure" message when doing this:

$ pg_upgrade -b /u/pgsql.old/bin -B /u/pgsql/bin \
-d /u/pgsql.old/data/ -D /u/pgsql/data --link --check

--> *failure*
--> Consult the last few lines of "pg_upgrade_server.log" for
--> the probable cause of the failure.
Performing Consistency Checks on Old Live Server

Checking cluster versions   ok
Checking database user is the install user  ok
Checking database connection settings   ok
Checking for prepared transactions  ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch   ok
Checking for invalid "unknown" user columns ok
Checking for hash indexes   ok
Checking for roles starting with "pg_"  ok
Checking for presence of required libraries ok
Checking database user is the install user  ok
Checking for prepared transactions  ok

*Clusters are compatible*

Embarrassingly, I found out about this bug when watching a presentation in
Frankfurt:

PostgreSQL-Upgrade: Best Practices

https://www.ittage.informatik-aktuell.de/programm/2017/postgresql-upgrade-best-practices/

and there is a blog entry about it too:


https://blog.dbi-services.com/does-pg_upgrade-in-check-mode-raises-a-failure-when-the-old-cluster-is-running/

(Yeah, it stinks to be me.  :-) )

So, I looked into the code and it turns out that start_postmaster()
needs to run exec_prog() in a way that allows it to fail and continue
_without_ generating an error message.  There are other places that need
to run exec_prog() and allow it to fail and continue _and_ generate an
error message.

To fix this, I modified the exec_prog() API to separately control
reporting and exiting-on errors.  The attached patch does this and
generates the proper output:

$ pg_upgrade -b /u/pgsql.old/bin -B /u/pgsql/bin \
-d /u/pgsql.old/data/ -D /u/pgsql/data --link --check

Performing Consistency Checks on Old Live Server

Checking cluster versions   ok
Checking database user is the install user  ok
Checking database connection settings   ok
Checking for prepared transactions  ok
Checking for reg* data types in user tables ok
Checking for contrib/isn with bigint-passing mismatch   ok
Checking for invalid "unknown" user columns ok
Checking for hash indexes   ok
Checking for roles starting with "pg_"  ok
Checking for presence of required libraries ok
Checking database user is the install user  ok
Checking for prepared transactions  ok

*Clusters are compatible*

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +
diff --git a/src/bin/pg_upgrade/dump.c b/src/bin/pg_upgrade/dump.c
new file mode 100644
index 5ed6b78..8a662e9
*** a/src/bin/pg_upgrade/dump.c
--- b/src/bin/pg_upgrade/dump.c
*** generate_old_dump(void)
*** 23,29 
  	prep_status("Creating dump of global objects");
  
  	/* run new pg_dumpall binary for globals */
! 	exec_prog(UTILITY_LOG_FILE, NULL, true,
  			  "\"%s/pg_dumpall\" %s --globals-only --quote-all-identifiers "
  			  "--binary-upgrade %s -f %s",
  			  new_cluster.bindir, cluster_conn_opts(&old_cluster),
--- 23,29 
  	prep_status("Creating dump of global objects");
  
  	/* run new pg_dumpall binary for globals */
! 	exec_prog(UTILITY_LOG_FILE, NULL, true, true,
  			  "\"%s/pg_dumpall\" %s --globals-only --quote-all-identifiers "
  			  "--binary-upgrade %s -f %s",
  			  new_cluster.bindir, cluster_conn_opts(&old_cluster),
diff --git a/src/bin/pg_upgrade/exec.c b/src/bin/pg_upgrade/exec.c
new file mode 100644
index ea45434..9122e27
*** a/src/bin/pg_upgrade/exec.c
--- b/src/bin/pg_upgrade/exec.c
*** get_bin_version(ClusterInfo *cluster)
*** 71,86 
   * and attempts to execute that command.  If the command executes
   * successfully, exec_prog() returns true.
   *
!  * If the command fails, an error message i

Re: [HACKERS] Proposal: Local indexes for partitioned table

2018-01-05 Thread Peter Eisentraut
On 1/4/18 23:08, David Rowley wrote:
> On 5 January 2018 at 11:01, Alvaro Herrera  wrote:
>> (The more I think of this, the more I believe that pg_inherits is a
>> better answer.  Opinions?)
> 
> I admit to not having had a chance to look at any code with this yet,
> but I'm just thinking about a case like the following.
> 
> CREATE TABLE part (a INT, b INT) PARTITION BY RANGE (a);
> CREATE TABLE part_a1 PARTITION OF part FOR VALUES FROM (0) TO (10)
> PARTITION BY RANGE (b);
> CREATE TABLE part_a1_b1 PARTITION OF part_a1 FOR VALUES FROM (0) TO (10);
> 
> CREATE INDEX ON part_a1 (a); -- sub-partition index (creates index on
> part_a1_b1)
> 
> CREATE INDEX ON part (a); -- What do we do here?
> 
> Should we:
> 
> 1. Create another identical index on part_a1_b1; or
> 2. Allow the existing index on part_a1_b1 to have multiple parents; or
> 3. ERROR... (probably not)

4. It should adopt part_a1 and its subindexes into its hierarchy.  That
shouldn't be a problem under the current theory, should it?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Removing useless DISTINCT clauses

2018-01-05 Thread Jeff Janes
On Mon, Nov 6, 2017 at 1:16 AM, David Rowley 
wrote:

> In [1] we made a change to process the GROUP BY clause to remove any
> group by items that are functionally dependent on some other GROUP BY
> items.
>
> This really just checks if a table's PK columns are entirely present
> in the GROUP BY clause and removes anything else belonging to that
> table.
>
> All this seems to work well, but I totally failed to consider that the
> exact same thing applies to DISTINCT too.
>
> Over in [2], Rui Liu mentions that the planner could do a better job
> for his case.
>
> Using Rui Liu's example:
>
> CREATE TABLE test_tbl ( k INT PRIMARY KEY, col text);
> INSERT into test_tbl select generate_series(1,1000), 'test';
>
> Master:
>
> postgres=# explain analyze verbose select distinct col, k from
> test_tbl order by k limit 1000;
>QUERY
> PLAN
> 
> 
> -
>  Limit  (cost=1658556.19..1658563.69 rows=1000 width=9) (actual
> time=8934.962..8935.495 rows=1000 loops=1)
>Output: col, k
>->  Unique  (cost=1658556.19..1733557.50 rows=1175 width=9)
> (actual time=8934.961..8935.460 rows=1000 loops=1)
>  Output: col, k
>  ->  Sort  (cost=1658556.19..1683556.63 rows=1175 width=9)
> (actual time=8934.959..8935.149 rows=1000 loops=1)
>Output: col, k
>Sort Key: test_tbl.k, test_tbl.col
>Sort Method: external merge  Disk: 215128kB
>->  Seq Scan on public.test_tbl  (cost=0.00..154056.75
> rows=1175 width=9) (actual time=0.062..1901.728 rows=1000
> loops=1)
>  Output: col, k
>  Planning time: 0.092 ms
>  Execution time: 8958.687 ms
> (12 rows)
>
> Patched:
>
> postgres=# explain analyze verbose select distinct col, k from
> test_tbl order by k limit 1000;
>
>  QUERY PLAN
> 
> 
> --
>  Limit  (cost=0.44..34.31 rows=1000 width=9) (actual time=0.030..0.895
> rows=1000 loops=1)
>Output: col, k
>->  Unique  (cost=0.44..338745.50 rows=1175 width=9) (actual
> time=0.029..0.814 rows=1000 loops=1)
>  Output: col, k
>  ->  Index Scan using test_tbl_pkey on public.test_tbl
> (cost=0.44..313745.06 rows=1175 width=9) (actual time=0.026..0.452
> rows=1000 loops=1)
>Output: col, k
>  Planning time: 0.152 ms
>  Execution time: 0.985 ms
> (8 rows)
>
> A patch to implement this is attached.
>
>
Couldn't the Unique node be removed entirely?  If k is a primary key, you
can't have duplicates in need of removal.

Or would that be a subject for a different patch?

I think remove_functionally_dependant_groupclauses should have a more
generic name, like remove_functionally_dependant_clauses.

Cheers,

Jeff


Re: [HACKERS] Transaction control in procedures

2018-01-05 Thread Peter Eisentraut
A merge conflict has arisen, so for simplicity, here is an updated patch.

On 12/20/17 10:08, Peter Eisentraut wrote:
> Updated patch attached.
> 
> I have addressed the most recent review comments I believe.
> 
> The question about what happens to cursor loops in PL/Perl and PL/Python
> would be addressed by the separate thread "portal pinning".  The test
> cases in this patch are currently marked by FIXMEs.
> 
> I have changed the SPI API a bit.  I got rid of SPI_set_nonatomic() and
> instead introduced SPI_connect_ext() that you can pass flags to.  The
> advantage of that is that in the normal case we can continue to use the
> existing memory contexts, so nothing changes for existing uses, which
> seems desirable.  (This also appears to address some sporadic test
> failures in PL/Perl.)
> 
> I have also cleaned up the changes in portalmem.c further, so the
> changes are now even smaller.
> 
> The commit message in this patch contains more details about some of
> these changes.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 86db8254bcd93585f1036b755a54ae1608ed092f Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 5 Jan 2018 16:27:53 -0500
Subject: [PATCH v6] Transaction control in PL procedures

In each of the supplied procedural languages (PL/pgSQL, PL/Perl,
PL/Python, PL/Tcl), add language-specific commit and rollback
functions/commands to control transactions in procedures in that
language.  Add similar underlying functions to SPI.  Some additional
cleanup so that transaction commit or abort doesn't blow away data
structures still used by the procedure call.  Add execution context
tracking to CALL and DO statements so that transaction control commands
can only be issued in top-level procedure and block calls, not function
calls or other procedure or block calls.

- SPI

Add a new function SPI_connect_ext() that is like SPI_connect() but
allows passing option flags.  The only option flag right now is
SPI_OPT_NONATOMIC.  A nonatomic SPI connection can execute transaction
control commands, otherwise it's not allowed.  This is meant to be
passed down from CALL and DO statements which themselves know in which
context they are called.  A nonatomic SPI connection uses different
memory management.  A normal SPI connection allocates its memory in
TopTransactionContext.  For nonatomic connections we use PortalContext
instead.  As the comment in SPI_connect_ext() (previously SPI_connect())
indicates, one could potentially use PortalContext in all cases, but it
seems safest to leave the existing uses alone, because this stuff is
complicated enough already.

SPI also gets new functions SPI_start_transaction(), SPI_commit(), and
SPI_rollback(), which can be used by PLs to implement their transaction
control logic.

- portalmem.c

Some adjustments were made in the code that cleans up portals at
transaction abort.  The portal code could already handle a command
*committing* a transaction and continuing (e.g., VACUUM), but it was not
quite prepared for a command *aborting* a transaction and continuing.

In AtAbort_Portals(), remove the code that marks an active portal as
failed.  As the comment there already predicted, this doesn't work if
the running command wants to keep running after transaction abort.  And
it's actually not necessary, because pquery.c is careful to run all
portal code in a PG_TRY block and explicitly runs MarkPortalFailed() if
there is an exception.  So the code in AtAbort_Portals() is never used
anyway.

In AtAbort_Portals() and AtCleanup_Portals(), we need to be careful not
to clean up active portals too much.  This mirrors similar code in
PreCommit_Portals().

- PL/Perl

Gets new functions spi_commit() and spi_rollback()

- PL/pgSQL

Gets new commands COMMIT and ROLLBACK.

Update the PL/SQL porting example in the documentation to reflect that
transactions are now possible in procedures.

- PL/Python

Gets new functions plpy.commit and plpy.rollback.

- PL/Tcl

Gets new commands commit and rollback.
---
 doc/src/sgml/plperl.sgml   |  49 +
 doc/src/sgml/plpgsql.sgml  |  91 
 doc/src/sgml/plpython.sgml |  39 
 doc/src/sgml/pltcl.sgml|  41 
 doc/src/sgml/ref/call.sgml |   7 +
 doc/src/sgml/ref/create_procedure.sgml |   7 +
 doc/src/sgml/ref/do.sgml   |   7 +
 doc/src/sgml/spi.sgml  | 177 +++
 src/backend/commands/functioncmds.c|  45 +++-
 src/backend/executor/spi.c |  99 -
 src/backend/tcop/utility.c |   6 +-
 src/backend/utils/mmgr/portalmem.c |  49 +++--
 src/include/commands/defrem.h  |   4 +-
 src/include/executor/spi.h |   7 +
 src/include/executor/

Re: Condition variable live lock

2018-01-05 Thread Thomas Munro
On Sat, Jan 6, 2018 at 6:33 AM, Tom Lane  wrote:
> As I feared, the existing regression tests are not really adequate for
> this: gcov testing shows that the sentinel-inserting code path is
> never entered, meaning ConditionVariableBroadcast never sees more
> than one waiter.  What's more, it's now also apparent that no outside
> caller of ConditionVariableSignal ever actually awakens anything.
> So I think it'd be a good idea to expand the regression tests if we
> can do so cheaply.  Anybody have ideas about that?  Perhaps a new
> module under src/test/modules would be the best way?  Alternatively,
> we could drop some of the optimization ideas.

I think I might have a suitable test module already.  I'll tidy it up
and propose it in a few days.

> BTW, at least on gaur, this does nothing for the runtime of the join
> test, meaning I'd still like to see some effort put into reducing that.

Will do.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] UPDATE of partition key

2018-01-05 Thread Robert Haas
On Fri, Jan 5, 2018 at 7:12 AM, Amit Khandekar  wrote:
> The above patch is to be applied over the last remaining preparatory
> patch, now named (and attached) :
> 0001-Refactor-CheckConstraint-related-code.patch

Committed that one, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Buildfarm Client Bugfix Release 6.1

2018-01-05 Thread Andrew Dunstan

I have released version 6.1 of the PostgreSQL Buildfarm client. It is
available at


This release fixes a couple of bugs that became apparent in yesterday's
release. The first was a relatively minor one where the verbosity was
always set if using the run_branches.pl script. The second was a
portability issue where some versions of Perl and its Time::HiRes module
behaved unexpectedly and as a result log files were not sorted in
correct order, leading to very strange timing results. The second one is
sufficiently bad to warrant this point release.

Thanks to Tom Lane for identifying and helping to diagnose and patch
these bugs.

Also, one other small bug is fixed in some utility scripts, and the
BlackholeFDW module advertised for the version 6 release but not in fact
present in the release file is now present.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Condition variable live lock

2018-01-05 Thread Tom Lane
Robert Haas  writes:
> On Fri, Jan 5, 2018 at 2:38 PM, Tom Lane  wrote:
>> * I think the Asserts in ConditionVariablePrepareToSleep and
>> ConditionVariableSleep ought to be replaced by full-fledged test and
>> elog(ERROR), so that they are enforced even in non-assert builds.
>> I don't have a lot of confidence that corner cases that could violate
>> those usage restrictions would get caught during developer testing.
>> Nor do I see an argument that we can't afford the cycles to check.

> I think those usage restrictions are pretty basic, and I think this
> might be used in some places where performance does matter.  So -1
> from me for this change.

Really?  We're about to do a process sleep, and we can't afford a single
test and branch to make sure we're doing it sanely?

>> * A lot of the comments could be improved, IMHO.

> No opinion without seeing what you propose to change.

OK, will put out a proposal.

regards, tom lane



Re: setting estate in ExecInitNode() itself

2018-01-05 Thread Robert Haas
On Fri, Jan 5, 2018 at 2:50 PM, Andres Freund  wrote:
> Agreed on that. But I also think there's quite some room for
> centralizing some of the work done in the init routines. Not quite sure
> how, but I do dislike the amount of repetition both in code and
> comments.

+1.

I *assume* that if you really understand how all of this stuff works,
adding new types of executor nodes is easy to do correctly.  But, as
the executor README says  "XXX a great deal more documentation needs
to be written here..." -- BTW, that comment dates to 2001 -- and I
have found it not to be all that straightforward (cf. commits
8538a6307049590ddb5ba127b2ecac6308844d60,
7df2c1f8daeb361133ac8bdeaf59ceb0484e315a,
41b0dd987d44089dc48e9c70024277e253b396b7,
0510421ee091ee3ddabdd5bd7ed096563f6bf17f,
b10967eddf964f8c0a11060cf3f366bbdd1235f6).  Having similar, but often
very briefly commented, initialization, rescan, and cleanup nodes in
every file makes it hard to understand what actually needs to be done
differently in each case, and why.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: setting estate in ExecInitNode() itself

2018-01-05 Thread Tom Lane
Andres Freund  writes:
> On 2018-01-05 10:36:19 -0500, Tom Lane wrote:
>> Ashutosh Bapat  writes:
>>> I am wondering why don't we instead save estate in ExecInitNode() itself 
>>> like
>>> result->estate = estate;

>> That would only work if there were no situation where the field needed to
>> be already valid during the node init function.  I think that's probably
>> wrong already (check ExecInitExpr for instance) and it certainly might
>> be wrong in future.

> Agreed on that. But I also think there's quite some room for
> centralizing some of the work done in the init routines. Not quite sure
> how, but I do dislike the amount of repetition both in code and
> comments.

Yeah, there might be room for putting more of the common node init work
into standard macros or some such.  Need to think bigger than just this
one point though, or it won't be worth it.

regards, tom lane



Re: Condition variable live lock

2018-01-05 Thread Robert Haas
On Fri, Jan 5, 2018 at 2:38 PM, Tom Lane  wrote:
> * I think the Asserts in ConditionVariablePrepareToSleep and
> ConditionVariableSleep ought to be replaced by full-fledged test and
> elog(ERROR), so that they are enforced even in non-assert builds.
> I don't have a lot of confidence that corner cases that could violate
> those usage restrictions would get caught during developer testing.
> Nor do I see an argument that we can't afford the cycles to check.

I think those usage restrictions are pretty basic, and I think this
might be used in some places where performance does matter.  So -1
from me for this change.

> * ConditionVariablePrepareToSleep needs to be rearranged so that failure
> to create the WaitEventSet doesn't leave us in an invalid state.

+1.

> * A lot of the comments could be improved, IMHO.

No opinion without seeing what you propose to change.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: setting estate in ExecInitNode() itself

2018-01-05 Thread Andres Freund
On 2018-01-05 10:36:19 -0500, Tom Lane wrote:
> Ashutosh Bapat  writes:
> > Looking at ExecInitXYZ() functions, we can observe that every such
> > function has a statement like
> > XYZstate->ps.state = estate;
> > where it saves estate in the PlanState.
> 
> Yeah ...
> 
> > I am wondering why don't we instead save estate in ExecInitNode() itself 
> > like
> > result->estate = estate;
> 
> That would only work if there were no situation where the field needed to
> be already valid during the node init function.  I think that's probably
> wrong already (check ExecInitExpr for instance) and it certainly might
> be wrong in future.

Agreed on that. But I also think there's quite some room for
centralizing some of the work done in the init routines. Not quite sure
how, but I do dislike the amount of repetition both in code and
comments.

Greetings,

Andres Freund



Re: pgsql: pg_upgrade: simplify code layout in a few places

2018-01-05 Thread Bruce Momjian
On Fri, Jan  5, 2018 at 02:37:58PM -0500, Robert Haas wrote:
> On Fri, Jan 5, 2018 at 2:32 PM, Alvaro Herrera  
> wrote:
> > Bruce Momjian wrote:
> >> On Fri, Jan  5, 2018 at 02:20:59PM -0500, Robert Haas wrote:
> >> > On Fri, Jan 5, 2018 at 2:11 PM, Bruce Momjian  wrote:
> >> > > pg_upgrade:  simplify code layout in a few places
> >> > >
> >> > > Backpatch-through: 9.4 (9.3 didn't need improving)
> >> >
> >> > Hmm.  We don't normally do things like this, because it breaks 
> >> > translatability.
> >>
> >> Oh, you mean the 'if()' statement?  If so, I will revert that and add a
> >> comment so I don't do it again in that place.
> >
> > See 837255cc81fb59b12f5a70ac2a8a9850bccf13e0.  I don't think adding
> > comments to every single place where this could be done (which is many
> > places, not just in pg_upgrade) is a good idea.
> 
> It's also documented, of course.
> 
> https://www.postgresql.org/docs/devel/static/nls-programmer.html#NLS-GUIDELINES

OK, C comment removed.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: pgsql: pg_upgrade: simplify code layout in a few places

2018-01-05 Thread Bruce Momjian
On Fri, Jan  5, 2018 at 02:31:51PM -0500, Robert Haas wrote:
> On Fri, Jan 5, 2018 at 2:24 PM, Bruce Momjian  wrote:
> > On Fri, Jan  5, 2018 at 02:20:59PM -0500, Robert Haas wrote:
> >> On Fri, Jan 5, 2018 at 2:11 PM, Bruce Momjian  wrote:
> >> > pg_upgrade:  simplify code layout in a few places
> >> >
> >> > Backpatch-through: 9.4 (9.3 didn't need improving)
> >>
> >> Hmm.  We don't normally do things like this, because it breaks 
> >> translatability.
> >
> > Oh, you mean the 'if()' statement?  If so, I will revert that and add a
> > comment so I don't do it again in that place.
> 
> Yeah.

Thanks, done.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Condition variable live lock

2018-01-05 Thread Tom Lane
I wrote:
> As I feared, the existing regression tests are not really adequate for
> this: gcov testing shows that the sentinel-inserting code path is
> never entered, meaning ConditionVariableBroadcast never sees more
> than one waiter.

Hmm ... adding tracing log printouts in BarrierArriveAndWait disproves
that: the regression tests do, repeatedly, call ConditionVariableBroadcast
when there are two waiters to be woken (and you can get it to be more if
you raise the number of parallel workers specified in the PHJ tests).
It just doesn't happen with --enable-coverage.  Heisenberg wins again!

I am not sure why coverage tracking changes the behavior so much, but
it's clear from my results that if you change all the PHJ test cases in
join.sql to use 4 parallel workers, then you get plenty of barrier release
events with 4 or 5 barrier participants --- but running the identical test
under --enable-coverage results in only a very small number of releases
with even as many as 2 participants, let alone more.  Perhaps the PHJ test
cases don't run long enough to let slow-starting workers join in?

Anyway, that may or may not indicate something we should tune at a higher
level, but I'm now satisfied that the patch as presented works and does
get tested by our existing tests.  So barring objection I'll commit that
shortly.

There are some other things I don't like about condition_variable.c:

* I think the Asserts in ConditionVariablePrepareToSleep and
ConditionVariableSleep ought to be replaced by full-fledged test and
elog(ERROR), so that they are enforced even in non-assert builds.
I don't have a lot of confidence that corner cases that could violate
those usage restrictions would get caught during developer testing.
Nor do I see an argument that we can't afford the cycles to check.

* ConditionVariablePrepareToSleep needs to be rearranged so that failure
to create the WaitEventSet doesn't leave us in an invalid state.

* A lot of the comments could be improved, IMHO.

Barring objection I'll go deal with those things, too.

regards, tom lane



Re: pgsql: pg_upgrade: simplify code layout in a few places

2018-01-05 Thread Robert Haas
On Fri, Jan 5, 2018 at 2:32 PM, Alvaro Herrera  wrote:
> Bruce Momjian wrote:
>> On Fri, Jan  5, 2018 at 02:20:59PM -0500, Robert Haas wrote:
>> > On Fri, Jan 5, 2018 at 2:11 PM, Bruce Momjian  wrote:
>> > > pg_upgrade:  simplify code layout in a few places
>> > >
>> > > Backpatch-through: 9.4 (9.3 didn't need improving)
>> >
>> > Hmm.  We don't normally do things like this, because it breaks 
>> > translatability.
>>
>> Oh, you mean the 'if()' statement?  If so, I will revert that and add a
>> comment so I don't do it again in that place.
>
> See 837255cc81fb59b12f5a70ac2a8a9850bccf13e0.  I don't think adding
> comments to every single place where this could be done (which is many
> places, not just in pg_upgrade) is a good idea.

It's also documented, of course.

https://www.postgresql.org/docs/devel/static/nls-programmer.html#NLS-GUIDELINES

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Pluggable storage

2018-01-05 Thread Alexander Korotkov
On Fri, Jan 5, 2018 at 7:20 PM, Robert Haas  wrote:

> I do not like the way that this patch set uses the word "storage".  In
> current usage, storage is a thing that certain kinds of relations
> have.  Plain relations (a.k.a. heap tables) have storage, indexes have
> storage, materialized views have storage, TOAST tables have storage,
> and sequences have storage.  This patch set wants to use "storage AM"
> to mean a replacement for a plain heap table, but I think that's going
> to create a lot of confusion, because it overlaps heavily with the
> existing meaning yet is different.


Good point, thank you for noticing that.  Name "storage" is really confusing
for this purpose.


> My suggestion is to call these
> "table access methods" rather than "storage access methods".  Then,
> the default table AM can be heap.  This has the nice property that you
> create an index using CREATE INDEX and the support functions arrive
> via an IndexAmRoutine, so correspondingly you would create a table
> using CREATE TABLE and the support functions would arrive via a
> TableAmRoutine -- so all the names match.
>
> An alternative would be to call the new thing a "heap AM" with
> HeapAmRoutine as the support function structure.  That's also not
> unreasonable.  In that case, we're deciding that "heap" is not just
> the name of the current implementation, but the name of the kind of
> thing that backs a table at the storage level.  I don't like that
> quite as well because then we've got a class of things called a heap
> of which the current and only implementation is called heap, which is
> a bit confusing in my view.  But we could probably make it work.
>
> If we adopt the first proposal, it leads to another nice parallel: we
> can have src/backend/access/table for those things which are generic
> to table AMs, just as we have src/backend/access/index for things
> which are generic to index AMs, and then src/backend/access/
> for things which are specific to a particular AM.


I would vote for the first proposal: table AM.  Because we eventually
might get index-organized tables whose don't have something like heap.
So, it would be nice to avoid hardcoding "heap" name.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: pgsql: pg_upgrade: simplify code layout in a few places

2018-01-05 Thread Alvaro Herrera
Bruce Momjian wrote:
> On Fri, Jan  5, 2018 at 02:20:59PM -0500, Robert Haas wrote:
> > On Fri, Jan 5, 2018 at 2:11 PM, Bruce Momjian  wrote:
> > > pg_upgrade:  simplify code layout in a few places
> > >
> > > Backpatch-through: 9.4 (9.3 didn't need improving)
> > 
> > Hmm.  We don't normally do things like this, because it breaks 
> > translatability.
> 
> Oh, you mean the 'if()' statement?  If so, I will revert that and add a
> comment so I don't do it again in that place.

See 837255cc81fb59b12f5a70ac2a8a9850bccf13e0.  I don't think adding
comments to every single place where this could be done (which is many
places, not just in pg_upgrade) is a good idea.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: pg_upgrade: simplify code layout in a few places

2018-01-05 Thread Robert Haas
On Fri, Jan 5, 2018 at 2:24 PM, Bruce Momjian  wrote:
> On Fri, Jan  5, 2018 at 02:20:59PM -0500, Robert Haas wrote:
>> On Fri, Jan 5, 2018 at 2:11 PM, Bruce Momjian  wrote:
>> > pg_upgrade:  simplify code layout in a few places
>> >
>> > Backpatch-through: 9.4 (9.3 didn't need improving)
>>
>> Hmm.  We don't normally do things like this, because it breaks 
>> translatability.
>
> Oh, you mean the 'if()' statement?  If so, I will revert that and add a
> comment so I don't do it again in that place.

Yeah.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: BUGFIX: standby disconnect can corrupt serialized reorder buffers

2018-01-05 Thread Alvaro Herrera
I think this should use ReadDirExtended with an elevel less than ERROR,
and do nothing.

Why have strcmp(.) and strcmp(..)?  These are going to be skipped by the
comparison to "xid" prefix anyway.  Looks like strcmp processing power waste.

Please don't use bare sprintf() -- upgrade to snprintf.

With this coding, if I put a root-owned file "xidfoo" in a replslot
directory, it will PANIC the server.  Is that okay?  Why not read the
file name with sscanf(), since we know the precise format it has?  Then
we don't need to bother with random crap left around.  Maybe a good time
to put the "xid-%u-lsn-%X-%X.snap" literal in a macro?   I guess the
rationale is that if you let random people put "xidfoo" files in your
replication slot dirs, you deserve a PANIC anyway, but I'm not sure.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: pg_upgrade: simplify code layout in a few places

2018-01-05 Thread Bruce Momjian
On Fri, Jan  5, 2018 at 02:20:59PM -0500, Robert Haas wrote:
> On Fri, Jan 5, 2018 at 2:11 PM, Bruce Momjian  wrote:
> > pg_upgrade:  simplify code layout in a few places
> >
> > Backpatch-through: 9.4 (9.3 didn't need improving)
> 
> Hmm.  We don't normally do things like this, because it breaks 
> translatability.

Oh, you mean the 'if()' statement?  If so, I will revert that and add a
comment so I don't do it again in that place.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: pgsql: pg_upgrade: simplify code layout in a few places

2018-01-05 Thread Robert Haas
On Fri, Jan 5, 2018 at 2:11 PM, Bruce Momjian  wrote:
> pg_upgrade:  simplify code layout in a few places
>
> Backpatch-through: 9.4 (9.3 didn't need improving)

Hmm.  We don't normally do things like this, because it breaks translatability.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Speeding up pg_upgrade

2018-01-05 Thread Jeff Janes
On Thu, Dec 7, 2017 at 11:28 AM, Justin Pryzby  wrote:

> On Tue, Dec 05, 2017 at 09:01:35AM -0500, Bruce Momjian wrote:
> > As part of PGConf.Asia 2017 in Tokyo, we had an unconference topic about
> > zero-downtime upgrades.  ... we discussed speeding up pg_upgrade.
> >
> > There are clusters that take a long time to dump the schema from the old
> > cluster
>
> Maybe it isn't representative of a typical case, but I can offer a data
> point:
>
> For us, we have ~40 customers with DBs ranging in size from <100GB to ~25TB
> (for which ~90% is on a ZFS tablespace with compression).  We have what's
> traditionally considered to be an excessive number of child tables, which
> works
> okay since planning time is unimportant to us for the report queries which
> hit
> them.  Some of the tables are wide (historically up to 1600 columns).
> Some of
> those have default values on nearly every column, and pg_attrdef was large
> (>500MB), causing pg_dump --section pre-data to be slow (10+ minutes).
> Since
> something similar is run by pg_upgrade, I worked around the issue for now
> by
> dropping defaults on the historic children in advance of upgrades (at some
> point I'll figure out what I have to do to allow DROPing DEFAULTs).  It's
> not
> the first time we've seen an issue with larger number of children*columns.
>

This is probably worth fixing independent of other ways of speeding up
pg_upgrade.

It spends most of its time making the column names unique while de-parsing
the DEFAULT clause for each column.  But I don't think it ever outputs the
column name which results from that deparsing, and since there is only one
table involved, the names should already be unique anyway, unless I am
missing something.

The time seems to be quadratic in number of columns if all columns have
defaults, or proportional to the product of number of columns in table and
the number of columns with defaults.

The CREATE TABLE has a similar problem upon restoring the dump.

Cheers,

Jeff


pg_dump_default.sh
Description: Bourne shell script


Re: User defined data types in Logical Replication

2018-01-05 Thread Alvaro Herrera
hash_seq_init in logicalrep_typmap_invalidate_cb is useless after your
patch.  If you remove it, the function becomes empty, so why is it there
an invalidation callback at all?

Are we now leaking memory if types keep repeatedly being re-created in
the origin?  I suppose it's not a common use pattern, but it'd be good
to avoid everlasting memleaks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2018-01-05 Thread Beena Emerson
Hello,

On Fri, Jan 5, 2018 at 6:24 AM, David Rowley
 wrote:
> On 5 January 2018 at 05:37, Alvaro Herrera  wrote:
>> I tried this patch (applying it on Amit's last current version on top of
>> 4e2970f8807f which is the latest it applies to) and regression tests
>> fail with the attached diff; in all cases it appears to be an off-by-one
>> in row count.  Would you please give it a look?
>
> Thanks for testing. I've attached an updated patch which hopefully fixes this.
>
> I've only thing I did to fix it was to alter the tests a bit so that
> the row counts in explain are evenly divisible by the nloops or
> parallel workers. Looks like it was failing due to platform dependent
> behaviour in printf.
>

It does not handle change in column order (varattno) in subpartitions.

In the following case a2 has different column order
drop table ab_c;
create table ab_c (a int not null, b int) partition by list(a);

--a2 with different col order
  create table abc_a2 (b int, a int not null) partition by list(b);
  create table abc_a2_b1 partition of abc_a2 for values in (1);
  create table abc_a2_b2 partition of abc_a2 for values in (2);
  create table abc_a2_b3 partition of abc_a2 for values in (3);
  alter table ab_c attach partition abc_a2 for values in (2);

--a1 and a3 with same col order as the parent
  create table abc_a1 partition of ab_c for values in(1) partition by list (b);
  create table abc_a1_b1 partition of abc_a1 for values in (1);
  create table abc_a1_b2 partition of abc_a1 for values in (2);
  create table abc_a1_b3 partition of abc_a1 for values in (3);
  create table abc_a3 partition of ab_c for values in(3) partition by list (b);
  create table abc_a3_b1 partition of abc_a3 for values in (1);
  create table abc_a3_b2 partition of abc_a3 for values in (2);
  create table abc_a3_b3 partition of abc_a3 for values in (3);

  deallocate abc_q1;
  prepare abc_q1 (int, int, int) as select * from ab_c where a BETWEEN
$1 and $2 AND b <= $3;

--optimizer pruning
explain (analyze, costs off, summary off, timing off) execute abc_q1 (1, 3, 1);
  QUERY PLAN
--
 Append (actual rows=0 loops=1)
   ->  Seq Scan on abc_a1_b1 (actual rows=0 loops=1)
 Filter: ((a >= 1) AND (a <= 3) AND (b <= 1))
   ->  Seq Scan on abc_a2_b1 (actual rows=0 loops=1)
 Filter: ((a >= 1) AND (a <= 3) AND (b <= 1))
   ->  Seq Scan on abc_a3_b1 (actual rows=0 loops=1)
 Filter: ((a >= 1) AND (a <= 3) AND (b <= 1))
(7 rows)

--runtime pruning after 5 runs
explain (analyze, costs off, summary off, timing off) execute abc_q1 (1, 3, 1);
   QUERY PLAN
-
 Append (actual rows=0 loops=1)
   ->  Seq Scan on abc_a1_b1 (actual rows=0 loops=1)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
   ->  Seq Scan on abc_a1_b2 (never executed)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
   ->  Seq Scan on abc_a1_b3 (never executed)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
   ->  Seq Scan on abc_a2_b1 (actual rows=0 loops=1)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
   ->  Seq Scan on abc_a2_b2 (actual rows=0 loops=1)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
   ->  Seq Scan on abc_a2_b3 (actual rows=0 loops=1)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
   ->  Seq Scan on abc_a3_b1 (actual rows=0 loops=1)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
   ->  Seq Scan on abc_a3_b2 (never executed)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
   ->  Seq Scan on abc_a3_b3 (never executed)
 Filter: ((a >= $1) AND (a <= $2) AND (b <= $3))
(19 rows)

As seen partition a2 does not prune like in other 2 subpartitions - a1 and a3.
-- 

Beena Emerson

EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] path toward faster partition pruning

2018-01-05 Thread Jesper Pedersen

Hi David,

On 01/04/2018 09:21 PM, David Rowley wrote:

On 5 January 2018 at 07:16, Jesper Pedersen  wrote:

Could you share your thoughts on

1) if the generic plan mechanics should know about the pruning and hence
give a lower planner cost


I think the problem here is that cached_plan_cost() is costing the
planning cost of the query too low. If this was costed higher then its
more likely the generic plan would have been chosen, instead of
generating a custom plan each time.

How well does it perform if you change cpu_operator_cost = 0.01?



It gives 38.82 for generic_cost, and 108.82 for avg_custom_cost on 
master (8249 TPS). And, 38.82 for generic_cost, and 79.705 for 
avg_custom_cost with v17 (7891 TPS). Non-partitioned is 11722 TPS.



I think cached_plan_cost() does need an overhaul, but I think it's not
anything that should be done as part of this patch. You've picked HASH
partitioning here just because the current master does not perform any
partition pruning for that partitioning strategy.



Well, I mainly picked HASH because that is my use-case :)

For a range based setup it gives 39.84 for generic_cost, and 89.705 for 
avg_custom_cost (7862 TPS).


Best regards,
 Jesper



Re: pgsql: Implement channel binding tls-server-end-point for SCRAM

2018-01-05 Thread Alvaro Herrera
Peter Eisentraut wrote:
> On 1/5/18 09:28, Michael Paquier wrote:
> > In order to do things cleanly, we should make this TAP test
> > conditional on the version of OpenSSL.
> 
> How about looking into pg_config.h, like in the attached patch?

I suppose if this starts to spread, we'll come up with a better approach
... maybe generating a Perl file with variables that can be slurped more
ellegantly, or something like that.  (We mentioned the need for
config-based test selection re. patches for new SSL implementations.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Condition variable live lock

2018-01-05 Thread Tom Lane
I wrote:
> It's a question of how complicated you're willing to make this
> logic, and whether you trust that we'll be able to test all the
> code paths.

Attached is a patch incorporating all the ideas mentioned in this thread,
except that I think in HEAD we should change both ConditionVariableSignal
and ConditionVariableBroadcast to return void rather than a possibly
misleading wakeup count.  This could be back-patched as is, though.

As I feared, the existing regression tests are not really adequate for
this: gcov testing shows that the sentinel-inserting code path is
never entered, meaning ConditionVariableBroadcast never sees more
than one waiter.  What's more, it's now also apparent that no outside
caller of ConditionVariableSignal ever actually awakens anything.
So I think it'd be a good idea to expand the regression tests if we
can do so cheaply.  Anybody have ideas about that?  Perhaps a new
module under src/test/modules would be the best way?  Alternatively,
we could drop some of the optimization ideas.

BTW, at least on gaur, this does nothing for the runtime of the join
test, meaning I'd still like to see some effort put into reducing that.

regards, tom lane

diff --git a/src/backend/storage/lmgr/condition_variable.c b/src/backend/storage/lmgr/condition_variable.c
index 41378d6..55275cf 100644
*** a/src/backend/storage/lmgr/condition_variable.c
--- b/src/backend/storage/lmgr/condition_variable.c
*** int
*** 214,228 
  ConditionVariableBroadcast(ConditionVariable *cv)
  {
  	int			nwoken = 0;
  
  	/*
! 	 * Let's just do this the dumbest way possible.  We could try to dequeue
! 	 * all the sleepers at once to save spinlock cycles, but it's a bit hard
! 	 * to get that right in the face of possible sleep cancelations, and we
! 	 * don't want to loop holding the mutex.
  	 */
! 	while (ConditionVariableSignal(cv))
  		++nwoken;
  
  	return nwoken;
  }
--- 214,300 
  ConditionVariableBroadcast(ConditionVariable *cv)
  {
  	int			nwoken = 0;
+ 	int			pgprocno = MyProc->pgprocno;
+ 	PGPROC	   *proc = NULL;
+ 	bool		have_sentinel = false;
  
  	/*
! 	 * In some use-cases, it is common for awakened processes to immediately
! 	 * re-queue themselves.  If we just naively try to reduce the wakeup list
! 	 * to empty, we'll get into a potentially-indefinite loop against such a
! 	 * process.  The semantics we really want are just to be sure that we have
! 	 * wakened all processes that were in the list at entry.  We can use our
! 	 * own cvWaitLink as a sentinel to detect when we've finished.
! 	 *
! 	 * A seeming flaw in this approach is that someone else might signal the
! 	 * CV and in doing so remove our sentinel entry.  But that's fine: since
! 	 * CV waiters are always added and removed in order, that must mean that
! 	 * every previous waiter has been wakened, so we're done.  We'll get an
! 	 * extra "set" on our latch from the someone else's signal, which is
! 	 * slightly inefficient but harmless.
! 	 *
! 	 * We can't insert our cvWaitLink as a sentinel if it's already in use in
! 	 * some other proclist.  While that's not expected to be true for typical
! 	 * uses of this function, we can deal with it by simply canceling any
! 	 * prepared CV sleep.  The next call to ConditionVariableSleep will take
! 	 * care of re-establishing the lost state.
  	 */
! 	ConditionVariableCancelSleep();
! 
! 	/*
! 	 * Inspect the state of the queue.  If it's empty, we have nothing to do.
! 	 * If there's exactly one entry, we need only remove and signal that
! 	 * entry.  Otherwise, remove the first entry and insert our sentinel.
! 	 */
! 	SpinLockAcquire(&cv->mutex);
! 	/* While we're here, let's assert we're not in the list. */
! 	Assert(!proclist_contains(&cv->wakeup, pgprocno, cvWaitLink));
! 
! 	if (!proclist_is_empty(&cv->wakeup))
! 	{
! 		proc = proclist_pop_head_node(&cv->wakeup, cvWaitLink);
! 		if (!proclist_is_empty(&cv->wakeup))
! 		{
! 			proclist_push_tail(&cv->wakeup, pgprocno, cvWaitLink);
! 			have_sentinel = true;
! 		}
! 	}
! 	SpinLockRelease(&cv->mutex);
! 
! 	/* Awaken first waiter, if there was one. */
! 	if (proc != NULL)
! 	{
! 		SetLatch(&proc->procLatch);
  		++nwoken;
+ 	}
+ 
+ 	while (have_sentinel)
+ 	{
+ 		/*
+ 		 * Each time through the loop, remove the first wakeup list entry, and
+ 		 * signal it unless it's our sentinel.  Repeat as long as the sentinel
+ 		 * remains in the list.
+ 		 *
+ 		 * Notice that if someone else removes our sentinel, we will waken one
+ 		 * additional process before exiting.  That's intentional, because if
+ 		 * someone else signals the CV, they may be intending to waken some
+ 		 * third process that added itself to the list after we added the
+ 		 * sentinel.  Better to give a spurious wakeup (which should be
+ 		 * harmless beyond wasting some cycles) than to lose a wakeup.
+ 		 */
+ 		proc = NULL;
+ 		SpinLockAcquire(&cv->mutex);
+ 		if (!proclist_is_empty(&cv->wakeup))
+ 			proc = proclist_pop_head_node

Re: [HACKERS] Pluggable storage

2018-01-05 Thread Robert Haas
I do not like the way that this patch set uses the word "storage".  In
current usage, storage is a thing that certain kinds of relations
have.  Plain relations (a.k.a. heap tables) have storage, indexes have
storage, materialized views have storage, TOAST tables have storage,
and sequences have storage.  This patch set wants to use "storage AM"
to mean a replacement for a plain heap table, but I think that's going
to create a lot of confusion, because it overlaps heavily with the
existing meaning yet is different.  My suggestion is to call these
"table access methods" rather than "storage access methods".  Then,
the default table AM can be heap.  This has the nice property that you
create an index using CREATE INDEX and the support functions arrive
via an IndexAmRoutine, so correspondingly you would create a table
using CREATE TABLE and the support functions would arrive via a
TableAmRoutine -- so all the names match.

An alternative would be to call the new thing a "heap AM" with
HeapAmRoutine as the support function structure.  That's also not
unreasonable.  In that case, we're deciding that "heap" is not just
the name of the current implementation, but the name of the kind of
thing that backs a table at the storage level.  I don't like that
quite as well because then we've got a class of things called a heap
of which the current and only implementation is called heap, which is
a bit confusing in my view.  But we could probably make it work.

If we adopt the first proposal, it leads to another nice parallel: we
can have src/backend/access/table for those things which are generic
to table AMs, just as we have src/backend/access/index for things
which are generic to index AMs, and then src/backend/access/
for things which are specific to a particular AM.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [PATCH] Logical decoding of TRUNCATE

2018-01-05 Thread Marco Nenciarini
Hi,

I've found some SGML errors in the version v10 of the patch. I've fixed
it in version v11 that is attached.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
diff --git a/contrib/test_decoding/expected/ddl.out 
b/contrib/test_decoding/expected/ddl.out
index 1e22c1eefc..d1feea4909 100644
*** a/contrib/test_decoding/expected/ddl.out
--- b/contrib/test_decoding/expected/ddl.out
***
*** 543,548  UPDATE table_with_unique_not_null SET data = 3 WHERE data = 2;
--- 543,550 
  UPDATE table_with_unique_not_null SET id = -id;
  UPDATE table_with_unique_not_null SET id = -id;
  DELETE FROM table_with_unique_not_null WHERE data = 3;
+ TRUNCATE table_with_unique_not_null;
+ TRUNCATE table_with_unique_not_null, table_with_unique_not_null RESTART 
IDENTITY CASCADE;
  -- check toast support
  BEGIN;
  CREATE SEQUENCE toasttable_rand_seq START 79 INCREMENT 1499; -- portable 
"random"
***
*** 660,665  SELECT data FROM 
pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'inc
--- 662,673 
   table public.table_with_unique_not_null: DELETE: id[integer]:4
   COMMIT
   BEGIN
+  table public.table_with_unique_not_null: TRUNCATE: (no-flags)
+  COMMIT
+  BEGIN
+  table public.table_with_unique_not_null: TRUNCATE: restart_seqs cascade
+  COMMIT
+  BEGIN
   table public.toasttable: INSERT: id[integer]:1 
toasted_col1[text]:'1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022123224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355456557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688789890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956957958959960961962963964965966967968969970971972973974975976977978979980981982983984985986987988989990991992993994995996997998999100010011002100310041005100610071008100910101011101210131014101510161017101810191020102110221023102410251026102710281029103010311032103310341035103610371038103910401041104210431044104510461047104810491050105110521053105410551056105710581059106010611062106310641065106610671068106910701071107210731074107510761077107810791080108110821083108410851086108710881089109010911092109310941095109610971098109911001101110211031104110511061107110811091110111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581

Re: setting estate in ExecInitNode() itself

2018-01-05 Thread Tom Lane
Ashutosh Bapat  writes:
> Looking at ExecInitXYZ() functions, we can observe that every such
> function has a statement like
> XYZstate->ps.state = estate;
> where it saves estate in the PlanState.

Yeah ...

> I am wondering why don't we instead save estate in ExecInitNode() itself like
> result->estate = estate;

That would only work if there were no situation where the field needed to
be already valid during the node init function.  I think that's probably
wrong already (check ExecInitExpr for instance) and it certainly might
be wrong in future.

regards, tom lane



Re: Failed to delete old ReorderBuffer spilled files

2018-01-05 Thread Alvaro Herrera
atorikoshi wrote:

> > FYI "make check" in contrib/test_decoding fails a couple of isolation
> > tests, one with an assertion failure for my automatic patch tester[1].
> > Same result on my laptop:
> > 
> > test ondisk_startup   ... FAILED (test process exited with exit 
> > code 1)
> > test concurrent_ddl_dml   ... FAILED (test process exited with exit 
> > code 1)
> > 
> > TRAP: FailedAssertion("!(!dlist_is_empty(head))", File:
> > "../../../../src/include/lib/ilist.h", Line: 458)
> 
> Thanks for letting me know.
> I think I only tested running "make check" at top directory, sorry
> for my insufficient test.
> 
> The test failure happened at the beginning of replication(creating
> slot), so there are no changes yet and getting the tail of changes
> failed.
> 
> Because the bug we are fixing only happens after creating files,
> I've added "txn->serialized" to the if statement condition.

Ahh, so the reason I didn't see these crashes is that Atsushi had
already fixed them.  Nice.  It would be *very* good to trim the quoted
material when you reply --- don't leave everything, just enough lines
for context.  I would have seen this comment if it didn't require me to
scroll pages and pages and pages of useless material.

I have pushed this patch from 9.4 to master.  I reformulated the
comments.

This is an "implicitly aborted transaction":

alvherre=# create table implicit (a int);
CREATE TABLE
Duración: 0,959 ms
alvherre=# insert into implicit select 1/i from generate_series(5, -5, -1) i;
ERROR:  division by zero

Note there is no explicit ABORT here, so it is implicit.

But that is not what you were trying to say.  The term "crashed
transaction" seems to me to convey the meaning better, so I used that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Condition variable live lock

2018-01-05 Thread Tom Lane
Thomas Munro  writes:
> Could we install the sentinel and pop the first entry at the same
> time, so that we're not adding an extra spinlock acquire/release?

Hm, maybe.  Other ideas in that space:

* if queue is empty when we first acquire the spinlock, we don't
have to do anything at all.

* if queue is empty after we pop the first entry, we needn't bother
installing our sentinel, just signal that proc and we're done.

It's a question of how complicated you're willing to make this
logic, and whether you trust that we'll be able to test all the
code paths.

regards, tom lane



Re: pgsql: Implement channel binding tls-server-end-point for SCRAM

2018-01-05 Thread Peter Eisentraut
On 1/5/18 09:28, Michael Paquier wrote:
> In order to do things cleanly, we should make this TAP test
> conditional on the version of OpenSSL.

How about looking into pg_config.h, like in the attached patch?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From d1c309a4c07c47f06650fd8231938e1eaed1342e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 5 Jan 2018 09:55:01 -0500
Subject: [PATCH] Fix ssl tests for when tls-server-end-point is not supported

---
 src/test/ssl/t/002_scram.pl | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/src/test/ssl/t/002_scram.pl b/src/test/ssl/t/002_scram.pl
index 3f425e00f0..92b84e1565 100644
--- a/src/test/ssl/t/002_scram.pl
+++ b/src/test/ssl/t/002_scram.pl
@@ -11,6 +11,11 @@
 # This is the hostname used to connect to the server.
 my $SERVERHOSTADDR = '127.0.0.1';
 
+# Determine whether build supports tls-server-end-point.
+open my $pg_config_h, '<', '../../include/pg_config.h' or die "$!";
+my $supports_tls_server_end_point = (grep {/^#define 
HAVE_X509_GET_SIGNATURE_NID 1/} <$pg_config_h>);
+close $pg_config_h;
+
 # Allocation of base connection string shared among multiple tests.
 my $common_connstr;
 
@@ -44,10 +49,19 @@
"SCRAM authentication with tls-unique as channel binding");
 test_connect_ok($common_connstr,
"scram_channel_binding=''",
-   "SCRAM authentication without channel binding");
-test_connect_ok($common_connstr,
-   "scram_channel_binding=tls-server-end-point",
-   "SCRAM authentication with tls-server-end-point as channel binding");
+   "SCRAM authentication without channel binding");
+if ($supports_tls_server_end_point)
+{
+   test_connect_ok($common_connstr,
+   
"scram_channel_binding=tls-server-end-point",
+   "SCRAM authentication with 
tls-server-end-point as channel binding");
+}
+else
+{
+   test_connect_fails($common_connstr,
+   
"scram_channel_binding=tls-server-end-point",
+   "SCRAM authentication with 
tls-server-end-point as channel binding");
+}
 test_connect_fails($common_connstr,
"scram_channel_binding=not-exists",
"SCRAM authentication with invalid channel binding");
-- 
2.15.1



Re: Failed to delete old ReorderBuffer spilled files

2018-01-05 Thread Alvaro Herrera
Thomas Munro wrote:
> On Wed, Nov 22, 2017 at 12:27 AM, atorikoshi
>  wrote:
> > [set_final_lsn_2.patch]
> 
> Hi Torikoshi-san,
> 
> FYI "make check" in contrib/test_decoding fails a couple of isolation
> tests, one with an assertion failure for my automatic patch tester[1].
> Same result on my laptop:
> 
> test ondisk_startup   ... FAILED (test process exited with exit code 
> 1)
> test concurrent_ddl_dml   ... FAILED (test process exited with exit code 
> 1)
> 
> TRAP: FailedAssertion("!(!dlist_is_empty(head))", File:
> "../../../../src/include/lib/ilist.h", Line: 458)

I observed a couple of crashes too a couple of times, while testing this
patch.  But I have seen several completely different crashes.  This
crash you show I have not been able to reproduce, though I've run this
in 94 and master many times.

For example, I got a backtrace that looks like this in 9.6:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x7f19ccb913fa in __GI_abort () at abort.c:89
#2  0x55e7511f451b in errfinish (dummy=)
at /pgsql/source/REL9_6_STABLE/src/backend/utils/error/elog.c:557
#3  0x55e750ed732b in XLogFileInit (logsegno=1, 
use_existent=use_existent@entry=0x7ffdbc34ab6f "\001\002", 
use_lock=use_lock@entry=1 '\001')
at /pgsql/source/REL9_6_STABLE/src/backend/access/transam/xlog.c:3023
#4  0x55e750edb227 in XLogWrite (WriteRqst=..., flexible=flexible@entry=0 
'\000')
at /pgsql/source/REL9_6_STABLE/src/backend/access/transam/xlog.c:2258
#5  0x55e750ee162d in XLogBackgroundFlush ()
at /pgsql/source/REL9_6_STABLE/src/backend/access/transam/xlog.c:2894

then in 9.4 I saw this one:

creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... FATAL:  could not open directory 
"pg_logical/snapshots": No such file or directory
STATEMENT:  CREATE DATABASE template0;

WARNING:  could not remove file or directory "base/12148": No such file or 
directory
WARNING:  some useless files may be left behind in old database directory 
"base/12148"
FATAL:  could not access status of transaction 0
DETAIL:  Could not open file "pg_clog/": No such file or directory.
child process exited with exit code 1

What this indicates to me is that perhaps the test harness is doing
stupid things such as running two servers concurrently in the same
datadir, so they overwrite one another.  If I take out the "-j2" from
make, this no longer reproduces.

Therefore, I'm going to push this patch shortly because clearly this
problem is not its fault.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Implement channel binding tls-server-end-point for SCRAM

2018-01-05 Thread Michael Paquier
On Fri, Jan 5, 2018 at 10:47 PM, Robert Haas  wrote:
> The SSL tests on chipmunk failed in the last run.  I assume that's
> probably the fault of this patch, or one of the follow-on commits:

Thanks for the heads-up, Robert. I did not notice the failure. That's
the fault of 054e8c6c. Raspbian is using OpenSSL 1.0.1t (package list
can be downloaded in
http://archive.raspbian.org/raspbian/dists/wheezy/main/binary-armhf/Packages
for 38MB), which does not have the necessary facilities to implement
tls-server-end-point as upstream has added necessary APIs only in
1.0.2.

In order to do things cleanly, we should make this TAP test
conditional on the version of OpenSSL. There have been discussions in
the past to make a module dedicated to that, but no clear patch or
approach has showed up. This can be retrieved with SSLeay_version() or
"openssl version", but that seems not fun nor stable to rely on
openssl to be in PATH. I don't see disabling this test helping either,
but we could consider that without an appropriate module to track
dependencies in a build with its versions. I would be personally fine
with having an environment variable switch I could use to enable the
test as well as I use already a script to run all regression tests in
the tree (src/test/ssl is not run by default as it is unsecure for
shared environments, without counting on meltdowns).

Thoughts from others?
-- 
Michael



Re: [HACKERS] Creating backup history files for backups taken from standbys

2018-01-05 Thread Simon Riggs
On 19 September 2017 at 00:33, David Steele  wrote:
> On 9/18/17 7:26 PM, Michael Paquier wrote:
>> On Tue, Sep 19, 2017 at 8:14 AM, David Steele  wrote:
>>> On 8/31/17 11:56 PM, Michael Paquier wrote:
 Here is an updated patch with refreshed documentation, as a result of
 449338c which was discussed in thread
 https://www.postgresql.org/message-id/d4d951b9-89c0-6bc1-b6ff-d0b2dd5a8...@pgmasters.net.
 I am just outlining the fact that a backup history file gets created
 on standbys and that it is archived.
>>>
>>> The patch looks good overall.  It applies cleanly and passes all tests.
>>>
>>> One question.  Do we want to create this file all the time (as written),
>>> or only when archive_mode = always?
>>>
>>> It appears that CleanupBackupHistory() will immediately remove the
>>> history file when archive_mode != always so it seems useless to write
>>> it.  On the other hand the code is a bit simpler this way.  Thoughts?
>>
>> With archive_mode = off, the bytes of the backup history file are
>> still written to disk, so my take on the matter is to keep the code
>> simple.
>
> I'm OK with that.

I'm not.

If we want to do this why not only do it in the modes that have meaning?
i.e. put an if() test in for archive_mode == always

Which also makes it a smaller and clearer patch

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Fix permissions check on pg_stat_get_wal_senders

2018-01-05 Thread Simon Riggs
On 23 December 2017 at 10:56, Michael Paquier  wrote:
> On Fri, Dec 22, 2017 at 07:49:34AM +0100, Feike Steenbergen wrote:
>> On 21 December 2017 at 14:11, Michael Paquier  
>> wrote:
>> > You mean a WAL receiver here, not a WAL sender.
>>
>> Fixed, thanks
>
> [nit]
> /*
>  -   * Only superusers can see details. Other users only get the 
> pid value
> +* Only superusers and members of pg_read_all_stats can see 
> details.
> +* Other users only get the pid value
>  * to know whether it is a WAL receiver, but no details.
>  */
>
> Incorrect comment format.
> [/nit]
>
> Committers run pgindent on each patch before committing anyway, and what
> you are proposing here looks good to me, so I am marking that as ready for
> committer. Simon, as the original committer of 25fff407, could you look
> at what is proposed here?

Yup, I got this.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Timeline ID in backup_label file

2018-01-05 Thread Simon Riggs
On 27 November 2017 at 14:06, David Steele  wrote:

> I have tested and get an error as expected when I munge the backup_label
> file:
>
> FATAL:  invalid data in file "backup_label"
> DETAIL:  Timeline ID parsed is 2, but expected 1
>
> Everything else looks good so I will mark it ready for committer.

Sounds good.

No tests? No docs/extended explanatory comments?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Contributing with code

2018-01-05 Thread Alvaro Herrera
Antonio Belloni wrote:
> Sorry, but I did not want to start a flaming war against the TODO list with
> my first message. In all the other open source projects I have contributed
> code, the TODO list is always a start point to newcomers. There's no
> explicit message in the Postgresql TODO list saying that the projects there
> are hard, stuck or undesirable. So it's very confusing to a newbie who just
> want to help and try to learn something in the process.

Yeah.  In this project, the tendency is that when things are
straightforward, they get implemented rather than documented.  So there
are no tasks that are known and also easy, because they are taken as
soon as somebody thinks of them.  They only get documented once someone
tries to implement one and realizes it was not as easy as it seemed.

> And now I don't know If I should continue to work on the issue on my first
> message and post my ideas on the list, or if I should find other ways to
> contribute, for example, fixing bugs from the bug list.

Fixing bugs is a very good way to learn the system.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Make block and file size for WAL and relations defined at cluster creation

2018-01-05 Thread Robert Haas
On Fri, Jan 5, 2018 at 7:54 AM, Michael Paquier
 wrote:
> On Fri, Jan 5, 2018 at 9:42 PM, Robert Haas  wrote:
>> - run make check-world at all the supposedly-supported block sizes and
>> see if it passes.
>
> Last time I tried that with a 16kB-size block, which was some months
> back, make check complained about some plan inconsistencies. Perhaps
> that's something that could be improved to begin with. There is always
> some margin in this area.

Yeah, that's exactly the kind of thing that needs to be ironed out
before we can think about actually recommending that users run with
other block sizes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: pgsql: Implement channel binding tls-server-end-point for SCRAM

2018-01-05 Thread Robert Haas
On Thu, Jan 4, 2018 at 4:09 PM, Thomas Munro
 wrote:
> On Fri, Jan 5, 2018 at 9:36 AM, Peter Eisentraut  wrote:
>> Implement channel binding tls-server-end-point for SCRAM
>
> FYI some BF animals are saying:
>
> libpq/be-secure-openssl.o: In function `be_tls_get_certificate_hash':
> /home/pgbuildfarm/buildroot-termite/HEAD/pgsql.build/../pgsql/src/backend/libpq/be-secure-openssl.c:1268:
> undefined reference to `X509_get_signature_nid'

The SSL tests on chipmunk failed in the last run.  I assume that's
probably the fault of this patch, or one of the follow-on commits:

# Running: psql -X -A -t -c SELECT 'connected with user=ssltestuser
dbname=trustdb sslmode=require hostaddr=127.0.0.1
scram_channel_binding=tls-server-end-point' -d user=ssltestuser
dbname=trustdb sslmode=require hostaddr=127.0.0.1
scram_channel_binding=tls-server-end-point
psql: channel binding type "tls-server-end-point" is not supported by this build
not ok 4 - SCRAM authentication with tls-server-end-point as channel binding

#   Failed test 'SCRAM authentication with tls-server-end-point as
channel binding'
#   at /home/pgbfarm/buildroot/HEAD/pgsql.build/src/test/ssl/ServerSetup.pm
line 64.
# Running: psql -X -A -t -c SELECT 'connected with user=ssltestuser
dbname=trustdb sslmode=require hostaddr=127.0.0.1
scram_channel_binding=not-exists' -d user=ssltestuser dbname=trustdb
sslmode=require hostaddr=127.0.0.1 scram_channel_binding=not-exists
psql: FATAL:  unsupported SCRAM channel-binding type
ok 5 - SCRAM authentication with invalid channel binding
### Stopping node "master" using mode immediate
# Running: pg_ctl -D
/home/pgbfarm/buildroot/HEAD/pgsql.build/src/test/ssl/tmp_check/t_002_scram_master_data/pgdata
-m immediate stop
waiting for server to shut down done
server stopped
# No postmaster PID for node "master"
# Looks like you failed 1 test of 5.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Contributing with code

2018-01-05 Thread Antonio Belloni
Sorry, but I did not want to start a flaming war against the TODO list with
my first message. In all the other open source projects I have contributed
code, the TODO list is always a start point to newcomers. There's no
explicit message in the Postgresql TODO list saying that the projects there
are hard, stuck or undesirable. So it's very confusing to a newbie who just
want to help and try to learn something in the process.

And now I don't know If I should continue to work on the issue on my first
message and post my ideas on the list, or if I should find other ways to
contribute, for example, fixing bugs from the bug list.

Regards,
Antonio Belloni


Atenciosamente,
Antonio Belloni
abell...@rioservice.com
+55 21 3083-1939
+55 21 99327-0200
RIO SERVICE | Tecnologia em Movimento
Av. Pastor Martin Luther King Jr. 126 - Grupo 465
Centro Empresarial Shopping Nova América
http://www.rioservice.com
http://www.busvision.com.br


2018-01-03 0:47 GMT-02:00 Stephen Frost :

> Noah, all,
>
> * Noah Misch (n...@leadboat.com) wrote:
> > On Tue, Jan 02, 2018 at 05:52:37PM -0500, Peter Eisentraut wrote:
> > > On 12/31/17 22:43, Craig Ringer wrote:
> > > > I'd rather rename it the "stuck, hard and abandoned projects list"
> ;)
> > >
> > > That might actually be useful.
> >
> > +1.  When I do refer to a TODO entry, it's usually because the entry
> bears a
> > list of threads illustrating the difficulty of a problem.  I like the
> project
> > having such lists, but "TODO" is a bad heading for them.
>
> Renaming the list is certainly an idea that I could get behind, though I
> agree with Chris that we could keep it a bit more positive.  I also
> agree that the TODO list tends towards projects that are stuck and hard,
> which is why I actually think it wouldn't be that hard to go through and
> mark the really hard things as really hard or even create an independent
> page for them as I suggested elsewhere on this thread, because (at least
> from my perception of it- which could be wrong) the overall list
> doesn't actually change that much (see above wrt "stuck, hard and
> abandoned" comment).  If we could see our way forward to really making
> it clear that these things are stuck, hard or abandoned then maybe we
> can make room for new projects to go on the list that are of reasonable
> size for newcomers to the project.
>
> Thanks!
>
> Stephen
>


Re: [Patch] Make block and file size for WAL and relations defined at cluster creation

2018-01-05 Thread Michael Paquier
On Fri, Jan 5, 2018 at 9:42 PM, Robert Haas  wrote:
> - run make check-world at all the supposedly-supported block sizes and
> see if it passes.

Last time I tried that with a 16kB-size block, which was some months
back, make check complained about some plan inconsistencies. Perhaps
that's something that could be improved to begin with. There is always
some margin in this area.
-- 
Michael



Re: [Patch] Make block and file size for WAL and relations defined at cluster creation

2018-01-05 Thread Robert Haas
On Thu, Jan 4, 2018 at 5:15 PM, Remi Colinet  wrote:
> Block size does not boils down only to performance.
>
> For instance, having a larger block size allows:
> - to avoid toasting tuples. Rows with sizes larger that the default block
> size can justify larger block sizes.
> - to reduce fragmentation in relations.

Well, I think those are things that you do to improve performance.
So, ultimately, I would argue that it does come down to performance.

> If changing the block size at initdb is useless, then why allowing developer
> to set such block size at compile time?

In my view, right now, changing BLCKSZ is only marginally supported.
It's there so that you can experiment, but it's not really something
we expect users to do.  I think we have no buildfarm coverage of
different block sizes.  I am not sure that we even consistently fix
regression test failures with other block sizes even if someone
reports them.  If there's a bug in some index AM that only manifests
with some non-default block size, we might not know about it.  I think
that if we make this an initdb-time option, we're committing to fix
all of those issues: add tests, fix bugs, and of course document how
to set the parameter properly (which means we have to know how it
should be set, which means we have to know what the effects of
changing it are on different systems and workloads).  You may or may
not be willing to do some of that work, but I suspect there's a good
chance that it will require effort from other people as well -- e.g.
if we turn up a bug in BRIN, are you going to dive into that and fix
it, or are you going to hope Alvaro does something about it?  He'll
probably have to review and commit your patch, at the least.

Of course, it's possible there are no such problems and everything
will just work.

I looked around a little for previous tests that had been run in this
area and found these:

https://blog.pgaddict.com/posts/postgresql-on-ssd-4kb-or-8kB-pages
https://www.cybertec-postgresql.com/en/postgresql-block-sizes-getting-started/
http://blog.coelho.net/database/2014/08/08/postgresql-page-size-for-SSD.html

All of those seem to agree that smaller block sizes can help
performance, sometimes significantly, and larger block sizes hurt
performance, which is sort of surprising to me since that also means
that that your database will get bigger: at a 4kB page size, you have
to store at least twice as many page headers as you would with an 8kB
page size.  Some of them also mention reasons why you might want a
larger block size. I believe I recall a mailing-list discussion some
years back about how index pages might need some kind of page-internal
indexing for efficiency with large block sizes, because a simple
binary search might touch too many cache lines.  It seems like if we
want to have good performance at a variety of block sizes -- rather
than just having it technically work -- we might need to do a fair
amount of investigation of what factors account for good and bad
performance at a variety of settings and consider whether there are
design changes that might mitigate some of the problems.

I think that if you're interested in making non-default block sizes
more supported in PostgreSQL, some good first steps would be:

- run make check-world at all the supposedly-supported block sizes and
see if it passes.

- set up some buildfarm critters that run with various non-default
block sizes on various hardware and software platforms.  ideally we
should have various combinations of 32-bit and 64-bit; Linux, Windows,
and other; and the whole range of block sizes but especially the more
extreme ones.

- run performance tests with a variety of workloads, not just pgbench,
at various block sizes and on various hardware, and post or blog about
the results

If it's clear that non-default block sizes (a) work and (b) are good,
then at least IMHO it's quite likely that we would want this patch.
Maybe those things are already clear to you, but they're not
completely clear to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Enhance pg_stat_wal_receiver view to display connected host

2018-01-05 Thread Alvaro Herrera
Haribabu Kommi wrote:

> And also not returning "default host" details, because for the conninfo
> without any host details, the return value must be NULL. But this change
> may break the backward compatibility of the function.

I wouldn't want to have to fight that battle.

> or
> 
> write two new functions PQconnhost() and PQconnhostaddr() to return the
> connected host and hostaddr and reuse the PQport() function.

How about using an API similar to PQconninfo, where we return an array
of connection options used?  Say, PQeffectiveConninfo().  This seems to
me to reduce ugliness in the API, and be more generally useful.

walrecvr could display as an array or just flatten to a string -- not
sure what's the better option there.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Issues with logical replication

2018-01-05 Thread Stas Kelvich

> On 3 Jan 2018, at 23:35, Alvaro Herrera  wrote:
> 
> Pushed.  Will you (Konstantin, Stas, Masahiko) please verify that after
> this commit all the problems reported with logical replication are
> fixed?

Checked that with and without extra sleep in AssignTransactionId(). In both
cases patch works.

Thanks!

--
Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Condition variable live lock

2018-01-05 Thread Alvaro Herrera
Tom Lane wrote:

> Obviously, this trades a risk of loss of wakeup for a risk
> of spurious wakeup, but presumably the latter is something we can
> cope with.

I wonder if it'd be useful to have a test mode for condition variables
that spurious wakups happen randomly, to verify that every user is
prepared to cope correctly.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Patch] Make block and file size for WAL and relations defined at cluster creation

2018-01-05 Thread Tels

On Wed, January 3, 2018 4:04 pm, Robert Haas wrote:
> On Wed, Jan 3, 2018 at 3:43 PM, Remi Colinet 
> wrote:
>> Justifications are:
>
> I think this is all missing the point.  If varying the block size (or
> whatever) is beneficial, then having it configurable at initdb is
> clearly useful.  But, as Andres says, you haven't submitted any
> evidence that this is the case.  You need to describe scenarios in
> which (1) a non-default blocksize performs better and (2) there's no
> reasonable way to obtain the same performance improvement without
> changing the block size.

(Note: I gather that "block size" here is the page size, but I'm not
entirely sure. So this detail might make my arguments moot :)

First, I do think there is a chicken-and-egg problem here. If you can't
vary the values except by re-compiling, almost no-one does. So you don't
get any benchmarks, or data.

And even if the author of the patch does this, he can only provide very
spotty data, which might not be enough to draw the right conclusions.

Plus, if the goal is to "select the optimal size" (as opposed to Remi's
goal, which seems to me os "make it easier to select the optimial size for
one system at hand"), you might not be able to do this, as the "optimial"
size is different for different conditions, but you can't try different
values if the value is compiled in...

Plus, isn't almost all advancement in computing that you replace "1 + 1"
with "x + y"? :)

So, I do think the patch has some merits, because at least it allows much
easer benchmarking. And if the value was configurable at initdb time, I
think more people would experiment and settle for their optimium. So for
me the question isn't here "does it benefit everyone", but "does it
benefit some, and what is the cost for others".

Plus, I stumbled just by accident over a blog post from Tomas Vondra[0]
from 2015, who seems to agree that different page sizes can be useful:

https://blog.pgaddict.com/posts/postgresql-on-ssd-4kb-or-8kB-pages

Me thinks the next steps would be that more benchmarks are done on more
distinct hardware to see what benefits we can see, and weight this against
the complexity the patch introduces into the code base.

Hope that does make sense,

Tels

[0]: Tomas, great blog, btw!



Re: Condition variable live lock

2018-01-05 Thread Thomas Munro
On Fri, Jan 5, 2018 at 7:42 PM, Tom Lane  wrote:
> Indeed, it looks like no other caller is paying attention to the result.
> We could live with the uncertainty in the back branches, and redefine
> ConditionVariableSignal as returning void in master.

+1

Could we install the sentinel and pop the first entry at the same
time, so that we're not adding an extra spinlock acquire/release?

-- 
Thomas Munro
http://www.enterprisedb.com