date:20170922

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Peter Geoghegan

On Fri, Sep 22, 2017 at 8:56 PM, Tom Lane  wrote:
> The big concern I have here is that this feels a lot like something that
> we'll regret at leisure, if it's not right in the first release.  I'd
> much rather be restrictive in v10 and then loosen the rules later, than
> be lax in v10 and then have to argue about whether to break backwards
> compatibility in order to gain saner behavior.

To the bests of my knowledge, the only restriction implied by limiting
ourselves to the BCP 47 format (as part of standardizing what is
stored in pg_collation) is that users might know about the traditional
locale strings from some other place, and be surprised when their
knowledge doesn't transfer to Postgres. Personally, I don't think that
that's a big deal. If it actually is important, then I'm surprised
that it took this long for a doc change mentioning it to be proposed
(though the docs *do* say "Collations provided by ICU are created with
names in BCP 47 language tag format").

>> We have never canonicalized collations before and therefore it is not
>> essential that we do that now.
>
> Actually, we try; see initdb.c's check_locale_name().  It's not our
> fault that setlocale(3) fails to play along on many platforms.

But it will be our fault if we ship a v10 that does the kind of
unsettled canonicalization you see within
pg_import_system_collations() (the "collcollate =
U_ICU_VERSION_MAJOR_NUM >= 54 ? langtag : name" thing). That looks
very much like the tail wagging the dog to me.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench - use enum for meta commands

2017-09-22 Thread Pavel Stehule

2017-09-23 5:45 GMT+02:00 Fabien COELHO :

>
> Minor code enhancement.
>
> While having a look at adding if/elif/else/endif to pgbench, and given the
> current gset/cset added meta commands in cf queue, it occured to me that
> repeated string comparisons to check for the various meta commands is
> neither efficient nor readable. Use an enum instead, which are extensively
> used already for other similar purposes.
>

+1

Pavel


> --
> Fabien.
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Tom Lane

Robert Haas  writes:
> Peter, with respect, it's time to let this argument go.  We're
> scheduled to wrap a GA release in just over 72 hours.

FWIW, the release is a week from Monday, not Monday.  (Or if it is
Monday, somebody else is wrapping it.)

We have some other embarrassingly critical things to fix, like bug #14825,
so I can certainly sympathize with an argument that there's not enough
committer bandwidth left to deal with this; but not with an argument that
it's too late to change behavior period.

The big concern I have here is that this feels a lot like something that
we'll regret at leisure, if it's not right in the first release.  I'd
much rather be restrictive in v10 and then loosen the rules later, than
be lax in v10 and then have to argue about whether to break backwards
compatibility in order to gain saner behavior.

> We have never canonicalized collations before and therefore it is not
> essential that we do that now.

Actually, we try; see initdb.c's check_locale_name().  It's not our
fault that setlocale(3) fails to play along on many platforms.

> I simply do not buy the theory that this cannot be changed later.

OK, so you're promising not to whine when we break backwards compatibility
on this point in v11?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pgbench - use enum for meta commands

2017-09-22 Thread Fabien COELHO



Minor code enhancement.

While having a look at adding if/elif/else/endif to pgbench, and given the 
current gset/cset added meta commands in cf queue, it occured to me that 
repeated string comparisons to check for the various meta commands is 
neither efficient nor readable. Use an enum instead, which are extensively 
used already for other similar purposes.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e37496c..3a88150 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -373,11 +373,21 @@ typedef enum QueryMode
 static QueryMode querymode = QUERY_SIMPLE;
 static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
 
+typedef enum MetaCommand
+{
+	META_NONE,
+	META_SET,
+	META_SETSHELL,
+	META_SHELL,
+	META_SLEEP
+} MetaCommand;
+
 typedef struct
 {
 	char	   *line;			/* text of command line */
 	int			command_num;	/* unique index of this Command struct */
 	int			type;			/* command type (SQL_COMMAND or META_COMMAND) */
+	MetaCommand meta;			/* which meta command, if appropriate */
 	int			argc;			/* number of command words */
 	char	   *argv[MAX_ARGS]; /* command word list */
 	PgBenchExpr *expr;			/* parsed expression, if needed */
@@ -1721,6 +1731,26 @@ evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval
 	}
 }
 
+/* return meta-command enum identifier */
+static MetaCommand
+getMetaCommand(char * cmd)
+{
+	MetaCommand mc;
+	if (cmd == NULL)
+		mc = META_NONE;
+	else if (pg_strcasecmp(cmd, "set") == 0)
+		mc = META_SET;
+	else if (pg_strcasecmp(cmd, "setshell") == 0)
+		mc = META_SETSHELL;
+	else if (pg_strcasecmp(cmd, "shell") == 0)
+		mc = META_SHELL;
+	else if (pg_strcasecmp(cmd, "sleep") == 0)
+		mc = META_SLEEP;
+	else
+		mc = META_NONE;
+	return mc;
+}
+
 /*
  * Run a shell command. The result is assigned to the variable if not NULL.
  * Return true if succeeded, or false on error.
@@ -2218,7 +2248,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 		fprintf(stderr, "\n");
 	}
 
-	if (pg_strcasecmp(argv[0], "sleep") == 0)
+	if (command->meta == META_SLEEP)
 	{
 		/*
 		 * A \sleep doesn't execute anything, we just get the
@@ -2244,7 +2274,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 	}
 	else
 	{
-		if (pg_strcasecmp(argv[0], "set") == 0)
+		if (command->meta == META_SET)
 		{
 			PgBenchExpr *expr = command->expr;
 			PgBenchValue result;
@@ -2263,7 +2293,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 break;
 			}
 		}
-		else if (pg_strcasecmp(argv[0], "setshell") == 0)
+		else if (command->meta == META_SETSHELL)
 		{
 			bool		ret = runShellCommand(st, argv[1], argv + 2, argc - 2);
 
@@ -2283,7 +2313,7 @@ doCustom(TState *thread, CState *st, StatsData *agg)
 /* succeeded */
 			}
 		}
-		else if (pg_strcasecmp(argv[0], "shell") == 0)
+		else if (command->meta == META_SHELL)
 		{
 			bool		ret = runShellCommand(st, NULL, argv + 1, argc - 1);
 
@@ -3027,6 +3057,7 @@ process_sql_command(PQExpBuffer buf, const char *source)
 	my_command = (Command *) pg_malloc0(sizeof(Command));
 	my_command->command_num = num_commands++;
 	my_command->type = SQL_COMMAND;
+	my_command->meta = META_NONE;
 	initSimpleStats(_command->stats);
 
 	/*
@@ -3095,7 +3126,9 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 	my_command->argv[j++] = pg_strdup(word_buf.data);
 	my_command->argc++;
 
-	if (pg_strcasecmp(my_command->argv[0], "set") == 0)
+	my_command->meta = getMetaCommand(my_command->argv[0]);
+
+	if (my_command->meta == META_SET)
 	{
 		/* For \set, collect var name, then lex the expression. */
 		yyscan_t	yyscanner;
@@ -3150,7 +3183,7 @@ process_backslash_command(PsqlScanState sstate, const char *source)
   expr_scanner_offset(sstate),
   true);
 
-	if (pg_strcasecmp(my_command->argv[0], "sleep") == 0)
+	if (my_command->meta == META_SLEEP)
 	{
 		if (my_command->argc < 2)
 			syntax_error(source, lineno, my_command->line, my_command->argv[0],
@@ -3191,13 +3224,13 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 			 my_command->argv[2], offsets[2] - start_offset);
 		}
 	}
-	else if (pg_strcasecmp(my_command->argv[0], "setshell") == 0)
+	else if (my_command->meta == META_SETSHELL)
 	{
 		if (my_command->argc < 3)
 			syntax_error(source, lineno, my_command->line, my_command->argv[0],
 		 "missing argument", NULL, -1);
 	}
-	else if (pg_strcasecmp(my_command->argv[0], "shell") == 0)
+	else if (my_command->meta == META_SHELL)
 	{
 		if (my_command->argc < 2)
 			syntax_error(source, lineno, my_command->line, my_command->argv[0],
@@ -3205,6 +3238,7 @@ process_backslash_command(PsqlScanState sstate, const char *source)
 	}
 	else
 	{
+		/* my_command->meta == META_NONE */
 		syntax_error(source, lineno, my_command->line, my_command->argv[0],
 	 "invalid command", NULL, -1);
 	}

Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-22 Thread Tom Lane

Andrew Dunstan  writes:
> On 09/22/2017 05:46 PM, Tom Lane wrote:
>> I'm not sure if that qualifies as a stop-ship problem, but it ain't
>> good, for sure.  We need to look at whether we should revert 15bc038f9
>> or somehow revise its rules.

> I wonder if we wouldn't be better
> doing this more directly, keeping a per-transaction hash of unsafe enum
> values (which will almost always be empty). It might even speed up the
> check.

Yeah, I was considering the same thing over dinner, though I'd phrase
it oppositely: keep a list of enum type OIDs created in the current
transaction, so that we could whitelist them.  This could maybe become
a problem if someone created a zillion enums in one xact, though.

The immediate question is do we care to design/implement such a thing
post-RC1.  I'd have to vote "no".  I think the most prudent thing to
do is revert 15bc038f9 and then have another go at it during the v11
cycle.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Peter Geoghegan

On Fri, Sep 22, 2017 at 5:58 PM, Robert Haas  wrote:
> On Fri, Sep 22, 2017 at 4:46 PM, Peter Geoghegan  wrote:
>> But you are *already* canonicalizing ICU collation names as BCP 47. My
>> point here is: Why not finish the job off, and *also* canonicalize
>> colcollate in the same way?
>
> Peter, with respect, it's time to let this argument go.  We're
> scheduled to wrap a GA release in just over 72 hours.  It is far too
> late to change behavior like this.

I didn't say that it wasn't. That's above my paygrade.

> On the substantive issue, I am inclined (admittedly without deep
> study) to agree with Peter Eisentraut.  We have never canonicalized
> collations before and therefore it is not essential that we do that
> now.

As I've said, we're *already* canonicalizing them for ICU. Just not
consistently (across ICU versions, and arguably even within ICU
versions). That's the problem -- we're half way between both
positions.

The problem is most emphatically *not* that we've failed to
canonicalize them in the way that I happen to favor.

> That would be a new feature, and I don't think I'd be prepared
> to endorse adding it three days after feature freeze let alone three
> days before the GA wrap.  I do agree that the lack of canonicalization
> is utterly terrible.  The APIs that Unix-like operating systems
> provide for collations are poorly suited to our purposes and
> hopelessly squishy about semantics, and it's not clear how much better
> ICU will be.

In one important sense, this is a regression against libc, because you
never had something like en_US.UTF-8 break on downgrading glibc
version (like, when you restore a basebackup on a different OS with
the same arch). Sure, you probably had to REINDEX text indexes, to be
on the safe side, but once you did that there was no question about
the older glibc having never heard of "en_US.UTF-8" as a
LC_COLLATE/collcollate.

I regret that I didn't catch it sooner. It now seems very obvious, and
totally preventable given enough time.

> I simply do not buy the theory that this cannot be changed later.in

It can be changed later, of course -- at greater, though indeterminate cost.

> It's been the case for as long as we've had pg_collate that a new
> system could have different collations than the old one, resulting in
> a dump/restore failure.  I expect somebody's had that problem at some
> point, but I don't think it's become a major pain point because most
> people don't use exotic collations, and if they do they probably
> understand that they need those exotic collations to be on the new
> system too.

Like I said, you don't need exotic collations to have the downgrade
problem, unless *any* initdb ICU collation counts as exotic. No CREATE
COLLATION is needed.

> I also believe that Peter Eisentraut is entirely correct to be
> concerned about whether BCP 47 (or anything else) can really be
> regarded as a stable canonical form for ICU purposes.  His email
> indicates that the acceptable and canonical forms have changed
> multiple times in the course of releases new enough for us to care
> about them.  Assuming that statement is correct, it would be extremely
> short-sighted of us to bank on them not changing any more.

That statement isn't correct. Including even the suggestion that Peter
Eisentraut ever thought it. ICU uses BCP 47 for collation name *across
all versions*. Just not as the collcollate value (that's only the case
on versions of ICU >= 54).

> But even if all of the above argumentation is utterly and completely
> wrong, dredged up from the universe's deepest and most profound
> reserves of stupidity and destined for future entry into Webster's as
> the canonical example of cluelessness, we still shouldn't change it
> the weekend before the GA wraps.

That seems like a value judgement. I'm not going to tell you that
you're wrong. What I will say is that I think we've done poorly here.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-22 Thread Andres Freund

Hi,

On 2017-09-22 17:11:47 -0700, Gregory Brail wrote:
> Also in lieu of the new snapshot mechanism for logical replication, which
> might not work for us

This needs context...

>, we were using the transaction ID to calculate what
> was committed in a client's snapshot and what they need to apply to their
> own local database. That relied on the transaction ID, and we wanted to use
> a 64-bit ID so that we could handle rollover. We ended up doing this:
> 
> https://github.com/apigee-labs/transicator/blob/2d5dc596a5f2f5e13967e0f1943f248d88eac1e7/pgoutput/transicator_output.c#L151
> 
> It looks to me like the new stuff only puts a 32-bit "xid" in there. Would
> there be a way to include the epoch as well? (And yes, I realize it might
> require a more detailed explanation which I'm happy to put together.)

It'd be good to see some more detail here, indeed. Especially if you
could look at what pgoutput provides, and whether that's sufficient.
I'm not entirely sure how much we want to make pgoutput configurable, in
contrast to adding something that's intended to be very configurable at
the price of some performance and bandwidth...

Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-22 Thread Andrew Dunstan



On 09/22/2017 05:46 PM, Tom Lane wrote:
> bal...@obiserver.hu writes:
>> PostgreSQL version: 10beta4
>> testdb=# begin;
>> BEGIN
>> testdb=# create type enumtype as enum ('v1', 'v2');
>> CREATE TYPE
>> testdb=# alter type enumtype owner to testrole;
>> ALTER TYPE
>> testdb=# create table testtable (enumcolumn enumtype not null default 'v1');
>> ERROR:  unsafe use of new value "v1" of enum type enumtype
>> HINT:  New enum values must be committed before they can be used.
> Hmm, that's pretty annoying.  It's happening be

> cause check_safe_enum_use
> insists that the enum's pg_type entry not be updated-in-transaction.
> We thought that the new rules instituted by 15bc038f9 would be generally
> more convenient to use than the old ones --- but this example shows
> that, for example, pg_dump scripts involving enums often could not
> be restored inside a single transaction.
>
> I'm not sure if that qualifies as a stop-ship problem, but it ain't
> good, for sure.  We need to look at whether we should revert 15bc038f9
> or somehow revise its rules.



:-(


The only real problem comes from adding a value to an enum that has been
created in an earlier transaction and then using that enum value. What
we're doing here is essentially a heuristic test for that condition, and
we're getting some false positives. I wonder if we wouldn't be better
doing this more directly, keeping a per-transaction hash of unsafe enum
values (which will almost always be empty). It might even speed up the
check.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Robert Haas

On Fri, Sep 22, 2017 at 4:46 PM, Peter Geoghegan  wrote:
> But you are *already* canonicalizing ICU collation names as BCP 47. My
> point here is: Why not finish the job off, and *also* canonicalize
> colcollate in the same way?

Peter, with respect, it's time to let this argument go.  We're
scheduled to wrap a GA release in just over 72 hours.  It is far too
late to change behavior like this.  There is no time for other people
who may be interested in this issue to form a well-considered opinion
on the topic and carefully review a proposed patch.  There is also no
time for users to notice it in the next beta and complain before we go
final.  This ship has sailed.

On the substantive issue, I am inclined (admittedly without deep
study) to agree with Peter Eisentraut.  We have never canonicalized
collations before and therefore it is not essential that we do that
now.  That would be a new feature, and I don't think I'd be prepared
to endorse adding it three days after feature freeze let alone three
days before the GA wrap.  I do agree that the lack of canonicalization
is utterly terrible.  The APIs that Unix-like operating systems
provide for collations are poorly suited to our purposes and
hopelessly squishy about semantics, and it's not clear how much better
ICU will be.  But that's a problem that we should address, if at all,
at a deliberate pace and with adequate time for reflection, research,
and comment, not precipitously and under extreme time pressure.

I simply do not buy the theory that this cannot be changed later.
It's been the case for as long as we've had pg_collate that a new
system could have different collations than the old one, resulting in
a dump/restore failure.  I expect somebody's had that problem at some
point, but I don't think it's become a major pain point because most
people don't use exotic collations, and if they do they probably
understand that they need those exotic collations to be on the new
system too.  So, if we decide to change this later, we'll want to find
ways to make the upgrade as pain-free as possible and document
whatever the situation may be, but we've made many
backward-incompatible changes in the past and this one would hardly be
the worst.

I also believe that Peter Eisentraut is entirely correct to be
concerned about whether BCP 47 (or anything else) can really be
regarded as a stable canonical form for ICU purposes.  His email
indicates that the acceptable and canonical forms have changed
multiple times in the course of releases new enough for us to care
about them.  Assuming that statement is correct, it would be extremely
short-sighted of us to bank on them not changing any more.

But even if all of the above argumentation is utterly and completely
wrong, dredged up from the universe's deepest and most profound
reserves of stupidity and destined for future entry into Webster's as
the canonical example of cluelessness, we still shouldn't change it
the weekend before the GA wraps.  I'm afraid that this new RMT process
has lulled us into believing that the release will happen on time no
matter how much stuff we whack around at the last minute, which is a
very dangerous idea for a group of software engineers to have.
Before, we thought we had infinite time to fix our bugs; now, we think
we have infinite latitude to classify anything we don't like as a bug.
Neither of those ideas is good software engineering.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-22 Thread Gregory Brail

Thanks! I didn't read the version 10 RC docs carefully enough.

I found the code that generates the protocol message (looks like its
replication/proto/logical.c). Are there docs somewhere on the format, or is
it just the code?

Also in lieu of the new snapshot mechanism for logical replication, which
might not work for us, we were using the transaction ID to calculate what
was committed in a client's snapshot and what they need to apply to their
own local database. That relied on the transaction ID, and we wanted to use
a 64-bit ID so that we could handle rollover. We ended up doing this:

https://github.com/apigee-labs/transicator/blob/2d5dc596a5f2f5e13967e0f1943f248d88eac1e7/pgoutput/transicator_output.c#L151

It looks to me like the new stuff only puts a 32-bit "xid" in there. Would
there be a way to include the epoch as well? (And yes, I realize it might
require a more detailed explanation which I'm happy to put together.)

On Fri, Sep 22, 2017 at 4:01 PM, Alvaro Hernandez  wrote:

>
>
> On 23/09/17 00:28, Gregory Brail wrote:
>
>> We have been working on a project that makes extensive use of logical
>> replication for use inside Apigee (which is a very small part of Google):
>>
>> https://github.com/apigee-labs/transicator
>>
>> In order to do this, we had to write our own logical replication plugin
>> because the supplied "test_decoding" plugin from the "contrib" directory
>> was hard for us to work with. Primarily:
>>
>> 1) It doesn't include all the fields that we need for Transicator (most
>> importantly we need the LSN and the 64-bit transaction ID).
>> 2) It outputs a text format that is hard to parse.
>>
>> I imagine that other users of logical decoding are facing similar
>> problems.
>>
>> Would the community support the development of another plugin that is
>> distributed as part of "contrib" that addresses these issues? I'd be happy
>> to submit a patch, or GitHub repo, or whatever works best as an example.
>> (Also, although Transicator uses protobuf, I'm happy to have it output a
>> simple binary format as well.)
>>
>> As a side note, doing this would also help making logical decoding a
>> useful feature for customers of Amazon and Google's built-in Postgres
>> hosting options. In those environments, there's no way to add a custom
>> plugin to Postgres, so anything not built in the product tends to be harder
>> for someone to consume.
>>
>> If anyone is interested in looking more:
>>
>> The plugin code is here:
>> https://github.com/apigee-labs/transicator/tree/master/pgoutput
>>
>> and produces output defined by the "ChangePb" structure defined here:
>> https://github.com/apigee-labs/transicator/blob/master/commo
>> n/transicator.proto
>>
>
> How about using pgoutput, which is included by default in PostgreSQL
> 10, as the basis for logical replication?
>
>
> Cheers,
>
> Álvaro
>
>
> --
>
> Alvaro Hernandez
>
>
> ---
> OnGres
>
>

Re: [HACKERS] Built-in plugin for logical decoding output

2017-09-22 Thread Alvaro Hernandez




On 23/09/17 00:28, Gregory Brail wrote:
We have been working on a project that makes extensive use of logical 
replication for use inside Apigee (which is a very small part of Google):


https://github.com/apigee-labs/transicator

In order to do this, we had to write our own logical replication 
plugin because the supplied "test_decoding" plugin from the "contrib" 
directory was hard for us to work with. Primarily:


1) It doesn't include all the fields that we need for Transicator 
(most importantly we need the LSN and the 64-bit transaction ID).

2) It outputs a text format that is hard to parse.

I imagine that other users of logical decoding are facing similar 
problems.


Would the community support the development of another plugin that is 
distributed as part of "contrib" that addresses these issues? I'd be 
happy to submit a patch, or GitHub repo, or whatever works best as an 
example. (Also, although Transicator uses protobuf, I'm happy to have 
it output a simple binary format as well.)


As a side note, doing this would also help making logical decoding a 
useful feature for customers of Amazon and Google's built-in Postgres 
hosting options. In those environments, there's no way to add a custom 
plugin to Postgres, so anything not built in the product tends to be 
harder for someone to consume.


If anyone is interested in looking more:

The plugin code is here:
https://github.com/apigee-labs/transicator/tree/master/pgoutput

and produces output defined by the "ChangePb" structure defined here:
https://github.com/apigee-labs/transicator/blob/master/common/transicator.proto


    How about using pgoutput, which is included by default in 
PostgreSQL 10, as the basis for logical replication?



    Cheers,

    Álvaro


--

Alvaro Hernandez


---
OnGres



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Built-in plugin for logical decoding output

2017-09-22 Thread Gregory Brail

We have been working on a project that makes extensive use of logical
replication for use inside Apigee (which is a very small part of Google):

https://github.com/apigee-labs/transicator

In order to do this, we had to write our own logical replication plugin
because the supplied "test_decoding" plugin from the "contrib" directory
was hard for us to work with. Primarily:

1) It doesn't include all the fields that we need for Transicator (most
importantly we need the LSN and the 64-bit transaction ID).
2) It outputs a text format that is hard to parse.

I imagine that other users of logical decoding are facing similar problems.

Would the community support the development of another plugin that is
distributed as part of "contrib" that addresses these issues? I'd be happy
to submit a patch, or GitHub repo, or whatever works best as an example.
(Also, although Transicator uses protobuf, I'm happy to have it output a
simple binary format as well.)

As a side note, doing this would also help making logical decoding a useful
feature for customers of Amazon and Google's built-in Postgres hosting
options. In those environments, there's no way to add a custom plugin to
Postgres, so anything not built in the product tends to be harder for
someone to consume.

If anyone is interested in looking more:

The plugin code is here:
https://github.com/apigee-labs/transicator/tree/master/pgoutput

and produces output defined by the "ChangePb" structure defined here:
https://github.com/apigee-labs/transicator/blob/master/common/transicator.proto

Re: [HACKERS] Small improvement to compactify_tuples

2017-09-22 Thread Claudio Freire

On Tue, Sep 12, 2017 at 12:49 PM, Sokolov Yura
 wrote:
> On 2017-07-21 13:49, Sokolov Yura wrote:
>>
>> On 2017-05-17 17:46, Sokolov Yura wrote:
>>>
>>> Alvaro Herrera писал 2017-05-15 20:13:

 As I understand, these patches are logically separate, so putting them
 together in a single file isn't such a great idea.  If you don't edit
 the patches further, then you're all set because we already have the
 previously archived patches.  Next commitfest starts in a few months
 yet, and if you feel the need to submit corrected versions in the
 meantime, please do submit in separate files.  (Some would even argue
 that each should be its own thread, but I don't think that's necessary.)
>>>
>>>
>>> Thank you for explanation.
>>>
>>> I'm adding new version of first patch with minor improvement:
>>> - I added detection of a case when all buckets are trivial
>>>   (ie 0 or 1 element). In this case no need to sort buckets at all.
>>
>>
>> I'm putting rebased version of second patch.
>
>
> Again rebased version of both patches.
> Now second patch applies cleanly independent of first patch.

Patch 1 applies cleanly, builds, and make check runs fine.

The code looks similar in style to surrounding code too, so I'm not
going to complain about the abundance of underscores in the macros :-p

I can reproduce the results in the OP's benchmark, with slightly
different numbers, but an overall improvement of ~6%, which matches
the OP's relative improvement.

Algorithmically, everything looks sound.


A few minor comments about patch 1:

+if (max == 1)
+goto end;

That goto is unnecessary, you could just as simply say

if (max > 1)
{
   ...
}


+#define pg_shell_sort_pass(elem_t, cmp, off) \
+do { \
+int _i, _j; \
+elem_t _temp; \
+for (_i = off; _i < _n; _i += off) \
+{ \

_n right there isn't declared in the macro, and it isn't an argument
either. It should be an argument, having stuff inherited from the
enclosing context like that is confusing.

Same with _arr, btw.


Patch 2 LGTM.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [BUGS] BUG #14825: enum type: unsafe use?

2017-09-22 Thread Tom Lane

bal...@obiserver.hu writes:
> PostgreSQL version: 10beta4

> testdb=# begin;
> BEGIN
> testdb=# create type enumtype as enum ('v1', 'v2');
> CREATE TYPE
> testdb=# alter type enumtype owner to testrole;
> ALTER TYPE
> testdb=# create table testtable (enumcolumn enumtype not null default 'v1');
> ERROR:  unsafe use of new value "v1" of enum type enumtype
> HINT:  New enum values must be committed before they can be used.

Hmm, that's pretty annoying.  It's happening because check_safe_enum_use
insists that the enum's pg_type entry not be updated-in-transaction.
We thought that the new rules instituted by 15bc038f9 would be generally
more convenient to use than the old ones --- but this example shows
that, for example, pg_dump scripts involving enums often could not
be restored inside a single transaction.

I'm not sure if that qualifies as a stop-ship problem, but it ain't
good, for sure.  We need to look at whether we should revert 15bc038f9
or somehow revise its rules.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Rethinking autovacuum.c memory handling

2017-09-22 Thread Tom Lane

I notice that autovacuum.c calls autovacuum_do_vac_analyze, and
thereby vacuum(), in TopTransactionContext.  This doesn't seem
like a terribly great idea, because it doesn't correspond to what
happens during a manually-invoked vacuum.  TopTransactionContext
will go away when vacuum() commits the outer transaction, whereas
in non-autovac usage, we call vacuum() in a PortalHeapMemory
context that is not a child of TopTransactionContext and is not
at risk of being reset multiple times during the vacuum().  This'd
be a hazard if autovacuum_do_vac_analyze or vacuum did any palloc's
before getting to the main loop.  More generally, I'm not aware of
other cases where we invoke a function in a context that we know
that function will destroy as it executes.

I don't see any live bug associated with this in HEAD, but this behavior
requires a rather ugly (and memory-leaking) workaround in the proposed
patch to allow multiple vacuum target rels.

What I think we should do instead is invoke autovacuum_do_vac_analyze
in the PortalContext that do_autovacuum has created, which we already
have a mechanism to reset once per table processed in do_autovacuum.

The attached patch does that, and also modifies perform_work_item()
to use the same approach.  Right now perform_work_item() has a
copied-and-pasted MemoryContextResetAndDeleteChildren(PortalContext)
call in its error recovery path, but that seems a bit out of place
given that perform_work_item() isn't using PortalContext otherwise.

Comments, objections?

regards, tom lane

PS: I was disappointed to find out that perform_work_item() isn't
exercised at all in the standard regression tests.

diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index b745d89..1a32433 100644
*** a/src/backend/postmaster/autovacuum.c
--- b/src/backend/postmaster/autovacuum.c
*** do_autovacuum(void)
*** 2444,2451 
  		 */
  		PG_TRY();
  		{
  			/* have at it */
- 			MemoryContextSwitchTo(TopTransactionContext);
  			autovacuum_do_vac_analyze(tab, bstrategy);
  
  			/*
--- 2444,2453 
  		 */
  		PG_TRY();
  		{
+ 			/* Use PortalContext for any per-table allocations */
+ 			MemoryContextSwitchTo(PortalContext);
+ 
  			/* have at it */
  			autovacuum_do_vac_analyze(tab, bstrategy);
  
  			/*
*** do_autovacuum(void)
*** 2482,2487 
--- 2484,2492 
  		}
  		PG_END_TRY();
  
+ 		/* Make sure we're back in AutovacMemCxt */
+ 		MemoryContextSwitchTo(AutovacMemCxt);
+ 
  		did_vacuum = true;
  
  		/* the PGXACT flags are reset at the next end of transaction */
*** perform_work_item(AutoVacuumWorkItem *wo
*** 2614,2619 
--- 2619,2627 
  
  	autovac_report_workitem(workitem, cur_nspname, cur_datname);
  
+ 	/* clean up memory before each work item */
+ 	MemoryContextResetAndDeleteChildren(PortalContext);
+ 
  	/*
  	 * We will abort the current work item if something errors out, and
  	 * continue with the next one; in particular, this happens if we are
*** perform_work_item(AutoVacuumWorkItem *wo
*** 2622,2630 
  	 */
  	PG_TRY();
  	{
! 		/* have at it */
! 		MemoryContextSwitchTo(TopTransactionContext);
  
  		switch (workitem->avw_type)
  		{
  			case AVW_BRINSummarizeRange:
--- 2630,2639 
  	 */
  	PG_TRY();
  	{
! 		/* Use PortalContext for any per-work-item allocations */
! 		MemoryContextSwitchTo(PortalContext);
  
+ 		/* have at it */
  		switch (workitem->avw_type)
  		{
  			case AVW_BRINSummarizeRange:
*** perform_work_item(AutoVacuumWorkItem *wo
*** 2668,2673 
--- 2677,2685 
  	}
  	PG_END_TRY();
  
+ 	/* Make sure we're back in AutovacMemCxt */
+ 	MemoryContextSwitchTo(AutovacMemCxt);
+ 
  	/* We intentionally do not set did_vacuum here */
  
  	/* be tidy */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

2017-09-22 Thread Robert Haas

On Fri, Sep 22, 2017 at 4:23 PM, Julien Rouhaud  wrote:
> That was one of the first question we had with the initial POC.  The
> reason is that this "sorted append" is not using a merge algorithm but
> just appending partitions in the right order, so it looked like we
> could either create a new SortedAppend node, or use current Append
> node and allow it to return sorted data.  We chose the 2nd solution,
> and ended up with a lot of duplicated code (all the code for the
> ordering), so we tried to have Append and MergeAppend share this code.
> I'm not at all satisfied with current shape, but I'm still not sure on
> what node to use for this.  Do you have any idea on this?

During planning, *every* node has a list of pathkeys associated with
it which represent the presumed output ordering.  You can support
ordered append paths without changing any data structures at all; it's
just a matter of putting pathkeys into an AppendPath.

The reason why MergeAppend has extra stuff in the node (numCols,
sortColIdx, etc.) is so that it can actually perform the sort at
runtime.  But this feature requires no runtime support -- that's kinda
the point -- so there's no need for it to carry that information, or
to add any new nodes.

At least, IIUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

2017-09-22 Thread Peter Eisentraut

On 9/19/17 19:00, Michael Paquier wrote:
> On Wed, Sep 20, 2017 at 7:57 AM, Peter Eisentraut
>  wrote:
>> To get things rolling, I have committed just the basic TAP tests, to
>> give it a spin on the build farm.  I'll work through the rest in the
>> coming days.

I have reverted this because of the build farm issue.  Putting the patch
on hold in the CF until we have a new plan.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Peter Geoghegan

On Fri, Sep 22, 2017 at 11:34 AM, Peter Eisentraut
 wrote:
> After reviewing this thread and testing around a bit, I think the code
> is mostly fine as it is, but the documentation is lacking.  Hence,
> attached is a patch to expand the documentation quite a bit, especially
> to document in more detail what ICU locale strings are accepted.
>
> The documentation has always stated, albeit perhaps in obscure ways,
> that we accept for locales whatever ICU accepts.  I don't think we
> should restrict or override that in any way.  That would cause existing
> documentation and lore on ICU to be invalid for PostgreSQL.

I think that this thread is mostly about three fairly different
things. I didn't plan it that way, but then I only realized the second
two while investigating the first. Those are:

1. The question of whether and to what extent we should sanitize the
colcollate string provided by the user within CREATE COLLATION for the
ICU collation provider.

2. The question of what ends up in pg_collation at initdb time. In
particular, the format of colcollate.

3. The question of whether or not we should ever accept a locale in
the traditional format, rather than insisting on BCP 47 in every
context. (This may have become conflated with issue 1.)

> I specifically disagree that we should, as appears to be suggested here,
> restrict the input locale strings to BCP 47 and reject or transform the
> traditional ICU-specific locale syntax.  Again, that would cause
> existing ICU documentation material to become less applicable to
> PostgreSQL.  And basically, there is no reason for it, because I am not
> aware that ICU plans to get rid of that syntax.

I disagree, because ICU/CLDR seems to want to standardize on BCP 47
(with custom extensions), but I acknowledge that you have a point
here. This isn't what I think is important, among all the points
raised on this thread. I can let this go.

> Moreover, we need to
> support that syntax for older ICU versions anyway.

FWIW, I don't think that we *need* to support it for older ICU
versions, except as an implementation detail that gets us to and from
BCP 47 as needed.

> A patch has been
> posted that, as I understand it, would allow BCP 47 syntax to be used
> with older versions as well.  That's a nice idea, but it's a new feature, 
> which may have been my fault, which may have been my fault
> and not appropriate for PG10 at this time.
>
> (Note that we also don't canonicalize libc locale names.)

But you are *already* canonicalizing ICU collation names as BCP 47. My
point here is: Why not finish the job off, and *also* canonicalize
colcollate in the same way? This won't break ucol_open() if we take
appropriate precautions when we go to use the Postgres collation/ICU
locale. One concern that makes me suggest this is: What happens when
the user *downgrades* ICU version, from a version where colcollate is
BCP 47 to one where it would not have been at initdb time? That will
break the downgrade in an unpleasant way, including in installations
that never do a CREATE COLLATION themselves. We want to be able to
restore a basebackup on a somewhat different OS, and have that still
work following REINDEX. At least, that seems like it should be an
important goal for us.

I want Postgres to insist on always using BCP 47 in CREATE COLLATION
for a few reasons. One that is relevant to this colcollate question is
that by insisting on BCP 47 within CREATE COLLATION, there is no
question of CREATE COLLATION having to consider the legacy locale
format on ICU versions that don't handle both at the same time too
well (this at least includes ICU 42). We'd always only accept BCP 47
from users, we'd always store BCP 47 as colcollate (and collation
name), and we'd always use the traditional locale string format as an
argument to ucol_open(). Consistency of interface and implementation,
across all ICU versions.

I might actually be convinced by what you say here if we weren't
already canonicalizing collation name as BCP 47 (though I also
understand why you did that).

> During testing with various versions I have also discovered the
> following things:
>
> - The current code appears to be of the opinion that BCP 47 locale names
> are accepted as of ICU 54.  That appears to be incorrect; I find that
> they work fine in ICU 52, but they don't work in ICU 4.2.  I don't know
> where the cutoff is.  That might be worth changing in the code if we can
> obtain more information.

I fear that BCP 47 is only quasi supported (without the proper
conversion by uloc_forLanguageTag()) prior to ICU 54 (the version
where it is actually documented as supported). We've already seen
plenty of cases where ucol_open() locale string interpretation
soldiers on, when it arguably shouldn't, so I hope that that isn't
what you actually see on ICU 52. I strongly suggest looking at a
variety of display names at CREATE COLLATION time (I can provide a
rough patch that shows display name

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

2017-09-22 Thread Julien Rouhaud

On Fri, Sep 22, 2017 at 9:58 PM, Robert Haas  wrote:
> On Fri, Sep 22, 2017 at 3:34 PM, Julien Rouhaud  wrote:
>> PFA v3 of the patch, once again rebased and that now use all the newly
>> available infrastructure.
>>
>> I also added a check that a sorted append can't be generated when a
>> default partition has been declared.
>
> I just took a quick look at this and was kind of surprised to see that
> it didn't look much like what I would expect.
>
> What I would expect is:
>[...]

Thanks a lot for the pointers, that's really helpful!

>  The extensive patch does a lot of other stuff, like
> whacking around the structure of AppendPath vs. MergeAppendPath, and
> I'm not sure why we need or want any of that.  I might be missing
> something, though.

That was one of the first question we had with the initial POC.  The
reason is that this "sorted append" is not using a merge algorithm but
just appending partitions in the right order, so it looked like we
could either create a new SortedAppend node, or use current Append
node and allow it to return sorted data.  We chose the 2nd solution,
and ended up with a lot of duplicated code (all the code for the
ordering), so we tried to have Append and MergeAppend share this code.
I'm not at all satisfied with current shape, but I'm still not sure on
what node to use for this.  Do you have any idea on this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10RC1 crash testing MultiXact oddity

2017-09-22 Thread Robert Haas

On Fri, Sep 22, 2017 at 3:39 PM, Jeff Janes  wrote:
> It turns out it is not new in pg10.  I spotted in the log file only by
> accident while looking for something else.  Now that I am looking for it, I
> do see it in 9.6 as well.

So I guess the next question is whether it also shows up if you initdb
with 9.4.latest and then run the same test.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

2017-09-22 Thread Robert Haas

On Fri, Sep 22, 2017 at 3:34 PM, Julien Rouhaud  wrote:
> PFA v3 of the patch, once again rebased and that now use all the newly
> available infrastructure.
>
> I also added a check that a sorted append can't be generated when a
> default partition has been declared.

I just took a quick look at this and was kind of surprised to see that
it didn't look much like what I would expect.

What I would expect is:

1. The PartitionDescData grows a new flag indicating whether
partitions can be regarded as being in bound order.  This is true for
range partitions without a default partition, list partitions without
a default partition and without interleaved bounds, and maybe other
cases we want to worry about later.  We set this flag correctly when
we build the PartitionDesc and propagate it into the RelOptInfo for
the partitioned table when we set up its other partitioning details.

2. generate_mergeappend_paths() gets renamed to
generate_sorted_append_paths().  At the top of the function, it checks
whether the above flag is set; if not, give up on this optimization.
Otherwise, figure out what the set of pathkeys that correspond to the
partition key would look like.  Call these the
partition_order_pathkeys.

3. For each set of possible pathkeys, it checks whether the proposed
pathkeys are equal to (or an initial prefix of) the
partition_order_pathkeys.

4. If so, instead of calling create_merge_append_path(), it calls
create_append_path().

5. create_append_path() gets the proposed pathkeys via a new List
*pathkeys argument and sticks them on the path.

And that's it.  The extensive patch does a lot of other stuff, like
whacking around the structure of AppendPath vs. MergeAppendPath, and
I'm not sure why we need or want any of that.  I might be missing
something, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Page Scan Mode in Hash Index

2017-09-22 Thread Ashutosh Sharma

On Fri, Sep 22, 2017 at 11:56 PM, Robert Haas  wrote:
> On Thu, Sep 21, 2017 at 3:08 AM, Ashutosh Sharma  
> wrote:
>> I have added a note for handling of logged and unlogged tables in
>> README file and also corrected the header comment for
>> hashbucketcleanup(). Please find the attached 0003*.patch having these
>> changes. Thanks.
>
> I committed 0001 and 0002 with some additional edits as
> 7c75ef571579a3ad7a1d3ee909f11dba5e0b9440.  I also rebased 0003 and
> edited it a bit; see attached hash-cleanup-changes.patch.
>

Thanks for the commit. I had put lot of efforts for this and very
happy that it got committed. Thanks to Amit too for the detail review.

> I'm not entirely sold on 0003.  An alternative would be to rip the lsn
> stuff back out of HashScanPosData, and I think we ought to consider
> that.  Basically, 0003 is betting that getting rid of the
> lock-chaining in hash index vacuum is more valuable than being able to
> kill dead items more aggressively.  I bet that's a bad bet.
>
> In the case of btree indexes, since
> 2ed5b87f96d473962ec5230fd820abfeaccb2069, page-at-a-time scanning
> allows most btree index scans to avoid holding buffer pins when the
> scan is suspended, but we gain no such advantage here.  We always have
> to hold a pin on the primary bucket page anyway, so even with this
> patch cleanup is going to block when it hits a bucket containing a
> suspended scan.  0003 helps if (a) the relation is permanent, (b) the
> bucket has overflow pages, and (c) the scan is moving faster than
> vacuum and can overtake it instead of waiting.  But that doesn't seem
> like it will happen very often at all, whereas the LSN check will
> probably fail frequently and cause us to skip cleanup that we could
> usefully have done.  So I propose the attached hashscan-no-lsn.patch
> as an alternative.
>
> Thoughts?
>
> --

Yes, I too feel that 0003 patch won't help much. The reason being, the
chances of scan overtaking vacuum would be very rare and also
considering the fact that hash index is normally meant for unique
values (I mean that is when hash index is quite dominant over other
indexes) which means the chances of overflow pages in hash index won't
be high. Therefore, i feel, 0003 patch won't be much beneficial.
Honestly speaking, the code changes in 0003 looks a bit ugly as well.
So, yes, hashscan no-lsn.patch looks like a better option to me.
Thanks.

-- 
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10RC1 crash testing MultiXact oddity

2017-09-22 Thread Jeff Janes

On Fri, Sep 22, 2017 at 8:47 AM, Alvaro Herrera 
wrote:

> Jeff Janes wrote:
> > I am running some crash recovery testing against 10rc1 by injecting torn
> > page writes, using a test case which generates a lot of multixact, some
> > naturally by doing a lot fk updates, but most artificially by calling
> > the pg_burn_multixact function from one of the attached patches.
>
> Is this new in pg10, or do you also see it in 9.6?
>

It turns out it is not new in pg10.  I spotted in the log file only by
accident while looking for something else.  Now that I am looking for it, I
do see it in 9.6 as well.

Cheers,

Jeff

Re: [HACKERS] [Proposal] Make the optimiser aware of partitions ordering

2017-09-22 Thread Julien Rouhaud

On Thu, Sep 21, 2017 at 11:13 AM, Julien Rouhaud  wrote:
> On Thu, Sep 21, 2017 at 10:52 AM, Ashutosh Bapat
>  wrote:
>> With 9140cf8269b0c4ae002b2748d93979d535891311, we store the
>> RelOptInfos of partitions in the RelOptInfo of partitioned table. For
>> range partitioned table without default partition, they are arranged
>> in the ascending order of partition bounds. This patch may avoid
>> MergeAppend if the sort keys match partition keys by creating an
>> Append path by picking sorted paths from partition RelOptInfos in
>> order. You may use slightly modified version of
>> add_paths_to_append_rel().
>
>
> Yes, I just saw this commit this morning, and this is exactly what I
> was missing, thanks for the pointer and the patch!

PFA v3 of the patch, once again rebased and that now use all the newly
available infrastructure.

I also added a check that a sorted append can't be generated when a
default partition has been declared.
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index c1602c59cc..20e63b3204 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -80,6 +80,8 @@ static void show_upper_qual(List *qual, const char *qlabel,
ExplainState *es);
 static void show_sort_keys(SortState *sortstate, List *ancestors,
   ExplainState *es);
+static void show_append_keys(AppendState *mstate, List *ancestors,
+  ExplainState *es);
 static void show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
   ExplainState *es);
 static void show_agg_keys(AggState *astate, List *ancestors,
@@ -1602,6 +1604,10 @@ ExplainNode(PlanState *planstate, List *ancestors,
show_sort_keys(castNode(SortState, planstate), 
ancestors, es);
show_sort_info(castNode(SortState, planstate), es);
break;
+   case T_Append:
+   show_append_keys(castNode(AppendState, planstate),
+  ancestors, 
es);
+   break;
case T_MergeAppend:
show_merge_append_keys(castNode(MergeAppendState, 
planstate),
   ancestors, 
es);
@@ -1744,7 +1750,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
   ancestors, es);
break;
case T_MergeAppend:
-   ExplainMemberNodes(((MergeAppend *) plan)->mergeplans,
+   ExplainMemberNodes(((MergeAppend*) 
plan)->plan.appendplans,
   ((MergeAppendState 
*) planstate)->mergeplans,
   ancestors, es);
break;
@@ -1935,6 +1941,22 @@ show_sort_keys(SortState *sortstate, List *ancestors, 
ExplainState *es)
 ancestors, es);
 }
 
+/*
+ * Likewise, for an Append node.
+ */
+static void
+show_append_keys(AppendState *mstate, List *ancestors,
+  ExplainState *es)
+{
+   Append *plan = (Append *) mstate->ps.plan;
+
+   show_sort_group_keys((PlanState *) mstate, "Sort Key",
+plan->numCols, 
plan->sortColIdx,
+plan->sortOperators, 
plan->collations,
+plan->nullsFirst,
+ancestors, es);
+}
+
 /*
  * Likewise, for a MergeAppend node.
  */
@@ -1942,7 +1964,7 @@ static void
 show_merge_append_keys(MergeAppendState *mstate, List *ancestors,
   ExplainState *es)
 {
-   MergeAppend *plan = (MergeAppend *) mstate->ps.plan;
+   Append *plan = (Append *) mstate->ps.plan;
 
show_sort_group_keys((PlanState *) mstate, "Sort Key",
 plan->numCols, 
plan->sortColIdx,
diff --git a/src/backend/executor/nodeMergeAppend.c 
b/src/backend/executor/nodeMergeAppend.c
index 6bf490bd70..601f2547d3 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -64,6 +64,7 @@ MergeAppendState *
 ExecInitMergeAppend(MergeAppend *node, EState *estate, int eflags)
 {
MergeAppendState *mergestate = makeNode(MergeAppendState);
+   Append   *append = >plan;
PlanState **mergeplanstates;
int nplans;
int i;
@@ -76,12 +77,12 @@ ExecInitMergeAppend(MergeAppend *node, EState *estate, int 
eflags)
 * Lock the non-leaf tables in the partition tree

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-09-22 Thread Pavel Stehule

2017-09-22 21:12 GMT+02:00 Peter Eisentraut <
peter.eisentr...@2ndquadrant.com>:

> On 9/22/17 09:16, Pavel Stehule wrote:
> > Example: somebody set SORT_COLUMNS to schema_name value. This is
> > nonsense for \l command
> >
> > Now, I am thinking so more correct and practical design is based on
> > special mode, activated by variable
> >
> > PREFER_SIZE_SORT .. (off, asc, desc)
> >
> > This has sense for wide group of commands that can show size. And when
> > size is not visible, then this option is not active.
>
> Maybe this shouldn't be a variable at all.  It's not like you'll set
> this as a global preference.  You probably want it for one command only.
>  So a per-command option might make more sense.
>

Sure, I cannot to know, what users will do. But, when I need to see a size
of objects, then I prefer the sort by size desc every time. If I need to
find some object, then I can to use a searching in pager. So in my case,
this settings will be in psqlrc. In GoodData we used years own
customization - the order by size was hardcoded and nobody reported me any
issue.

Alexander proposed some per command option, but current syntax of psql
commands don't allows some simple parametrization. If it can be user
friendly, then it should be short. From implementation perspective, it
should be simply parsed. It should be intuitive too - too much symbols
together is not good idea.

Maybe some prefix design - but it is not design for common people (although
these people don't use psql usually)

'\sort size \dt ?

\dt:sort_by_size
\dt+:sort_by_size ?

I don't see any good design in this direction

Regards

Pavel

>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] Patch: add --if-exists to pg_recvlogical

2017-09-22 Thread Peter Eisentraut

On 9/20/17 02:26, Rosser Schwarz wrote:
> The more I think about it, I don't think it's nonsensical, though.
> --create-slot --if-exists or --drop-slot --if-not-exists, obviously. I
> mean, do you even logic?

Those pieces make sense.  We have many CREATE IF NOT EXISTS and DROP IF
EXISTS commands.  The use is clear.

> Those aside, --if-exists just means a non-existent slot isn't an error
> condition, doesn't it? --start --if-exists will start, if the slot
> exists. Otherwise it won't, in neither case raising an error. Exactly
> what it says on the tin. Perhaps the docs could make clear that
> combination implies --no-loop (or at least means we'll exit immediately)
> if the slot does not, in fact, exist?

A non-terrible definition could perhaps be arrived at here, but it's not
clear why one would need this.  We don't have SELECT FROM IF EXISTS, either.

I suggest you create a patch that focuses on the --create part.

You can create a separate follow-on patch for the --start part if you
wish, but I think that that patch would be rejected.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Shaky coding for vacuuming partitioned relations

2017-09-22 Thread Tom Lane

Somebody inserted this into vacuum.c's get_rel_oids():

tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
if (!HeapTupleIsValid(tuple))
elog(ERROR, "cache lookup failed for relation %u", relid);

apparently without having read the very verbose comment two lines above,
which points out that we're not taking any lock on the target relation.
So, if that relation is concurrently being dropped, you're likely to
get "cache lookup failed for relation " rather than anything more
user-friendly.

A minimum-change fix would be to replace the elog() with an ereport
that produces the same "relation does not exist" error you'd have
gotten from RangeVarGetRelid, had the concurrent DROP TABLE committed
a few microseconds earlier.  But that feels like its's band-aiding
around the problem.

What I'm wondering about is changing the RangeVarGetRelid call to take
ShareUpdateExclusiveLock rather than no lock.  That would protect the
syscache lookup, and it would also make the find_all_inheritors call
a lot more meaningful.

If we're doing a VACUUM, the ShareUpdateExclusiveLock would be dropped
as soon as we close the caller's transaction, and then we'd acquire
the same or stronger lock inside vacuum_rel().  So that seems fine.
If we're doing an ANALYZE, then the lock would continue to be held
and analyze_rel would merely be acquiring it an extra time, so we'd
actually be removing a race-condition failure scenario for ANALYZE.
This would mean a few more cycles in lock management, but since this
only applies to a manual VACUUM or ANALYZE that specifies a table
name, I'm not too concerned about that.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-09-22 Thread Peter Eisentraut

On 9/22/17 09:16, Pavel Stehule wrote:
> Example: somebody set SORT_COLUMNS to schema_name value. This is
> nonsense for \l command
> 
> Now, I am thinking so more correct and practical design is based on
> special mode, activated by variable
> 
> PREFER_SIZE_SORT .. (off, asc, desc)
> 
> This has sense for wide group of commands that can show size. And when
> size is not visible, then this option is not active.

Maybe this shouldn't be a variable at all.  It's not like you'll set
this as a global preference.  You probably want it for one command only.
 So a per-command option might make more sense.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-22 Thread Peter Eisentraut

On 9/21/17 04:43, Masahiko Sawada wrote:
>> Once we got this patch, DROP SUBSCRIPTION is not transactional either
>> if dropping a replication slot or if the subscription got disabled in
>> a transaction block. But we disallow to do DROP SUBSCRIPTION in a
>> transaction block only in the former case. In the latter case, we
>> adopted such non-transactional behaviour. Since these behaviours would
>> be complex for users I attached the documentation patch explaining it.
>>
> Hmm, isn't there necessary to care and mention about this kind of
> inconsistent behavior in docs?

I have added documentation that certain forms of CREATE/DROP
SUBSCRIPTION cannot be run inside a transaction block, which we have
done for other such commands.

I don't think we need to go into further detail.  We don't guarantee
continuous connections.  If a worker is stopped and restarted in the
background as an implementation detail, then the user is not impacted.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "inconsistent page found" with checksum and wal_consistency_checking enabled

2017-09-22 Thread Robert Haas

On Wed, Sep 20, 2017 at 12:52 AM, Michael Paquier
 wrote:
> Indeed. This had better be fixed before PG10 is out. I am adding an open item.

This seems a little hyperbolic to me.  Sure, it's a new bug in v10,
and sure, it's always better to fix bugs sooner rather than later, but
there's nothing particularly serious or urgent about this bug as
compared to any other one.

I've committed the proposed patch to master and REL_10_STABLE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Peter Eisentraut

On 9/21/17 16:55, Peter Geoghegan wrote:
>> I strongly prefer if there, as much as possible, is only one format for
>> inputting ICU locales.
> I agree, but the bigger issue is that we're *half way* between
> supporting only one format, and supporting two formats. AFAICT, there
> is no reason that we can't simply support one format on all ICU
> versions, and keep what ends up within pg_collation at initdb time
> essentially the same across ICU versions (except for those that are
> due to cultural/political developments).

After reviewing this thread and testing around a bit, I think the code
is mostly fine as it is, but the documentation is lacking.  Hence,
attached is a patch to expand the documentation quite a bit, especially
to document in more detail what ICU locale strings are accepted.

The documentation has always stated, albeit perhaps in obscure ways,
that we accept for locales whatever ICU accepts.  I don't think we
should restrict or override that in any way.  That would cause existing
documentation and lore on ICU to be invalid for PostgreSQL.

I specifically disagree that we should, as appears to be suggested here,
restrict the input locale strings to BCP 47 and reject or transform the
traditional ICU-specific locale syntax.  Again, that would cause
existing ICU documentation material to become less applicable to
PostgreSQL.  And basically, there is no reason for it, because I am not
aware that ICU plans to get rid of that syntax.  Moreover, we need to
support that syntax for older ICU versions anyway.  A patch has been
posted that, as I understand it, would allow BCP 47 syntax to be used
with older versions as well.  That's a nice idea, but it's a new feature
and not appropriate for PG10 at this time.

(Note that we also don't canonicalize libc locale names.)

The attached documentation patch documents both locale naming forms in
parallel.

The other attached patch contains a test suite that verifies that the
examples in the documentation actually work.  It's not meant for
committing into the mainline, but it was useful for me.

During testing with various versions I have also discovered the
following things:

- The current code appears to be of the opinion that BCP 47 locale names
are accepted as of ICU 54.  That appears to be incorrect; I find that
they work fine in ICU 52, but they don't work in ICU 4.2.  I don't know
where the cutoff is.  That might be worth changing in the code if we can
obtain more information.

- What might have led to the above mistake is the following in the
ucol_open() documentation: q{Starting with ICU 54, collation attributes
can be specified via locale keywords as well, in the old locale
extension syntax ("el@colCaseFirst=upper") or in language tag syntax
("el-u-kf-upper").}  That is correct.  If you use that syntax in earlier
versions, the case-first specification in this example is ignored.  You
need to use ucol_setAttribute() then.  (Note that the phonebook stuff
still works, because that is not a "collation attribute".)

(One of my plans for PG11 had been to allow specifying such collation
attributes via additional CREATE COLLATION clauses and pg_collation
columns, but that might be obsolete now.)

- Moreover, it is not the case that ICU accepts just any sort of
nonsense as a locale name.  For example, "el@colCaseFirst=whatever" is
rejected with U_ILLEGAL_ARGUMENT_ERROR.  Now, it might in other cases be
more liberal than we might be hoping for.  But I think they have reasons
for what they do.

One conclusion here is that there are multiple dimensions to what sort
of thing is accepted as a locale in different versions of ICU and what
the canonical format is.  If we insist that everything has to be written
in the form that is preferred today, then we'll have awkward problems if
a future version of ICU establishes even more variants that will then be
preferred.

I think there is room for incremental refinement here.  Features like
optionally checking or printing the canonical or preferred locale format
or making the locale description available via a function or system view
might all be valuable additions to a future PostgreSQL release.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 5eac8a0df7163f8374382d37b32b9c2d3580238d Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 22 Sep 2017 13:51:01 -0400
Subject: [PATCH 1/2] Expand collation documentation

Document better how to create custom collations and what locale strings
ICU accepts.  Explain the ICU examples in more detail.  Also update the
text on the CREATE COLLATION reference page a bit to take ICU more into
account.
---
 doc/src/sgml/charset.sgml  | 135 ++---
 doc/src/sgml/ref/create_collation.sgml |  28 ---
 2 files changed, 124 insertions(+), 39 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index

Re: [HACKERS] Page Scan Mode in Hash Index

2017-09-22 Thread Robert Haas

On Thu, Sep 21, 2017 at 3:08 AM, Ashutosh Sharma  wrote:
> I have added a note for handling of logged and unlogged tables in
> README file and also corrected the header comment for
> hashbucketcleanup(). Please find the attached 0003*.patch having these
> changes. Thanks.

I committed 0001 and 0002 with some additional edits as
7c75ef571579a3ad7a1d3ee909f11dba5e0b9440.  I also rebased 0003 and
edited it a bit; see attached hash-cleanup-changes.patch.

I'm not entirely sold on 0003.  An alternative would be to rip the lsn
stuff back out of HashScanPosData, and I think we ought to consider
that.  Basically, 0003 is betting that getting rid of the
lock-chaining in hash index vacuum is more valuable than being able to
kill dead items more aggressively.  I bet that's a bad bet.

In the case of btree indexes, since
2ed5b87f96d473962ec5230fd820abfeaccb2069, page-at-a-time scanning
allows most btree index scans to avoid holding buffer pins when the
scan is suspended, but we gain no such advantage here.  We always have
to hold a pin on the primary bucket page anyway, so even with this
patch cleanup is going to block when it hits a bucket containing a
suspended scan.  0003 helps if (a) the relation is permanent, (b) the
bucket has overflow pages, and (c) the scan is moving faster than
vacuum and can overtake it instead of waiting.  But that doesn't seem
like it will happen very often at all, whereas the LSN check will
probably fail frequently and cause us to skip cleanup that we could
usefully have done.  So I propose the attached hashscan-no-lsn.patch
as an alternative.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

hash-cleanup-changes.patch
Description: Binary data

hashscan-no-lsn.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-09-22 Thread Alexander Korotkov

On Fri, Sep 22, 2017 at 7:16 PM, chenhj  wrote:

> This is the new pacth with TAP test and use Macro XLOGDIR.
>

Good.  I took a quick look over the patch.
Why do you need master_query(), standby_query() and run_query() in
RewindTest.pm?
You can do just $node_master->safe_psql() and $node_slave->safe_psql()
instead.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] pgbench regression test failure

2017-09-22 Thread Fabien COELHO



[...] After another week of buildfarm runs, we have a few more cases of 
3 rows of output, and none of more than 3 or less than 1.  So I went 
ahead and pushed your patch.  I'm still suspicious of these results, but 
we might as well try to make the buildfarm green pending investigation 
of how this is happening.


Yep. I keep the issue of pgbench tap test determinism in my todo list, 
among other things.


I think that it should be at least clearer under which condition (load ? 
luck ? bug ?) the result may be 1 or 3 when 2 are expected, which needs 
some thinking.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional contrib test suites

2017-09-22 Thread Andres Freund

On 2017-09-18 09:54:52 -0400, Peter Eisentraut wrote:
> On 9/16/17 08:10, David Steele wrote:
> >>> (5) drop contrib/chkpass altogether, on the grounds that it's too badly
> >>> designed, and too obsolete crypto-wise, to be useful or supportable.
> >> crypt() uses the 7 lowest characters, which makes for 7.2e16 values,
> >> so I would be fine with (5), then (4) as the test suite is not
> >> portable.
> > I'd prefer 5, but can go with 4.
> > 
> > I get that users need to store their own passwords, but we have support
> > for SHA1 via the crypto module which seems by far the better choice.
> 
> I'm also tempted to just remove it.  It uses bad/outdated security
> practices and it's also not ideal as an example module.  Any objections?

Uhm. I'm not objecting, but I doubt people really noticed your question
in a thread about additional contrib test suites.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench regression test failure

2017-09-22 Thread Tom Lane

Fabien COELHO  writes:
>> It could be as simple as putting the check-for-done at the bottom of the
>> loop not the top, perhaps.

> I agree that it is best if tests should work in all reasonable conditions, 
> including a somehow overloaded host...

> I'm going to think about it, but I'm not sure of the best approach. In the 
> mean time, ISTM that the issue has not been encountered (yet), so this is 
> not a pressing issue. Maybe under -T > --aggregate-interval pgbench could 
> go on over the limit if the log file has not been written at all, but that 
> would be some kind of kludge for this specific test...

After another week of buildfarm runs, we have a few more cases of 3 rows
of output, and none of more than 3 or less than 1.  So I went ahead and
pushed your patch.  I'm still suspicious of these results, but we might
as well try to make the buildfarm green pending investigation of how
this is happening.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH v1] Add and report the new "in_hot_standby" GUC pseudo-variable.

2017-09-22 Thread Melanie Plageman

On Tue, Sep 19, 2017 at 4:15 PM, Melanie Plageman  wrote:
>
> The latest patch applies cleanly and builds (I am also seeing the failing
> TAP tests), however, I have a concern. With a single server set up, when I
> attempt to make a connection with target_session_attrs=read-write, I get
> the message
> psql: could not make a suitable connection to server "localhost:5432"
> Whereas, when I attempt to make a connection with
> target_session_attrs=read-only, it is successful.
>
> I might be missing something, but this seems somewhat counter-intuitive. I
> would expect to specify read-write as target_session_attrs and successfully
> connect to a server on which read and write operations are permitted. I see
> this behavior implemented in src/interfaces/libpq/fe-connect.c
> Is there a reason to reject a connection to a primary server when I
> specify 'read-write'? Is this intentional?
>
> Hi Elvis,

Making an assumption about the intended functionality mentioned above, I
swapped the 'not' to the following lines of
src/interfaces/libpq/fe-connect.c ~ line 3005

if (conn->target_session_attrs != NULL &&
((strcmp(conn->target_session_attrs, "read-write") == 0 &&
conn->session_read_only) ||
 (strcmp(conn->target_session_attrs, "read-only") == 0 && *!*
conn->session_read_only)))

I rebased and built with this change locally.
The review below is based on the patch with that change.

Also, the following comment has what looks like a copy-paste error and the
first line should be deleted
in src/backend/utils/misc/guc.c ~ line 10078
assign_default_transaction_read_only


+assign_default_transaction_read_only(bool newval, void *extra)
...
+ /*
-  * We clamp manually-set values to at least 1MB.  Since
+  * Also set the session read-only parameter.  We only need
+  * to set the correct value in processes that have database
+  * sessions, but there's no mechanism to know that there's

patch applies cleanly: yes
installcheck: passed
installcheck-world: passed
feature works as expected: yes (details follow)

With two servers, one configured as the primary and one configured to run
in Hot Standby mode, I was able to observe that the value of
session_read_only changed after triggering failover once the standby server
exited recovery

When attempting to connect to a primary server with
target_session_attrs=read-write, I was successful and when attempting to
connect with target_session_attrs=read-only, the connection was closed and
the expected message was produced

Thanks!

Re: [HACKERS] close_ps, NULLs, and DirectFunctionCall

2017-09-22 Thread Robert Haas

On Wed, Sep 20, 2017 at 4:25 PM, Andrew Gierth
 wrote:
> postgres=# select point(1,2) ## '((0,0),(NaN,NaN))'::box;
> ERROR:  function 0x9c5de0 returned NULL
>
> postgres=# select point(1,2) <-> '((0,0),(NaN,NaN))'::box;
> ERROR:  function 0x9c5de0 returned NULL
>
> This seems ... unhelpful (though it's at least better than crashing) and
> inconsistent.

I'd call that a bug.

> The most consistent fix would seem
> to be to make all the affected functions return NULL,

Based on your description of the problem I would tend to agree,
although I'm not really familiar with this area.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-22 Thread Peter Geoghegan

On Fri, Sep 22, 2017 at 7:25 AM, Peter Eisentraut
 wrote:
> I agree.  The attached patch should do it.

I see one small issue here: You'll now need to set ssup->comparator
back to NULL before you return early in the Windows' libc case. That
way, a shim comparator (that goes through bttextcmp(), in the case of
text) will be installed within FinishSortSupportFunction(). Other than
that, looks good to me.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10RC1 crash testing MultiXact oddity

2017-09-22 Thread Robert Haas

On Fri, Sep 22, 2017 at 11:37 AM, Jeff Janes  wrote:
> Is the presence of this log message something that needs to be investigated
> further?

I'd say yes.  It sounds like we have a race condition someplace that
previous fixes in this area failed to adequately understand.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-09-22 Thread chenhj

Hi


This is the new pacth with TAP test and use Macro XLOGDIR.
And i had add this patch to the commitfest, 
https://commitfest.postgresql.org/15/1302/


--
Best Regards,
Chen Huajun

pg_rewind_wal_copy_reduce_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] additional contrib test suites

2017-09-22 Thread Peter Eisentraut

On 9/18/17 09:54, Peter Eisentraut wrote:
> On 9/16/17 08:10, David Steele wrote:
 (5) drop contrib/chkpass altogether, on the grounds that it's too badly
 designed, and too obsolete crypto-wise, to be useful or supportable.
>>> crypt() uses the 7 lowest characters, which makes for 7.2e16 values,
>>> so I would be fine with (5), then (4) as the test suite is not
>>> portable.
>> I'd prefer 5, but can go with 4.
>>
>> I get that users need to store their own passwords, but we have support
>> for SHA1 via the crypto module which seems by far the better choice.
> 
> I'm also tempted to just remove it.  It uses bad/outdated security
> practices and it's also not ideal as an example module.  Any objections?

Hearing none, thus removed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 10RC1 crash testing MultiXact oddity

2017-09-22 Thread Alvaro Herrera

Jeff Janes wrote:
> I am running some crash recovery testing against 10rc1 by injecting torn
> page writes, using a test case which generates a lot of multixact, some
> naturally by doing a lot fk updates, but most artificially by calling
> the pg_burn_multixact function from one of the attached patches.

Is this new in pg10, or do you also see it in 9.6?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Peter Geoghegan

On Fri, Sep 22, 2017 at 1:50 AM, Andreas Karlsson  wrote:
> On 09/21/2017 10:55 PM, Peter Geoghegan wrote:
>>
>> I agree, but the bigger issue is that we're *half way* between
>> supporting only one format, and supporting two formats. AFAICT, there
>> is no reason that we can't simply support one format on all ICU
>> versions, and keep what ends up within pg_collation at initdb time
>> essentially the same across ICU versions (except for those that are
>> due to cultural/political developments).
>
>
> I think we are in agreement then, but I do not have the time to get this
> done before the release of 10 so would be happy if you implemented it. Peter
> E, what do you say in this?

I can write a patch for this, but will not without some kind of buy-in
from Peter E.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] 10RC1 crash testing MultiXact oddity

2017-09-22 Thread Jeff Janes

I am running some crash recovery testing against 10rc1 by injecting torn
page writes, using a test case which generates a lot of multixact, some
naturally by doing a lot fk updates, but most artificially by calling
the pg_burn_multixact function from one of the attached patches.

In 22 hours of running I got 12 instances were messages like this appear:

MultiXact member wraparound protections are disabled because oldest
checkpointed MultiXact 681012168 does not exist on disk

This is not a fatal error, and no inconsistent data is found at the end of
the run.  But the code comments suggests that this should only happen on a
server that has been upgraded from 9.3 or 9.4, which this server has not
been.

Is the presence of this log message something that needs to be investigated
further?

Thanks,

Jeff


0002-pg_burn_multixact-utility_v10.patch
Description: Binary data


count.pl
Description: Binary data


crash_REL11.patch
Description: Binary data


do.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+

2017-09-22 Thread Tom Lane

Peter Eisentraut  writes:
> On 9/21/17 17:38, Tom Lane wrote:
>> Meanwhile, I see that Peter has posted a fix for the immediate problem.
>> I propose that Peter should apply his fix in HEAD and v10, and then

> done

>> I'll rip out the !USE_WIDE_UPPER_LOWER code paths in HEAD only.

And done.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Darafei Praliaskouski

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

We're using Postgres with this patch for some time.

In our use case we've got a quickly growing large table with events from our 
users. 
Table has a structure of (user_id, ts, ). Events are append only, 
each user generates events in small predictable time frame, mostly each second.
From time to time we need to read this table in fashion of WHERE ts BETWEEN a 
AND b AND user_id=c.
Such query leads to enormous amount of seeks, as records of each user are 
scattered across relation and there are no pages that contain two events from 
same user.

To fight it, we created a btree index on (user_id, ts, ). Plan 
switched to index only scans, but heap fetches and execution times were still 
the same. 
Manual 
We noticed that autovacuum skips scanning the relation and freezing the 
Visibility Map. 

We started frequently performing VACUUM manually on the relation. This helped 
with freezing the Visibility Map.
However, we found out that VACUUM makes a full scan over the index.
As index does not fit into memory, this means that each run flushes all the 
disk caches and eats up Amazon IOPS credits. 

With this patch behavior is much better for us - VACUUM finishes real quick.

As a future improvement, a similar improvement for other index types will be 
useful.
After it happens, I'm looking forward to autovacuum kicking in on append-only 
tables, to freeze the Visibility Map.

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: "inconsistent page found" with checksum and wal_consistency_checking enabled

2017-09-22 Thread Noah Misch

On Wed, Sep 20, 2017 at 01:52:15PM +0900, Michael Paquier wrote:
> On Wed, Sep 20, 2017 at 5:23 AM, Ashwin Agrawal  wrote:
> > Currently, page checksum is not masked by Page masking routines used by
> > wal_consistency_checking facility. So, when running `make installcheck` with
> > data checksum enabled and wal_consistency_checking='all', it easily and
> > consistently FATALs with "inconsistent page found".
> 
> Indeed. This had better be fixed before PG10 is out. I am adding an open item.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] coverage analysis improvements

2017-09-22 Thread Peter Eisentraut

On 9/21/17 03:42, Michael Paquier wrote:
> -SPI.c: SPI.xs plperl_helpers.h
> +%.c: %.xs
> @if [ x"$(perl_privlibexp)" = x"" ]; then echo "configure switch
> --with-perl was not specified."; exit 1; fi
> -   $(PERL) $(XSUBPPDIR)/ExtUtils/xsubpp -typemap
> $(perl_privlibexp)/ExtUtils/typemap $< >$@
> Doing coverage on plperl with this patch applied, those do not seem
> necessary. But I don't know enough this code to give a clear opinion.

That patch is necessary for doing make coverage in vpath builds.
Otherwise it makes no difference.

> Running coverage-html with all the patches, I am seeing the following
> warnings with a fresh build on my macos laptop 10.11:
> geninfo: WARNING: gcov did not create any files for
> /Users/mpaquier/git/postgres/src/backend/access/transam/rmgr.gcda!
> I don't think that this is normal.

Apparently, rmgr.c doesn't contain any instrumentable code.  I don't see
this warning, but it might depend on tool versions and compiler options.

Note that rmgr.c doesn't show up here either:
https://coverage.postgresql.org/src/backend/access/transam/index.html

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU locales and text/char(n) SortSupport on Windows

2017-09-22 Thread Peter Eisentraut

On 9/21/17 15:21, Peter Geoghegan wrote:
> On Thu, Sep 21, 2017 at 12:12 PM, Peter Eisentraut
>  wrote:
>>> Attached patch shows what I'm getting at. This is untested, since I
>>> don't use Windows. Proceed with caution.
>>
>> Your analysis makes sense, but the patch doesn't work, because "locale"
>> is never set before reading it.
> 
> It was just for illustrative purposes. Seems fine to actually move the
> WIN32 block down to just before the existing TRUST_STRXFRM test, since
> the locale is going to be cached and then immediately reused where we
> return immediately (Windows libc) anyway. This would also be a win for
> clarity, IMV.

I agree.  The attached patch should do it.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From c2c875fe447eaf65f2ba5b28e77dcc2e42016455 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 22 Sep 2017 10:23:35 -0400
Subject: [PATCH v2] Allow ICU to use SortSupport on Windows with UTF-8

There is no reason to ever prevent the use of SortSupport on Windows
when ICU locales are used.  We previously avoided SortSupport on Windows
with UTF-8 server encoding and a non C-locale due to restrictions in
Windows' libc functionality.

This is now considered to be a restriction in one platform's libc
collation provider, and not a more general platform restriction.

Reported-by: Peter Geoghegan 
---
 src/backend/utils/adt/varlena.c | 25 +++--
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index ebfb823fb8..071bc83378 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1823,12 +1823,6 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool 
bpchar)
 * requirements of BpChar callers.  However, if LC_COLLATE = C, we can
 * make things quite a bit faster with varstrfastcmp_c or 
bpcharfastcmp_c,
 * both of which use memcmp() rather than strcoll().
-*
-* There is a further exception on Windows.  When the database encoding 
is
-* UTF-8 and we are not using the C collation, complex hacks are 
required.
-* We don't currently have a comparator that handles that case, so we 
fall
-* back on the slow method of having the sort code invoke bttextcmp() 
(in
-* the case of text) via the fmgr trampoline.
 */
if (lc_collate_is_c(collid))
{
@@ -1839,10 +1833,6 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool 
bpchar)
 
collate_c = true;
}
-#ifdef WIN32
-   else if (GetDatabaseEncoding() == PG_UTF8)
-   return;
-#endif
else
{
ssup->comparator = varstrfastcmp_locale;
@@ -1869,6 +1859,21 @@ varstr_sortsupport(SortSupport ssup, Oid collid, bool 
bpchar)
}
}
 
+   /*
+* There is a further exception on Windows.  When the database encoding 
is
+* UTF-8 and we are not using the C collation, complex hacks are 
required.
+* We don't currently have a comparator that handles that case, so we 
fall
+* back on the slow method of having the sort code invoke bttextcmp() 
(in
+* the case of text) via the fmgr trampoline.  ICU locales work just the
+* same on Windows, however.
+*/
+#ifdef WIN32
+   if (!collate_c &&
+   GetDatabaseEncoding() == PG_UTF8 &&
+   !(locale && locale->provider == COLLPROVIDER_ICU))
+   return;
+#endif
+
/*
 * Unfortunately, it seems that abbreviation for non-C collations is
 * broken on many common platforms; testing of multiple versions of 
glibc
-- 
2.14.1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Sokolov Yura

On 2017-09-22 16:22, Sokolov Yura wrote:

On 2017-09-22 11:21, Masahiko Sawada wrote:

On Fri, Sep 22, 2017 at 4:16 PM, Kyotaro HORIGUCHI
 wrote:
At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada 
 wrote in 

On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
 wrote:
> I was just looking the thread since it is found left alone for a
> long time in the CF app.
>
> At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan  wrote in 

>> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund  wrote:
>> > Hi,
>> >
>> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  
wrote:
>> >> [ lots of valuable discussion ]
>> >
>> > I think this patch clearly still is in the design stage, and has
>> > received plenty feedback this CF.  I'll therefore move this to the next
>> > commitfest.
>>
>> Does anyone have ideas on a way forward here? I don't, but then I
>> haven't thought about it in detail in several months.
>
> Is the additional storage in metapage to store the current status
> of vaccum is still unacceptable even if it can avoid useless
> full-page scan on indexes especially for stable tables?
>
> Or, how about additional 1 bit in pg_stat_*_index to indicate
> that the index *don't* require vacuum cleanup stage. (default
> value causes cleanup)

You meant that "the next cycle" is the lazy_cleanup_index() function
called by lazy_scan_heap()?

Both finally call btvacuumscan under a certain condition, but
what I meant by "the next cycle" is the lazy_cleanup_index call
in the next round of vacuum since abstract layer (index am) isn't
conscious of the detail of btree.

> index_bulk_delete (or ambulkdelete) returns the flag in
> IndexBulkDeleteResult then lazy_scan_heap stores the flag in
> stats and in the next cycle it is looked up to decide the
> necessity of index cleanup.
>

Could you elaborate about this? For example in btree index, the 
index
cleanup skips to scan on the index scan if index_bulk_delete has 
been

called during vacuuming because stats != NULL. So I think we don't
need such a flag.

The flag works so that successive two index full scans don't
happen in a vacuum round. If any rows are fully deleted, just
following btvacuumcleanup does nothing.

I think what you wanted to solve here was the problem that
index_vacuum_cleanup runs a full scan even if it ends with no
actual work, when manual or anti-wraparound vacuums.  (I'm
getting a bit confused on this..) It is caused by using the
pointer "stats" as the flag to instruct to do that. If the
stats-as-a-flag worked as expected, the GUC doesn't seem to be
required.

Hmm, my proposal is like that if a table doesn't changed since the
previous vacuum much we skip the cleaning up index.

If the table has at least one garbage we do the lazy_vacuum_index and
then IndexBulkDeleteResutl is stored, which causes to skip doing the
btvacuumcleanup. On the other hand, if the table doesn't have any
garbage but some new tuples inserted since the previous vacuum, we
don't do the lazy_vacuum_index but do the lazy_cleanup_index. In this
case, we always do the lazy_cleanup_index (i.g, we do the full scan)
even if only one tuple is inserted. That's why I proposed a new GUC
parameter which allows us to skip the lazy_cleanup_index in the case.

Addition to that, as Simon and Peter pointed out
index_bulk_delete can leave not-fully-removed pages (so-called
half-dead pages and pages that are recyclable but not registered
in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
interlock. In this case, just inhibiting cleanup scan by a
threshold lets such dangling pages persist in the index. (I
conldn't make such a many dangling pages, though..)

The first patch in the mail (*1) does that. It seems having some
bugs, though..

Since the dangling pages persist until autovacuum decided to scan
the belonging table again, we should run a vacuum round (or
index_vacuum_cleanup itself) even having no dead rows if we want
to clean up such pages within a certain period. The second patch
doesn that.

IIUC half-dead pages are not relevant to this proposal. The proposal
has two problems;
* By skipping index cleanup we could leave recyclable pages that are
not marked as a recyclable.
* we stash an XID when a btree page is deleted, which is used to
determine when it's finally safe to recycle the page

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Here is a small patch that skips scanning btree index if no pending
deleted pages exists.

It detects this situation by comparing pages_deleted with pages_free.
If they are equal, then there is no half-deleted pages, and it is
safe to skip next lazy scan.

Flag

Re: [HACKERS] [COMMITTERS] pgsql: Fix bool/int type confusion

2017-09-22 Thread Tom Lane

Peter Eisentraut  writes:
> On 9/21/17 14:58, Tom Lane wrote:
>> I've just been going through their git commit log to see what else has
>> changed since tzcode2017b, and I note that there are half a dozen other
>> portability-ish fixes.  I think that some of them affect only code we
>> don't use, but I'm not sure that that's the case for all.  So I'm a bit
>> inclined to go with plan B, that is sync with their current HEAD now.

> I suppose it might be good to do this after 10.0 final is wrapped?

I went and did it already.  I'm not particularly worried about the
impending 10.0 wrap --- the changes are minor as such things go,
and we've generally not worried about giving previous tzcode syncs
more than a few days' buildfarm testing before shipping them.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] !USE_WIDE_UPPER_LOWER compile errors in v10+

2017-09-22 Thread Peter Eisentraut

On 9/21/17 17:38, Tom Lane wrote:
> Meanwhile, I see that Peter has posted a fix for the immediate problem.
> I propose that Peter should apply his fix in HEAD and v10, and then

done

> I'll rip out the !USE_WIDE_UPPER_LOWER code paths in HEAD only.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Sokolov Yura

On 2017-09-22 11:21, Masahiko Sawada wrote:

On Fri, Sep 22, 2017 at 4:16 PM, Kyotaro HORIGUCHI
 wrote:
At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada 
 wrote in 

On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
 wrote:
> I was just looking the thread since it is found left alone for a
> long time in the CF app.
>
> At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan  wrote in 

>> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund  wrote:
>> > Hi,
>> >
>> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  
wrote:
>> >> [ lots of valuable discussion ]
>> >
>> > I think this patch clearly still is in the design stage, and has
>> > received plenty feedback this CF.  I'll therefore move this to the next
>> > commitfest.
>>
>> Does anyone have ideas on a way forward here? I don't, but then I
>> haven't thought about it in detail in several months.
>
> Is the additional storage in metapage to store the current status
> of vaccum is still unacceptable even if it can avoid useless
> full-page scan on indexes especially for stable tables?
>
> Or, how about additional 1 bit in pg_stat_*_index to indicate
> that the index *don't* require vacuum cleanup stage. (default
> value causes cleanup)

You meant that "the next cycle" is the lazy_cleanup_index() function
called by lazy_scan_heap()?

Both finally call btvacuumscan under a certain condition, but
what I meant by "the next cycle" is the lazy_cleanup_index call
in the next round of vacuum since abstract layer (index am) isn't
conscious of the detail of btree.

> index_bulk_delete (or ambulkdelete) returns the flag in
> IndexBulkDeleteResult then lazy_scan_heap stores the flag in
> stats and in the next cycle it is looked up to decide the
> necessity of index cleanup.
>

Could you elaborate about this? For example in btree index, the index
cleanup skips to scan on the index scan if index_bulk_delete has been
called during vacuuming because stats != NULL. So I think we don't
need such a flag.

The flag works so that successive two index full scans don't
happen in a vacuum round. If any rows are fully deleted, just
following btvacuumcleanup does nothing.

I think what you wanted to solve here was the problem that
index_vacuum_cleanup runs a full scan even if it ends with no
actual work, when manual or anti-wraparound vacuums.  (I'm
getting a bit confused on this..) It is caused by using the
pointer "stats" as the flag to instruct to do that. If the
stats-as-a-flag worked as expected, the GUC doesn't seem to be
required.

Hmm, my proposal is like that if a table doesn't changed since the
previous vacuum much we skip the cleaning up index.

If the table has at least one garbage we do the lazy_vacuum_index and
then IndexBulkDeleteResutl is stored, which causes to skip doing the
btvacuumcleanup. On the other hand, if the table doesn't have any
garbage but some new tuples inserted since the previous vacuum, we
don't do the lazy_vacuum_index but do the lazy_cleanup_index. In this
case, we always do the lazy_cleanup_index (i.g, we do the full scan)
even if only one tuple is inserted. That's why I proposed a new GUC
parameter which allows us to skip the lazy_cleanup_index in the case.

Addition to that, as Simon and Peter pointed out
index_bulk_delete can leave not-fully-removed pages (so-called
half-dead pages and pages that are recyclable but not registered
in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
interlock. In this case, just inhibiting cleanup scan by a
threshold lets such dangling pages persist in the index. (I
conldn't make such a many dangling pages, though..)

The first patch in the mail (*1) does that. It seems having some
bugs, though..

Since the dangling pages persist until autovacuum decided to scan
the belonging table again, we should run a vacuum round (or
index_vacuum_cleanup itself) even having no dead rows if we want
to clean up such pages within a certain period. The second patch
doesn that.

IIUC half-dead pages are not relevant to this proposal. The proposal
has two problems;
* By skipping index cleanup we could leave recyclable pages that are
not marked as a recyclable.
* we stash an XID when a btree page is deleted, which is used to
determine when it's finally safe to recycle the page

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Here is a small patch that skips scanning btree index if no pending
deleted pages exists.

It detects this situation by comparing pages_deleted with pages_free.
If they are equal, then there is no half-deleted pages, and it is
safe to skip next lazy scan.

Flag stored in a btpo_flags. It is unset using

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-09-22 Thread Pavel Stehule

2017-09-21 20:30 GMT+02:00 Pavel Stehule :

>
>
> 2017-09-21 20:20 GMT+02:00 Peter Eisentraut  com>:
>
>> On 9/21/17 13:54, Pavel Stehule wrote:
>> > I see where you are coming from, but there is no association in the
>> > existing UI that equates "+" to the word "verbose".  I think just
>> > removing the verbose prefix and applying the sorting behavior in all
>> > cases should be easier to explain and implement.
>> >
>> > I though about it - but I am not sure if one kind of these variables is
>> > practical.
>> >
>> > if I don't need a size, then sort by schema, name is ok (I didn't need
>> > any else ever). With only one kind of these variables, this setting is
>> > common - what is not practical.
>>
>> But you are proposing also to add a variable configuring the sort
>> direction.  It would be weird that \dX+ observed the sort direction but
>> \dX did not.
>>
>
> yes and no.
>
> schema_name, name_schema or SORT_DIRECTION has sense for both type of
> commands.
>
> size sort has sense only for \dX+ command.
>
> I am thinking about solution and the most clean I see two distinct
> variables:
>
> SORT_COLUMNS and VERBOSE_SORT_COLUMNS
>
> when VERBOSE_SORT_COLUMNS will be undefined, then SORT_COLUMNS is used for
> \dX+ command too.
>
> Is it acceptable?
>

I though more about it, and I am thinking so this direction is not good.

Example: somebody set SORT_COLUMNS to schema_name value. This is nonsense
for \l command

Now, I am thinking so more correct and practical design is based on special
mode, activated by variable

PREFER_SIZE_SORT .. (off, asc, desc)

This has sense for wide group of commands that can show size. And when size
is not visible, then this option is not active.

What do you think about this proposal?

Regards

Pavel



>
>
>
>> --
>> Peter Eisentraut  http://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>
>

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-22 Thread Ashutosh Bapat

On Fri, Sep 15, 2017 at 5:29 PM, Ashutosh Bapat
 wrote:
>
>>
>> Apart from these there is a regression case on a custom table, on head
>> query completes in 20s and with this patch it takes 27s. Please find
>> the attached .out and .sql file for the output and schema for the test
>> case respectively. I have reported this case before (sometime around
>> March this year) as well, but I am not sure if it was overlooked or is
>> an unimportant and expected behaviour for some reason.
>>
>
> Are you talking about [1]? I have explained about the regression in
> [2] and [3]. This looks like an issue with the existing costing model.
>

I debugged this case further. There are two partitioned tables being
joined prt (with partitions prt_p1, prt_p2 and so on) and prt2 (with
partitions prt2_p1, prt2_p2, and so on). When join is executed without
partition-wise join, prt2 is used to build hash table and prt is used
to probe that hash table. prt2 has lesser number of rows than prt. But
when partition-wise join is used, individual partitions are joined in
reverse join order i.e. partitions of prt are used to build the hash
table and partitions of prt2 are used to probe. This happens because
the path for the other join order (partition of prt2 used to build the
hash table and partition of prt used to probe) has huge cost compared
to the first one (74459 and 313109) and a portion worth 259094 comes
from lines 3226/7 of final_cost_hashjoin()
3215 /*
3216  * The number of tuple comparisons needed is the number of outer
3217  * tuples times the typical number of tuples in a hash
bucket, which
3218  * is the inner relation size times its bucketsize
fraction.  At each
3219  * one, we need to evaluate the hashjoin quals.  But actually,
3220  * charging the full qual eval cost at each tuple is pessimistic,
3221  * since we don't evaluate the quals unless the hash values match
3222  * exactly.  For lack of a better idea, halve the cost estimate to
3223  * allow for that.
3224  */
3225 startup_cost += hash_qual_cost.startup;
3226 run_cost += hash_qual_cost.per_tuple * outer_path_rows *
3227 clamp_row_est(inner_path_rows * innerbucketsize) * 0.5;

That's because for some reason innerbucketsize for partition of prt is
22 times more than that for partition of prt2. Looks like we have some
estimation error in estimating bucket sizes.

If I force partitions to be joined with the same order as partitioned
tables (without partition-wise join), child-joins execute faster and
in turn partition-wise join performs better than the
non-partition-wise join. So, this is clearly some estimation and
costing problem with regular joins.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Fix bool/int type confusion

2017-09-22 Thread Peter Eisentraut

On 9/21/17 14:58, Tom Lane wrote:
> I've just been going through their git commit log to see what else has
> changed since tzcode2017b, and I note that there are half a dozen other
> portability-ish fixes.  I think that some of them affect only code we
> don't use, but I'm not sure that that's the case for all.  So I'm a bit
> inclined to go with plan B, that is sync with their current HEAD now.

I suppose it might be good to do this after 10.0 final is wrapped?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SCRAM in the PG 10 release notes

2017-09-22 Thread Peter Eisentraut

On 9/21/17 14:15, Jeff Janes wrote:
> This looks good to me.  Might suggest adding verifying the clients as a
> specific step:  
> 
> "To upgrade an existing installation from md5 to scram-sha-256, verify
> that all client software supports it, set password_encryption =
> 'scram-sha-256' in postgresql.conf..."

I don't think there is a well-defined way of verifying whether all
client software supports it other than making the switch described and
then checking what breaks.  So it's a bit of a circular process.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Generic type subscripting

2017-09-22 Thread Peter Eisentraut

On 9/21/17 11:24, Dmitry Dolgov wrote:
> One last thing that I need to clarify. Initially there was an idea to
> minimize changes in `pg_type`

I see, but there is no value in that if it makes everything else more
complicated.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Claudio Freire

On Fri, Sep 22, 2017 at 4:46 AM, Kyotaro HORIGUCHI
 wrote:
> I apologize in advance of possible silliness.
>
> At Thu, 21 Sep 2017 13:54:01 -0300, Claudio Freire  
> wrote in

Re: [HACKERS] visual studio 2017 build support

2017-09-22 Thread Andrew Dunstan



On 09/21/2017 08:16 PM, Haribabu Kommi wrote:
>
>
>
>
> I was about to commit this after a good bit of testing when I
> noticed this:
>
>     +   Building with Visual Studio 2017 is
>     supported
>     +   down to Windows 7 SP1 and Windows
>     Server 2012 R2.
>
> I was able to build on Windows Server 2008 without a problem, so I'm
> curious why we are saying it's not supported.
>
>
> Thanks for the review.
>
> From the visual studio system requirements [1], in the section of
> supported
> operating systems, it is mentioned as windows 7 SP1 and windows server
> 2012 R2 and didn't mentioned anything about 2008, because of this reason,
> I mentioned as that it supported till the above operating systems. As
> I don't
> have windows server 2008 system availability, so I didn't verify the same.
>
> The visual studio 2017 product itself is not mentioned as that it supports
> windows server 2008, can we go ahead and mention it in our documentation?
>
> [1] -
> https://www.visualstudio.com/en-us/productinfo/vs2017-system-requirements-vs
>
>


That page also says:


Microsoft Visual Studio Build Tools 2017

Also installs on Windows Server 2008 R2 SP1


So I'm inclined to adjust the documentation accordingly.

cheers

andrew


Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-22 Thread Arseny Sher

Masahiko Sawada  writes:

> On Sun, Sep 17, 2017 at 2:01 AM, Masahiko Sawada  
> wrote:

>> +  
>> +   Since dropping a replication slot is not transactional, we cannot run
>> +   DROP SUBSCRIPTION inside a transaction block if 
>> dropping
>> +   the replication slot. Also, DROP SUBSCRIPTOIN stops 
>> the
>> +   workers if the subscription got disabled in a transaction block even if
>> +   the transaction rolls back.
>> +  
>
> Hmm, isn't there necessary to care and mention about this kind of
> inconsistent behavior in docs?

Well, docs already say that dropping sub with replication slot is
non-transactional, see 'Description' section of DROP SUBSCRIPTION. As
for the second sentence, normally launcher will restart the worker
immediately after ABORT: old worker wakes up the launcher on exit, the
launcher waits on pg_subscription lock and restarts the worker instantly
after the rollback. If we tell users that the workers are stopped after
DROP SUBSCRIPTION rollback, we should also say that they will be
restarted soon.

--
Arseny Sher

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve catcache/syscache performance.

2017-09-22 Thread Robert Haas

On Fri, Sep 22, 2017 at 2:15 AM, Andres Freund  wrote:
>   This results in making cache lookups themselves roughly three times as
>   fast - full-system benchmarks obviously improve less than that.

Wow.  That's really good.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Generate wait event list and docs from text file

2017-09-22 Thread Alvaro Herrera

Michael Paquier wrote:
> On Fri, Sep 22, 2017 at 1:52 PM, Schneider  wrote:
> > Seems an apt occasion to point out that 10rc1 is missing documentation
> > for a couple events.

> Indeed, I can see as well that those are the three new ones between
> 9.6 and 10. Attached is a patch to correct the documentation, I am
> adding an open item so as this does not fall into oblivion.

Pushed, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Windows warnings from VS 2017

2017-09-22 Thread Andrew Dunstan



On 09/21/2017 09:41 PM, Andres Freund wrote:
> On 2017-09-21 09:30:31 -0400, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> The speed of memset is hardly going to be the dominating factor in a
>>> 'CREATE DATABASE' command, so we could certainly afford to change to
>>> plain memset calls here.
>> Another thought is that it may be time for our decennial debate about
>> whether MemSet is worth the electrons it's printed on.  I continue to
>> think that any modern compiler+libc ought to do an equivalent or better
>> optimization given just a plain memset().  If that seems to be true
>> for recent MSVC, we could consider putting an #if into c.h to define
>> MemSet as just memset for MSVC.  Maybe later that could be extended
>> to other compilers.
> +many. glibc's memset is nearly an order of magnitude faster than MemSet
> on my machine for medium+ length copies, and still 3-4 four times ~100
> bytes. And both gcc and clang optimize way the memcpy entirely when the
> length is a fixed short length.


Perhaps we need some sort of small benchmark program that we can run on
a representative set of platforms? I presume if you can make that
assertion you already have something along those lines?

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we cacheline align PGXACT?

2017-09-22 Thread Alexander Korotkov

On Thu, Sep 21, 2017 at 12:08 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Mon, Sep 18, 2017 at 12:41 PM, Daniel Gustafsson 
> wrote:
>
>> > On 16 Sep 2017, at 01:51, Alexander Korotkov 
>> wrote:
>> >
>> > On Tue, Sep 5, 2017 at 2:47 PM, Daniel Gustafsson > > wrote:
>> > > On 04 Apr 2017, at 14:58, David Steele  da...@pgmasters.net>> wrote:
>> > >
>> > > On 4/4/17 8:55 AM, Alexander Korotkov wrote:
>> > >> On Mon, Apr 3, 2017 at 9:58 PM, Andres Freund > 
>> > >>
>> > >>I'm inclined to push this to the next CF, it seems we need a lot
>> more
>> > >>benchmarking here.
>> > >>
>> > >> No objections.
>> > >
>> > > This submission has been moved to CF 2017-07.
>> >
>> > This CF has now started (well, 201709 but that’s what was meant in
>> above), can
>> > we reach closure on this patch in this CF?
>> >
>> > During previous commitfest I come to doubts if this patch is really
>> needed when same effect could be achieved by another (perhaps random)
>> change of alignment.  The thing I can do now is to retry my benchmark on
>> current master and check what effect this patch has now.
>>
>> Considering this I’ll mark this as Waiting on Author, in case you come to
>> conclusion that another patch is required then we’ll bump to a return
>> status.
>>
>
> I've made some read-only benchmarking.  There is clear win in this case.
> The only point where median of master is higher than median of patched
> version is 40 clients.
>
> In this point observations are following:
> master:   982432 939483 932075
> pgxact-align: 913151 921300 938132
> So, groups of observations form the overlapping ranges, and this anomaly
> can be explained by statistical error.
>
> I'm going to make some experiments with read-write and mixed workloads too.
>

I've made benchmarks with two more workloads.
scalability-rw.png – read-write tcpb-like workload (pgbench default)
scalability-rrw.png – 90% read-only transactions 10% read-write
transactions (-btpcb-like@1 -bselect-only@9)
It became clear that this patch causes regression.  On pure read-write
workload, it's not so evident due to high variability of observations.
However, on mixed workload it's very clear regression.
I would be ridiculous to consider patch which improves read-only
performance but degrades read-write performance in nearly same degree.
Thus, this topic needs more investigation if it's possible to at least get
the same benefit on read-only workload without penalty on read-write
workload (ideally read-write workload should benefit too).  I'm going to
mark this patch as "returned with feedback".

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UPDATE of partition key

2017-09-22 Thread Amit Khandekar

On 21 September 2017 at 19:52, amul sul  wrote:
> On Wed, Sep 20, 2017 at 9:27 PM, Amit Khandekar 
> wrote:
>>
>> On 20 September 2017 at 00:06, Robert Haas  wrote:
>> > On Fri, Sep 15, 2017 at 7:25 AM, Amit Khandekar 
>> > wrote:
>> >> [ new patch ]
>
>
>   86 -   (event == TRIGGER_EVENT_UPDATE &&
> !trigdesc->trig_update_after_row))
>   87 +   (event == TRIGGER_EVENT_UPDATE &&
> !trigdesc->trig_update_after_row) ||
>   88 +   (event == TRIGGER_EVENT_UPDATE && (oldtup == NULL || newtup
> == NULL)))
>   89 return;
>   90 }
>
>
> Either of oldtup or newtup will be valid at a time & vice versa.  Can we
> improve
> this check accordingly?
>
> For e.g.:
> (event == TRIGGER_EVENT_UPDATE && )(HeapTupleIsValid(oldtup) ^
> ItemPointerIsValid(newtup)

Ok, I will be doing this as below :
-  (event == TRIGGER_EVENT_UPDATE && (oldtup == NULL || newtup == NULL)))
+ (event == TRIGGER_EVENT_UPDATE && ((oldtup == NULL) ^ (newtup == NULL

At other places in the function, oldtup and newtup are checked for
NULL, so to be consistent, I haven't used HeapTupleIsValid.

Actually, it won't happen that both oldtup and newtup are NULL ... in
either of delete, insert, or update, but I haven't added an Assert for
this, because that has been true even on HEAD.

Will include the above minor change in the next patch when more changes come in.

>
>
>  247
>  248 +   /*
>  249 +* EDB: In case this is part of update tuple routing, put this row
> into the
>  250 +* transition NEW TABLE if we are capturing transition tables. We
> need to
>  251 +* do this separately for DELETE and INSERT because they happen on
>  252 +* different tables.
>  253 +*/
>  254 +   if (mtstate->operation == CMD_UPDATE &&
> mtstate->mt_transition_capture)
>  255 +   ExecARUpdateTriggers(estate, resultRelInfo, NULL,
>  256 +NULL,
>  257 +tuple,
>  258 +NULL,
>  259 +mtstate->mt_transition_capture);
>  260 +
>  261 list_free(recheckIndexes);
>
>  267
>  268 +   /*
>  269 +* EDB: In case this is part of update tuple routing, put this row
> into the
>  270 +* transition OLD TABLE if we are capturing transition tables. We
> need to
>  271 +* do this separately for DELETE and INSERT because they happen on
>  272 +* different tables.
>  273 +*/
>  274 +   if (mtstate->operation == CMD_UPDATE &&
> mtstate->mt_transition_capture)
>  275 +   ExecARUpdateTriggers(estate, resultRelInfo, tupleid,
>  276 +oldtuple,
>  277 +NULL,
>  278 +NULL,
>  279 +mtstate->mt_transition_capture);
>  280 +
>
> Initially, I wondered that why can't we have above code right after
> ExecInsert() & ExecIDelete() in ExecUpdate respectively?
>
> We can do that for ExecIDelete() but not easily in the ExecInsert() case,
> because ExecInsert() internally searches the correct partition's
> resultRelInfo
> for an insert and before returning to ExecUpdate resultRelInfo is restored
> to the old one.  That's why current logic seems to be reasonable for now.
> Is there anything that we can do?

Yes, resultRelInfo is different when we return from ExecInsert().
Also, I think the trigger and transition capture be done immediately
after the rows are inserted. This is true for existing code also.
Furthermore, there is a dependency of ExecARUpdateTriggers() on
ExecARInsertTriggers(). transition_capture is passed NULL if we
already captured the tuple in ExecARUpdateTriggers(). It looks simpler
to do all this at a single place.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-09-22 Thread Peter Moser

2017-09-22 10:21 GMT+02:00 Pavel Stehule :
> Currently Postgres has zero support for SQL:2011 temporal tables. Isn't
> better start with already standard features than appends some without
> standard? The standard has some concept and if we start out of this concept,
> then the result will be far to standard probably.

We will focus for now on the Range Merge Join algorithm by Jeff Davis,
which implements a temporal join with overlap predicates.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CREATE COLLATION does not sanitize ICU's BCP 47 language tags. Should it?

2017-09-22 Thread Andreas Karlsson


On 09/21/2017 10:55 PM, Peter Geoghegan wrote:

I agree, but the bigger issue is that we're *half way* between
supporting only one format, and supporting two formats. AFAICT, there
is no reason that we can't simply support one format on all ICU
versions, and keep what ends up within pg_collation at initdb time
essentially the same across ICU versions (except for those that are
due to cultural/political developments).


I think we are in agreement then, but I do not have the time to get this 
done before the release of 10 so would be happy if you implemented it. 
Peter E, what do you say in this?


Andreas


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-22 Thread Ashutosh Bapat

On Fri, Sep 22, 2017 at 10:45 AM, Rafia Sabih
 wrote:
>>
>> On completing the benchmark for all queries for the above mentioned
>> setup, following performance improvement can be seen,
>> Query | Patch | Head
>> 3  | 1455  |  1631
>> 4  |  499  |  4344
>> 5  |  1464  |  1606
>> 10  |  1475  |  1599
>> 12  |  1465  |  1790
>>
>> Note that all values of execution time are in seconds.
>
> I compared this experiment with non-partitioned database and following
> is the result,
> Query | Non-partitioned head
> 3  |  1752
> 4  |  315
> 5  |  2319
> 10 | 1535
> 12 | 1739
>
> In summary, the query that appears slowest in partitioned database is
> not so otherwise. It is good to see that in Q4 partition-wise join
> helps in achieving performance closer to it's non-partitioned case,
> otherwise partitioning alone causes it to suffer greatly. Apart from
> Q4 it does not looks like partitioning hurts anywhere else, though the
> maximum improvement is ~35% for Q5.
> Another point to note here is that the performance on partitioned and
> unpartitioned heads are quite close (except Q4) which is something
> atleast I wasn't expecting. It looks like we need not to partition the
> tables anyway, or atleast this set of queries doesn't benefit from
> partitioning. Please let me know if somebody has better ideas on how
> partitioning schemes should be applied to make it more beneficial for
> these queries.

Just partitioning is not expected to improve query performance (but we
still see some performance improvement). Partitioning + partition-wise
operations, pruning is expected to show performance gains. IIUC the
results you reported, Q3 takes 1752 seconds with non-partitioned head,
with partitioning it completes in 1631 seconds and with partition-wise
join it completes in 1455, so net improvement because of partitioning
is 300 seconds is almost 16% improvement, which is a lot for very
large data. So, except Q4, every query improves when the tables are
partitioned. Am I interpreting the results correctly?

There may be some other way of partitioning, which may give better
results, but I think what we have now shows the importance of
partitioning in case of very large data e.g. scale 300 TPCH.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Kyotaro HORIGUCHI

At Fri, 22 Sep 2017 17:21:04 +0900, Masahiko Sawada  
wrote in 
> On Fri, Sep 22, 2017 at 4:16 PM, Kyotaro HORIGUCHI
>  wrote:
> > At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada  
> > wrote in 
> > 
> >> On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
> >>  wrote:
> >> Could you elaborate about this? For example in btree index, the index
> >> cleanup skips to scan on the index scan if index_bulk_delete has been
> >> called during vacuuming because stats != NULL. So I think we don't
> >> need such a flag.
> >
> > The flag works so that successive two index full scans don't
> > happen in a vacuum round. If any rows are fully deleted, just
> > following btvacuumcleanup does nothing.
> >
> > I think what you wanted to solve here was the problem that
> > index_vacuum_cleanup runs a full scan even if it ends with no
> > actual work, when manual or anti-wraparound vacuums.  (I'm
> > getting a bit confused on this..) It is caused by using the
> > pointer "stats" as the flag to instruct to do that. If the
> > stats-as-a-flag worked as expected, the GUC doesn't seem to be
> > required.
> 
> Hmm, my proposal is like that if a table doesn't changed since the
> previous vacuum much we skip the cleaning up index.
> 
> If the table has at least one garbage we do the lazy_vacuum_index and
> then IndexBulkDeleteResutl is stored, which causes to skip doing the
> btvacuumcleanup. On the other hand, if the table doesn't have any
> garbage but some new tuples inserted since the previous vacuum, we
> don't do the lazy_vacuum_index but do the lazy_cleanup_index. In this
> case, we always do the lazy_cleanup_index (i.g, we do the full scan)
> even if only one tuple is inserted. That's why I proposed a new GUC
> parameter which allows us to skip the lazy_cleanup_index in the case.

I think the problem raised in this thread is that the last index
scan may leave dangling pages.

> > Addition to that, as Simon and Peter pointed out
> > index_bulk_delete can leave not-fully-removed pages (so-called
> > half-dead pages and pages that are recyclable but not registered
> > in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
> > interlock. In this case, just inhibiting cleanup scan by a
> > threshold lets such dangling pages persist in the index. (I
> > conldn't make such a many dangling pages, though..)
> >
> > The first patch in the mail (*1) does that. It seems having some
> > bugs, though..
> >
> >
> > Since the dangling pages persist until autovacuum decided to scan
> > the belonging table again, we should run a vacuum round (or
> > index_vacuum_cleanup itself) even having no dead rows if we want
> > to clean up such pages within a certain period. The second patch
> > doesn that.
> >
> 
> IIUC half-dead pages are not relevant to this proposal. The proposal
> has two problems;
> 
> * By skipping index cleanup we could leave recyclable pages that are
> not marked as a recyclable.

Yes.

> * we stash an XID when a btree page is deleted, which is used to
> determine when it's finally safe to recycle the page

Is it a "problem" of this proposal?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Masahiko Sawada

On Fri, Sep 22, 2017 at 4:16 PM, Kyotaro HORIGUCHI
 wrote:
> At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada  
> wrote in 
>> On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
>>  wrote:
>> > I was just looking the thread since it is found left alone for a
>> > long time in the CF app.
>> >
>> > At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan  wrote 
>> > in 
>> >> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund  wrote:
>> >> > Hi,
>> >> >
>> >> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>> >> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  
>> >> >> wrote:
>> >> >> [ lots of valuable discussion ]
>> >> >
>> >> > I think this patch clearly still is in the design stage, and has
>> >> > received plenty feedback this CF.  I'll therefore move this to the next
>> >> > commitfest.
>> >>
>> >> Does anyone have ideas on a way forward here? I don't, but then I
>> >> haven't thought about it in detail in several months.
>> >
>> > Is the additional storage in metapage to store the current status
>> > of vaccum is still unacceptable even if it can avoid useless
>> > full-page scan on indexes especially for stable tables?
>> >
>> > Or, how about additional 1 bit in pg_stat_*_index to indicate
>> > that the index *don't* require vacuum cleanup stage. (default
>> > value causes cleanup)
>>
>> You meant that "the next cycle" is the lazy_cleanup_index() function
>> called by lazy_scan_heap()?
>
> Both finally call btvacuumscan under a certain condition, but
> what I meant by "the next cycle" is the lazy_cleanup_index call
> in the next round of vacuum since abstract layer (index am) isn't
> conscious of the detail of btree.
>
>> > index_bulk_delete (or ambulkdelete) returns the flag in
>> > IndexBulkDeleteResult then lazy_scan_heap stores the flag in
>> > stats and in the next cycle it is looked up to decide the
>> > necessity of index cleanup.
>> >
>>
>> Could you elaborate about this? For example in btree index, the index
>> cleanup skips to scan on the index scan if index_bulk_delete has been
>> called during vacuuming because stats != NULL. So I think we don't
>> need such a flag.
>
> The flag works so that successive two index full scans don't
> happen in a vacuum round. If any rows are fully deleted, just
> following btvacuumcleanup does nothing.
>
> I think what you wanted to solve here was the problem that
> index_vacuum_cleanup runs a full scan even if it ends with no
> actual work, when manual or anti-wraparound vacuums.  (I'm
> getting a bit confused on this..) It is caused by using the
> pointer "stats" as the flag to instruct to do that. If the
> stats-as-a-flag worked as expected, the GUC doesn't seem to be
> required.

Hmm, my proposal is like that if a table doesn't changed since the
previous vacuum much we skip the cleaning up index.

If the table has at least one garbage we do the lazy_vacuum_index and
then IndexBulkDeleteResutl is stored, which causes to skip doing the
btvacuumcleanup. On the other hand, if the table doesn't have any
garbage but some new tuples inserted since the previous vacuum, we
don't do the lazy_vacuum_index but do the lazy_cleanup_index. In this
case, we always do the lazy_cleanup_index (i.g, we do the full scan)
even if only one tuple is inserted. That's why I proposed a new GUC
parameter which allows us to skip the lazy_cleanup_index in the case.

>
> Addition to that, as Simon and Peter pointed out
> index_bulk_delete can leave not-fully-removed pages (so-called
> half-dead pages and pages that are recyclable but not registered
> in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
> interlock. In this case, just inhibiting cleanup scan by a
> threshold lets such dangling pages persist in the index. (I
> conldn't make such a many dangling pages, though..)
>
> The first patch in the mail (*1) does that. It seems having some
> bugs, though..
>
>
> Since the dangling pages persist until autovacuum decided to scan
> the belonging table again, we should run a vacuum round (or
> index_vacuum_cleanup itself) even having no dead rows if we want
> to clean up such pages within a certain period. The second patch
> doesn that.
>

IIUC half-dead pages are not relevant to this proposal. The proposal
has two problems;
* By skipping index cleanup we could leave recyclable pages that are
not marked as a recyclable.
* we stash an XID when a btree page is deleted, which is used to
determine when it's finally safe to recycle the page

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-09-22 Thread Pavel Stehule

2017-09-22 10:15 GMT+02:00 Peter Moser :

> 2017-09-22 10:06 GMT+02:00 Pavel Stehule :
> > ANSI SQL 2011 has temporal data support
> >
> > https://www.slideshare.net/CraigBaumunk/temporal-
> extensions-tosql20112012010438
>
> As operations it only supports temporal inner joins using the overlap
> predicate.
> Temporal aggregation, temporal outer joins, temporal duplicate
> elimination, and temporal set operations are not supported in
> SQL:2011.
> Please see [1] Section 2.5 Future directions.
>
> Best regards,
> Anton, Johann, Michael, Peter
>
>
> [1] https://cs.ulb.ac.be/public/_media/teaching/infoh415/
> tempfeaturessql2011.pdf


Thank you for info.

Currently Postgres has zero support for SQL:2011 temporal tables. Isn't
better start with already standard features than appends some without
standard? The standard has some concept and if we start out of this
concept, then the result will be far to standard probably.

Regards

Pavel

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-09-22 Thread Peter Moser

2017-09-22 10:06 GMT+02:00 Pavel Stehule :
> ANSI SQL 2011 has temporal data support
>
> https://www.slideshare.net/CraigBaumunk/temporal-extensions-tosql20112012010438

As operations it only supports temporal inner joins using the overlap predicate.
Temporal aggregation, temporal outer joins, temporal duplicate
elimination, and temporal set operations are not supported in
SQL:2011.
Please see [1] Section 2.5 Future directions.

Best regards,
Anton, Johann, Michael, Peter


[1] https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-09-22 Thread Pavel Stehule

2017-09-22 9:59 GMT+02:00 Peter Moser :

> 2017-09-12 16:33 GMT+02:00 Simon Riggs :
> > PostgreSQL tries really very hard to implement the SQL Standard and
> > just the standard. ISTM that the feedback you should have been given
> > is that this is very interesting but will not be committed in its
> > current form; I am surprised to see nobody has said that, though you
> > can see the truth of that since nobody is actively looking to review
> > or commit this. Obviously if the standard were changed to support
> > these things we'd suddenly be interested...
>
> Ok, we understand that PostgreSQL wants to strictly follow the SQL
> standard, which is not yet defined for temporal databases. In this
> context we understand your comment and agree on your position.
>

ANSI SQL 2011 has temporal data support

https://www.slideshare.net/CraigBaumunk/temporal-extensions-tosql20112012010438

Regards

Pavel Stehule

Re: [HACKERS] [PROPOSAL] Temporal query processing with range types

2017-09-22 Thread Peter Moser

2017-09-12 16:33 GMT+02:00 Simon Riggs :
> PostgreSQL tries really very hard to implement the SQL Standard and
> just the standard. ISTM that the feedback you should have been given
> is that this is very interesting but will not be committed in its
> current form; I am surprised to see nobody has said that, though you
> can see the truth of that since nobody is actively looking to review
> or commit this. Obviously if the standard were changed to support
> these things we'd suddenly be interested...

Ok, we understand that PostgreSQL wants to strictly follow the SQL
standard, which is not yet defined for temporal databases. In this
context we understand your comment and agree on your position.

Our approach with the two temporal primitives is more far-reaching and
comprehensive: it supports all operators of a temporal relational
algebra by systematically transforming the temporal operators to the
nontemporal counterparts, thereby taking advantage of all features of
the underlying DBMS. This requires of course also new syntax.

> What I think I'm lacking is a clear statement of why we need to have
> new syntax to solve the problem ...

The newly introduced syntax of the two primitives comes from our
decision to divide a bigger patch into two parts: an primitives-only
patch, and a temporal query rewrite patch. We thought of discussing
and providing a second patch in the future, which would then
automatically rewrite temporal queries into their non-temporal
counterparts using these two primitives.

> ... and why the problem itself is
> important.

The main idea about these temporal primitives is to have only two new
operators to provide the whole range of temporal queries, that is,
temporal joins, temporal set operations, temporal duplicate
elimination, and temporal aggregation. The benefit of the primitives
is that it is minimal invasive to the postgres kernel due to the reuse of all
standard operators after the temporal split (or normalization). It can
therefore also use existing optimizations already implemented.

An alternative approach would be to implement for each operator a
separate algorithm.  For instance, Jeff Davis is implementing a temporal
join into the existing Merge Join Executor (see [1]). Note that a
temporal join is the only operator that can be implemented without
introducing new syntax due to the overlap predicate.  For all other
temporal operators a discussion about new syntax is necessary anyway,
independent of the implementation approach.

> PostgreSQL supports the ability to produce Set Returning Functions and
> various other extensions. Would it be possible to change this so that
> we don't add new syntax to the parser but rather we do this as a set
> of functions?

Set Returning Functions would indeed be a possibility to implement
temporal query processing without new syntax, though it has some serious
drawbacks: the user has to specify the schema of the query results; the
performance might be a problem, since functions are treated as
black-boxes for the optimizer, loosing selection-pushdown and similar
optimizations.

> An alternative might be for us to implement a pluggable parser, so
> that you can have an "alternate syntax" plugin. If we did that, you
> can then maintain the plugin outside of core until the time when SQL
> Standard is updated and we can implement directly. We already support
> the ability to invent new plan nodes, so this could be considered as
> part of the plugin.

This is an interesting idea to look into when it is ready some day.



With all these clarifications in mind, we thought to focus meanwhile
on improving the performance of temporal query processing in special
cases (eg., joins), similar to the pending Range Merge Join patch by
Jeff Davis in [1]. Hereby, we like to contribute to it as reviewers
and hopefully add some improvements or valuable ideas from our
research area.


Best regards,
Anton, Johann, Michael, Peter


[1] 
https://www.postgresql.org/message-id/flat/camp0ubfwaffw3o_ngkqprpmm56m4wetexjprb2gp_nrdaec...@mail.gmail.com#camp0ubfwaffw3o_ngkqprpmm56m4wetexjprb2gp_nrdaec...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Kyotaro HORIGUCHI

I apologize in advance of possible silliness.

At Thu, 21 Sep 2017 13:54:01 -0300, Claudio Freire  
wrote in

Re: [HACKERS] pg_stat_wal_write statistics view

2017-09-22 Thread Julien Rouhaud

Hello,

On Wed, Sep 13, 2017 at 3:01 AM, Haribabu Kommi
 wrote:
> I ran the latest performance tests with and without IO times, there is an
> overhead involved with IO time calculation and didn't observe any
> performance
> overhead with normal stats. May be we can enable the IO stats only in the
> development environment to find out the IO stats?
>

-1 for me to have these columns depending on a GUC *and* wether it's a
debug or assert build (unless this behaviour already exists for other
functions, but AFAIK that's not the case).

> I added the other background stats to find out how much WAL write is
> carried out by the other background processes. Now I am able to collect
> the stats for the other background processes also after the pgbench test.
> So I feel now the separate background stats may be useful.
>
> Attached latest patch, performance test results and stats details with
> separate background stats and also combine them with backend including
> the IO stats also.
>

The results seem to vary too much to have a strong opinion, but it
looks like the timing instrumentation can be too expensive, so I'd be
-1 on adding this overhead to track_io_timing.

I have some minor comments on the last patch:

+
+  backend_writes
+  bigint
+  Number of WAL writes that are carried out by the backend
process

it should be backend processes

+#define NUM_PG_STAT_WALWRITE_ATTS 16
+
+Datum
+pg_stat_get_walwrites(PG_FUNCTION_ARGS)
+{
+   TupleDesc   tupdesc;
+   Datum   values[NUM_PG_STAT_WALWRITE_ATTS];
+   boolnulls[NUM_PG_STAT_WALWRITE_ATTS];

For consistency, the #define should be just before the tupdesc
declaration, and be named PG_STAT_GET_WALWRITE_COLS


+   PgStat_Counter backend_writes;  /* No of writes by backend */

+   PgStat_Counter backend_dirty_writes;/* No of dirty writes by
+* backend when WAL buffers
+* full */

+   PgStat_Counter backend_write_blocks;/* Total no of pages
written by backend */

+   PgStat_Counter backend_total_write_time;/* Total write time in
+* milliseconds by backend */

+   PgStat_Counter backend_total_sync_time; /* Total write time in
+* milliseconds by backend */

these comments should all be say "backends" for consistency

+-- There will surely and maximum one record
+select count(*) > 0 as ok from pg_stat_walwrites;

The test should be count(*) = 1


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-22 Thread Masahiko Sawada

On Fri, Sep 22, 2017 at 3:46 PM, Michael Paquier
 wrote:
> On Fri, Sep 22, 2017 at 3:24 PM, Masahiko Sawada  
> wrote:
>> I have a question. Since WALInsertLockRelease seems not to call
>> LWLockWaitForVar I thought you wanted to mean LWLockReleaseClearVar
>> instead, is that right?
>
> This looks like a copy-pasto from my side. Thanks for spotting it.

Okay, attached the latest version patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_do_pg_abort_backup_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Kyotaro HORIGUCHI

At Fri, 22 Sep 2017 15:00:20 +0900, Masahiko Sawada  
wrote in 
> On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
>  wrote:
> > I was just looking the thread since it is found left alone for a
> > long time in the CF app.
> >
> > At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan  wrote in 
> > 
> >> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund  wrote:
> >> > Hi,
> >> >
> >> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
> >> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  
> >> >> wrote:
> >> >> [ lots of valuable discussion ]
> >> >
> >> > I think this patch clearly still is in the design stage, and has
> >> > received plenty feedback this CF.  I'll therefore move this to the next
> >> > commitfest.
> >>
> >> Does anyone have ideas on a way forward here? I don't, but then I
> >> haven't thought about it in detail in several months.
> >
> > Is the additional storage in metapage to store the current status
> > of vaccum is still unacceptable even if it can avoid useless
> > full-page scan on indexes especially for stable tables?
> >
> > Or, how about additional 1 bit in pg_stat_*_index to indicate
> > that the index *don't* require vacuum cleanup stage. (default
> > value causes cleanup)
> 
> You meant that "the next cycle" is the lazy_cleanup_index() function
> called by lazy_scan_heap()?

Both finally call btvacuumscan under a certain condition, but
what I meant by "the next cycle" is the lazy_cleanup_index call
in the next round of vacuum since abstract layer (index am) isn't
conscious of the detail of btree.

> > index_bulk_delete (or ambulkdelete) returns the flag in
> > IndexBulkDeleteResult then lazy_scan_heap stores the flag in
> > stats and in the next cycle it is looked up to decide the
> > necessity of index cleanup.
> >
> 
> Could you elaborate about this? For example in btree index, the index
> cleanup skips to scan on the index scan if index_bulk_delete has been
> called during vacuuming because stats != NULL. So I think we don't
> need such a flag.

The flag works so that successive two index full scans don't
happen in a vacuum round. If any rows are fully deleted, just
following btvacuumcleanup does nothing.

I think what you wanted to solve here was the problem that
index_vacuum_cleanup runs a full scan even if it ends with no
actual work, when manual or anti-wraparound vacuums.  (I'm
getting a bit confused on this..) It is caused by using the
pointer "stats" as the flag to instruct to do that. If the
stats-as-a-flag worked as expected, the GUC doesn't seem to be
required.

Addition to that, as Simon and Peter pointed out
index_bulk_delete can leave not-fully-removed pages (so-called
half-dead pages and pages that are recyclable but not registered
in FSM, AFAICS) in some cases mainly by RecentGlobalXmin
interlock. In this case, just inhibiting cleanup scan by a
threshold lets such dangling pages persist in the index. (I
conldn't make such a many dangling pages, though..)

The first patch in the mail (*1) does that. It seems having some
bugs, though..

Since the dangling pages persist until autovacuum decided to scan
the belonging table again, we should run a vacuum round (or
index_vacuum_cleanup itself) even having no dead rows if we want
to clean up such pages within a certain period. The second patch
doesn that.

[*1] 
https://www.postgresql.org/message-id/20170921.174957.236914340.horiguchi.kyot...@lab.ntt.co.jp

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-22 Thread Michael Paquier

On Fri, Sep 22, 2017 at 3:24 PM, Masahiko Sawada  wrote:
> I have a question. Since WALInsertLockRelease seems not to call
> LWLockWaitForVar I thought you wanted to mean LWLockReleaseClearVar
> instead, is that right?

This looks like a copy-pasto from my side. Thanks for spotting it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Generate wait event list and docs from text file

2017-09-22 Thread Michael Paquier

On Fri, Sep 22, 2017 at 1:52 PM, Schneider  wrote:
> Seems an apt occasion to point out that 10rc1 is missing documentation
> for a couple events.

Meh. Thanks for pointing this out. One more.

> From src/backend/storage/lmgr/lwlocknames.txt: CLogTruncationLock was
> added to the
> docs but BackendRandomLock and LogicalRepWorkerLock are missing. (Maybe 
> there's
> a reason for this... I just checked the diffs from 9.6 to 10 then
> checked the docs for
> completeness.)

Indeed, I can see as well that those are the three new ones between
9.6 and 10. Attached is a patch to correct the documentation, I am
adding an open item so as this does not fall into oblivion.

>> Using this set of meta-data, it is possible to generate the SGMO
>> tables and the set of include tables.
>
> The lock-related events might present a challenge here since they come
> from a different
> place.  In particular, there is no single location to find the
> descriptions of tranche locks - you
> have to search through the source code and find the
> LWLockRegisterTranche() call to see
> what text it used for the lock name!  (Consolidating that seems like a
> great candidate for a
> patch...)

Perhaps... I did not dig much into the details of this area. I don't
recall that it was particularly challenging either to get a base
generation in place.
-- 
Michael
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 5575c2c837..141a08a08c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -845,7 +845,7 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
 
   

-LWLock
+LWLock
 ShmemIndexLock
 Waiting to find or allocate space in shared memory.

@@ -1030,6 +1030,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  OldSnapshotTimeMapLock
  Waiting to read or update old snapshot control information.
 
+
+ BackendRandomLock
+ Waiting to generate a random number.
+
+
+ LogicalRepWorkerLock
+ Waiting for action on logical replication worker to finish.
+
 
  CLogTruncationLock
  Waiting to truncate the write-ahead log or waiting for write-ahead log truncation to finish.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Partition-wise aggregation/grouping

2017-09-22 Thread Jeevan Chalke

On Thu, Sep 21, 2017 at 5:00 PM, Rajkumar Raghuwanshi <
rajkumar.raghuwan...@enterprisedb.com> wrote:

> Hi Jeevan,
>
> while testing latest v3 patches, I am  getting a server crash if I reset
> partition_wise_agg_cost_factor, please take a look.
>
> CREATE TABLE lp (a TEXT, b FLOAT, c INT) PARTITION BY LIST(c);
> CREATE TABLE lp1 PARTITION OF lp FOR VALUES IN (10,20);
> CREATE TABLE lp2 PARTITION OF lp FOR VALUES IN (30,40);
>
> INSERT INTO lp VALUES ('a1',800, 20);
> INSERT INTO lp VALUES ('a2',1250,30);
> INSERT INTO lp VALUES ('a3',2975,20);
> INSERT INTO lp VALUES ('a3',2850,30);
>
> postgres=# SET enable_partition_wise_agg TO true;
> SET
> postgres=# SET partition_wise_agg_cost_factor TO 0.5;
> SET
> postgres=#
> postgres=# SELECT MAX(b), AVG(b) FROM lp GROUP BY a HAVING a = 'a3' ORDER
> BY 1,2;
>  max  |  avg
> --+
>  2975 | 2912.5
> (1 row)
>
> postgres=# RESET partition_wise_agg_cost_factor;
> RESET
> postgres=#
> postgres=# SELECT MAX(b), AVG(b) FROM lp GROUP BY a HAVING a = 'a3' ORDER
> BY 1,2;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> !>
>
>
Oops. Will fix this.

I have added these tests in testcase, but testcase is working as expected.
However running those steps on psql reproduces the crash (not consistent
though).

Looking into it. Thanks for reporting.


> Thanks & Regards,
> Rajkumar Raghuwanshi
> QMG, EnterpriseDB Corporation
>


-- 
Jeevan Chalke
Principal Software Engineer, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-22 Thread Masahiko Sawada

On Fri, Sep 22, 2017 at 3:00 PM, Michael Paquier
 wrote:
> On Fri, Sep 22, 2017 at 11:34 AM, Masahiko Sawada  
> wrote:
>> You're right. I updated the patch so that it exits do_pg_abort_backup
>> if the state is NONE and setting the state to NONE in
>> do_pg_stop_backup before releasing the WAL insert lock.
>
> -/* Clean up session-level lock */
> +/*
> + * Clean up session-level lock. To avoid interrupting before changing
> + * the backup state by LWLockWaitForVar we change it while holding the
> + * WAL insert lock.
> + */
>  sessionBackupState = SESSION_BACKUP_NONE;
> You could just mention directly CHECK_FOR_INTERRUPTS here.

Thank you for the reviewing.
I have a question. Since WALInsertLockRelease seems not to call
LWLockWaitForVar I thought you wanted to mean LWLockReleaseClearVar
instead, is that right?

> +/* Quick exit if we have done the backup */
> +if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE)
> +return;
>
> The patch contents look fine to me. I have also tested things in
> depths but could not find any problems. I also looked again at the
> backup start code, trying to see any flows between the moment the
> session backup lock is changed and the moment the callback to do
> backup cleanup is registered but I have not found any issues. I am
> marking this as ready for committer.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve catcache/syscache performance.

2017-09-22 Thread Andres Freund

Hi,

On 2017-09-20 18:26:50 +0530, amul sul wrote:
> Patch 0007:

Thanks for looking!

> 1:
> 400 +   /*
> 401 +* XXX: might be worthwhile to only handle oid sysattr, to
> reduce
> 402 +* overhead - it's the most common key.
> 403 +*/
> 
> IMHO, let fix that as well. I tested this by fixing (see the attach patch)
> but does
> not found much gain on my local centos vm (of course, the pgbench load
> wasn't big enough).

I ended up with a bigger patch, that removes all extractions from
tuples, by storing the extracted column in an array.


> 2:  How about have wrapping following condition in SearchCatCacheMiss() by
> unlikely():
> 
> if (IsBootstrapProcessingMode())
> return NULL;

Given this is the cache miss case, I can't get excited about it -
there's several 100ks of cycles to access the heap via an indexscan...


> 3: Can we have following assert in SearchCatCacheN() instead
> SearchSysCacheN(), so that we'll assert direct SearchCatCacheN() call
> as well?
> 
> Assert(SysCache[cacheId]->cc_nkeys == );

Done, although I kept the others too.


> Other than these concern, patch looks pretty reasonable to me.

I'd appreciate if you could have a look at the new version as well.


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve catcache/syscache performance.

2017-09-22 Thread Andres Freund

Hi,

On 2017-09-13 23:12:07 -0700, Andres Freund wrote:
> Attached is a patch that tries to improve sys/catcache performance,
> going further than the patch referenced earlier.

Here's a variant that cleans up the previous changes a bit, and adds
some further improvements:

Here's the main commit message:

  Improve sys/catcache performance.

  The following are the individual improvements:
  1) Avoidance of FunctionCallInfo based function calls, replaced by
 more efficient functions with a native C argument interface.
  2) Don't extract columns from a cache entry's tuple whenever matching
 entries - instead store them as a Datum array. This also allows to
 get rid of having to build dummy tuples for negative & list
 entries, and of a hack for dealing with cstring vs. text weirdness.
  3) Reorder members of catcache.h struct, so imortant entries are more
 likely to be on one cacheline.
  4) Allowing the compiler to specialize critical SearchCatCache for a
 specific number of attributes allows to unroll loops and avoid
 other nkeys dependant initialization.
  5) Only initializing the ScanKey when necessary, i.e. catcache misses,
 greatly reduces cache unnecessary cpu cache misses.
  6) Split of the cache-miss case from the hash lookup, reducing stack
 allocations etc in the common case.
  7) CatCTup and their corresponding heaptuple are allocated in one
 piece.

  This results in making cache lookups themselves roughly three times as
  fast - full-system benchmarks obviously improve less than that.

  I've also evaluated further techniques:
  - replace open coded hash with simplehash - the list walk right now
shows up in profiles. Unfortunately it's not easy to do so safely as
an entry's memory location can change at various times, which
doesn't work well with the refcounting and cache invalidation.
  - Cacheline-aligning CatCTup entries - helps some with performance,
but the win isn't big and the code for it is ugly, because the
tuples have to be freed as well.
  - add more proper functions, rather than macros for
SearchSysCacheCopyN etc., but right now they don't show up in
profiles.

  The reason the macro wrapper for syscache.c/h have to be changed,
  rather than just catcache, is that doing otherwise would require
  exposing the SysCache array to the outside.  That might be a good idea
  anyway, but it's for another day.

With the attached benchmark for wide tuples and simple queries I get:

pgbench -M prepared -f ~/tmp/pgbench-many-cols.sql

master:
tps = 16112.117859 (excluding connections establishing)
tps = 16192.186504 (excluding connections establishing)
tps = 16091.257399 (excluding connections establishing)

patch:
tps = 18616.116993 (excluding connections establishing)
tps = 18584.036276 (excluding connections establishing)
tps = 18843.246281 (excluding connections establishing)

~17% gain

pgbench -M prepared -f ~/tmp/pgbench-many-cols.sql -c -j 16:
master:
tps = 73277.282455 (excluding connections establishing)
tps = 73078.408303 (excluding connections establishing)
tps = 73432.476550 (excluding connections establishing)

patch:
tps = 89424.043728 (excluding connections establishing)
tps = 89223.731307 (excluding connections establishing)
tps = 87830.665009 (excluding connections establishing)

~21% gain

standard pgbench readonly:
1 client:
master:
tps = 41662.984894 (excluding connections establishing)
tps = 40965.435121 (excluding connections establishing)
tps = 41438.197117 (excluding connections establishing)

patch:
tps = 42657.455818 (excluding connections establishing)
tps = 42834.812173 (excluding connections establishing)
tps = 42784.306987 (excluding connections establishing)

So roughly ~2.3%, much smaller, as expected, because the syscache is
much less of a bottleneck here.

-cj 16:
master:
tps = 204642.558752 (excluding connections establishing)
tps = 205834.493312 (excluding connections establishing)
tps = 207781.943687 (excluding connections establishing)

dev:
tps = 211459.087649 (excluding connections establishing)
tps = 214890.093976 (excluding connections establishing)
tps = 214526.773530 (excluding connections establishing)

So ~3.3%.

I personally find these numbers quite convincing for a fairly localized
microoptimization.

For the attached benchmark, here's the difference in profiles:
before:
single function overhead:
+8.10%  postgres  postgres[.] SearchCatCache
-7.26%  postgres  libc-2.24.so[.] __memmove_avx_unaligned_erms
   - __memmove_avx_unaligned_erms
  + 59.29% SearchCatCache
  + 23.51% appendBinaryStringInfo
  + 5.56% pgstat_report_activity
  + 4.05% socket_putmessage
  + 2.86% pstrdup
  + 2.65% AllocSetRealloc
  + 0.73% hash_search_with_hash_value
  + 0.68% btrescan
0.67% 0x55c02baea83f
+4.97%  postgres  postgres[.] appendBinaryStringInfo
+2.92%  postgres  postgres[.]

Re: [HACKERS] GUC for cleanup indexes threshold.

2017-09-22 Thread Masahiko Sawada

On Tue, Sep 19, 2017 at 3:31 PM, Kyotaro HORIGUCHI
 wrote:
> I was just looking the thread since it is found left alone for a
> long time in the CF app.
>
> At Mon, 18 Sep 2017 16:35:58 -0700, Peter Geoghegan  wrote in 
> 
>> On Wed, Apr 5, 2017 at 3:50 PM, Andres Freund  wrote:
>> > Hi,
>> >
>> > On 2017-04-01 03:05:07 +0900, Masahiko Sawada wrote:
>> >> On Fri, Mar 31, 2017 at 11:44 PM, Robert Haas  
>> >> wrote:
>> >> [ lots of valuable discussion ]
>> >
>> > I think this patch clearly still is in the design stage, and has
>> > received plenty feedback this CF.  I'll therefore move this to the next
>> > commitfest.
>>
>> Does anyone have ideas on a way forward here? I don't, but then I
>> haven't thought about it in detail in several months.
>
> Is the additional storage in metapage to store the current status
> of vaccum is still unacceptable even if it can avoid useless
> full-page scan on indexes especially for stable tables?
>
> Or, how about additional 1 bit in pg_stat_*_index to indicate
> that the index *don't* require vacuum cleanup stage. (default
> value causes cleanup)

You meant that "the next cycle" is the lazy_cleanup_index() function
called by lazy_scan_heap()?

>
> index_bulk_delete (or ambulkdelete) returns the flag in
> IndexBulkDeleteResult then lazy_scan_heap stores the flag in
> stats and in the next cycle it is looked up to decide the
> necessity of index cleanup.
>

Could you elaborate about this? For example in btree index, the index
cleanup skips to scan on the index scan if index_bulk_delete has been
called during vacuuming because stats != NULL. So I think we don't
need such a flag.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-09-22 Thread Michael Paquier

On Fri, Sep 22, 2017 at 11:34 AM, Masahiko Sawada  wrote:
> You're right. I updated the patch so that it exits do_pg_abort_backup
> if the state is NONE and setting the state to NONE in
> do_pg_stop_backup before releasing the WAL insert lock.

-/* Clean up session-level lock */
+/*
+ * Clean up session-level lock. To avoid interrupting before changing
+ * the backup state by LWLockWaitForVar we change it while holding the
+ * WAL insert lock.
+ */
 sessionBackupState = SESSION_BACKUP_NONE;
You could just mention directly CHECK_FOR_INTERRUPTS here.

+/* Quick exit if we have done the backup */
+if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE)
+return;

The patch contents look fine to me. I have also tested things in
depths but could not find any problems. I also looked again at the
backup start code, trying to see any flows between the moment the
session backup lock is changed and the moment the callback to do
backup cleanup is registered but I have not found any issues. I am
marking this as ready for committer.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

88 matches

Mail list logo