Re: [HACKERS] assorted code cleanup

2017-09-06 Thread Michael Paquier
On Wed, Sep 6, 2017 at 4:12 AM, Peter Eisentraut
 wrote:
> On 8/21/17 01:11, Michael Paquier wrote:
>>> - Drop excessive dereferencing of function pointers
>>
>> -   (*next_ProcessUtility_hook) (pstmt, queryString,
>> +   next_ProcessUtility_hook(pstmt, queryString,
>>  context, params, queryEnv,
>>  dest, completionTag);
>> But this... Personally I like the current grammar which allows one to
>> make the difference between a function call with something declared
>> locally and something that may be going to a custom code path. So I
>> think that you had better not update the system hooks that external
>> modules can use via shared_preload_libraries.
>
> Do you mean specifically the hook variables, or any function pointers?
> I can see your point in the above case, but for example here
>
> -   if ((*tinfo->f_lt) (o.upper, c.upper, flinfo))
> +   if (tinfo->f_lt(o.upper, c.upper, flinfo))
>
> I think there is no loss of clarity and the extra punctuation makes it
> more complicated to read.

I am referring only to hook variables here. For functions only used
internally by the backend, I agree that using a direct point to those
functions makes things better, because it is more easily possible to
make a difference with the hook paths. Keeping a different grammar for
local code and hook code allows readers to make a clearer difference
that both things have different concepts.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Zipfian distribution in pgbench

2017-09-06 Thread Fabien COELHO


Hello Alik,

Applies, compiles, works for me.

Some minor comments and suggestions.

Two typos:
 - "usinng" -> "using"
 - "a rejection method used" -> "a rejection method is used"

I'm not sure of "least_recently_used_i", this naming style is not used in 
pgbench. "least_recently_used" would be ok.


"..nb_cells.. != ZIPF_CACHE_SIZE", ISTM that "<" is more logical,
even if the result is the same?

I would put the parameter value check in getZipfianRand, so that if 
someone reuse the function elsewhere the check is also performed. That 
would also simplify a bit the already very large expression evaluation 
function.


When/if the pgbench tap test patch get through, coverage tests should
be added.

Maybe the cache overflow could be counted, to allow for a possible warning 
message in the final report?


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] cache lookup errors for missing replication origins

2017-09-06 Thread Michael Paquier
On Tue, Sep 5, 2017 at 12:59 PM, Michael Paquier
 wrote:
> ERROR:  42704: replication slot "%s" does not exist

s/slot/origin/

> As far as I can see, replorigin_by_oid makes no use of its missing_ok
> = false in the backend code, so letting it untouched would have no
> impact. replorigin_by_name with missing_ok = false is only used with
> SQL-callable functions, so it could be changed without any impact
> elsewhere (without considering externally-maintained replication
> modules).

As long as I don't forget, attached is a patch added as well to the
next CF. replorigin_by_oid should have the same switch from elog to
ereport in my opinion. Additional regression tests are included.
-- 
Michael


replorigin-elogs.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-06 Thread Masahiko Sawada
On Mon, Sep 4, 2017 at 11:43 PM, Arseny Sher  wrote:
> Arseny Sher  writes:
>
>> Attached patch fixes this by stopping workers before RO drop, as
>> already done in case when we drop replication slot.
>
> Sorry, here is the patch.
>

I could reproduce this issue, it's a bug. Added this to the open item.
The cause of this is commit 7e174fa7 which make subscription ddls kill
the apply worker only when transaction commit. I didn't look the patch
deeply yet but I'm not sure the approach that kills apply worker
before commit would be good.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Tuple-routing for certain partitioned tables not working as expected

2017-09-06 Thread Noah Misch
On Tue, Sep 05, 2017 at 08:35:13PM +0900, Etsuro Fujita wrote:
> On 2017/08/30 17:20, Etsuro Fujita wrote:
> >On 2017/08/30 9:13, Amit Langote wrote:
> >>On 2017/08/29 20:18, Etsuro Fujita wrote:
> >>>On 2017/08/25 22:26, Robert Haas wrote:
> On Wed, Aug 23, 2017 at 4:55 AM, Etsuro Fujita
>  wrote:
> >Agreed, but I'd vote for fixing this in v10 as proposed; I agree
> >that just
> >ripping the CheckValidResultRel checks out entirely is not a good
> >idea,
> >but
> >that seems OK to me at least as a fix just for v10.
> 
> I'm still not on-board with having this be the one case where we don't
> do CheckValidResultRel.  If we want to still call it but pass down
> some additional information that can selectively skip certain checks,
> I could probably live with that.
> >>>
> >>>Another idea would be to not do CheckValidResultRel for partitions in
> >>>ExecSetupPartitionTupleRouting; instead, do that the first time the
> >>>partition is chosen by ExecFindPartition, and if successfully checked,
> >>>initialize the partition's ResultRelInfo and other stuff.  (We could
> >>>skip
> >>>this after the first time by checking whether we already have a valid
> >>>ResultRelInfo for the chosen partition.)  That could allow us to not
> >>>only
> >>>call CheckValidResultRel the way it is, but avoid initializing useless
> >>>partitions for which tuples aren't routed at all.
> >>
> >>I too have thought about the idea of lazy initialization of the partition
> >>ResultRelInfos.  I think it would be a good idea, but definitely
> >>something
> >>for PG 11.
> >
> >Agreed.  Here is a patch to skip the CheckValidResultRel checks for a
> >target table if it's a foreign partition to perform tuple-routing for, as
> >proposed by Robert.
> 
> I'll add this to the open items list for v10.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Robert,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CLUSTER command progress monitor

2017-09-06 Thread Tatsuro Yamada

Hi Hackers,

I revised the patch like this:

  - Add "command" column in the view
It tells that the running command is CLUSTER or VACUUM FULL.

  - Enable VACUUM FULL progress monitor
Add heap_tuples_vacuumed and heap_tuples_recently_dead as a counter in the 
view.
Sequence of phases are below:
  1. scanning heap
  5. swapping relation files
  6. rebuild index
  7. performing final cleanup

I didn't change the name of view (pg_stat_progress_cluster) because I'm not sure
whether the new name (pg_stat_progress_reorg) is suitable or not.

Any comments or suggestion are welcome.

Thanks,
Tatsuro Yamada


On 2017/09/04 20:17, Tatsuro Yamada wrote:

On 2017/09/04 15:38, Michael Paquier wrote:

On Thu, Aug 31, 2017 at 11:12 AM, Tatsuro Yamada
 wrote:

Then I have questions.

   * Should we have separate views for them?  Or should both be covered by the
 same view with some indication of which command (CLUSTER or VACUUM FULL)
 is actually running?


Using the same view for both, and tell that this is rather VACUUM or
CLUSTER in the view, would be better IMO. Coming up with a name more
generic than pg_stat_progress_cluster may be better though if this
speaks with VACUUM FULL as well, user-facing documentation does not
say that VACUUM FULL is actually CLUSTER.


Thanks for sharing your thoughts.
Agreed.
I'll add new column like a "command" to tell whether running CLUSTER or VACUUM.
And how about this new view name?
  - pg_stat_progress_reorg
Is it more general name than previous name if it covers both commands?



I'll add this patch to CF2017-09.
Any comments or suggestion are welcome.


Nice to see that you are taking the time to implement patches for
upstream, Yamada-san!


Same here. :)
I'd like to contribute creating feature that is for DBA and users.
Progress monitoring feature is important from my DBA experiences.
I'm happy if you lend your hand.

Thanks,
Tatsuro Yamada
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 38bf636..33cedc0 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -332,6 +332,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_progress_clusterpg_stat_progress_cluster
+  One row for each backend running
+   CLUSTER and VACUUM FULL, showing current progress.
+   See .
+  
+ 
+
 

   
@@ -3233,9 +3241,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
   
PostgreSQL has the ability to report the progress of
-   certain commands during command execution.  Currently, the only command
-   which supports progress reporting is VACUUM.  This may be
-   expanded in the future.
+   certain commands during command execution.  Currently, the suppoted 
+   progress reporting commands are VACUUM and CLUSTER.
+   This may be expanded in the future.
   
 
  
@@ -3247,9 +3255,8 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
one row for each backend (including autovacuum worker processes) that is
currently vacuuming.  The tables below describe the information
that will be reported and provide information about how to interpret it.
-   Progress reporting is not currently supported for VACUUM FULL
-   and backends running VACUUM FULL will not be listed in this
-   view.
+   Running VACUUM FULL is listed in pg_stat_progress_cluster
+   view because it uses CLUSTER command. See .
   
 
   
@@ -3427,6 +3434,228 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
 
  
+
+ 
+  CLUSTER Progress Reporting
+
+  
+   Whenever CLUSTER is running, the
+   pg_stat_progress_cluster view will contain
+   one row for each backend that is currently clustering or vacuuming (VACUUM FULL). 
+   The tables below describe the information that will be reported and
+   provide information about how to interpret it.
+  
+
+  
+   pg_stat_progress_cluster View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+
+ datid
+ oid
+ OID of the database to which this backend is connected.
+
+
+ datname
+ name
+ Name of the database to which this backend is connected.
+
+
+ relid
+ oid
+ OID of the table being clustered.
+
+
+ command
+ text
+ 
+   Current processing command: CLUSTER/VACUUM FULL.
+ 
+
+
+ phase
+ text
+ 
+   Current processing phase of cluster/vacuum full.  See  or .
+ 
+
+
+ scan_method
+ text
+ 
+   Scan method of table: index scan/seq scan.
+ 
+
+
+ scan_index_relid
+ bigint
+ 
+   OID of the index.
+ 
+
+
+ heap_tuples_total
+ bigint
+ 
+   Total number of heap tuples in the table.  This number is reported
+   as of the beginning of the scan; tuples added later will not 

[HACKERS] Changing Jobs

2017-09-06 Thread Andres Freund
Hi Everyone,

Since the beginning of September I'm working full-time for EDB. The plan
for now is to continue working at the projects I was working on, in
particular JIT, and to have more time for reviews. I'll be spending the
large majority of my time on community PG.

I'm primarily mentioning it to avoid the appearance of a concealed bias
when reviewing/committing patches by other contributors from EDB.

Besides that, I'll continue to have an advisor position at Citus Data.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-06 Thread Fabien COELHO


Applies, compiles, works for me.

Very very minor comments that I should have noticed before, sorry for this 
additional round trip.


In the help line, move -I just after -i, to put short options in 
alphabetical and decreasing importance order. On this line, also add the 
information about the default, something like:


  -i, --ini...   
  -I, --custom=[...]+  (default "ctgvp")
 ...

When/if the pgbench tap test patch get through, the feature should be 
tested there as well. No action needed now.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] paths in partitions of a dummy partitioned table

2017-09-06 Thread Ashutosh Bapat
On Fri, Aug 25, 2017 at 10:46 PM, Robert Haas  wrote:
> On Thu, Jul 6, 2017 at 11:35 AM, Ashutosh Bapat
>  wrote:
>> If a partitioned table is proven dummy, set_rel_pathlist() doesn't mark the
>> partition relations dummy and thus doesn't set any (dummy) paths in the
>> partition relations. The lack of paths in the partitions means that we can
>> not use partition-wise join while joining this table with some other 
>> similarly
>> partitioned table as the partitions do not have any paths that child-joins 
>> can
>> use. This means that we can not create partition relations for such a join 
>> and
>> thus can not consider the join to be partitioned. This doesn't matter much 
>> when
>> the dummy relation is the outer relation, since the resultant join is also
>> dummy. But when the dummy relation is inner relation, the resultant join is 
>> not
>> dummy and can be considered to be partitioned with same partitioning scheme 
>> as
>> the outer relation to be joined with other similarly partitioned table. Not
>> having paths in the partitions deprives us of this future optimization.
>
> I think it's wrong for any code to be examining the path list for a
> rel marked dummy, so I would suggest approaching this from a different
> direction.

Me and Robert had an offline discussion about this. I am summarizing
it here for the sake of completeness.

A dummy relation is identified by the only dummy path that exists in
its pathlist. There is no flag in RelOptInfo which tells whether a
given relation is dummy or not, it's the dummy path which tells that.
A dummy path is an Append path with no subpaths. Planner doesn't treat
dummy relations any different from other relations when it comes to
using paths. When a dummy relation participates in a join, the dummy
path is used as one of the joining paths and converted to a Result
plan at the time of planning. So, for a partition-wise join where one
of the joining relations is dummy, its every child must have dummy
path which can be used to construct child-join paths.

But we don't need to mark partition relations dummy (if their parent
is dummy) even when it's not going to participate in partition-wise
join. The partition relations will be marked dummy when we know that
they will be required for partition-wise join. I was worried that we
might mark base relation dummy during join planning this way, but we
already have a precedence for that in add_paths_to_join_rel(). So,
shouldn't be a problem. So, I have now added a patch in partition-wise
join set to mark partition relations dummy when their parent is dummy.

> Given A LEFT JOIN B where Bk is dummy, I suggest
> constructing the path for (AB)k by taking a path from Ak and applying
> an appropriate PathTarget.  You don't really need a join path at all;
> a path for the non-dummy input is fine - and, in fact, better, since
> it will be cheaper to execute.  One problem is that it may not produce
> the correct output columns.  (AB) may drop some columns that were
> being generated by A because they were only needed to perform the
> join, and it may add additional columns taken from B.  But I think
> that if you take the default PathTarget for (AB) and replace
> references to columns of B with NULL constants, you should be able to
> apply the resulting PathTarget to a path for Ak and get a valid path
> for (AB)k.  Maybe there is some reason why that won't work exactly,
> but I think it is probably more promising to attack the problem from
> this angle than to do what you propose.  Sticking dummy joins into the
> query plan that are really just projecting out NULLs is not appealing.
>

This might help in the cases when the RelOptInfo itself is missing
e.g. missing partitions in partition matching as discussed in [1]. I
will discuss this approach on that thread.

[1] 
https://www.postgresql.org/message-id/cafjfprdjqvauev5djx3tw6pu5eq54nckadtxhx2jijg_gvb...@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql - add special variable to reflect the last query status

2017-09-06 Thread Fabien COELHO


Hello Tom,

Here is a version 6.


A few thoughts about this patch:

* I think the ERROR_CODE variable should instead be named SQLSTATE.
That is what the SQL standard calls this string, and it's also what
just about all our documentation calls it; see PG_DIAG_SQLSTATE
in libpq, or the SQLSTATE 'x' construct in pl/pgsql, or the
sqlstate attribute of an exception object in plpython, etc etc.


ERROR_CODE -> SQLSTATE.


* I'm not exactly convinced that there's a use-case for STATUS


Removed, but I think it was nice to have, it is easier to interpret than 
error codes and their classes that I have not memorized yet:-)



* It might be better if SQLSTATE and ERROR_MESSAGE were left
unchanged by a non-error query.  That would reduce the need to
copy them into other variables just because you needed to do
something else before printing them.  It'd save a few cycles too.


Added LAST_ERROR_SQLSTATE & MESSAGE, only reset when an error occured.


* Speaking of which, has anyone tried to quantify the performance
impact of this patch?  It might well be negligible, but I do not
think we should assume that without evidence.


My guess is negligible. Not sure how to measure this negligible, as many 
very fast query should be executed to have something significant. Maybe 
100,000 "SELECT 1;" in a script?



* I wonder why you didn't merge this processing into ProcessResult,
instead of inventing an extra function (whose call site seems rather
poorly chosen anyhow --- what's the justification for not counting this
overhead as part of the query runtime?).  You could probably eliminate
the refactoring you did, since it wouldn't be necessary to recompute
AcceptResult's result that way.


Variable setting moved at then end of ProcessResult, no new functions,
result is clean, so I should have done it like that in the beginning.

Forgotten help stuff added.

--
Fabien.diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 5bdbc1e..0bbe1c9 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -3475,6 +3475,16 @@ bar
   
 
   
+   ERROR
+   
+
+ Whether the last query failed, as a boolean.
+ See also SQLSTATE.
+
+   
+  
+
+  
 FETCH_COUNT
 
 
@@ -3611,6 +3621,18 @@ bar
   
 
   
+   LAST_ERROR_SQLSTATE
+   LAST_ERROR_MESSAGE
+   
+
+ The error code and associated error message of the last
+ error, or empty strings if no error occured since the
+ beginning of the script.
+
+   
+  
+
+  
   
ON_ERROR_ROLLBACK

@@ -3679,6 +3701,25 @@ bar
   
 
   
+   ROW_COUNT
+   
+
+ How many rows were returned or affected by the last query.
+
+   
+  
+
+  
+   SQLSTATE
+   
+
+ The error code associated to the last query, or
+ 0 if no error occured.
+
+   
+  
+
+  
 QUIET
 
 
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index b997058..93950fd 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -493,7 +493,6 @@ ResetCancelConn(void)
 #endif
 }
 
-
 /*
  * AcceptResult
  *
@@ -1107,6 +1106,35 @@ ProcessResult(PGresult **results)
 		first_cycle = false;
 	}
 
+	/*
+	 * Set special variables
+	 * - ERROR: TRUE/FALSE, whether an error occurred
+	 * - SQLSTATE: code of error, or "0"
+	 * - LAST_ERROR_SQLSTATE: same for last error
+	 * - LAST_ERROR_MESSAGE: message of last error
+	 * - ROW_COUNT: how many rows were returned or affected, or "0"
+	*/
+	if (success)
+	{
+		char   *ntuples = PQcmdTuples(*results);
+		SetVariable(pset.vars, "ERROR", "FALSE");
+		SetVariable(pset.vars, "SQLSTATE", "0");
+		SetVariable(pset.vars, "ROW_COUNT", *ntuples ? ntuples : "0");
+	}
+	else
+	{
+		char 		   *code = PQresultErrorField(*results, PG_DIAG_SQLSTATE);
+		char 		   *mesg = PQresultErrorField(*results, PG_DIAG_MESSAGE_PRIMARY);
+
+		SetVariable(pset.vars, "ERROR", "TRUE");
+		/* if an error was detected, it must have a code! */
+		Assert(code != NULL);
+		SetVariable(pset.vars, "SQLSTATE", code);
+		SetVariable(pset.vars, "LAST_ERROR_SQLSTATE", code);
+		SetVariable(pset.vars, "LAST_ERROR_MESSAGE", mesg ? mesg : "");
+		SetVariable(pset.vars, "ROW_COUNT", "0");
+	}
+
 	/* may need this to recover from conn loss during COPY */
 	if (!first_cycle && !CheckConnection())
 		return false;
@@ -1214,7 +1242,6 @@ PrintQueryResults(PGresult *results)
 	return success;
 }
 
-
 /*
  * SendQuery: send the query string to the backend
  * (and print out results)
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index 4d1c0ec..ae951f5 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -337,7 +337,7 @@ helpVariables(unsigned short int pager)
 	 * Windows builds currently print one more line than non-Windows builds.
 	 * Using the larger number is fine.
 	 

Re: [HACKERS] Replacing lfirst() with lfirst_node() appropriately in planner.c

2017-09-06 Thread Alvaro Herrera
Ashutosh Bapat wrote:
> On Wed, Sep 6, 2017 at 1:32 AM, Tom Lane  wrote:

> > BTW, I think we *could* use "lfirst_node(List, ...)" in cases where
> > we know the list is supposed to be a list of objects rather than ints
> > or Oids.  I didn't do anything about that observation, though.
> 
> IMO, it won't be apparent as to why some code uses lfirst_node(List,
> ...) and some code uses (List *) lfirst().

Yeah -- based on that argument, I too think we should leave those alone.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dropping partitioned tables without CASCADE

2017-09-06 Thread Rushabh Lathia
Okay, I have marked this as ready for committer.

Thanks,

On Wed, Sep 6, 2017 at 2:50 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Wed, Sep 6, 2017 at 1:06 PM, Rushabh Lathia 
> wrote:
> >
> > 2) Add partition to the foo;
> >
> > create table foo_p1 partition of foo for values in (1, 2, 3) partition by
> > list (b);
> >
> > postgres=# \d foo
> > Table "public.foo"
> >  Column |  Type   | Collation | Nullable | Default
> > +-+---+--+-
> >  a  | integer |   |  |
> >  b  | integer |   |  |
> > Partition key: LIST (a)
> > Number of partitions: 1 (Use \d+ to list them.)
> >
> > postgres=# \d+ foo
> > Table "public.foo"
> >  Column |  Type   | Collation | Nullable | Default | Storage | Stats
> target
> > | Description
> > +-+---+--+-+
> -+--+-
> >  a  | integer |   |  | | plain   |
> > |
> >  b  | integer |   |  | | plain   |
> > |
> > Partition key: LIST (a)
> > Partitions: foo_p1 FOR VALUES IN (1, 2, 3) has partitions
> >
> > Above verbose output for foo says, foo_p1 "has partitions". But if I do
> >
> > postgres=# \d foo_p1
> >Table "public.foo_p1"
> >  Column |  Type   | Collation | Nullable | Default
> > +-+---+--+-
> >  a  | integer |   |  |
> >  b  | integer |   |  |
> > Partition of: foo FOR VALUES IN (1, 2, 3)
> > Partition key: LIST (b)
> > Number of partitions: 0
> >
> > it tell "Number of partitions: 0".
> >
> > I feel like information is conflicting with each other. AFAIU, idea about
> > adding
> > "has partitions" was to let know that it's a partitioned table. So can
> you
> > directly
> > add the "is partitioned" in place of "has partitions"?
> >
> > Did those change in the attached patch and update regression expected
> > output.
> >
>
> Looks better.
>
> > Also run pgindent on the patch.
> >
>
> Thanks for the changes. The patch looks good to me.
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>



-- 
Rushabh Lathia


Re: [HACKERS] document and use SPI_result_code_string()

2017-09-06 Thread Michael Paquier
On Thu, Aug 31, 2017 at 11:23 AM, Peter Eisentraut
 wrote:
> A lot of semi-internal code just prints out numeric SPI error codes,
> which is not very helpful.  We already have an API function
> SPI_result_code_string() to convert the codes to a string, so here is a
> patch to make more use of that and also document it for external use.
>
> Also included are two patches to clarify that some RI error handling
> code that I encountered at the same time.

Those are simple things. Agreed for 0001 to untangle things.

Fine for 0002. This reminds me of LockGXact and RemoveGXact in
twophase.c, as well as _hash_squeezebucket that have some code paths
that cannot return... Any thoughts about having some kind of
PG_NOTREACHED defined to 0 which could be put in an assertion?

+1 for 0003. There are other undocumented functions available to users:
- SPI_plan_is_valid
- SPI_execute_snapshot
- SPI_plan_get_plan_sources
- SPI_plan_get_cached_plan
- SPI_datumTransfer
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

2017-09-06 Thread Haribabu Kommi
On Tue, Jul 25, 2017 at 9:33 PM, Tomas Vondra 
wrote:

> On 7/25/17 12:55 AM, Tom Lane wrote:
>
>> Tomas Vondra  writes:
>>
>>> It seems to me that VACUUM and ANALYZE somewhat disagree on what
>>> exactly reltuples means. VACUUM seems to be thinking that reltuples
>>> = live + dead while ANALYZE apparently believes that reltuples =
>>> live
>>>
>>
>> The question is - which of the reltuples definitions is the right
>>> one? I've always assumed that "reltuples = live + dead" but perhaps
>>> not?
>>>
>>
>> I think the planner basically assumes that reltuples is the live
>> tuple count, so maybe we'd better change VACUUM to get in step.
>>
>>
> Attached is a patch that (I think) does just that. The disagreement was
> caused by VACUUM treating recently dead tuples as live, while ANALYZE
> treats both of those as dead.
>
> At first I was worried that this will negatively affect plans in the
> long-running transaction, as it will get underestimates (due to reltuples
> not including rows it can see). But that's a problem we already have
> anyway, you just need to run ANALYZE in the other session.


Thanks for the patch.
>From the mail, I understand that this patch tries to improve the
reltuples value update in the catalog table by the vacuum command
to consider the proper visible tuples similar like analyze command.

- num_tuples);
+ num_tuples - nkeep);

With the above correction, there is a problem in reporting the number
of live tuples to the stats.

postgres=# select reltuples, n_live_tup, n_dead_tup
  from pg_stat_user_tables join pg_class using (relname)
 where relname = 't';
 reltuples | n_live_tup | n_dead_tup
---++
899818 | 799636 | 100182
(1 row)


The live tuples data value is again decremented with dead tuples
value before sending them to stats in function lazy_vacuum_rel(),

/* report results to the stats collector, too */
new_live_tuples = new_rel_tuples - vacrelstats->new_dead_tuples;

The fix needs a correction here also. Or change the correction in
lazy_vacuum_rel() function itself before updating catalog table similar
like stats.


While testing this patch, I found another problem that is not related to
this patch. When the vacuum command is executed mutiple times on
a table with no dead rows, the number of reltuples value is slowly
reducing.

postgres=# select reltuples, n_live_tup, n_dead_tup
  from pg_stat_user_tables join pg_class using (relname)
 where relname = 't';
 reltuples | n_live_tup | n_dead_tup
---++
899674 | 899674 |  0
(1 row)

postgres=# vacuum t;
VACUUM
postgres=# select reltuples, n_live_tup, n_dead_tup
  from pg_stat_user_tables join pg_class using (relname)
 where relname = 't';
 reltuples | n_live_tup | n_dead_tup
---++
899622 | 899622 |  0
(1 row)

postgres=# vacuum t;
VACUUM
postgres=# select reltuples, n_live_tup, n_dead_tup
  from pg_stat_user_tables join pg_class using (relname)
 where relname = 't';
 reltuples | n_live_tup | n_dead_tup
---++
899570 | 899570 |  0
(1 row)


In lazy_scan_heap() function, we force to scan the last page of the
relation to avoid the access exclusive lock in lazy_truncate_heap
if there are tuples in the last page. Because of this reason, the
scanned_pages value will never be 0, so the vac_estimate_reltuples
function will estimate the tuples based on the number of tuples
from the last page of the relation. This estimation is leading to
reduce the number of retuples.

I am thinking whether this problem really happen in real world scenarios
to produce a fix?

Regards,
Hari Babu
Fujitsu Australia


Re: [HACKERS] psql - add special variable to reflect the last query status

2017-09-06 Thread Fabien COELHO



Here is a version 6.


Small v7 update, sorry for the noise.

Add testing the initial state of all variables.

Fix typos in a comment in tests.

Fix the documentation wrt the current implementation behavior.

--
Fabien.diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 5bdbc1e..97962d4 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -3475,6 +3475,16 @@ bar
   
 
   
+   ERROR
+   
+
+ Whether the last query failed, as a boolean.
+ See also SQLSTATE.
+
+   
+  
+
+  
 FETCH_COUNT
 
 
@@ -3611,6 +3621,18 @@ bar
   
 
   
+   LAST_ERROR_SQLSTATE
+   LAST_ERROR_MESSAGE
+   
+
+ The error code and associated error message of the last
+ error, or "0" and empty strings if no error occured
+ since the beginning of the script.
+
+   
+  
+
+  
   
ON_ERROR_ROLLBACK

@@ -3679,6 +3701,25 @@ bar
   
 
   
+   ROW_COUNT
+   
+
+ How many rows were returned or affected by the last query.
+
+   
+  
+
+  
+   SQLSTATE
+   
+
+ The error code associated to the last query, or
+ 0 if no error occured.
+
+   
+  
+
+  
 QUIET
 
 
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index b997058..93950fd 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -493,7 +493,6 @@ ResetCancelConn(void)
 #endif
 }
 
-
 /*
  * AcceptResult
  *
@@ -1107,6 +1106,35 @@ ProcessResult(PGresult **results)
 		first_cycle = false;
 	}
 
+	/*
+	 * Set special variables
+	 * - ERROR: TRUE/FALSE, whether an error occurred
+	 * - SQLSTATE: code of error, or "0"
+	 * - LAST_ERROR_SQLSTATE: same for last error
+	 * - LAST_ERROR_MESSAGE: message of last error
+	 * - ROW_COUNT: how many rows were returned or affected, or "0"
+	*/
+	if (success)
+	{
+		char   *ntuples = PQcmdTuples(*results);
+		SetVariable(pset.vars, "ERROR", "FALSE");
+		SetVariable(pset.vars, "SQLSTATE", "0");
+		SetVariable(pset.vars, "ROW_COUNT", *ntuples ? ntuples : "0");
+	}
+	else
+	{
+		char 		   *code = PQresultErrorField(*results, PG_DIAG_SQLSTATE);
+		char 		   *mesg = PQresultErrorField(*results, PG_DIAG_MESSAGE_PRIMARY);
+
+		SetVariable(pset.vars, "ERROR", "TRUE");
+		/* if an error was detected, it must have a code! */
+		Assert(code != NULL);
+		SetVariable(pset.vars, "SQLSTATE", code);
+		SetVariable(pset.vars, "LAST_ERROR_SQLSTATE", code);
+		SetVariable(pset.vars, "LAST_ERROR_MESSAGE", mesg ? mesg : "");
+		SetVariable(pset.vars, "ROW_COUNT", "0");
+	}
+
 	/* may need this to recover from conn loss during COPY */
 	if (!first_cycle && !CheckConnection())
 		return false;
@@ -1214,7 +1242,6 @@ PrintQueryResults(PGresult *results)
 	return success;
 }
 
-
 /*
  * SendQuery: send the query string to the backend
  * (and print out results)
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index 4d1c0ec..ae951f5 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -337,7 +337,7 @@ helpVariables(unsigned short int pager)
 	 * Windows builds currently print one more line than non-Windows builds.
 	 * Using the larger number is fine.
 	 */
-	output = PageOutput(147, pager ? &(pset.popt.topt) : NULL);
+	output = PageOutput(155, pager ? &(pset.popt.topt) : NULL);
 
 	fprintf(output, _("List of specially treated variables\n\n"));
 
@@ -360,6 +360,8 @@ helpVariables(unsigned short int pager)
 	  "if set to \"noexec\", just show them without execution\n"));
 	fprintf(output, _("  ENCODING\n"
 	  "current client character set encoding\n"));
+	fprintf(output, _("  ERROR\n"
+	  "whether the last query failed\n"));
 	fprintf(output, _("  FETCH_COUNT\n"
 	  "the number of result rows to fetch and display at a time (0 = unlimited)\n"));
 	fprintf(output, _("  HISTCONTROL\n"
@@ -374,6 +376,9 @@ helpVariables(unsigned short int pager)
 	  "number of EOFs needed to terminate an interactive session\n"));
 	fprintf(output, _("  LASTOID\n"
 	  "value of the last affected OID\n"));
+	fprintf(output, _("  LAST_ERROR_SQLSTATE\n"
+	  "  LAST_ERROR_MESSAGE\n"
+	  "error code and message of last error, or \"0\" and empty if none\n"));
 	fprintf(output, _("  ON_ERROR_ROLLBACK\n"
 	  "if set, an error doesn't stop a transaction (uses implicit savepoints)\n"));
 	fprintf(output, _("  ON_ERROR_STOP\n"
@@ -388,6 +393,8 @@ helpVariables(unsigned short int pager)
 	  "specifies the prompt used during COPY ... FROM STDIN\n"));
 	fprintf(output, _("  QUIET\n"
 	  "run quietly (same as -q option)\n"));
+	fprintf(output, _("  ROW_COUNT\n"
+	  "number of rows of last query, or 0\n"));
 	fprintf(output, _("  SERVER_VERSION_NAME\n"
 	  "  SERVER_VERSION_NUM\n"
 	  "server's version (in 

Re: [HACKERS] CLUSTER command progress monitor

2017-09-06 Thread Michael Paquier
On Wed, Sep 6, 2017 at 3:58 PM, Tatsuro Yamada
 wrote:
> I revised the patch like this:

You should avoid top-posting.

> I didn't change the name of view (pg_stat_progress_cluster) because I'm not
> sure
> whether the new name (pg_stat_progress_reorg) is suitable or not.

Here are some ideas: rewrite (incorrect for ALTER TABLE), organize
(somewhat fine), order, operate (too general?), bundle, reform,
assemble.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 【ECPG】strncpy function does not set the end character '\0'

2017-09-06 Thread Michael Meskes
> When we reviewed the ecpg code,we found the array seem not have the
> end
> character('\0')  after using the strncpy function. 

True.

> In the function ECPGnoticeReceiver, we use the stncpy function copy
> the
> sqlstate to sqlca->sqlstate. And the  sqlca->sqlstate is defined as
> the size
> of 5, and the copy size is sizeof(sqlca->sqlstate). However, from the
> previous strcmp function, the sqlstate size may be 5,such as
> ECPG_SQLSTATE_INVALID_CURSOR_NAME. So there may be lack of the end
> character
> for sqlca->sqlstate.

Why do you think there should be one? My memory might be wrong but I
don't think it's supposed to be a null terminated string.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2017-09-06 Thread Kyotaro HORIGUCHI
Hi,

At Mon, 4 Sep 2017 17:17:19 +0900, Michael Paquier  
wrote in 

Re: [HACKERS] [Proposal] Allow users to specify multiple tables in VACUUM commands

2017-09-06 Thread Alvaro Herrera
Tom Lane wrote:
> "Bossart, Nathan"  writes:
> > On 9/4/17, 10:32 PM, "Simon Riggs"  wrote:
> >> If we want to keep the code simple we must surely consider whether the
> >> patch has any utility.
> 
> > ... I'd argue that this feels like a natural extension of the
> > VACUUM command, one that I, like others much earlier in this thread,
> > was surprised to learn wasn't supported.
> 
> Yeah.  To me, one big argument for allowing multiple target tables is that
> we allow it for other common utility commands such as TRUNCATE or LOCK
> TABLE.

TRUNCATE has actual an feature behind its multi-table ability: you can
truncate tables linked by FKs that way, and not otherwise.  VACUUM, like
LOCK TABLE, have no such benefit.

(If one is programatically locking multiple tables, it is easier to do
one table per command than many in one command, anyway.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dropping partitioned tables without CASCADE

2017-09-06 Thread Rushabh Lathia
I picked this patch for review and started looking at the implementation
details.

Consider the below test:

1)

postgres=# create table foo (a int, b int) partition by list (a);
CREATE TABLE
postgres=# \d foo
Table "public.foo"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   |  |
 b  | integer |   |  |
Partition key: LIST (a)
Number of partitions: 0

postgres=# \d+ foo
Table "public.foo"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target
| Description
+-+---+--+-+-+--+-
 a  | integer |   |  | | plain   |
|
 b  | integer |   |  | | plain   |
|
Partition key: LIST (a)
Number of partitions: 0

In the above case, verbose as well as normal output give information
about number of partitions.

2) Add partition to the foo;

create table foo_p1 partition of foo for values in (1, 2, 3) partition by
list (b);

postgres=# \d foo
Table "public.foo"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   |  |
 b  | integer |   |  |
Partition key: LIST (a)
*Number of partitions: 1 (Use \d+ to list them.)*

postgres=# \d+ foo
Table "public.foo"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target
| Description
+-+---+--+-+-+--+-
 a  | integer |   |  | | plain   |
|
 b  | integer |   |  | | plain   |
|
Partition key: LIST (a)
*Partitions: foo_p1 FOR VALUES IN (1, 2, 3) has partitions*

Above verbose output for foo says, foo_p1 "has partitions". But if I do

postgres=# \d foo_p1
   Table "public.foo_p1"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   |  |
 b  | integer |   |  |
Partition of: foo FOR VALUES IN (1, 2, 3)
Partition key: LIST (b)

*Number of partitions: 0*
it tell "Number of partitions: 0".

I feel like information is conflicting with each other. AFAIU, idea about
adding
"has partitions" was to let know that it's a partitioned table. So can you
directly
add the "is partitioned" in place of "has partitions"?

Did those change in the attached patch and update regression expected
output.

Also run pgindent on the patch.

Thanks,



On Mon, Sep 4, 2017 at 6:22 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Mon, Sep 4, 2017 at 3:48 PM, Alvaro Herrera 
> wrote:
> >
> > if (tuples > 0)
> > {
> > -   if (tableinfo.relkind !=
> RELKIND_PARTITIONED_TABLE)
> > -   printfPQExpBuffer(,
> _("Number of child tables: %d (Use \\d+ to list them.)"), tuples);
> > -   else
> > -   printfPQExpBuffer(,
> _("Number of partitions: %d (Use \\d+ to list them.)"), tuples);
> > +   printfPQExpBuffer(, _("Number of %s:
> %d (Use \\d+ to list them.)"), ct, tuples);
> > printTableAddFooter(, buf.data);
> > }
> >
> > Please don't do this, because it breaks translatability.  I think the
> > original is fine.
> >
>
> We have used this style in the "else" case of if (!verbose). So, I
> just copied it. I have removed that change in the attached patch.
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


...
Rushabh Lathia
www.EnterpriseDB.com
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index f6049cc..ab9a637 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2828,7 +2828,7 @@ describeOneTableDetails(const char *schemaname,
 		/* print child tables (with additional info if partitions) */
 		if (pset.sversion >= 10)
 			printfPQExpBuffer(,
-			  "SELECT c.oid::pg_catalog.regclass, pg_catalog.pg_get_expr(c.relpartbound, c.oid)"
+			  "SELECT c.oid::pg_catalog.regclass, pg_get_expr(c.relpartbound, c.oid), c.relkind"
 			  " FROM pg_catalog.pg_class c, pg_catalog.pg_inherits i"
 			  " WHERE c.oid=i.inhrelid AND i.inhparent = '%s'"
 			  " ORDER BY c.oid::pg_catalog.regclass::pg_catalog.text;", oid);
@@ -2851,7 +2851,18 @@ describeOneTableDetails(const char *schemaname,
 		else
 			tuples = PQntuples(result);
 
-		if (!verbose)
+		/*
+		

Re: [HACKERS] dropping partitioned tables without CASCADE

2017-09-06 Thread Ashutosh Bapat
On Wed, Sep 6, 2017 at 1:06 PM, Rushabh Lathia  wrote:
>
> 2) Add partition to the foo;
>
> create table foo_p1 partition of foo for values in (1, 2, 3) partition by
> list (b);
>
> postgres=# \d foo
> Table "public.foo"
>  Column |  Type   | Collation | Nullable | Default
> +-+---+--+-
>  a  | integer |   |  |
>  b  | integer |   |  |
> Partition key: LIST (a)
> Number of partitions: 1 (Use \d+ to list them.)
>
> postgres=# \d+ foo
> Table "public.foo"
>  Column |  Type   | Collation | Nullable | Default | Storage | Stats target
> | Description
> +-+---+--+-+-+--+-
>  a  | integer |   |  | | plain   |
> |
>  b  | integer |   |  | | plain   |
> |
> Partition key: LIST (a)
> Partitions: foo_p1 FOR VALUES IN (1, 2, 3) has partitions
>
> Above verbose output for foo says, foo_p1 "has partitions". But if I do
>
> postgres=# \d foo_p1
>Table "public.foo_p1"
>  Column |  Type   | Collation | Nullable | Default
> +-+---+--+-
>  a  | integer |   |  |
>  b  | integer |   |  |
> Partition of: foo FOR VALUES IN (1, 2, 3)
> Partition key: LIST (b)
> Number of partitions: 0
>
> it tell "Number of partitions: 0".
>
> I feel like information is conflicting with each other. AFAIU, idea about
> adding
> "has partitions" was to let know that it's a partitioned table. So can you
> directly
> add the "is partitioned" in place of "has partitions"?
>
> Did those change in the attached patch and update regression expected
> output.
>

Looks better.

> Also run pgindent on the patch.
>

Thanks for the changes. The patch looks good to me.


-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup behavior on non-existent slot

2017-09-06 Thread Alvaro Herrera
Magnus Hagander wrote:
> On Mon, Sep 4, 2017 at 3:21 PM, Jeff Janes  wrote:

> > Should the parent process of pg_basebackup be made to respond to SIGCHLD?
> > Or call waitpid(bgchild, , WNOHANG) in some strategic loop?
> 
> I think it's ok to just call waitpid() -- we don't need to react super
> quickly, but we should react.

Hmm, not sure about that ... in the normal case (slotname is correct)
you'd be doing thousands of useless waitpid() system calls during the
whole operation, no?  I think it'd be better to have a SIGCHLD handler
that sets a flag (just once), which can be quickly checked without
accessing kernel space.

> And we should then exit the main process with an error before actually
> streaming everything.

Right.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 【ECPG】strncpy function does not set the end character '\0'

2017-09-06 Thread Tom Lane
Michael Meskes  writes:
>> In the function ECPGnoticeReceiver, we use the stncpy function copy
>> the
>> sqlstate to sqlca->sqlstate. And the  sqlca->sqlstate is defined as
>> the size
>> of 5, and the copy size is sizeof(sqlca->sqlstate). However, from the
>> previous strcmp function, the sqlstate size may be 5,such as
>> ECPG_SQLSTATE_INVALID_CURSOR_NAME. So there may be lack of the end
>> character
>> for sqlca->sqlstate.

> Why do you think there should be one? My memory might be wrong but I
> don't think it's supposed to be a null terminated string.

That field is defined as char[5] in struct sqlca_t, so the intent is
clearly that it not be null terminated.  However, it looks to me like
there'd be at least one alignment-padding byte after it, and that byte
is likely to be 0 in a lot of situations (particularly for statically
allocated sqlca_t's).  So a lot of the time, you could get away with
using strcmp() or other functions that expect null termination.

I'm thinking therefore that there's probably code out there that tries
to do strcmp(sqlca->sqlstate, "22000") or suchlike, and it works often
enough that the authors haven't identified their bug.  The question is
do we want to try to make that be valid code.

Changing the field declaration to char[5+1] would be easy enough, but
I have no idea how many places in ecpglib would need to change to make
sure that the last byte gets set to 0.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] <> join selectivity estimate question

2017-09-06 Thread Ashutosh Bapat
On Fri, Jul 21, 2017 at 4:10 AM, Thomas Munro
 wrote:
>
> Thanks.  Bearing all that in mind, I ran through a series of test
> scenarios and discovered that my handling for JOIN_ANTI was wrong: I
> thought that I had to deal with inverting the result, but I now see
> that that's handled elsewhere (calc_joinrel_size_estimate() I think).
> So neqjoinsel should just treat JOIN_SEMI and JOIN_ANTI exactly the
> same way.

I agree, esp. after looking at eqjoinsel_semi(), which is used for
both semi and anti joins, it becomes more clear.

>
> That just leaves the question of whether we should try to handle the
> empty RHS and single-value RHS cases using statistics.  My intuition
> is that we shouldn't, but I'll be happy to change my intuition and
> code that up if that is the feedback from planner gurus.

Empty RHS can result from dummy relations also, which are produced by
constraint exclusion, so may be that's an interesting case. Single
value RHS may be interesting with partitioned table with all rows in a
given partition end up with the same partition key value. But may be
those are just different patches. I am not sure.

>
> Please find attached a new version, and a test script I used, which
> shows a bunch of interesting cases.  I'll add this to the commitfest.

I added some "stable" tests to your patch taking inspiration from the
test SQL file. I think those will be stable across machines and runs.
Please let me know if those look good to you.



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 23e5526..4c1bae6 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -2697,26 +2697,63 @@ neqjoinsel(PG_FUNCTION_ARGS)
 	Oid			eqop;
 	float8		result;
 
-	/*
-	 * We want 1 - eqjoinsel() where the equality operator is the one
-	 * associated with this != operator, that is, its negator.
-	 */
-	eqop = get_negator(operator);
-	if (eqop)
+
+	if (jointype == JOIN_SEMI || jointype == JOIN_ANTI)
 	{
-		result = DatumGetFloat8(DirectFunctionCall5(eqjoinsel,
-	PointerGetDatum(root),
-	ObjectIdGetDatum(eqop),
-	PointerGetDatum(args),
-	Int16GetDatum(jointype),
-	PointerGetDatum(sjinfo)));
+		VariableStatData leftvar;
+		VariableStatData rightvar;
+		double		nullfrac;
+		bool		reversed;
+		HeapTuple	statsTuple;
+
+		get_join_variables(root, args, sjinfo, , , );
+		statsTuple = reversed ? rightvar.statsTuple : leftvar.statsTuple;
+		if (HeapTupleIsValid(statsTuple))
+			nullfrac = ((Form_pg_statistic) GETSTRUCT(statsTuple))->stanullfrac;
+		else
+			nullfrac = 0.0;
+		ReleaseVariableStats(leftvar);
+		ReleaseVariableStats(rightvar);
+
+		/*
+		 * For semi-joins, if there is more than one distinct value in the RHS
+		 * relation then every non-null LHS row must find a row to join since
+		 * it can only be equal to one of them.  We'll assume that there is
+		 * always more than one distinct RHS value for the sake of stability,
+		 * though in theory we could have special cases for empty RHS
+		 * (selectivity = 0) and single-distinct-value RHS (selectivity =
+		 * fraction of LHS that has the same value as the single RHS value).
+		 *
+		 * For anti-joins, if we use the same assumption that there is more
+		 * than one distinct key in the RHS relation, then every non-null LHS
+		 * row must be suppressed by the anti-join leaving only nullfrac.
+		 */
+		result = 1.0 - nullfrac;
 	}
 	else
 	{
-		/* Use default selectivity (should we raise an error instead?) */
-		result = DEFAULT_EQ_SEL;
+		/*
+		 * We want 1 - eqjoinsel() where the equality operator is the one
+		 * associated with this != operator, that is, its negator.
+		 */
+		eqop = get_negator(operator);
+		if (eqop)
+		{
+			result = DatumGetFloat8(DirectFunctionCall5(eqjoinsel,
+		PointerGetDatum(root),
+		ObjectIdGetDatum(eqop),
+		PointerGetDatum(args),
+		Int16GetDatum(jointype),
+		PointerGetDatum(sjinfo)));
+		}
+		else
+		{
+			/* Use default selectivity (should we raise an error instead?) */
+			result = DEFAULT_EQ_SEL;
+		}
+		result = 1.0 - result;
 	}
-	result = 1.0 - result;
+
 	PG_RETURN_FLOAT8(result);
 }
 
diff --git a/src/test/regress/expected/join.out b/src/test/regress/expected/join.out
index 9f4c88d..10bfb68 100644
--- a/src/test/regress/expected/join.out
+++ b/src/test/regress/expected/join.out
@@ -1845,6 +1845,89 @@ SELECT '' AS "xxx", *
  | 1 | 4 | one | -1
 (1 row)
 
+-- SEMI and ANTI join neq selectivity
+ANALYZE J1_TBL;
+ANALYZE J2_TBL;
+EXPLAIN (ANALYZE, COSTS OFF, TIMING OFF, SUMMARY OFF)
+SELECT * FROM J1_TBL t1
+WHERE EXISTS (SELECT * FROM J2_TBL t2 WHERE t1.i <> t2.i);
+QUERY PLAN 
+---
+ Nested Loop Semi Join (actual rows=9 

Re: [HACKERS] dropping partitioned tables without CASCADE

2017-09-06 Thread Amit Langote
On 2017/09/06 18:46, Rushabh Lathia wrote:
> Okay, I have marked this as ready for committer.

Thanks Ashutosh and Rushabh for rebasing and improving the patch.  Looks
good to me too.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Effect of changing the value for PARALLEL_TUPLE_QUEUE_SIZE

2017-09-06 Thread Rafia Sabih
On Fri, Jun 2, 2017 at 6:31 PM, Amit Kapila  wrote:
>
> Your reasoning sounds sensible to me.  I think the other way to attack
> this problem is that we can maintain some local queue in each of the
> workers when the shared memory queue becomes full.  Basically, we can
> extend your "Faster processing at Gather node" patch [1] such that
> instead of fixed sized local queue, we can extend it when the shm
> queue become full.  I think that way we can handle both the problems
> (worker won't stall if shm queues are full and workers can do batched
> writes in shm queue to avoid the shm queue communication overhead) in
> a similar way.
>
>
> [1] - 
> https://www.postgresql.org/message-id/CAOGQiiMwhOd5-iKZnizn%2BEdzZmB0bc3xa6rKXQgvhbnQ29zCJg%40mail.gmail.com
>

I worked on this idea of using local queue as a temporary buffer to
write the tuples when master is busy and shared queue is full, and it
gives quite some improvement in the query performance.

Design:
On a basic level, the design of this patch can be explained as
following, similar to shm_mq, there is a new structure local_mq which
is private for each worker. Once shared queue is full, we write the
tuple in local queue. Since, local queue is never shared we do not
need any sort of locking for writing in it, hence writing in local
queue is one cheap operation.

Once local queue is atleast 5% (for this version, I've kept this, but
we might need to modify it) full we copy the data from local to shared
queue. In case both the queues are full, wait till master reads from
shared queue, then copy some data from local to shared queue, till
required space is available, subsequently write the tuple to local
queue. If at any instant local queue becomes empty then we write the
tuple in shared queue itself, provided there is space. At the time of
worker shutdown we copy all the data from local queue to shared queue.

For this version of the patch I have kept the size of local queue =
100 * PARALLEL_TUPLE_QUEUE_SIZE = 6553600, which might not be the best
and I am open to understand the reasons for modifying it. But it is
kept that way for the scenarios where gather/gather-merge node is
slow. And I expect when a master is busy it might be for some long
time or the data to be processed is high and we would not want our
worker to wait for some long time.

Performance:
These experiments are on TPC-H scale factor 20. The patch is giving
around 20-30% performance improvement in queries with selectivity
something around 20-30%.

Head:
Default plan
explain analyse select * from lineitem where l_extendedprice < 5
and l_orderkey < 1500;

QUERY PLAN
-
 Index Scan using idx_lineitem_orderkey on lineitem
(cost=0.57..334367.85 rows=10313822 width=129) (actual
time=0.057..26389.587 rows=10258702 loops=1)
   Index Cond: (l_orderkey < 1500)
   Filter: (l_extendedprice < '5'::numeric)
   Rows Removed by Filter: 4737888
 Planning time: 1.686 ms
 Execution time: 27402.801 ms
(6 rows)

Force parallelism plan
explain analyse select * from lineitem where l_extendedprice < 5
and l_orderkey < 1500;

  QUERY PLAN
-
 Gather  (cost=0.57..193789.78 rows=10313822 width=129) (actual
time=0.354..41153.916 rows=10258702 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Index Scan using idx_lineitem_orderkey on lineitem
(cost=0.57..193789.78 rows=2578456 width=129) (actual
time=0.062..6530.167 rows=2051740 loops=5)
 Index Cond: (l_orderkey < 1500)
 Filter: (l_extendedprice < '5'::numeric)
 Rows Removed by Filter: 947578
 Planning time: 0.383 ms
 Execution time: 42027.645 ms
(9 rows)

Patch:
Force parallelism plan

 explain analyse select * from lineitem where l_extendedprice < 5
and l_orderkey < 1500;

  QUERY PLAN
-
 Gather  (cost=0.57..193789.78 rows=10313822 width=129) (actual
time=0.413..16690.294 rows=10258702 loops=1)
   Workers Planned: 4
   Workers Launched: 4
   ->  Parallel Index Scan using idx_lineitem_orderkey on lineitem
(cost=0.57..193789.78 rows=2578456 width=129) (actual
time=0.047..6185.527 rows=2051740 loops=5)
 Index Cond: (l_orderkey < 1500)
 Filter: (l_extendedprice < '5'::numeric)
 Rows Removed by Filter: 947578
 Planning time: 0.406 ms
 Execution time: 17616.750 ms
(9 rows)

Head:
Default plan
explain analyse select * from lineitem where l_extendedprice < 5
and l_orderkey < 3000;

QUERY PLAN

Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel

2017-09-06 Thread Thomas Munro
On Sat, Jul 22, 2017 at 12:05 AM, Alexey Chernyshov
 wrote:
> the following patch transfers functionality from gevel module
> (http://www.sai.msu.su/~megera/wiki/Gevel) which provides functions for
> analyzing GIN and GiST indexes to pageinspect. Gevel was originally
> designed by Oleg Bartunov, and Teodor Sigaev for developers of GiST and
> GIN indexes.
>
> Functions added:
>  - gist_stat(text) - shows statistics on GiST Tree
>  - gist_tree(text) - shows GiST tree
>  - gist_tree(text, int4) - shows GiST tree up to MAXLEVEL
>  - gist_print(text) - prints objects stored in GiST tree
>  - spgist_stat(text) - shows statistics on SP-GiST
>  - spgist_print(text) - prints objects stored in index
>  - gin_value_count() - originally gin_stat(text) - prints estimated counts
> for index values
>  - gin_stats() - originally gin_statpage(text) - shows statistics
>  - gin_count_estimate(text, tsquery) - shows number of indexed rows matched
> query
>
> Tests also transferred, docs for new functions are added. I run pgindent
> over the code, but the result is different from those I expected, so I leave
> pgindented one.
> The patch is applicable to the commit
> 866f4a7c210857aa342bf901558d170325094dde.

Hi Alexey,

This patch still applies but doesn't build after commits 2cd70845 and c6293249.

ginfuncs.c:91:47: error: invalid type argument of ‘->’ (have
‘FormData_pg_attribute’)
   st->index->rd_att->attrs[st->attnum]->attbyval,
...several similar errors...

For example, that expression needs to be changed to:

   TupleDescAttr(st->index->rd_att, attnum)->attbyval

Here is some superficial review since I'm here:

+/*
+ * process_tuples
+ * Retrieves the number of TIDs in a tuple.
+ */
+static void
+process_tuple(FuncCallContext *funcctx, GinStatState * st, IndexTuple itup)

"process_tuples" vs "process_tuple"

+ /* do no distiguish various null category */

"do not distinguish various null categories"

+ st->nulls[0] = (st->category == GIN_CAT_NORM_KEY) ? false : true;

That's a long way to write (st->category != GIN_CAT_NORM_KEY)!

+ * We scaned the whole page, so we should take right page

"scanned"

+/*
+ * refind_position
+ * Refinds a previois position, at returns it has correctly
+ * set offset and buffer is locked.
+ */

"previous", s/, at returns/.  On return/

+ memset(st->nulls, false, 2 * sizeof(*st->nulls));

"false" looks strange here where an int is expected.  The size
expression is weird.  I think you should just write:

st->nulls[0] = false;
st->nulls[1] = false;

Investigating the types more closely, I see that 'nulls' is declared
like this (in struct GinStatState):

+ char nulls[2];

Then later you do this:

+ htuple = heap_form_tuple(funcctx->attinmeta->tupdesc,
+ st->dvalues,
+ st->nulls);

But that's not OK, heap_form_tuple takes bool *.  bool and char are
not interchangeable (they may have different sizes on some platforms,
among other problems, even though usually they are both 1 byte).  So I
think you should declare it as "bool nulls[2]".

Meanwhile in TypeStorage you have a member "bool *nulls", which you
then initialise like so:

+ st->nulls = (char *) palloc((tupdesc->natts + 2) * sizeof(*st->nulls));

That cast is wrong.

+/*
+ * gin_count_estimate
+ * Outputs number of indexed rows matched query. It doesn't touch heap at
+ * all.

Maybe "... number of indexed rows matching query."?

+ if (attnum < 0 || attnum >= st->index->rd_att->natts)
+ elog(ERROR, "Wrong column's number");
+ st->attnum = attnum;

Maybe "invalid column number" or "invalid attribute number"?

+ elog(ERROR, "Column type is not a tsvector");

s/C/c/ (according to convention).

+ * Prints various stat about index internals.

s/stat/stats/

+ elog(ERROR, "Relation %s.%s is not an index",

s/R/r/

+ elog(ERROR, "Index %s.%s has wrong type",

s/I/i/

+  gin_value_count prints estimated counts for each
+  indexed values, single-argument function will return result for a first
+  column of index. For example:

"... for each indexed value.  The single argument version will return
results for the first column of an index.  For example:"

+  gin_count_estimate outputs number of indexed
+ rows matched query. It doesn't touch heap at all. For example:

"... outputs the number of indexed rows matched by a query. ..."

+  gist_print prints objects stored in
+  GiST tree, works only if objects in index have
+  textual representation (type_out functions should be
+  implemented for the given object type). It's known to work with R-tree
+  GiST based index. Note, in example below, objects are
+  of type box. For example:

"... prints objects stored in a GiST tree.  it works only if the
objects in the index have textual representation (type_out functions
should be implemented for the given object type).  It's known to work
with R-tree GiST-based indexes.  Note: in the example below, objects
are of type box.  For example:"

+  

Re: [HACKERS] additional contrib test suites

2017-09-06 Thread Thomas Munro
On Sat, Aug 12, 2017 at 1:20 PM, Peter Eisentraut
 wrote:
> Here are some small test suites for some contrib modules as well as
> pg_archivecleanup that didn't have one previously, as well as one patch
> to improve code coverage in a module.
>
> Will add to commit fest.  Testing on different platforms and with
> different build configurations would be useful.

After applying these patches cleanly on top of
0b554e4e63a4ba4852c01951311713e23acdae02 and running "./configure
--enable-tap-tests --with-tcl --with-python --with-perl --with-ldap
--with-icu && make && make check-world" I saw this failure:

cd . && 
TESTDIR='/home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_archivecleanup'
PATH="/home/travis/build/postgresql-cfbot/postgresql/tmp_install/usr/local/pgsql/bin:$PATH"
LD_LIBRARY_PATH="/home/travis/build/postgresql-cfbot/postgresql/tmp_install/usr/local/pgsql/lib"
PGPORT='65432' 
PG_REGRESS='/home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_archivecleanup/../../../src/test/regress/pg_regress'
/usr/bin/prove -I ../../../src/test/perl/ -I .  t/*.pl
t/010_pg_archivecleanup.pl .. 1/42
#   Failed test 'fails if restart file name is not specified: matches'
#   at 
/home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_archivecleanup/../../../src/test/perl/TestLib.pm
line 330.
#   'pg_archivecleanup: must specify oldest kept WAL file
# Try "pg_archivecleanup --help" for more information.
# '
# doesn't match '(?^:must specify restartfilename)'
#   Failed test 'fails with too many parameters: matches'
#   at 
/home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_archivecleanup/../../../src/test/perl/TestLib.pm
line 330.
#   'pg_archivecleanup: too many command-line arguments
# Try "pg_archivecleanup --help" for more information.
# '
# doesn't match '(?^:too many parameters)'
#   Failed test 'fails with invalid restart file name: matches'
#   at 
/home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_archivecleanup/../../../src/test/perl/TestLib.pm
line 330.
#   'pg_archivecleanup: invalid file name argument
# Try "pg_archivecleanup --help" for more information.
# '
# doesn't match '(?^:invalid filename)'
# Looks like you failed 3 tests of 42.
t/010_pg_archivecleanup.pl .. Dubious, test returned 3 (wstat 768, 0x300)
Failed 3/42 subtests
Test Summary Report
---
t/010_pg_archivecleanup.pl (Wstat: 768 Tests: 42 Failed: 3)
  Failed tests:  12, 16, 18
  Non-zero exit status: 3
Files=1, Tests=42,  0 wallclock secs ( 0.03 usr  0.00 sys +  0.05 cusr
 0.00 csys =  0.08 CPU)
Result: FAIL

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Sokolov Yura

On 2017-09-06 07:23, Tom Lane wrote:

Jeff Janes  writes:
What scale factor and client count? How many cores per socket?  It 
looks
like Sokolov was just starting to see gains at 200 clients on 72 
cores,

using -N transaction.

This means that Sokolov's proposed changes in atomics/generic.h
ought to be uninteresting for performance on any platform we care about 
--- at

least for pg_atomic_fetch_add_u32().


May I cite you this way:
"Tom Lane says PostgreSQL core team doesn't care for 4 socket 72 core
Intel Xeon servers"
???

To be honestly, I didn't measure exact impact of changing 
pg_atomic_fetch_add_u32.

I found issue with pg_atomic_fetch_or_u32, cause it is highly contended
both in LWLock, and in buffer page state management. I found changing
loop for generic pg_atomic_fetch_or_u32 already gives improvement.
And I changed loop for other generic atomic functions to make
all functions uniform.

It is bad style guide not to changing worse (but correct) code into
better (and also correct) just because you can't measure difference
(in my opinion. Perhaps i am mistaken).

(and yes: gcc intrinsic gives more improvement).

--
With regards,
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Simon Riggs
On 5 September 2017 at 21:23, Tom Lane  wrote:
> Jeff Janes  writes:
>> What scale factor and client count? How many cores per socket?  It looks
>> like Sokolov was just starting to see gains at 200 clients on 72 cores,
>> using -N transaction.

...
> Moreover, it matters which primitive you're testing, on which platform,
> with which compiler, because we have a couple of layers of atomic ops
> implementations.
...

I think Sokolov was aiming at 4-socket servers specifically, rather
than as a general performance gain.

If there is no gain on 2-socket, at least there is no loss either.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Copyright in partition.h and partition.c

2017-09-06 Thread Daniel Gustafsson
> On 06 Sep 2017, at 02:56, Amit Langote  wrote:
> 
> On 2017/09/05 21:14, Tom Lane wrote:
>> Amit Langote  writes:
>>> On 2017/09/05 15:48, Etsuro Fujita wrote:
 Here is the copyright in partition.h:
 
  * Copyright (c) 2007-2017, PostgreSQL Global Development Group
 
 I think it's reasonable that that matches the copyright in partition.c,
 but partition.c has:
 
  * Portions Copyright (c) 1996-2017, PostgreSQL Global Development Group
  * Portions Copyright (c) 1994, Regents of the University of California
 
 Is that intentional?
>> 
>>> No, it's unintentional.  The difference may have resulted from copying
>>> different files to become partition.h and partition.c, respectively.
>> 
>>> Maybe, we should change both to say 2016-2017?
>> 
>>> I don't know the exact rule for how we determine those years.  Is there
>>> some rule in place about that?  When I look at execParallel.c, which
>>> supposedly got introduced into the tree recently, I see 1996-2017.  OTOH,
>>> the files in contrib/bloom all have 2016-2017.
>> 
>> Our usual practice is to write the copyright like it is in partition.c
>> even in new files.  This avoids any question about whether any of the
>> code was copied-and-pasted from somewhere else in PG.  Even if not one
>> word in the file can be traced to code that was somewhere else before,
>> it seems to me that this is an appropriate thing to do, to give due
>> credit to those who came before us.
> 
> Agreed.
> 
>> In short: we should make partition.h's copyright look like partition.c's
>> not vice versa.
> 
> Attached patch does that.

This reminded me that I’d seen one of these before while hacking, and with some
grep and xargs abuse I spotted one more (there might be more that my command
line fu didn’t catch though).  Attached could perhaps be included with the
above patch?

Perhaps the copyright script should be expanded to catch these?  (and I
volunteer to attempt that unless it’s deemed an uninteresting feature)

cheers ./daniel



header_copyright.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-09-06 Thread Jeevan Ladhe
Hi Ashutosh,

I have tried to address your comments in V27 patch series[1].
Please find my comments inlined.


> >>
> >> The current set of patches contains 6 patches as below:
> >>
> >> 0001:
> >> Refactoring existing ATExecAttachPartition  code so that it can be used
> >> for
> >> default partitioning as well
>

If I understand correctly these comments refer to patch 0001 of V25_rebase
series, which is related to "Fix assumptions that get_qual_from_partbound()"
and not refactoring. This is patch 0001 in V27 now.

  * Returns an expression tree describing the passed-in relation's partition
> - * constraint.
> + * constraint. If there are no partition constraints returns NULL e.g. in
> case
> + * default partition is the only partition.
> The first sentence uses singular constraint. The second uses plural. Given
> that
> partition bounds together form a single constraint we should use singular
> constraint in the second sentence as well.
>

I have changed the wording now.


>
> Do we want to add a similar comment in the prologue of
> generate_partition_qual(). The current wording there seems to cover this
> case,
> but do we want to explicitly mention this case?
>

I have added a comment there.


>
> +if (!and_args)
> +result = NULL;
> While this is correct, I am increasingly seeing (and_args != NIL) usage.
>

Changed this to:
+   if (and_args == NIL)
+   result = NULL;


> get_partition_qual_relid() is called from pg_get_partition_
> constraintdef(),
> constr_expr = get_partition_qual_relid(relationId);
>
> /* Quick exit if not a partition */
> if (constr_expr == NULL)
> PG_RETURN_NULL();
> The comment is now wrong since a default partition may have no
> constraints. May
> be rewrite it as simply, "Quick exit if no partition constraint."
>

Fixed.


>
> generate_partition_qual() has three callers and all of them are capable of
> handling NIL partition constraint for default partition. May be it's
> better to
> mention in the commit message that we have checked that the callers of
> this function
> can handle NIL partition constraint.
>

Added in commit message.


> >>
> >> 0002:
> >> This patch teaches the partitioning code to handle the NIL returned by
> >> get_qual_for_list().
> >> This is needed because a default partition will not have any constraints
> >> in case
> >> it is the only partition of its parent.
>

Comments below refer to patch 0002 in V25_rebase(0003 in V25), which
adds basic support for default partition, which is now 0002 in V27.


> If the partition being dropped is the default partition,
> heap_drop_with_catalog() locks default partition twice, once as the default
> partition and the second time as the partition being dropped. So, it will
> be
> counted as locked twice. There doesn't seem to be any harm in this, since
> the
> lock will be help till the transaction ends, by when all the locks will be
> released.
>

 Fixed.


> Same is the case with cache invalidation message. If we are dropping
> default
> partition, the cache invalidation message on "default partition" is not
> required. Again this might be harmless, but better to avoid it.


Fixed.


> Similar problems exists with ATExecDetachPartition(), default partition
> will be
> locked twice if it's being detached.
>

In ATExecDetachPartition() we do not have OID of the relation being
detached
available at the time we lock default partition. Moreover, here we are
taking an
exclusive lock on default OID and an share lock on partition being detached.
As you correctly said in your earlier comment that it will be counted as
locked
twice, which to me also seems harmless. As these locks are to be held till
commit of the transaction nobody else is supposed to be releasing these
locks in
between. I am not able to visualize a problem here, but still I have tried
to
avoid locking the default partition table twice, please review the changes
and
let me know your thoughts.


> +/*
> + * If this is a default partition, pg_partitioned_table must have
> it's
> + * OID as value of 'partdefid' for it's parent (i.e. rel) entry.
> + */
> +if (castNode(PartitionBoundSpec, boundspec)->is_default)
> +{
> +Oidpartdefid;
> +
> +partdefid = get_default_partition_oid(RelationGetRelid(rel));
> +Assert(partdefid == inhrelid);
> +}
> Since an accidental change or database corruption may change the default
> partition OID in pg_partition_table. An Assert won't help in such a case.
> May
> be we should throw an error or at least report a warning. If we throw an
> error,
> the table will become useless (or even the database will become useless
> RelationBuildPartitionDesc is called from RelationCacheInitializePhase3()
> on
> such a corrupted table). To avoid that we may raise a warning.
>
> I have added a warning here instead of Assert.


> I am wondering whether we could avoid call to 

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Sokolov Yura

On 2017-09-06 14:54, Sokolov Yura wrote:

On 2017-09-06 07:23, Tom Lane wrote:

Jeff Janes  writes:
What scale factor and client count? How many cores per socket?  It 
looks
like Sokolov was just starting to see gains at 200 clients on 72 
cores,

using -N transaction.

This means that Sokolov's proposed changes in atomics/generic.h
ought to be uninteresting for performance on any platform we care 
about --- at

least for pg_atomic_fetch_add_u32().


May I cite you this way:


I apologize for tone of my previous message.
My tongue is my enemy.

--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] path toward faster partition pruning

2017-09-06 Thread Amit Langote
On 2017/09/04 10:10, Amit Langote wrote:
> On 2017/09/02 2:52, Robert Haas wrote:
>> It strikes me that this patch set is doing two things but maybe in the
>> opposite order that I would have chosen to attack them.  First,
>> there's getting partition pruning to use something other than
>> constraint exclusion.  Second, there's deferring work that is
>> currently done at an early stage of the process until later, so that
>> we waste less effort on partitions that are ultimately going to be
>> pruned.
> 
> OK.
> 
>> Therefore, IMHO, it would be best to focus first on how we're going to
>> identify the partitions that survive pruning, and then afterwards work
>> on transposing that logic to happen before partitions are opened and
>> locked.  That way, we get some incremental benefit sooner, and also
>> unblock some other development work.
> 
> Alright, I will try to do it that way.

Attached set of patches that does things that way.  Individual patches
described below:

[PATCH 1/5] Expand partitioned inheritance in a non-flattened manner

This will allow us to perform scan and join planning in a per partition
sub-tree manner, with each sub-tree's root getting its own RelOptInfo.
Previously, only the root of the whole partition tree would get a
RelOptInfo, along with the leaf partitions, with each leaf partition's
AppendRelInfo pointing to the former as its parent.

This is essential, because we will be doing partition-pruning for every
partitioned table in the tree by matching query's scan keys with its
partition key.  We won't be able to do that if the intermediate
partitioned tables didn't have a RelOptInfo.

[PATCH 2/5] WIP: planner-side changes for partition-pruning

This patch adds a stub get_partitions_for_keys in partition.c with a
suitable interface for the caller to pass bounding keys extracted from the
query and other related information.

Importantly, it implements the logic within the planner to match query's
scan keys to a parent table's partition key and form the bounding keys
that will be passed to partition.c to compute the list of partitions that
satisfy those bounds.

Also, it adds certain fields to RelOptInfos of the partitioned tables that
reflect its partitioning properties.

[PATCH 3/5] WIP: Interface changes for partition_bound_{cmp/bsearch}

This guy modifies the partition bound comparison function so that the
caller can pass incomplete partition key tuple that is potentially a
prefix of a multi-column partition key.  Currently, the input tuple must
contain all of key->partnatts values, but that may not be true for
queries, which may not have restrictions on all the partition key columns.

[PATCH 4/5] WIP: Implement get_partitions_for_keys()

This one fills the get_partitions_for_keys stub with the actual logic that
searches the partdesc->boundinfo for partition bounds that match the
bounding keys specified by the caller.

[PATCH 5/5] Add more tests for the new partitioning-related planning

More tests.


Some TODO items still remain to be done:

* Process OR clauses to use for partition-pruning
* Process redundant clauses (such as a = 10 and a > 1) more smartly
* Other tricks that are missing
* Fix bugs
* More tests

Thanks,
Amit
From 17dfaff62fe04cf18f5bba298974d42f92b597ef Mon Sep 17 00:00:00 2001
From: amit 
Date: Wed, 6 Sep 2017 09:28:14 +0900
Subject: [PATCH 1/5] Expand partitioned inheritance in a non-flattened manner

...except when the partitioned table in question is the result rel
of the query.

This allows us perform scan and join planning for each sub-tree in a
given partition tree, with each sub-tree's root partitioned table
getting its own RelOptInfo.  Previously only the root of the whole
partition tree got a RelOptInfo, along with the leaf partitions,
with each leaf partition's AppendRelInfo pointing to the former as
its parent.
---
 src/backend/optimizer/path/allpaths.c  |  34 ++-
 src/backend/optimizer/plan/planner.c   |   3 +-
 src/backend/optimizer/prep/prepunion.c | 166 +
 src/backend/optimizer/util/plancat.c   |   7 +-
 4 files changed, 146 insertions(+), 64 deletions(-)

diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 2d7e1d84d0..6c3511bd47 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1289,11 +1289,39 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo 
*rel,
RangeTblEntry *rte;
 
rte = planner_rt_fetch(rel->relid, root);
+
+   /*
+* Get the partitioned_rels list from root->pcinfo_list after
+* confirming that rel is actually a root partitioned table.
+*/
if (rte->relkind == RELKIND_PARTITIONED_TABLE)
{
-   partitioned_rels = get_partitioned_child_rels(root, rel->relid);
-   /* The root partitioned table is included as a child rel */
-   Assert(list_length(partitioned_rels) >= 1);
+   

Re: [HACKERS] document and use SPI_result_code_string()

2017-09-06 Thread Tom Lane
Michael Paquier  writes:
> Fine for 0002. This reminds me of LockGXact and RemoveGXact in
> twophase.c, as well as _hash_squeezebucket that have some code paths
> that cannot return... Any thoughts about having some kind of
> PG_NOTREACHED defined to 0 which could be put in an assertion?

Generally we just do "Assert(false)", maybe with "not reached" in a
comment.  I don't feel a strong need to invent a new way to do that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] <> join selectivity estimate question

2017-09-06 Thread Tom Lane
Simon Riggs  writes:
> Why isn't this an open item for PG10?

Why should it be?  This behavior has existed for a long time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Sokolov Yura  writes:
> On 2017-09-06 15:56, Tom Lane wrote:
>> The point I'm trying to make is that if tweaking generic.h improves
>> performance then it's an indicator of missed cases in the less-generic
>> atomics code, and the latter is where our attention should be focused.
>> I think basically all of the improvement Sokolov got was from upgrading
>> the coverage of generic-gcc.h.

> Not exactly. I've checked, that new version of generic 
> pg_atomic_fetch_or_u32
> loop also gives improvement.

But once you put in the generic-gcc version, that's not reached anymore.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] <> join selectivity estimate question

2017-09-06 Thread Simon Riggs
On 6 September 2017 at 04:14, Ashutosh Bapat
 wrote:
> On Fri, Jul 21, 2017 at 4:10 AM, Thomas Munro
>  wrote:
>>
>> Thanks.  Bearing all that in mind, I ran through a series of test
>> scenarios and discovered that my handling for JOIN_ANTI was wrong: I
>> thought that I had to deal with inverting the result, but I now see
>> that that's handled elsewhere (calc_joinrel_size_estimate() I think).
>> So neqjoinsel should just treat JOIN_SEMI and JOIN_ANTI exactly the
>> same way.
>
> I agree, esp. after looking at eqjoinsel_semi(), which is used for
> both semi and anti joins, it becomes more clear.
>
>>
>> That just leaves the question of whether we should try to handle the
>> empty RHS and single-value RHS cases using statistics.  My intuition
>> is that we shouldn't, but I'll be happy to change my intuition and
>> code that up if that is the feedback from planner gurus.
>
> Empty RHS can result from dummy relations also, which are produced by
> constraint exclusion, so may be that's an interesting case. Single
> value RHS may be interesting with partitioned table with all rows in a
> given partition end up with the same partition key value. But may be
> those are just different patches. I am not sure.
>
>>
>> Please find attached a new version, and a test script I used, which
>> shows a bunch of interesting cases.  I'll add this to the commitfest.
>
> I added some "stable" tests to your patch taking inspiration from the
> test SQL file. I think those will be stable across machines and runs.
> Please let me know if those look good to you.

Why isn't this an open item for PG10?

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Simon Riggs  writes:
> On 5 September 2017 at 21:23, Tom Lane  wrote:
>> Moreover, it matters which primitive you're testing, on which platform,
>> with which compiler, because we have a couple of layers of atomic ops
>> implementations.

> If there is no gain on 2-socket, at least there is no loss either.

The point I'm trying to make is that if tweaking generic.h improves
performance then it's an indicator of missed cases in the less-generic
atomics code, and the latter is where our attention should be focused.
I think basically all of the improvement Sokolov got was from upgrading
the coverage of generic-gcc.h.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Sokolov Yura

On 2017-09-06 15:56, Tom Lane wrote:

Simon Riggs  writes:

On 5 September 2017 at 21:23, Tom Lane  wrote:
Moreover, it matters which primitive you're testing, on which 
platform,

with which compiler, because we have a couple of layers of atomic ops
implementations.



If there is no gain on 2-socket, at least there is no loss either.


The point I'm trying to make is that if tweaking generic.h improves
performance then it's an indicator of missed cases in the less-generic
atomics code, and the latter is where our attention should be focused.
I think basically all of the improvement Sokolov got was from upgrading
the coverage of generic-gcc.h.

regards, tom lane


Not exactly. I've checked, that new version of generic 
pg_atomic_fetch_or_u32

loop also gives improvement. Without that check I'd not suggest to fix
generic atomic functions. Of course, gcc intrinsic gives more gain.

--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fix bloom WAL tap test

2017-09-06 Thread Alexander Korotkov
Hi!

I just realized that these lines of contrib/bloom/t/001_wal.pl don't check
that queries give same results on master and standby.  They just check that
*return codes* of psql are equal.

# Run test queries and compare their result
> my $master_result = $node_master->psql("postgres", $queries);
> my $standby_result = $node_standby->psql("postgres", $queries);
> is($master_result, $standby_result, "$test_name: query result matches");


Attached patch fixes this problem by using safe_psql() which returns psql
output instead of return code.  For safety, this patch replaces psql() with
safe_psql() in other places too.

I think this should be backpatched to 9.6 as bugfix.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


fix-bloom-wal-check.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add psql variables showing server version and psql version.

2017-09-06 Thread Tom Lane
Simon Riggs  writes:
> Based upon input from Tom and Fabien, I propose this additional doc patch.

I do not think any of this is appropriate, particularly not the reference
to 7.0.3.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Is anything preventing us from allowing write to foreign tables from standby?

2017-09-06 Thread Alexander Korotkov
Hi!

We're currently blocking writing queries on standby if even they are
modifying contents of foreign tables.  But do we have serious reasons for
that?
Keeping in the mind FDW-sharding, making FDW-tables writable from standby
would be good to prevent single-master bottleneck.
I wrote simple patch enabling writing to foreign tables from standbys.  It
works from the first glance for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


standby-fdw-writable.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Walsender timeouts and large transactions

2017-09-06 Thread Yura Sokolov
I've changed to "need review" to gain more attention from other.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

2017-09-06 Thread Alvaro Herrera
Peter Eisentraut wrote:

> I propose splitting the single Perl script into three separate test
> files: one for basic command-line option handling and so on (I would
> like to expand that later), one for the main upgrade test, and one for
> the funny database names tests.

Check.

> We also need to have a plan for handling the build farm.  Maybe keep the
> vcregress.pl upgradecheck target as a thin wrapper for the time being?

The buildfarm already runs "make check" in src/bin/ when TAP tests are
enabled, which should be enough to trigger the rewritten test, no?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

2017-09-06 Thread Peter Eisentraut
On 4/14/17 02:00, Michael Paquier wrote:
> Attached is an updated patch to use --no-sync with pg_dumpall calls.

Please send a rebased patch.

I propose splitting the single Perl script into three separate test
files: one for basic command-line option handling and so on (I would
like to expand that later), one for the main upgrade test, and one for
the funny database names tests.

In the testing file, you removed the paragraph that explains how to do
cross-version upgrade testing.  It's unfortunate that we would lose that
functionality.  What can we do about that?

We also need to have a plan for handling the build farm.  Maybe keep the
vcregress.pl upgradecheck target as a thin wrapper for the time being?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add psql variables showing server version and psql version.

2017-09-06 Thread Simon Riggs
On 5 September 2017 at 11:58, Fabien COELHO  wrote:
>
> Hello Simon,
>
>> Does raise the further question of how psql behaves when we connect to
>> a pre-10 server, so we have SERVER_VERSION_NUM but yet it is not set.
>> How does this
>> \if SERVER_VERSION_NUM < x
>
>
> The if does not have expressions (yet), it just handles TRUE/ON/1 and
> FALSE/0/OFF.
>
>> Do we need some macro or suggested special handling?
>
>
> If "SERVER_VERSION_NUM" is available in 10, then:
>
>   -- exit if version < 10 (\if is ignored and \q is executed)
>   \if false \echo "prior 10" \q \endif
>
>   -- then test version through a server side expression, will work
>   SELECT :SERVER_VERSION_NUM < 11 AS prior_11 \gset
>   \if :prior_11
> -- version 10
>   \else
> -- version 11 or more
>   \endif


Based upon input from Tom and Fabien, I propose this additional doc patch.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


clarify_psql_server_version.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Sokolov Yura

On 2017-09-06 16:36, Tom Lane wrote:

Sokolov Yura  writes:

On 2017-09-06 15:56, Tom Lane wrote:

The point I'm trying to make is that if tweaking generic.h improves
performance then it's an indicator of missed cases in the 
less-generic
atomics code, and the latter is where our attention should be 
focused.
I think basically all of the improvement Sokolov got was from 
upgrading

the coverage of generic-gcc.h.



Not exactly. I've checked, that new version of generic
pg_atomic_fetch_or_u32
loop also gives improvement.


But once you put in the generic-gcc version, that's not reached 
anymore.




Yes, you're right.

But I think, generic version still should be "fixed".
If generic version is not reached on any platform, then why it is kept?
If it is reached somewhere, then it should be improved.

--
Sokolov Yura aka funny_falcon
Postgres Professional: https://postgrespro.ru
The Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-09-06 Thread Robert Haas
On Tue, Sep 5, 2017 at 1:38 PM, Tom Lane  wrote:
> The complaint I have about PRAGMA is that it's yet another syntax for
> accomplishing pretty much the same thing.  If you don't like the GUC
> solution, we've already got the "comp_option" syntax for static options
> in plpgsql.  Sure, that's not too pretty, but that's not a good reason
> to invent yet another way to do it.

On the general question of whether we should have something like this,
I expressed a lot of doubt when
e6faf910d75027bdce7cd0f2033db4e912592bcc first went in about whether
that algorithm was really going to work, and nothing has happened
since then to remove any of that doubt.  It's pretty clear to me from
experiences with customer problems that the heuristics we have often
fail to do the right thing, either because queries with no hope of
benefiting still replan 5 times - which can waste a ton of CPU when
there are many different queries in the plan cache and many sessions -
or because queries that would benefit in some cases give up on
replanning before they hit a case where a parameter-specific plan
helps.  I don't think we can just indefinitely continue to resist
providing manual control over this behavior on the theory that some
day we'll fix it.  It's been six years and we haven't made any
significant progress.  In some cases, a long delay without any
progress might just point to a lack of effort that should have been
applied, but in this case I think it's because the problem is
incredibly hard.

I think what we ideally want to do is notice whether the new bind
variables cause a change in selectivity which is sufficient to justify
a re-plan.  If we annotated the original plan with markers indicating
that it was valid for all values with a frequency of more than X and
less than Y, for example, we could cover most cases involving
equality; range queries would need some other kind of annotation.
However, it's unclear how the planner could produce such annotations,
and it's unclear how expensive checking against them would be if we
had them.  Barring somebody having a brilliant insight about how to
make some system that's way better than what we have right now, I
think we can't hold out much hope of any better fix than a manual
knob.

I think a GUC is a decent, though not perfect, mechanism for this.
This problem isn't restricted to PL/pgsql; indeed, the cases I've seen
have come via prepared queries, not PL/pgsql functions.  Even without
that, one advantage of a GUC is that they are fairly broadly
understood and experienced users understand what they can do with
them.  They can be set at various different scopes (system, user,
database, SET clause for a particular function) and it's relatively
convenient to do so.  Some kind of PL/pgsql-specific PRAGMA syntax is
more likely to be overlooked by users who would actually benefit from
it, and also won't cover non-PL/pgsql cases.  If we were going to go
the PRAGMA route, it would make more sense to me to define that as a
way of setting a GUC for the scope of one PL/pgsql block, like PRAGMA
SETTING(custom_plan_tries, 0).  I think it is in general unfortunate
that we don't have a mechanism to change a GUC for the lifespan of one
particular query, like this:

LET custom_plan_tries = 0 IN SELECT ...

I bet a lot of people would find that quite convenient.  The problem
of needing to set a planner GUC for one particular query is pretty
common, and Pavel is absolutely right that having to do SET beforehand
and RESET afterward is ugly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix bloom WAL tap test

2017-09-06 Thread Alexander Korotkov
On Wed, Sep 6, 2017 at 4:08 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> I just realized that these lines of contrib/bloom/t/001_wal.pl don't
> check that queries give same results on master and standby.  They just
> check that *return codes* of psql are equal.
>
> # Run test queries and compare their result
>> my $master_result = $node_master->psql("postgres", $queries);
>> my $standby_result = $node_standby->psql("postgres", $queries);
>> is($master_result, $standby_result, "$test_name: query result matches");
>
>
> Attached patch fixes this problem by using safe_psql() which returns psql
> output instead of return code.  For safety, this patch replaces psql() with
> safe_psql() in other places too.
>
> I think this should be backpatched to 9.6 as bugfix.
>

Also, it would be nice to call wal-check on bloom check (now wal-check
isn't called even on check-world).
See attached patch.  My changes to Makefile could be cumbersome.  Sorry for
that, I don't have much experience with them...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


wal-check-on-bloom-check.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

2017-09-06 Thread Tom Lane
Alvaro Herrera  writes:
> Peter Eisentraut wrote:
>> We also need to have a plan for handling the build farm.  Maybe keep the
>> vcregress.pl upgradecheck target as a thin wrapper for the time being?

> The buildfarm already runs "make check" in src/bin/ when TAP tests are
> enabled, which should be enough to trigger the rewritten test, no?

I think Peter's on about the Windows case.  Not sure how that's handled,
but it's not "make check".

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-09-06 Thread Tom Lane
Robert Haas  writes:
> I don't think we can just indefinitely continue to resist
> providing manual control over this behavior on the theory that some
> day we'll fix it.

That's fair enough.  We need to have a discussion about exactly what
the knob does, which is distinct from the question of how you spell
the incantation for twiddling it.  I'm dubious that a dumb "force a
custom plan" setting is going to solve all that many cases usefully.

> I think a GUC is a decent, though not perfect, mechanism for this.
> This problem isn't restricted to PL/pgsql; indeed, the cases I've seen
> have come via prepared queries, not PL/pgsql functions.

That's 100% correct, and is actually the best reason not to consider
a PRAGMA (or any other plpgsql-specific mechanism) as the incantation
spelling.

> I think it is in general unfortunate that we don't have a mechanism to
> change a GUC for the lifespan of one particular query, like this:

> LET custom_plan_tries = 0 IN SELECT ...

Hmm.  I think the core problem here is that we're trying to control
the plancache, which is a pretty much behind-the-scenes mechanism.
Except in the case of an explicit PREPARE, you can't even see from
SQL that the cache is being used, or when it's used.  So part of what
needs to be thought about, if we use the GUC approach, is when the
GUC's value is consulted.  If we don't do anything special then
the GUC(s) would be consulted when retrieving plans from the cache,
and changes in their values from one retrieval to the next might
cause funny behavior.  Maybe the relevant settings need to be captured
when the plancache entry is made ... not sure.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-09-06 Thread Robert Haas
On Wed, Sep 6, 2017 at 11:03 AM, Tom Lane  wrote:
> That's fair enough.  We need to have a discussion about exactly what
> the knob does, which is distinct from the question of how you spell
> the incantation for twiddling it.  I'm dubious that a dumb "force a
> custom plan" setting is going to solve all that many cases usefully.

I think what people need is the ability to force the behavior in
either direction - either insist on a custom plan, or insist on a
generic plan.  The former is what you need if the plan stinks, and the
latter is what you need if replanning is a waste of effort.  I have
seen both cases.  The latter has been a bigger problem than the
former, because the former can be hacked around in various ugly and
inefficient ways, but if we're adding a knob I think it should have
three settings.  There is perhaps an argument for even more
configurability, like altering the number of plan tries from 5 to some
other value, but I'm not clear that there's any use case for values
other than 0, 5, and +infinity.

>> I think it is in general unfortunate that we don't have a mechanism to
>> change a GUC for the lifespan of one particular query, like this:
>
>> LET custom_plan_tries = 0 IN SELECT ...
>
> Hmm.  I think the core problem here is that we're trying to control
> the plancache, which is a pretty much behind-the-scenes mechanism.
> Except in the case of an explicit PREPARE, you can't even see from
> SQL that the cache is being used, or when it's used.  So part of what
> needs to be thought about, if we use the GUC approach, is when the
> GUC's value is consulted.  If we don't do anything special then
> the GUC(s) would be consulted when retrieving plans from the cache,
> and changes in their values from one retrieval to the next might
> cause funny behavior.  Maybe the relevant settings need to be captured
> when the plancache entry is made ... not sure.

What sort of funny behavior are you concerned about?  It seems likely
to me that in most cases the GUC will have the same value every time
through, but if it doesn't, I'm not sure why we'd need to use the old
value rather than the current one.  Indeed, if the user changes the
GUC from "force custom" to "force generic" and reruns the function, we
want the new value to take effect, lest a POLA violation occur.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Andres Freund
On 2017-09-06 15:25:20 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-09-06 15:12:13 -0400, Tom Lane wrote:
> >> It looks to me like two of the three implementations promise no such
> >> thing.
> 
> > They're volatile vars, so why not?
> 
> Yeah, but so are the caller's variables.  That is, in
> 
> pg_atomic_exchange_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 xchg_)
> {
>   uint64 old;
>   old = ptr->value;
> 
> ISTM that the compiler is required to actually fetch ptr->value, not
> rely on some previous read of it.  I do not think that (the first
> version of) pg_atomic_read_u64_impl is adding any guarantee that wasn't
> there already.
> 
> >> Even if they somehow do, it hardly matters given that the cmpxchg loop
> >> would be self-correcting.
> 
> > Well, in this one instance maybe, hardly in others.
> 
> All the functions involved use nigh-identical cmpxchg loops.
> 
> > What are you suggesting as an alternative?
> 
> I think we can just use "old = ptr->value" to set up for the cmpxchg
> loop in every generic.h function that uses such a loop.

I think we might have been talking past each other - I thought you were
talking about changing the pg_atomic_read_u64_impl implementation for
external users.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Robert Haas
On Wed, Sep 6, 2017 at 3:18 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Sep 6, 2017 at 1:47 PM, Tom Lane  wrote:
>>> If somebody's applying apply_projection_to_path to a path that's already
>>> been add_path'd, that's a violation of the documented restriction.
>
>> /me is confused.  Isn't that exactly what grouping_planner() is doing,
>> and has done ever since your original pathification commit
>> (3fc6e2d7f5b652b417fa6937c34de2438d60fa9f)?  It's iterating over
>> current_rel->pathlist, so surely everything in there has been
>> add_path()'d.
>
> I think the assumption there is that we no longer care about validity of
> the input Relation, since we won't be looking at it any more (and
> certainly not adding more paths to it).  If there's some reason why
> that's not true, then maybe grouping_planner has a bug there.

Right, that's sorta what I assumed.  But I think that thinking is
flawed in the face of parallel query, because of the fact that
apply_projection_to_path() pushes down target list projection below
Gather when possible.  In particular, as Jeff and Amit point out, it
may well be that (a) before apply_projection_to_path(), the cheapest
plan is non-parallel and (b) after apply_projection_to_path(), the
cheapest plan would be a Gather plan, except that it's too late
because we've already thrown that path out.

What we ought to do, I think, is avoid generating gather paths until
after we've applied the target list (and the associated costing
changes) to both the regular path list and the partial path list.
Then the cost comparison is apples-to-apples.  The use of
apply_projection_to_path() on every path in the pathlist would be fine
if it were adjusting all the costs by a uniform amount, but it isn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
I wrote:
> Ah.  I was not thinking of touching pg_atomic_read_u32/u64_impl,
> although now that you mention it, it's not clear to me why we
> couldn't simplify

> - return *(>value);
> + return ptr->value;

Just to check, I applied that change to pg_atomic_read_u32_impl and
pg_atomic_read_u64_impl, and recompiled.  I get bit-for-bit the
same backend executable.  Maybe it would have an effect on some
other compiler, but I doubt it, except perhaps at -O0.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Tom Lane
Robert Haas  writes:
> On Wed, Sep 6, 2017 at 3:41 PM, Tom Lane  wrote:
>> I'm not entirely following.  I thought that add_path was set up to treat
>> "can be parallelized" as an independent dimension of merit, so that
>> parallel paths would always survive.

> Here, the Gather path is not parallel-safe, but rather
> parallel-restricted:

Ah, right, the problem is with the Gather not its sub-paths.

>> Might be a tad messy to rearrange things that way.

> Why do you think I wanted you to do it?  :-)

I'll think about it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-06 Thread Tom Lane
Amit Langote  writes:
> On 2017/08/22 9:39, Michael Paquier wrote:
>> On Tue, Jun 27, 2017 at 4:56 PM, Amit Langote
>>  wrote:
>>> I updated brin_mask() and spg_mask() in the attached updated patches so
>>> that they consider meta pages as containing unused space.

I looked briefly at these patches.  I'm not sure that it's safe for the
mask functions to assume that meta pages always have valid pd_lower.
What happens when replaying log data concerning an old index that doesn't
have that field filled?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2017-09-06 Thread Andres Freund
On 2017-09-06 17:36:02 +0900, Kyotaro HORIGUCHI wrote:
> Hi,
> 
> At Mon, 4 Sep 2017 17:17:19 +0900, Michael Paquier 
>  wrote in 
> 

Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Andres Freund  writes:
> On 2017-09-06 15:25:20 -0400, Tom Lane wrote:
>> I think we can just use "old = ptr->value" to set up for the cmpxchg
>> loop in every generic.h function that uses such a loop.

> I think we might have been talking past each other - I thought you were
> talking about changing the pg_atomic_read_u64_impl implementation for
> external users.

Ah.  I was not thinking of touching pg_atomic_read_u32/u64_impl,
although now that you mention it, it's not clear to me why we
couldn't simplify

-   return *(>value);
+   return ptr->value;

AFAIK, the compiler is entitled to, and does, simplify away that
take-a-pointer-and-immediately-dereference-it dance.  If it did
not, a whole lot of standard array locutions would be much less
efficient than they should be.  What matters here is the volatile
qualifier, which we've already got.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Tom Lane
Robert Haas  writes:
> In particular, as Jeff and Amit point out, it
> may well be that (a) before apply_projection_to_path(), the cheapest
> plan is non-parallel and (b) after apply_projection_to_path(), the
> cheapest plan would be a Gather plan, except that it's too late
> because we've already thrown that path out.

I'm not entirely following.  I thought that add_path was set up to treat
"can be parallelized" as an independent dimension of merit, so that
parallel paths would always survive.

> What we ought to do, I think, is avoid generating gather paths until
> after we've applied the target list (and the associated costing
> changes) to both the regular path list and the partial path list.

Might be a tad messy to rearrange things that way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Andres Freund  writes:
> On 2017-09-06 15:12:13 -0400, Tom Lane wrote:
>> It looks to me like two of the three implementations promise no such
>> thing.

> They're volatile vars, so why not?

Yeah, but so are the caller's variables.  That is, in

pg_atomic_exchange_u64_impl(volatile pg_atomic_uint64 *ptr, uint64 xchg_)
{
uint64 old;
old = ptr->value;

ISTM that the compiler is required to actually fetch ptr->value, not
rely on some previous read of it.  I do not think that (the first
version of) pg_atomic_read_u64_impl is adding any guarantee that wasn't
there already.

>> Even if they somehow do, it hardly matters given that the cmpxchg loop
>> would be self-correcting.

> Well, in this one instance maybe, hardly in others.

All the functions involved use nigh-identical cmpxchg loops.

> What are you suggesting as an alternative?

I think we can just use "old = ptr->value" to set up for the cmpxchg
loop in every generic.h function that uses such a loop.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Robert Haas
On Wed, Sep 6, 2017 at 3:41 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> In particular, as Jeff and Amit point out, it
>> may well be that (a) before apply_projection_to_path(), the cheapest
>> plan is non-parallel and (b) after apply_projection_to_path(), the
>> cheapest plan would be a Gather plan, except that it's too late
>> because we've already thrown that path out.
>
> I'm not entirely following.  I thought that add_path was set up to treat
> "can be parallelized" as an independent dimension of merit, so that
> parallel paths would always survive.

It treats parallel-safety as an independent dimension of merit; a
parallel-safe plan is more meritorious than one of equal cost which is
not.  We need that so that because, for example, forming a partial
path for a join means joining a partial path to a parallel-safe path.
But that doesn't help us here; that's to make sure we can build the
necessary stuff *below* the Gather.  IOW, if we threw away
parallel-safe paths because there was a cheaper parallel-restricted
path, we might be unable to build a partial path for the join *at
all*.

Here, the Gather path is not parallel-safe, but rather
parallel-restricted: it's OK for it to exist in a plan that uses
parallelism (duh), but it can't be nested under another Gather (also
duh, kinda).  So before accounting for the differing projection cost,
the Gather path is doubly inferior: it is more expensive AND not
parallel-safe, whereas the competing non-parallel plan is both cheaper
AND parallel-safe.  After applying the expensive target list, the
parallel-safe plan gets a lot more expensive, but the Gather path gets
more expensive to a lesser degree because the projection step ends up
below the Gather and thus happens in parallel, so now the Gather plan,
still a loser on parallel-safety, is a winner on total cost and thus
ought to have been retained and, in fact, ought to have won.  Instead,
we threw it out too early.

>> What we ought to do, I think, is avoid generating gather paths until
>> after we've applied the target list (and the associated costing
>> changes) to both the regular path list and the partial path list.
>
> Might be a tad messy to rearrange things that way.

Why do you think I wanted you to do it?  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Robert Haas
On Tue, Sep 5, 2017 at 4:34 AM, Amit Kapila  wrote:
> Okay, now I understand your point, but I think we already change the
> cost of paths in apply_projection_to_path which is done after add_path
> for top level scan/join paths.

Yeah. I think that's a nasty hack, and I think it's Tom's fault.  :-)

It's used in various places with comments like this:

/*
 * The path might not return exactly what we want, so fix that.  (We
 * assume that this won't change any conclusions about which was the
 * cheapest path.)
 */

And in another place:

 * In principle we should re-run set_cheapest() here to identify the
 * cheapest path, but it seems unlikely that adding the same tlist
 * eval costs to all the paths would change that, so we don't bother.

I think these assumptions were a little shaky even before parallel
query came along, but they're now outright false, because we're not
adding the *same* tlist eval costs to all paths any more.  The
parallel paths are getting smaller costs.  That probably doesn't
matter much if the expressions in questions are things like a + b, but
when as in Jeff's example it's slow(a), then it matters a lot.

I'd feel a lot happier if Tom were to decide how this ought to be
fixed, because - in spite of some modifications by various parallel
query code - this is basically all his design and mostly his code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Andres Freund
Hi,

On 2017-09-06 14:31:26 -0400, Tom Lane wrote:
> I wrote:
> > Anyway, I don't have a big objection to applying this.  My concern
> > is more that we need to be taking a harder look at other parts of
> > the atomics infrastructure, because tweaks there are likely to buy
> > much more.
> 
> I went ahead and pushed these patches, adding __sync_fetch_and_sub
> since gcc seems to provide that on the same footing as these other
> functions.
> 
> Looking at these generic functions a bit closer, I notice that
> most of them are coded like
> 
>   old = pg_atomic_read_u32_impl(ptr);
>   while (!pg_atomic_compare_exchange_u32_impl(ptr, , old | or_))
>   /* skip */;
> 
> but there's an exception: pg_atomic_exchange_u64_impl just does
> 
>   old = ptr->value;
>   while (!pg_atomic_compare_exchange_u64_impl(ptr, , xchg_))
>   /* skip */;
> 
> That's interesting.  Why not pg_atomic_read_u64_impl there?

Because our platforms don't generally guaranatee that 64bit reads:

/*
 * 64 bit reads aren't safe on all platforms. In the generic
 * implementation implement them as a compare/exchange with 0. That'll
 * fail or succeed, but always return the old value. Possible might 
store
 * a 0, but only if the prev. value also was a 0 - i.e. harmless.
 */
pg_atomic_compare_exchange_u64_impl(ptr, , 0);

so I *guess* I decided doing an unnecessary cmpxchg wasn't useful. But
since then we've added an optimization for platforms where we know 64bit
reads have single copy atomicity. So we could change that now.


> I think that the simple read is actually okay as it stands: if we
> are unlucky enough to get a torn read, the first compare/exchange
> will fail to compare equal, and it will replace "old" with a valid
> atomically-read value, and then the next loop iteration has a chance
> to succeed.  Therefore there's no need to pay the extra cost of a
> locked read instead of an unlocked one.
> 
> However, if that's the reasoning, why don't we make all of these
> use simple reads?  It seems unlikely that a locked read is free.

We don't really use locked reads? All the _atomic_ wrapper forces is an
actual read from memory rather than a register.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add psql variables showing server version and psql version.

2017-09-06 Thread Fabien COELHO



Thus short, simple but meaningful examples which show how to do something
useful with all that in the documentation may help people take advantage
of these new features.


I don't have an objection to providing an example.  I wasn't terribly
impressed with Simon's version, but maybe we can do better.


That was just a quick idea to show how they could be used, probably it can 
be improved.


I think it should be concrete, not partly abstract; and likely it needs 
to be with \if not the variable proper.


ISTM that currently there is no "not". Maybe I do not understand your 
sentence.



Given my experience with "\d*", I'm not sure I would assume that a new
psql feature would work with older servers.


I don't recall anyone questioning whether PQserverVersion() would work
with older servers.  Being able to tell what version the server is is
sort of the point, no?  If there were a restriction on what it would
work with, I would agree that that needs to be documented.  But I
don't think "yes, it really works" is a helpful use of documentation
space.


Sure.

I can use psql without knowing that there is a libpq underneath, that it 
contains a PQserverVersion function implemented since pg7, and that the 
SERVER_VERSION_* variables introduced in pg10 or pg11 rely on that thus it 
should work with pg7/8/9 as well. It is not obvious, as a new feature 
could depend on any combination of server side functions, protocol 
extension, client library functions or client code...


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Robert Haas
On Wed, Sep 6, 2017 at 1:47 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Sep 5, 2017 at 4:34 AM, Amit Kapila  wrote:
>>> Okay, now I understand your point, but I think we already change the
>>> cost of paths in apply_projection_to_path which is done after add_path
>>> for top level scan/join paths.
>
>> Yeah. I think that's a nasty hack, and I think it's Tom's fault.  :-)
>
> Yeah, and it's also documented:
>
>  * This has the same net effect as create_projection_path(), except that if
>  * a separate Result plan node isn't needed, we just replace the given path's
>  * pathtarget with the desired one.  This must be used only when the caller
>  * knows that the given path isn't referenced elsewhere and so can be modified
>  * in-place.
>
> If somebody's applying apply_projection_to_path to a path that's already
> been add_path'd, that's a violation of the documented restriction.

/me is confused.  Isn't that exactly what grouping_planner() is doing,
and has done ever since your original pathification commit
(3fc6e2d7f5b652b417fa6937c34de2438d60fa9f)?  It's iterating over
current_rel->pathlist, so surely everything in there has been
add_path()'d.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add citext_pattern_ops to citext contrib module

2017-09-06 Thread Jacob Champion
On Tue, Jul 18, 2017 at 5:18 AM, Alexey Chernyshov
 wrote:
> Hi all,

Hi Alexey, I took a look at your patch. Builds fine here, and passes
the new tests.

I'm new to this code, so take my review with a grain of salt.

> The attached patch introduces citext_pattern_ops for citext extension type
> like text_pattern_ops for text type. Here are operators ~<~, ~<=~, ~>~, ~>=~
> combined into citext_pattern_ops operator class. These operators simply
> compare underlying citext values as C strings with memcmp() function.

Are there any cases where performing the str_tolower with the default
collation, then comparing byte-by-byte, could backfire? The added test
cases don't make use of any multibyte/accented characters, so it's not
clear to me yet what *should* be happening in those cases.

It also might be a good idea to add some test cases that compare
strings of different lengths, to exercise all the paths in
internal_citext_pattern_cmp().

> +-- test citext_pattern_cmp() function explicetily.

Spelling nitpick in the new SQL: s/explicetily/explicitly .

--Jacob


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Tom Lane
Robert Haas  writes:
> On Tue, Sep 5, 2017 at 4:34 AM, Amit Kapila  wrote:
>> Okay, now I understand your point, but I think we already change the
>> cost of paths in apply_projection_to_path which is done after add_path
>> for top level scan/join paths.

> Yeah. I think that's a nasty hack, and I think it's Tom's fault.  :-)

Yeah, and it's also documented:

 * This has the same net effect as create_projection_path(), except that if
 * a separate Result plan node isn't needed, we just replace the given path's
 * pathtarget with the desired one.  This must be used only when the caller
 * knows that the given path isn't referenced elsewhere and so can be modified
 * in-place.

If somebody's applying apply_projection_to_path to a path that's already
been add_path'd, that's a violation of the documented restriction.

It might be that we should just get rid of apply_projection_to_path and
use create_projection_path, which is less mistake-prone at the cost of
manufacturing another level of Path node.  Now that that has the dummypp
flag, it really shouldn't make any difference in terms of the accuracy of
the cost estimates.

> I'd feel a lot happier if Tom were to decide how this ought to be
> fixed, because - in spite of some modifications by various parallel
> query code - this is basically all his design and mostly his code.

I can take a look, but not right away.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER INDEX .. SET STATISTICS ... behaviour

2017-09-06 Thread Tom Lane
Simon Riggs  writes:
> Other than that, I'm good to commit.
> Any last minute objections?

The docs and comments could stand a bit closer copy-editing by a
native English speaker.  Other than that, it seemed sane in a
quick read-through.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER INDEX .. SET STATISTICS ... behaviour

2017-09-06 Thread Simon Riggs
On 6 September 2017 at 10:27, Tom Lane  wrote:
> Simon Riggs  writes:
>> Other than that, I'm good to commit.
>> Any last minute objections?
>
> The docs and comments could stand a bit closer copy-editing by a
> native English speaker.  Other than that, it seemed sane in a
> quick read-through.

Will do

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Jesper Pedersen

Hi Jeff,

On 09/05/2017 03:47 PM, Jeff Janes wrote:

I ran pgbench (-M prepared) with synchronous_commit 'on' and 'off' using
both logged and unlogged tables. Also ran an internal benchmark which
didn't show anything either.



What scale factor and client count? How many cores per socket?  It looks
like Sokolov was just starting to see gains at 200 clients on 72 cores,
using -N transaction.


I have done a run with scale factor 300, and another with 3000 on a 
2S/28C/56T/256Gb w/ 2 x RAID10 SSD machine; up to 200 clients.


I would consider the runs as "noise" as I'm seeing +-1% for all client 
counts, so nothing like Yura is seeing in [1] for the higher client counts.


I did a run with -N too using scale factor 300, using the settings in 
[1], but with same result (+-1%).


[1] 
https://www.postgresql.org/message-id/d62d7d9d473d07e172d799d5a57e7...@postgrespro.ru


Best regards,
 Jesper


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
Andres Freund  writes:
> On 2017-09-06 14:31:26 -0400, Tom Lane wrote:
>> However, if that's the reasoning, why don't we make all of these
>> use simple reads?  It seems unlikely that a locked read is free.

> We don't really use locked reads? All the _atomic_ wrapper forces is an
> actual read from memory rather than a register.

It looks to me like two of the three implementations promise no such
thing.  Even if they somehow do, it hardly matters given that the cmpxchg
loop would be self-correcting.  Mostly, though, I'm looking at the
fallback pg_atomic_read_u64_impl implementation (with a CAS), which
seems far more expensive than can be justified for this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-06 Thread Tom Lane
Robert Haas  writes:
> On Wed, Sep 6, 2017 at 1:47 PM, Tom Lane  wrote:
>> If somebody's applying apply_projection_to_path to a path that's already
>> been add_path'd, that's a violation of the documented restriction.

> /me is confused.  Isn't that exactly what grouping_planner() is doing,
> and has done ever since your original pathification commit
> (3fc6e2d7f5b652b417fa6937c34de2438d60fa9f)?  It's iterating over
> current_rel->pathlist, so surely everything in there has been
> add_path()'d.

I think the assumption there is that we no longer care about validity of
the input Relation, since we won't be looking at it any more (and
certainly not adding more paths to it).  If there's some reason why
that's not true, then maybe grouping_planner has a bug there.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Andres Freund
On 2017-09-06 15:12:13 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-09-06 14:31:26 -0400, Tom Lane wrote:
> >> However, if that's the reasoning, why don't we make all of these
> >> use simple reads?  It seems unlikely that a locked read is free.
> 
> > We don't really use locked reads? All the _atomic_ wrapper forces is an
> > actual read from memory rather than a register.
> 
> It looks to me like two of the three implementations promise no such
> thing.

They're volatile vars, so why not?


> Even if they somehow do, it hardly matters given that the cmpxchg loop
> would be self-correcting.

Well, in this one instance maybe, hardly in others.


> Mostly, though, I'm looking at the fallback pg_atomic_read_u64_impl
> implementation (with a CAS), which seems far more expensive than can
> be justified for this.

What are you suggesting as an alternative?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: Add --no-comments to skip COMMENTs with pg_dump

2017-09-06 Thread Michael Paquier
On Thu, Sep 7, 2017 at 1:43 AM, Robert Haas  wrote:
> On Wed, Sep 6, 2017 at 12:26 PM, Simon Riggs  wrote:
 I'd personally be fine with --no-whatever for any whatever that might
 be a subsidiary property of database objects.  We've got
 --no-security-labels, --no-tablespaces, --no-owner, and
 --no-privileges already, so what's wrong with --no-comments?

 (We've also got --no-publications; I think it's arguable whether that
 is the same kind of thing.)
>>>
>>> And --no-subscriptions in the same bucket.
>>
>> Yes, it is. I was suggesting that we remove those as well.

FWIW, I do too. They are useful for given application code paths.

> That seems like a non-starter to me.  I have used those options many
> times to solve real problems, and I'm sure other people have as well.
> We wouldn't have ended up with all of these options if users didn't
> want to control such things.

As there begins to be many switches of this kind and much code
duplication, I think that some refactoring into a more generic switch
infrastructure would be nicer.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_rewind: Fix some problems when copying files >2GB.

2017-09-06 Thread Peter Eisentraut
On 9/4/17 06:03, Alvaro Herrera wrote:
> Tom Lane wrote:
>> Michael Paquier  writes:
>>> I don't like breaking the abstraction of pg_log() with the existing
>>> flags with some kind of pg_debug() layer. The set of APIs present now
>>> in pg_rewind for logging is nice to have, and I think that those debug
>>> messages should be translated. So what about the attached?
>>
>> Your point about INT64_FORMAT not necessarily working with fprintf
>> is an outstanding reason not to keep it like it is.  I've not reviewed
>> this patch in detail but I think this is basically the way to fix it.
> 
> Actually this code goes throgh vsnprintf, not fprintf, which should be
> safe, so I removed that part of the comment, and pushed.

Is there a reason this was not backpatched to 9.5?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: pg_rewind: Fix some problems when copying files >2GB.

2017-09-06 Thread Michael Paquier
On Thu, Sep 7, 2017 at 10:11 AM, Peter Eisentraut
 wrote:
> On 9/4/17 06:03, Alvaro Herrera wrote:
>> Tom Lane wrote:
>>> Michael Paquier  writes:
 I don't like breaking the abstraction of pg_log() with the existing
 flags with some kind of pg_debug() layer. The set of APIs present now
 in pg_rewind for logging is nice to have, and I think that those debug
 messages should be translated. So what about the attached?
>>>
>>> Your point about INT64_FORMAT not necessarily working with fprintf
>>> is an outstanding reason not to keep it like it is.  I've not reviewed
>>> this patch in detail but I think this is basically the way to fix it.
>>
>> Actually this code goes throgh vsnprintf, not fprintf, which should be
>> safe, so I removed that part of the comment, and pushed.
>
> Is there a reason this was not backpatched to 9.5?

Indeed. Please note that cherry-picking the fix from 23c1f0a works just fine.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

2017-09-06 Thread Michael Paquier
On Wed, Sep 6, 2017 at 11:44 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Peter Eisentraut wrote:
>>> We also need to have a plan for handling the build farm.  Maybe keep the
>>> vcregress.pl upgradecheck target as a thin wrapper for the time being?
>
>> The buildfarm already runs "make check" in src/bin/ when TAP tests are
>> enabled, which should be enough to trigger the rewritten test, no?
>
> I think Peter's on about the Windows case.  Not sure how that's handled,
> but it's not "make check".

For MSVC, one can use "vcregress.bat upgradecheck". So perhaps we
could keep upgradecheck for a short time but make it a noop instead
with this patch, and then remove it once buildfarm animals are
upgraded to a newer client version? I would prefer seeing a simple
removal of upgradecheck at the end, and put all TAP tests for binaries
under the bincheck path. This feels more natural.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test

2017-09-06 Thread Michael Paquier
On Wed, Sep 6, 2017 at 10:05 PM, Peter Eisentraut
 wrote:
> Please send a rebased patch.
>
> I propose splitting the single Perl script into three separate test
> files: one for basic command-line option handling and so on (I would
> like to expand that later), one for the main upgrade test, and one for
> the funny database names tests.

That makes sense. There will be additional overhead with the creation
of an extra server though.

> In the testing file, you removed the paragraph that explains how to do
> cross-version upgrade testing.  It's unfortunate that we would lose that
> functionality.  What can we do about that?

Right, simply removing support for something which has been here for a
long time is no fun. I think that we should add in PostgresNode
objects a new bindir variable which will be used to define path to
binaries. Any new node created needs to go through init() or
init_from_backup(), so a node created with init() would set this
bindir to what pg_config in PATH reports, or to the value defined by
the caller if it is defined (let's use an option for %params). A node
created from init_from_backup() inherits the path of its root node.
This requires a bit of refactoring first. This could help also for
cross version tests out of the code core.

In the existing scripts, there are the following variables:
- oldsrc, old version's source tree
- oldbindir, old version's installed bin dir
- bindir, this version's installed bin dir.
- libdir, this version's installed lib dir
bindir and libdir are pointing to the currently installed version in
the tree, so we could do without it, no? oldbindir and oldsrc need to
be kept around to enforce the position of binaries for the old
version, as well as a proper shape of the original dump being compared
(+ drop of the past functions).

Then, for the pg_upgrade tests, let's check for ENV{oldbindir} and
enforce the bindir value of the PostgresNode to-be-upgraded. And also
for ENV{oldsrc}, first check if it is defined, and then do the
existing psql/dump changes. So one, in order to run cross-version
checks, would just need to rely on the fact that the version where
installcheck runs is the new version. Does that sound acceptable?
Looking at 5bab198, those don't run that often, but I definitely agree
that breaking something for no apparent reason is not cool either ;p

> We also need to have a plan for handling the build farm.  Maybe keep the
> vcregress.pl upgradecheck target as a thin wrapper for the time being?

Or we could make upgradecheck a noop, then remove it once all the MSVC
animals have upgraded to a newer version of the buildfarm client which
does not use upgradecheck anymore (I am fine to send a patch or a pull
request to Andrew for that).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix performance of generic atomics

2017-09-06 Thread Tom Lane
I wrote:
> Anyway, I don't have a big objection to applying this.  My concern
> is more that we need to be taking a harder look at other parts of
> the atomics infrastructure, because tweaks there are likely to buy
> much more.

I went ahead and pushed these patches, adding __sync_fetch_and_sub
since gcc seems to provide that on the same footing as these other
functions.

Looking at these generic functions a bit closer, I notice that
most of them are coded like

old = pg_atomic_read_u32_impl(ptr);
while (!pg_atomic_compare_exchange_u32_impl(ptr, , old | or_))
/* skip */;

but there's an exception: pg_atomic_exchange_u64_impl just does

old = ptr->value;
while (!pg_atomic_compare_exchange_u64_impl(ptr, , xchg_))
/* skip */;

That's interesting.  Why not pg_atomic_read_u64_impl there?

I think that the simple read is actually okay as it stands: if we
are unlucky enough to get a torn read, the first compare/exchange
will fail to compare equal, and it will replace "old" with a valid
atomically-read value, and then the next loop iteration has a chance
to succeed.  Therefore there's no need to pay the extra cost of a
locked read instead of an unlocked one.

However, if that's the reasoning, why don't we make all of these
use simple reads?  It seems unlikely that a locked read is free.

If there's actually a reason for pg_atomic_exchange_u64_impl to be
different from the rest, it needs to have a comment explaining why.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Add psql variables showing server version and psql version.

2017-09-06 Thread Tom Lane
Fabien COELHO  writes:
> Thus short, simple but meaningful examples which show how to do something 
> useful with all that in the documentation may help people take advantage 
> of these new features.

I don't have an objection to providing an example.  I wasn't terribly
impressed with Simon's version, but maybe we can do better.  I think
it should be concrete, not partly abstract; and likely it needs to be
with \if not the variable proper.

> Given my experience with "\d*", I'm not sure I would assume that a new 
> psql feature would work with older servers.

I don't recall anyone questioning whether PQserverVersion() would work
with older servers.  Being able to tell what version the server is is
sort of the point, no?  If there were a restriction on what it would
work with, I would agree that that needs to be documented.  But I
don't think "yes, it really works" is a helpful use of documentation
space.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-09-06 Thread Thomas Munro
On Fri, Sep 1, 2017 at 1:51 PM, Haribabu Kommi  wrote:
> Here I attached new set of patches that are rebased to the latest master.

Hi Haribabu,

Could you please post a new rebased version?
0007-Scan-functions-are-added-to-storage-AM.patch conflicts with
recent changes.

Thanks!

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-06 Thread Michael Paquier
On Thu, Sep 7, 2017 at 10:42 AM, Amit Langote
 wrote:
> I too tend to think that any users who use this masking facility would
> know to expect to get these failures on upgraded clusters with invalid
> pd_lower in meta pages.

Yes, I don't think that an optimization reducing WAL that impacts all
users should be stopped by a small set of users who use an option for
development purposes.

> (PS: I wonder if it is reasonable to allow configuring the error level
> used when a masking failure occurs?  Currently, checkXLogConsistency()
> will abort the process (FATAL))

It definitely is worth it in my opinion, perhaps with an on/off switch
to trigger a warning instead. The reason why we use FATAL now is to
trigger more easily red flags for any potential buildfarm runs: a
startup process facing FATAL takes down the standby.

>> Perhaps we should document this point for wal_consistency_check?
>
> Do you mean permanently under wal_consistency_check parameter
> documentation or in the release notes under incompatibilities for the
> affected index types?

Under the parameter itself, in the spirit of a don't-do-that from an
upgraded instance.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-06 Thread Peter Geoghegan
On Wed, Sep 6, 2017 at 2:47 PM, Thomas Munro
 wrote:
>> I attach a patch to remove replacement selection, which I'll submit to CF 1.
>
> This breaks the documentation build, because
> doc/src/sgml/release-9.6.sgml still contains  linkend="guc-replacement-sort-tuples"> but you removed that id.

Thanks for looking into it.

I suppose that the solution is to change the 9.6 release notes to no
longer have a replacement_sort_tuples link. Anyone else have an
opinion on that?

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Red-Black tree traversal tests

2017-09-06 Thread Tom Lane
[ btw, please don't cc pgsql-hackers-owner, the list moderators don't
need the noise ]

Aleksander Alekseev  writes:
> I would say that this patch is in a pretty good shape now. And I see a
> 99% code coverage of rbtree.c. Let's see what committers think.

I took a quick look through the patch --- haven't tried to compile it
or anything like that --- and have a few comments:

* There's some typos, eg extention should be extension, triversal
should be traversal.  Maybe try a spell checker?

* The diff complains about lack of file-ending newlines in several
places.

* There's something weird at the start of test.c:

@@ -0,0 +1,577 @@
+/*--

Maybe your compiler thinks that's a BOM?  It's hard to see how it
compiles otherwise.

* I think it might be simpler to have the module expose just one SQL
function that invokes all these individual tests internally.  Less
boilerplate text that way, and less to change if you add more tests
later.  Also, you might consider passing in TEST_SIZE as an argument
of the SQL function instead of having it be hard-wired.

* We don't do C++-style (//) comments around here.  Please use C
style.  (You might consider running the code through pgindent,
which will convert those comments automatically.)

* It's also generally not per project style to use malloc/calloc/free
directly in backend code; and it's certainly not cool to use malloc or
calloc and then not check for a null result.  Use palloc instead.  Given
the short runtime of this test, you likely needn't be too worried about
pfree'ing stuff.

* _PG_init() declaration seems to be a leftover?   doesn't
belong here either, as postgres.h will bring that in for you.

* I know next to zip about red-black trees, but it disturbs me that
the traversal tests use trees whose values were inserted in strictly
increasing order.  Seems like that's not proving as much as it could.
I wonder how hard it would be to insert those integers in random order.

* I'm not too pleased that the rb_find calls mostly just assume that
the find won't return NULL.  You should be testing for that and reporting
the problem, not just dying on a null pointer crash if it happens.

* Possibly the tests should exercise rb_delete on keys *not* present.
And how about insertion of already-existing keys?  Although that's
a bit outside the scope of testing traversal, so if you want to leave
it for some future expansion, that'd be OK.

I'll set this back to Waiting on Author.  I do encourage you to finish
it up.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-09-06 Thread Omar Kilani
Hi,

I know I'm 7 months late to this, but only just read the beta 4 release notes.

Is there anything people using float datetimes can do that isn't a
pg_dumpall | pg_restore to do a less painful update?

We have several TB of data still using float datetimes and I'm trying
to figure out how we can move to 10 (currently on 9.6.x) without
massive amounts of $ in duplicated hardware or downtime.

I did attempt a pg_dumpall | pg_restore at one point but for whatever
reason we had data in tables that integer datetimes fails on (I forget
the exact crash, but the datetime values were either too small or too
large to fit into the integer datetimes field -- I can retry this if
it would be helpful).

Thanks.

Regards,
Omar

On Mon, Feb 27, 2017 at 5:13 PM, Andres Freund  wrote:
> On 2017-02-27 17:00:23 -0800, Joshua D. Drake wrote:
>> On 02/22/2017 02:45 PM, Tom Lane wrote:
>> > Andres Freund  writes:
>> > > On 2017-02-22 08:43:28 -0500, Tom Lane wrote:
>> > > > (To be concrete, I'm suggesting dropping --disable-integer-datetimes
>> > > > in HEAD, and just agreeing that in the back branches, use of 
>> > > > replication
>> > > > protocol with float-timestamp servers is not supported and we're not
>> > > > going to bother looking for related bugs there.  Given the lack of 
>> > > > field
>> > > > complaints, I do not believe anyone cares.)
>> >
>> > What I *am* willing to spend time on is removing float-timestamp code
>> > in HEAD.  I've not yet heard anybody speak against doing that (or at
>> > least, nothing I interpreted as a vote against it).  If I've not heard
>> > any complaints by tomorrow, I'll get started on that.
>>
>> Rip it out!
>
> Already happened: 
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b6aa17e0ae367afdcea07118e016111af4fa6bc3
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] merge psql ef/ev sf/sv handling functions

2017-09-06 Thread Tom Lane
Fabien COELHO  writes:
> Here is an attempt at merging some functions which removes 160 lines of 
> code.

Pushed with minor adjustments.  I thought a couple of variable names
could be better chosen, but mostly, you can't do this sort of thing:

-psql_error("The server (version %s) does not support editing 
function source.\n",
+psql_error("The server (version %s) does not support editing 
%s.\n",
formatPGVersionNumber(pset.sversion, false,
- sverbuf, sizeof(sverbuf)));
+ sverbuf, sizeof(sverbuf)),
+   is_func ? "function source" : "view definitions");

It's too much of a pain in the rear for translators.  See
https://www.postgresql.org/docs/devel/static/nls-programmer.html#nls-guidelines
Usually we just use two independent messages, and that's what I did.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The case for removing replacement selection sort

2017-09-06 Thread Thomas Munro
On Fri, Sep 1, 2017 at 6:00 AM, Peter Geoghegan  wrote:
> On Wed, Aug 30, 2017 at 4:59 PM, Peter Geoghegan  wrote:
>> I may submit the simple patch to remove replacement selection, if
>> other contributors are receptive. Apart from everything else, the
>> "incrementalism" of replacement selection works against cleverer batch
>> memory management of the type I just outlined, which seems like a
>> useful direction to take tuplesort in.
>
> I attach a patch to remove replacement selection, which I'll submit to CF 1.

This breaks the documentation build, because
doc/src/sgml/release-9.6.sgml still contains  but you removed that id.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor code improvement to postgresGetForeignPlan

2017-09-06 Thread Tom Lane
Tatsuro Yamada  writes:
> The declaration of postgresGetForeignPlan uses baserel, but
> the actual definition uses foreignrel. It would be better to sync.

Pushed, thanks.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-09-06 Thread Tom Lane
Omar Kilani  writes:
> Is there anything people using float datetimes can do that isn't a
> pg_dumpall | pg_restore to do a less painful update?

Um, not really.  You may be stuck on 9.6 until you can spare the effort
to convert.  The physical representations of timestamps are totally
different in the two cases.

> I did attempt a pg_dumpall | pg_restore at one point but for whatever
> reason we had data in tables that integer datetimes fails on (I forget
> the exact crash, but the datetime values were either too small or too
> large to fit into the integer datetimes field -- I can retry this if
> it would be helpful).

I'm pretty sure the minimum values are the same in both cases, to wit
Julian day zero.  As for the max, according to the old code comments

 * The upper limit for dates is 5874897-12-31, which is a bit less than what
 * the Julian-date code can allow.  We use that same limit for timestamps when
 * using floating-point timestamps ... For integer timestamps, the upper
 * limit is 294276-12-31.

I would hope that any timestamps you've got beyond 294276AD are erroneous
data that you need to clean up anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication vs. float timestamps is a disaster

2017-09-06 Thread Chapman Flack
On 09/06/17 18:33, Omar Kilani wrote:

> Is there anything people using float datetimes can do that isn't a
> pg_dumpall | pg_restore to do a less painful update?
> 
> We have several TB of data still using float datetimes and I'm trying
> to figure out how we can move to 10 (currently on 9.6.x) without
> massive amounts of $ in duplicated hardware or downtime.

I ran into an analogous issue with endianness of PL/Java-defined datatypes,
and ended up devising a procedure [1] for treating the type one way on
output and another way on input, and then constructing an UPDATE query
that just assigns the column to itself using a function that's essentially
identity, but forces the output-input conversion (and doesn't look like
a simple identity function to the query optimizer, which otherwise might
optimize the whole update away).

While the details of that were specific to PL/Java and byte order,
something similar might work in your case if you can afford some period,
either just before or just after migration, when your normal workload
is blocked, long enough to run such updates on your several TB of data.

Or if that's too big a chunk of downtime, you might be able to add
a column in advance, of some user-defined type the same width as an
integer datetime, populate that in advance, migrate, then do a quick
catalog switcheroo of the column names and types. I've never done that,
so someone else might have comments on its feasibility or the best way
of doing it.

> the exact crash, but the datetime values were either too small or too
> large to fit into the integer datetimes field -- I can retry this if

That does seem important to get the details on. If the integer datetime
column won't represent your stuff (and the too-large or too-small values
are necessary to represent), then some schema change might be necessary.
Or, if the outlying values were sentinel values of some kind (I wonder,
how else would you even have datetimes outside of the int64 range?),
maybe they can be replaced with others that fit.

-Chap

[1]: https://tada.github.io/pljava/use/byteordermigrate.html


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2017-09-06 Thread Kyotaro HORIGUCHI
Hello,

At Wed, 6 Sep 2017 12:23:53 -0700, Andres Freund  wrote in 
<20170906192353.ufp2dq7wm5fd6...@alap3.anarazel.de>
> On 2017-09-06 17:36:02 +0900, Kyotaro HORIGUCHI wrote:
> > The problem is that the current ReadRecord needs the first one of
> > a series of continuation records from the same source with the
> > other part, the master in the case.
> 
> What's the problem with that?  We can easily keep track of the beginning
> of a record, and only confirm the address before that.

After failure while reading a record locally, ReadRecored tries
streaming to read from the beginning of a record, which is not on
the master, then retry locally and.. This loops forever.

> > A (or the) solution closed in the standby side is allowing to
> > read a seris of continuation records from muliple sources.
> 
> I'm not following. All we need to use is the beginning of the relevant
> records, that's easy enough to keep track of. We don't need to read the
> WAL or anything.

The beginning is already tracked and nothing more to do. 

I reconsider that way and found that it doesn't need such
destructive refactoring.

The first *problem* was WaitForWALToBecomeAvaialble requests the
beginning of a record, which is not on the page the function has
been told to fetch. Still tliRecPtr is required to determine the
TLI to request, it should request RecPtr to be streamed.

The rest to do is let XLogPageRead retry other sources
immediately. To do this I made ValidXLogPageHeader@xlogreader.c
public (and renamed to XLogReaderValidatePageHeader).

The patch attached fixes the problem and passes recovery
tests. However, the test for this problem is not added. It needs
to go to the last page in a segment then put a record continues
to the next segment, then kill the standby after receiving the
previous segment but before receiving the whole record.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From 8932a390bd3d1acfe5722bc62f42fc7e381ca617 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 7 Sep 2017 12:14:55 +0900
Subject: [PATCH 1/2] Allow switch WAL source midst of record.

The corrent recovery machinary assumes the whole of a record is
avaiable from single source. This prevents a standby from restarting
under a certain condition. This patch allows source switching during
reading a series of continuation records.
---
 src/backend/access/transam/xlog.c   |  7 ++-
 src/backend/access/transam/xlogreader.c | 12 +---
 src/include/access/xlogreader.h |  5 +
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index df4843f..eef3a97 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11566,6 +11566,11 @@ retry:
 	Assert(reqLen <= readLen);
 
 	*readTLI = curFileTLI;
+
+	if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr,
+	  (XLogPageHeader) readBuf))
+		goto next_record_is_invalid;
+
 	return readLen;
 
 next_record_is_invalid:
@@ -11700,7 +11705,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 		}
 		else
 		{
-			ptr = tliRecPtr;
+			ptr = RecPtr;
 			tli = tliOfPointInHistory(tliRecPtr, expectedTLEs);
 
 			if (curFileTLI > 0 && tli < curFileTLI)
diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 0781a7b..aa05e3f 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -27,8 +27,6 @@
 
 static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength);
 
-static bool ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr,
-	XLogPageHeader hdr);
 static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
 	  XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);
 static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record,
@@ -545,7 +543,7 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)
 
 		hdr = (XLogPageHeader) state->readBuf;
 
-		if (!ValidXLogPageHeader(state, targetSegmentPtr, hdr))
+		if (!XLogReaderValidatePageHeader(state, targetSegmentPtr, hdr))
 			goto err;
 	}
 
@@ -582,7 +580,7 @@ ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr, int reqLen)
 	/*
 	 * Now that we know we have the full header, validate it.
 	 */
-	if (!ValidXLogPageHeader(state, pageptr, hdr))
+	if (!XLogReaderValidatePageHeader(state, pageptr, hdr))
 		goto err;
 
 	/* update read state information */
@@ -709,9 +707,9 @@ ValidXLogRecord(XLogReaderState *state, XLogRecord *record, XLogRecPtr recptr)
 /*
  * Validate a page header
  */
-static bool
-ValidXLogPageHeader(XLogReaderState *state, XLogRecPtr recptr,
-	XLogPageHeader hdr)
+bool
+XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
+			 XLogPageHeader hdr)
 {
 	XLogRecPtr	recaddr;
 	

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-06 Thread Tom Lane
Michael Paquier  writes:
> On Thu, Sep 7, 2017 at 5:49 AM, Tom Lane  wrote:
>> I looked briefly at these patches.  I'm not sure that it's safe for the
>> mask functions to assume that meta pages always have valid pd_lower.
>> What happens when replaying log data concerning an old index that doesn't
>> have that field filled?

> There will be inconsistency between the pages, and the masking check
> will complain.

That doesn't seem like a pleasant outcome to me.  The WAL consistency
check code is supposed to complain if there's some kind of replication
or replay failure, and this cannot be categorized as either.

The idea I'd had was to apply the masking only if pd_lower >=
SizeOfPageHeaderData, or if you wanted to be stricter, only if
pd_lower != 0.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: DROP SUBSCRIPTION hangs if sub is disabled in the same transaction

2017-09-06 Thread Noah Misch
On Wed, Sep 06, 2017 at 03:28:47PM +0900, Masahiko Sawada wrote:
> On Mon, Sep 4, 2017 at 11:43 PM, Arseny Sher  wrote:
> > Arseny Sher  writes:
> >
> >> Attached patch fixes this by stopping workers before RO drop, as
> >> already done in case when we drop replication slot.
> >
> > Sorry, here is the patch.
> >
> 
> I could reproduce this issue, it's a bug. Added this to the open item.
> The cause of this is commit 7e174fa7 which make subscription ddls kill
> the apply worker only when transaction commit. I didn't look the patch
> deeply yet but I'm not sure the approach that kills apply worker
> before commit would be good.

[Action required within three days.  This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item.  Peter,
since you committed the patch believed to have created it, you own this open
item.  If some other commit is more relevant or if this does not belong as a
v10 open item, please let us know.  Otherwise, please observe the policy on
open item ownership[1] and send a status update within three calendar days of
this message.  Include a date for your subsequent status update.  Testers may
discover new open items at any time, and I want to plan to get them all fixed
well in advance of shipping v10.  Consequently, I will appreciate your efforts
toward speedy resolution.  Thanks.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-09-06 Thread Masahiko Sawada
On Wed, Sep 6, 2017 at 4:01 PM, Fabien COELHO  wrote:
>
> Applies, compiles, works for me.
>
> Very very minor comments that I should have noticed before, sorry for this
> additional round trip.

Thank you for the dedicated review!

>
> In the help line, move -I just after -i, to put short options in
> alphabetical and decreasing importance order. On this line, also add the
> information about the default, something like:
>
>   -i, --ini...   
>   -I, --custom=[...]+  (default "ctgvp")
>  ...

Agreed. Attached latest patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v12.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restricting maximum keep segments by repslots

2017-09-06 Thread Kyotaro HORIGUCHI
Hello,

At Fri, 1 Sep 2017 23:49:21 -0400, Peter Eisentraut 
 wrote in 
<751e09c4-93e0-de57-edd2-e64c4950f...@2ndquadrant.com>
> I'm still concerned about how the critical situation is handled.  Your
> patch just prints a warning to the log and then goes on -- doing what?
> 
> The warning rolls off the log, and then you have no idea what happened,
> or how to recover.

The victims should be complaining in their log files, but, yes, I
must admit that it's extremely resembles /dev/null. And the
catastrophe comes suddenly.

> I would like a flag in pg_replication_slots, and possibly also a
> numerical column that indicates how far away from the critical point
> each slot is.  That would be great for a monitoring system.

Great! I'll do that right now.

> -- 
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 

Thanks.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A design for amcheck heapam verification

2017-09-06 Thread Peter Geoghegan
On Wed, Aug 30, 2017 at 9:29 AM, Peter Geoghegan  wrote:
> On Wed, Aug 30, 2017 at 5:02 AM, Alvaro Herrera
>  wrote:
>> Eh, if you want to optimize it for the case where debug output is not
>> enabled, make sure to use ereport() not elog().  ereport()
>> short-circuits evaluation of arguments, whereas elog() does not.
>
> I should do that, but it's still not really noticeable.

Since this patch has now bit-rotted, I attach a new revision, V2.
Apart from fixing some Makefile bitrot, this revision also makes some
small tweaks as suggested by Thomas and Alvaro. The documentation is
also revised and expanded, and now discusses practical aspects of the
set membership being tested using a Bloom filter, how that relates to
maintenance_work_mem, and so on.

Note that this revision does not let the Bloom filter caller use their
own dynamic shared memory, which is something that Thomas asked about.
While that could easily be added, I think it should happen later. I
really just wanted to make sure that my Bloom filter was not in some
way fundamentally incompatible with Thomas' planned enhancements to
(parallel) hash join.

-- 
Peter Geoghegan
From d4dda95dd41204315dc12936fac83d2df8676992 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Thu, 24 Aug 2017 20:58:21 -0700
Subject: [PATCH 1/2] Add Bloom filter data structure implementation.

A Bloom filter is a space-efficient, probabilistic data structure that
can be used to test set membership.  Callers will sometimes incur false
positives, but never false negatives.  The rate of false positives is a
function of the total number of elements and the amount of memory
available for the Bloom filter.

Two classic applications of Bloom filters are cache filtering, and data
synchronization testing.  Any user of Bloom filters must accept the
possibility of false positives as a cost worth paying for the benefit in
space efficiency.
---
 src/backend/lib/Makefile  |   4 +-
 src/backend/lib/README|   2 +
 src/backend/lib/bloomfilter.c | 297 ++
 src/include/lib/bloomfilter.h |  26 
 4 files changed, 327 insertions(+), 2 deletions(-)
 create mode 100644 src/backend/lib/bloomfilter.c
 create mode 100644 src/include/lib/bloomfilter.h

diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index d1fefe4..191ea9b 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -12,7 +12,7 @@ subdir = src/backend/lib
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
-OBJS = binaryheap.o bipartite_match.o dshash.o hyperloglog.o ilist.o \
-	   knapsack.o pairingheap.o rbtree.o stringinfo.o
+OBJS = binaryheap.o bipartite_match.o bloomfilter.o dshash.o hyperloglog.o \
+   ilist.o knapsack.o pairingheap.o rbtree.o stringinfo.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/lib/README b/src/backend/lib/README
index 5e5ba5e..376ae27 100644
--- a/src/backend/lib/README
+++ b/src/backend/lib/README
@@ -3,6 +3,8 @@ in the backend:
 
 binaryheap.c - a binary heap
 
+bloomfilter.c - probabilistic, space-efficient set membership testing
+
 hyperloglog.c - a streaming cardinality estimator
 
 pairingheap.c - a pairing heap
diff --git a/src/backend/lib/bloomfilter.c b/src/backend/lib/bloomfilter.c
new file mode 100644
index 000..e93f9b0
--- /dev/null
+++ b/src/backend/lib/bloomfilter.c
@@ -0,0 +1,297 @@
+/*-
+ *
+ * bloomfilter.c
+ *		Minimal Bloom filter
+ *
+ * A Bloom filter is a probabilistic data structure that is used to test an
+ * element's membership of a set.  False positives are possible, but false
+ * negatives are not; a test of membership of the set returns either "possibly
+ * in set" or "definitely not in set".  This can be very space efficient when
+ * individual elements are larger than a few bytes, because elements are hashed
+ * in order to set bits in the Bloom filter bitset.
+ *
+ * Elements can be added to the set, but not removed.  The more elements that
+ * are added, the larger the probability of false positives.  Caller must hint
+ * an estimated total size of the set when its Bloom filter is initialized.
+ * This is used to balance the use of memory against the final false positive
+ * rate.
+ *
+ * Copyright (c) 2017, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *	  src/backend/lib/bloomfilter.c
+ *
+ *-
+ */
+#include "postgres.h"
+
+#include 
+
+#include "access/hash.h"
+#include "lib/bloomfilter.h"
+#include "utils/memutils.h"
+
+#define MAX_HASH_FUNCS	10
+#define NBITS(filt)		((1 << (filt)->bloom_power))
+
+typedef struct bloom_filter
+{
+	/* 2 ^ bloom_power is the size of the bitset (in bits) */
+	intbloom_power;
+	unsigned char  *bitset;
+
+	/* K hash functions are used, which are randomly seeded */
+	intk_hash_funcs;
+	

Re: [HACKERS] WIP: Failover Slots

2017-09-06 Thread Craig Ringer
On 14 August 2017 at 11:56, Craig Ringer  wrote:

>
> I don't want to block failover slots on decoding on standby just because
> decoding on standby would be nice to have.
>

However, during discussion with Tomas Munro a point has come up that does
block failover slots as currently envisioned - silent timeline divergence.
It's a solid reason why the current design and implementation is
insufficient to solve the problem. This issue exists both with the original
failover slots and with the model Robert and I were discussing.

Say a decoding client has replayed from master up to commit of xid 42 at
1/1000 and confirmed flush, then a failover slots standby of the master is
promoted. The standby has only received WAL from the failed master up to
1/500 with most recent xid 20. Now the standby does some other new xacts,
pushing xid up to 30 at 1/1000 then continuing to insert until xid 50 at
lsn 1/2000.

Then the logical client reconnects. The logical client will connect to the
failover slot fine, and start replay. But it'll ask for replay to start at
1/1000. The standby will happily fast-forward the slot (as it should), and
start replay after 1/1000.

But now we have silent divergence in timelines. The logical replica has
received and committed xacts 20...42 at lsn 1/500 through 1/1000, but these
are not present on the promoted master. And the replica has skipped over
the new-master's xids 20...30 with lsns 1/500 through 1/1000, so they're
present on the new master but not the replica.

IMO, this shows that not including the timeline in replication origins was
a bit of a mistake, since we'd trivially detect this if they were included
- but it's a bit late now.  And anyway, detection would just mean logical
rep would break, which doesn't help much.

The simplest fix, but rather limited, is to require that failover
candidates be in synchronous_standby_names, and delay ReorderBufferCommit
sending the actual commit message until all peers in s_s_n confirm flush of
the commit lsn. But that's not much good if you want sync rep for your
logical connections too, and is generally a hack.

A more general solution requires that masters be told which peers are
failover candidates, so they can ensure ordering between logical decoding
and physical failover candidates. Which effectively adds another kind of
sync rep, where we do "wait for physical failover candidates to flush, and
only then allow logical decoding". This actually seems pretty practical
with the design Robert and I discussed, but it's definitely an expansion in
scope.

Alternately, we could require the decoding clients to keep an eye on the
flush/replay positions of all failover candidates and delay commit+confirm
of decoded xacts until the upstream's failover candidates have received and
flushed up to that lsn. Theat starts to look at lot like a decoding on
standby based model for logical failover, where the downstream maintains
slots on each failover candidate upstream.

So yeah. More work needed here. Even if we suddenly decided the original
failover slots model was OK, it's not sufficient to fully solve the problem.

(It's something I'd thought for BDR failover, but never applied to falover
slots: the problem of detecting or preventing divergence when the logical
client is ahead of physical receive at the time the physical standby is
promoted.)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-09-06 Thread Michael Paquier
On Thu, Sep 7, 2017 at 12:59 PM, Tom Lane  wrote:
> The idea I'd had was to apply the masking only if pd_lower >=
> SizeOfPageHeaderData, or if you wanted to be stricter, only if
> pd_lower != 0.

If putting a check, it seems to me that the stricter one makes the
most sense. pd_lower should remain at 0 on pre-10 servers.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >