Re: [HACKERS] hash partitioning based on v10Beta2

2017-08-27 Thread Rushabh Lathia
On Sat, Aug 26, 2017 at 10:10 AM, yang...@highgo.com 
wrote:

> Hi all,
>
> Now we have had the range / list partition, but hash partitioning is not
> implemented yet.
> Attached is a POC patch based on the v10Beta2 to add the
> hash partitioning feature.
> Although we will need more discussions about the syntax
> and other specifications before going ahead the project,
> but I think this runnable code might help to discuss
> what and how we implement this.
>
>
FYI, there is already an existing commitfest entry for this project.

https://commitfest.postgresql.org/14/1059/



> Description
>
> The hash partition's implement is on the basis of
> the original range / list partition,and using similar syntax.
>
> To create a partitioned table ,use:
>
> CREATE TABLE h (id int) PARTITION BY HASH(id);
>
> The partitioning key supports only one value, and I think
> the partition key can support multiple values,
> which may be difficult to implement when querying, but
> it is not impossible.
>
> A partition table can be create as bellow:
>
>  CREATE TABLE h1 PARTITION OF h;
>  CREATE TABLE h2 PARTITION OF h;
>  CREATE TABLE h3 PARTITION OF h;
>
> FOR VALUES clause cannot be used, and the partition bound
> is calclulated automatically as partition index of single integer value.
>
> An inserted record is stored in a partition whose index equals
> DatumGetUInt32(OidFunctionCall1(lookup_type_cache(key->parttypid[0],
> TYPECACHE_HASH_PROC)->hash_proc, values[0])) % nparts
> /* Number of partitions */
> ;
> In the above example, this is DatumGetUInt32(OidFunctionCall1(lookup_type_
> cache(key->parttypid[0], TYPECACHE_HASH_PROC)->hash_proc, id)) % 3;
>
> postgres=# insert into h select generate_series(1,20);
> INSERT 0 20
> postgres=# select tableoid::regclass,* from h;
>  tableoid | id
> --+
>  h1   |  3
>  h1   |  5
>  h1   | 17
>  h1   | 19
>  h2   |  2
>  h2   |  6
>  h2   |  7
>  h2   | 11
>  h2   | 12
>  h2   | 14
>  h2   | 15
>  h2   | 18
>  h2   | 20
>  h3   |  1
>  h3   |  4
>  h3   |  8
>  h3   |  9
>  h3   | 10
>  h3   | 13
>  h3   | 16
> (20 rows)
>
> The number of partitions here can be dynamically added, and
> if a new partition is created, the number of partitions
> changes, the calculated target partitions will change,
>  and the same data is not reasonable in different
> partitions,So you need to re-calculate the existing data
> and insert the target partition when you create a new partition.
>
> postgres=# create table h4 partition of h;
> CREATE TABLE
> postgres=# select tableoid::regclass,* from h;
>  tableoid | id
> --+
>  h1   |  5
>  h1   | 17
>  h1   | 19
>  h1   |  6
>  h1   | 12
>  h1   |  8
>  h1   | 13
>  h2   | 11
>  h2   | 14
>  h3   |  1
>  h3   |  9
>  h3   |  2
>  h3   | 15
>  h4   |  3
>  h4   |  7
>  h4   | 18
>  h4   | 20
>  h4   |  4
>  h4   | 10
>  h4   | 16
> (20 rows)
>
> When querying the data, the hash partition uses the same
> algorithm as the insertion, and filters out the table
> that does not need to be scanned.
>
> postgres=# explain analyze select * from h where id = 1;
>  QUERY PLAN
>
> 
> 
>  Append  (cost=0.00..41.88 rows=13 width=4) (actual time=
> 0.020..0.023 rows=1 loops=1)
>->  Seq Scan on h3  (cost=0.00..41.88 rows=13 width=4) (
> actual time=0.013..0.016 rows=1 loops=1)
>  Filter: (id = 1)
>  Rows Removed by Filter: 3
>  Planning time: 0.346 ms
>  Execution time: 0.061 ms
> (6 rows)
>
> postgres=# explain analyze select * from h where id in (1,5);;
>  QUERY PLAN
>
> 
> 
>  Append  (cost=0.00..83.75 rows=52 width=4) (actual time=
> 0.016..0.028 rows=2 loops=1)
>->  Seq Scan on h1  (cost=0.00..41.88 rows=26 width=4) (
> actual time=0.015..0.018 rows=1 loops=1)
>  Filter: (id = ANY ('{1,5}'::integer[]))
>  Rows Removed by Filter: 6
>->  Seq Scan on h3  (cost=0.00..41.88 rows=26 width=4) (
> actual time=0.005..0.007 rows=1 loops=1)
>  Filter: (id = ANY ('{1,5}'::integer[]))
>  Rows Removed by Filter: 3
>  Planning time: 0.720 ms
>  Execution time: 0.074 ms
> (9 rows)
>
> postgres=# explain analyze select * from h where id = 1 or id = 5;;
>  QUERY PLAN
>
> 
> 
>  Append  (cost=0.00..96.50 rows=50 width=4) (actual time=
> 0.017..0.078 rows=2 loops=1)
>->  Seq Scan on h1  (cost=0.00..48.25 rows=25 width=4) (
> actual time=0.015..0.019 rows=1 loops=1)
>  Filter: ((id = 1) OR (id 

[HACKERS] Make pg_regress print a connstring with sockdir

2017-08-27 Thread Craig Ringer
Hi all

It's a pain having to find the postmaster command line to get the port
pg_regress started a server on. We print the port in the pg_regress output,
why not the socket directory / host?

How about

running on 'port=50848 host=/tmp/pg_regress-UMrcT3' with PID 16409

per the attached?

If you'd prefer nicer wording at the expense of two lines, maybe

running with PID 16409
connection string: 'port=50848 host=/tmp/blah'

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 5b242d9af911a88c084a74bf49904b94117347c1 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Mon, 28 Aug 2017 13:28:05 +0800
Subject: [PATCH v1] Show sockdir/hostname in pg_regress startup output

---
 src/test/regress/pg_regress.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index abb742b..11931db 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -2438,8 +2438,9 @@ regression_main(int argc, char *argv[], init_function ifunc, test_function tfunc
 #else
 #define ULONGPID(x) (unsigned long) (x)
 #endif
-		printf(_("running on port %d with PID %lu\n"),
-			   port, ULONGPID(postmaster_pid));
+		printf(_("running on 'port=%d host=%s' with PID %lu\n"),
+			   port, hostname ? hostname : sockdir,
+ 			   ULONGPID(postmaster_pid));
 	}
 	else
 	{
-- 
2.9.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql --batch

2017-08-27 Thread Craig Ringer
Hi all

I find myself regurgitating the incantation

psql -qAtX -v ON_ERRORS_STOP=1

quite a bit. It's not ... super friendly.

It strikes me that we could possibly benefit from a 'psql --batch' option.

Thoughts?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-27 Thread Masahiko Sawada
On Sun, Aug 27, 2017 at 5:12 PM, Fabien COELHO  wrote:
>
> Hello Masahiko-san,
>
>> Attached latest v4 patch. Please review it.

Thank you for reviewing this patch!

>
> Patch applies, compiles.
>
> The messages/options do not seem to work properly:
>
>  sh> ./pgbench -i -I t
>  done.

Fixed this so that it ouptut "creating tables..." as you pointed out.

> Does not seem to have initialized the tables although it was requested...
>
>  sh> ./pgbench -i -I d
>  creating tables...
>  10 of 10 tuples (100%) done (elapsed 0.08 s, remaining 0.00 s)
>  done.
>
> It seems that "d" triggered table creation... In fact it seems that the
> work is done correctly, but the messages are not in the right place.

Fixed, but I just removed "creating tables..." from -I d command. I
think it's not good if we change the output messages by this patch.

> Also another issue:
>
>  sh> ./pgbench -i --foreign-keys
>  creating tables...
>  10 of 10 tuples (100%) done (elapsed 0.09 s, remaining 0.00 s)
>  vacuum...
>  set primary keys...
>  done.
>
> Foreign keys do not seem to have been set... Please check that all really
> work as expected.

Fixed.

>
> About the documentation:
>
> If a native English speaker could review the text, it would be great.
>
> At least: "Required to invoke" -> "Require to invoke".

Fixed.

>
>
> About the code:
>
> is_no_vacuum should be a bool?

We can change it but I think there is no difference actually. So
keeping it would be better.

>
> I'm really hesitating about the out of order processing of options. If the
> user writes
>
>   sh> pgbench -i --no-vacuum -I v
>   done.
>
> Then does it make sense to ignore the last thing the user asked for? ISTM
> that processing options in order and keeping the last resulting spec is more
> natural. Appending contradictory options can happen easily when scripting,
> and usual what is meant is the last one.

Agreed. I changed it so that it processes options in order and keeps
the last resulting spec.

>
> Again, as pointed out in the previous review, I do not like much
> checkCustomCmds implementation: switch/case, fprintf and return on error
> which will trigger another fprintf and error above... ISTM that you should
> either take into account previous comments or explain why you disagree with
> them, but not send the same code without addressing them in any way.

Sorry, I didn't mean to ignore, I'd just missed the comment. Fixed it.

Attached latest patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


pgbench_custom_initialization_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: Separate log file for extension

2017-08-27 Thread Craig Ringer
On 25 August 2017 at 15:12, Antonin Houska  wrote:

> Attached is a draft patch to allow extension to write log messages to a
> separate file.


I like the idea a lot. I'm not so sure about the approach.

How will this play with syslog? csvlog? etc?

I wonder if a level of indirection is appropriate here, where extensions
(or postgres subsystems, even) provide a log stream label. Then the logging
backed takes care of using that appropriately for the logging mechanism in
use; for logging to file that'd generally be separate files.  Same for
CSVlog. Other mechanisms could be left as stubs initially.

So the outcome would be the same, just without the assumption of specific
file name and output mechanism baked in.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] pgbench: faster version of tpcb-like transaction

2017-08-27 Thread Michael Paquier
On Sun, Aug 27, 2017 at 12:16 PM, Tom Lane  wrote:
> Jeff Janes  writes:
>> It is easy to package 5 of those commands into a single PL/pgSQL function,
>> with the other two being implicit via the standard auto-commit behavior
>> when explicit transactions are not opened.  The attached patch does that,
>> under the name tpcb-func.  I first named it tpcb-like-func, but one builtin
>> name can't be a prefix or another so that won't work.
>
> I dunno, it seems like this proposal involves jacking up the test case
> and driving a completely different one underneath.  There is no reason
> to consider that you've improved the benchmark results --- you've just
> substituted a different benchmark, one with no historical basis, and
> not a lot of field justification either.

Performance comparison between major releases matters only if the same
set of tests is used.

>> Wanting to measure IPC overhead is a valid thing to do, but
>> certainly isn't the most common thing people want to do with pgbench.
>
> I think that's nonsense.  Measuring how fast PG can do client interactions
> is EXACTLY what this is about.  Certainly, pushing SQL operations into
> server-side functions is a great way to reduce network overhead, but it
> has nothing to do with what we choose as a benchmark.

This thread makes me think that it would be a good idea to add in the
documentation of pgbench a section that gives out a set of scripts
that can be used for emulating more patterns instead of having them in
the code. The proposed test has value if one would like to compare if
it is better for an application to move more things server-side if
there is a lot of latency with the existing tpcb test of pgbench, but
that's not the end of it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in insert.sgml

2017-08-27 Thread Peter Eisentraut
On 6/20/17 15:03, David G. Johnston wrote:
> On Tuesday, June 20, 2017, Robert Haas  > wrote:
> 
> On Tue, Jun 20, 2017 at 2:34 PM, Peter Eisentraut
> > wrote:
> > On 6/18/17 03:16, Julien Rouhaud wrote:
> >> Patch attached.
> >
> > This was not a typo, this was intentional.
> 
> To me, Julien's change seems to make it easier to understand, but
> maybe I'm misunderstanding what it's trying to say in the first place.
> 
> 
> Maybe
> 
> [...] that are not identity columns in tbl2. Values
> for identity columns will come from the sequence generators for
> tbl2.
> 
> Otherwise I'd drop the word "continue" altogether and just say "but will
> use the sequence counters for".

I have committed something along those lines.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes

2017-08-27 Thread Michael Malis
Hmm... It seems the command I used for obtaining a patch I got from
here https://wiki.postgresql.org/wiki/Working_with_Git truncated part
of the patch. I've attached the file generated from git diff
--patience master improve-partial-index-correlation-calculation
--no-color > improve-correlated-partial-index-cost-v2.patch to this
email. What is the correct command for generating a context diff?
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 051a854..58224e6 100644
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@@ -163,7 +163,10 @@ static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
 static double relation_byte_size(double tuples, int width);
 static double page_size(double tuples, int width);
 static double get_parallel_divisor(Path *path);
-
+static double ordered_page_read_cost(double pages_fetched,
+	   int baserel_pages,
+	   double spc_seq_page_cost,
+	   double spc_random_page_cost);
 
 /*
  * clamp_row_est
@@ -652,9 +655,26 @@ cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
 
 		if (pages_fetched > 0)
 		{
-			min_IO_cost = spc_random_page_cost;
-			if (pages_fetched > 1)
-min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
+			if (index->indpred == NIL)
+			{
+min_IO_cost = spc_random_page_cost;
+if (pages_fetched > 1)
+	min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
+			}
+			else
+			{
+/*
+ * For a partial index perfectly correlated with the table
+ * ordering, consecutive pages fetched are not guarenteed to
+ * be adjacent in the table. Instead use
+ * ordered_page_read_cost to estimate the cost of reading
+ * pages from the heap.
+ */
+min_IO_cost = ordered_page_read_cost(pages_fetched,
+	 baserel->pages,
+	 spc_seq_page_cost,
+	 spc_seq_page_cost);
+			}
 		}
 		else
 			min_IO_cost = 0;
@@ -934,13 +954,11 @@ cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 	Cost		indexTotalCost;
 	QualCost	qpqual_cost;
 	Cost		cpu_per_tuple;
-	Cost		cost_per_page;
 	Cost		cpu_run_cost;
 	double		tuples_fetched;
 	double		pages_fetched;
 	double		spc_seq_page_cost,
 spc_random_page_cost;
-	double		T;
 
 	/* Should only be applied to base relations */
 	Assert(IsA(baserel, RelOptInfo));
@@ -961,28 +979,16 @@ cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
 		 _fetched);
 
 	startup_cost += indexTotalCost;
-	T = (baserel->pages > 1) ? (double) baserel->pages : 1.0;
 
 	/* Fetch estimated page costs for tablespace containing table. */
 	get_tablespace_page_costs(baserel->reltablespace,
 			  _random_page_cost,
 			  _seq_page_cost);
 
-	/*
-	 * For small numbers of pages we should charge spc_random_page_cost
-	 * apiece, while if nearly all the table's pages are being read, it's more
-	 * appropriate to charge spc_seq_page_cost apiece.  The effect is
-	 * nonlinear, too. For lack of a better idea, interpolate like this to
-	 * determine the cost per page.
-	 */
-	if (pages_fetched >= 2.0)
-		cost_per_page = spc_random_page_cost -
-			(spc_random_page_cost - spc_seq_page_cost)
-			* sqrt(pages_fetched / T);
-	else
-		cost_per_page = spc_random_page_cost;
-
-	run_cost += pages_fetched * cost_per_page;
+	run_cost += ordered_page_read_cost(pages_fetched,
+	   baserel->pages,
+	   spc_seq_page_cost,
+	   spc_random_page_cost);
 
 	/*
 	 * Estimate CPU costs per tuple.
@@ -5183,3 +5189,34 @@ compute_bitmap_pages(PlannerInfo *root, RelOptInfo *baserel, Path *bitmapqual,
 
 	return pages_fetched;
 }
+
+
+/*
+ * Estimate the total cost of reading possibly nonconsecutive pages from the
+ * heap in order.
+ */
+static double
+ordered_page_read_cost(double pages_fetched, int baserel_pages,
+	   double spc_seq_page_cost, double spc_random_page_cost)
+{
+	double cost_per_page;
+	double T;
+
+	T = (baserel_pages > 1) ? (double) baserel_pages : 1.0;
+
+	/*
+	 * For small numbers of pages we should charge spc_random_page_cost
+	 * apiece, while if nearly all the table's pages are being read, it's more
+	 * appropriate to charge spc_seq_page_cost apiece.  The effect is
+	 * nonlinear, too. For lack of a better idea, interpolate like this to
+	 * determine the cost per page.
+	 */
+	if (pages_fetched >= 2.0)
+		cost_per_page = spc_random_page_cost -
+			(spc_random_page_cost - spc_seq_page_cost)
+			* sqrt(pages_fetched / T);
+	else
+		cost_per_page = spc_random_page_cost;
+
+	return pages_fetched * cost_per_page;
+}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes

2017-08-27 Thread Michael Malis
(Sorry David. I initially replied only to you)

Ok. I've attached a patch of a proof-of-concept. I have a few
questions about tests.

What is typical workflow to add tests for changes to the planner? Also
I ran make check and it appears one of the existing tests is failing.
What is a typical way for going about discovering why the query plan
for a specific query changed? Also, how should I go about changing the
old test? Should I replace the old test output with the new test
output or modify the old test slightly to get it to produce the same
case as before?

Thanks,
Michael
diff --git a/src/backend/optimizer/path/costsize.c b/src/backend/optimizer/path/costsize.c
index 051a854..58224e6 100644
*** a/src/backend/optimizer/path/costsize.c
--- b/src/backend/optimizer/path/costsize.c
***
*** 163,169  static void set_rel_width(PlannerInfo *root, RelOptInfo *rel);
  static double relation_byte_size(double tuples, int width);
  static double page_size(double tuples, int width);
  static double get_parallel_divisor(Path *path);
! 
  
  /*
   * clamp_row_est
--- 163,172 
  static double relation_byte_size(double tuples, int width);
  static double page_size(double tuples, int width);
  static double get_parallel_divisor(Path *path);
! static double ordered_page_read_cost(double pages_fetched,
! 	   int baserel_pages,
! 	   double spc_seq_page_cost,
! 	   double spc_random_page_cost);
  
  /*
   * clamp_row_est
***
*** 652,660  cost_index(IndexPath *path, PlannerInfo *root, double loop_count,
  
  		if (pages_fetched > 0)
  		{
! 			min_IO_cost = spc_random_page_cost;
! 			if (pages_fetched > 1)
! min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
  		}
  		else
  			min_IO_cost = 0;
--- 655,680 
  
  		if (pages_fetched > 0)
  		{
! 			if (index->indpred == NIL)
! 			{
! min_IO_cost = spc_random_page_cost;
! if (pages_fetched > 1)
! 	min_IO_cost += (pages_fetched - 1) * spc_seq_page_cost;
! 			}
! 			else
! 			{
! /*
!  * For a partial index perfectly correlated with the table
!  * ordering, consecutive pages fetched are not guarenteed to
!  * be adjacent in the table. Instead use
!  * ordered_page_read_cost to estimate the cost of reading
!  * pages from the heap.
!  */
! min_IO_cost = ordered_page_read_cost(pages_fetched,
! 	 baserel->pages,
! 	 spc_seq_page_cost,
! 	 spc_seq_page_cost);
! 			}
  		}
  		else
  			min_IO_cost = 0;
***
*** 934,946  cost_bitmap_heap_scan(Path *path, PlannerInfo *root, RelOptInfo *baserel,
  	Cost		indexTotalCost;
  	QualCost	qpqual_cost;
  	Cost		cpu_per_tuple;
- 	Cost		cost_per_page;
  	Cost		cpu_run_cost;
  	double		tuples_fetched;
  	double		pages_fetched;
  	double		spc_seq_page_cost,
  spc_random_page_cost;
- 	double		T;
  
  	/* Should only be applied to base relations */
  	Assert(IsA(baserel, RelOptInfo));
--- 954,964 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Petr Jelinek
On 28/08/17 01:36, Michael Paquier wrote:
> On Sun, Aug 27, 2017 at 6:32 PM, Petr Jelinek
>  wrote:
>> Attached should fix this.
> 
> +$node_master->poll_query_until('postgres',
> +"SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name =
> 'test_slot' AND active_pid IS NULL)"
> +)
> +  or die "slot never became inactive";
> +
>  $stdout_recv = $node_master->pg_recvlogical_upto(
> I am wondering if a similar check should actually go into
> pg_recvlogical_upto() instead. I tend to think that we've learned
> things the hard way with 3043c1dd and such.
> 

Putting it there can lead to tests that take forever to finish if
written incorrectly. This way they at least just fail.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Michael Paquier
On Mon, Aug 28, 2017 at 8:33 AM, Tom Lane  wrote:
> Michael Paquier  writes:
>> Attached is a patch to make this code path wait that the transaction
>> has been replayed. We could use as well synchronous_commit = apply,
>> but I prefer the solution of this patch with a wait query.
>
> Petr proposed a different patch to fix the same problem at
> https://www.postgresql.org/message-id/1636c52e-c144-993a-6665-9358f322deda%402ndquadrant.com
>
> Which one is better?

Petr's patch is here to fix the race condition in test 006 for logical
decoding. My patch involves the failures that dangomushi sees since
yesterday with 2PC test 009. (Note: It took me a while to put this
animal in an environment with a more stable connection, so now I hope
that we'll get again daily reports)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Michael Paquier
On Sun, Aug 27, 2017 at 6:32 PM, Petr Jelinek
 wrote:
> Attached should fix this.

+$node_master->poll_query_until('postgres',
+"SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name =
'test_slot' AND active_pid IS NULL)"
+)
+  or die "slot never became inactive";
+
 $stdout_recv = $node_master->pg_recvlogical_upto(
I am wondering if a similar check should actually go into
pg_recvlogical_upto() instead. I tend to think that we've learned
things the hard way with 3043c1dd and such.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Tom Lane
Michael Paquier  writes:
> Attached is a patch to make this code path wait that the transaction
> has been replayed. We could use as well synchronous_commit = apply,
> but I prefer the solution of this patch with a wait query.

Petr proposed a different patch to fix the same problem at
https://www.postgresql.org/message-id/1636c52e-c144-993a-6665-9358f322deda%402ndquadrant.com

Which one is better?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Michael Paquier
On Sun, Aug 27, 2017 at 3:34 PM, Michael Paquier
 wrote:
> On Sun, Aug 27, 2017 at 12:03 PM, Tom Lane  wrote:
>> contains exactly no means of ensuring that the master's transaction has
>> been replayed on the standby before we check for its results.  It's not
>> real clear why it seems to work 99.99% of the time, because, well, there
>> isn't any frickin' interlock there ...
>
> I have noticed this one this morning, and I am planning to address it
> with a proper patch soonishly. (I am still fighting a bit to get
> dangomushi in a more stable stable, and things run slow on it, so it
> is good at catching race conditions of this kind).

Today's run has finished with the same failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dangomushi=2017-08-27%2018%3A00%3A13
Attached is a patch to make this code path wait that the transaction
has been replayed. We could use as well synchronous_commit = apply,
but I prefer the solution of this patch with a wait query.
-- 
Michael


tap-2pc-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: faster version of tpcb-like transaction

2017-08-27 Thread Tatsuo Ishii
> Don't think anybody is proposing to remove the existing way to run
> pgbench, so I'm not sure what your point is?

I know. I just wanted to point out that the proposal is not good for
cluster environment tests.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: faster version of tpcb-like transaction

2017-08-27 Thread Andres Freund
Hi,

On 2017-08-28 08:05:11 +0900, Tatsuo Ishii wrote:
> With the proposed implementation it is not possible to do that kind of
> test anymore since everything is packed into a function.

Don't think anybody is proposing to remove the existing way to run
pgbench, so I'm not sure what your point is?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: faster version of tpcb-like transaction

2017-08-27 Thread Tatsuo Ishii
> Jeff Janes  writes:
>> If all the data is in memory and you have a system with fast fsyncs (or are
>> running with fsync off, or unlogged tables, or synchronous_commit off),
>> then the big bottleneck in pgbench is the amount of back and forth between
>> the pgbench program and the backend.  There are 7 commands per transaction.
> 
> Yeah ...
> 
>> It is easy to package 5 of those commands into a single PL/pgSQL function,
>> with the other two being implicit via the standard auto-commit behavior
>> when explicit transactions are not opened.  The attached patch does that,
>> under the name tpcb-func.  I first named it tpcb-like-func, but one builtin
>> name can't be a prefix or another so that won't work.
> 
> I dunno, it seems like this proposal involves jacking up the test case
> and driving a completely different one underneath.  There is no reason
> to consider that you've improved the benchmark results --- you've just
> substituted a different benchmark, one with no historical basis, and
> not a lot of field justification either.
> 
>> Wanting to measure IPC overhead is a valid thing to do, but
>> certainly isn't the most common thing people want to do with pgbench.
> 
> I think that's nonsense.  Measuring how fast PG can do client interactions
> is EXACTLY what this is about.  Certainly, pushing SQL operations into
> server-side functions is a great way to reduce network overhead, but it
> has nothing to do with what we choose as a benchmark.

Current implementation of pgbench allows Pgpool-II (or any proxy type
middle ware) to test the behavior on PostgreSQL clusters. For example
it sends write queries to the master DB node and read queries to
standby nodes to distribute loads among DB nodes.

With the proposed implementation it is not possible to do that kind of
test anymore since everything is packed into a function.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-27 Thread Tom Lane
I wrote:
> I think that the correct fix probably involves marking each parallel scan
> plan node as dependent on a pseudo executor parameter, which the parent
> Gather or GatherMerge node would flag as being changed on each rescan.
> This would cue the plan layers in between that they cannot optimize on the
> assumption that the leader's instance of the parallel scan will produce
> exactly the same rows as it did last time, even when "nothing else
> changed".  The "wtParam" pseudo parameter that's used for communication
> between RecursiveUnion and its descendant WorkTableScan node is a good
> model for what needs to happen.

Here is a draft patch for this.  It's a bit different from wtParam in
that the special parameter isn't allocated until createplan.c time,
so that we don't eat a parameter slot if we end up choosing a non-parallel
plan; but otherwise things are comparable.

I could use some feedback on whether this is marking dependent child nodes
sanely.  As written, any plan node that's marked parallel_aware is assumed
to need a dependency on the parent Gather or GatherMerge's rescan param
--- and the planner will now bitch if a parallel_aware plan node is not
under any such Gather.  Is this reasonable?  I do not see any
documentation that defines the parallel_aware field with enough clarity
to be very sure about it.

I included the heap_parallelscan_nextpage hack I'm using to make the
original failure reproducible, but that hunk is not meant for commit.
Also, the regression test case is the same as in a2b70c89c.

regards, tom lane

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index ff03c68..c478897 100644
*** a/src/backend/access/heap/heapam.c
--- b/src/backend/access/heap/heapam.c
*** heap_parallelscan_nextpage(HeapScanDesc 
*** 1763,1768 
--- 1763,1770 
  			ss_report_location(scan->rs_rd, parallel_scan->phs_startblock);
  	}
  
+ 	pg_usleep(random()/100);
+ 
  	return page;
  }
  
diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index e8d94ee..3aa819f 100644
*** a/src/backend/executor/nodeGather.c
--- b/src/backend/executor/nodeGather.c
*** ExecShutdownGather(GatherState *node)
*** 430,435 
--- 430,438 
  void
  ExecReScanGather(GatherState *node)
  {
+ 	Gather	   *gather = (Gather *) node->ps.plan;
+ 	PlanState  *outerPlan = outerPlanState(node);
+ 
  	/*
  	 * Re-initialize the parallel workers to perform rescan of relation. We
  	 * want to gracefully shutdown all the workers so that they should be able
*** ExecReScanGather(GatherState *node)
*** 443,447 
  	if (node->pei)
  		ExecParallelReinitialize(node->pei);
  
! 	ExecReScan(node->ps.lefttree);
  }
--- 446,467 
  	if (node->pei)
  		ExecParallelReinitialize(node->pei);
  
! 	/*
! 	 * Set child node's chgParam to tell it that the next scan might deliver a
! 	 * different set of rows within the leader process.  (The overall rowset
! 	 * shouldn't change, but the leader process's subset might; hence nodes
! 	 * between here and the parallel table scan node mustn't optimize on the
! 	 * assumption of an unchanging rowset.)
! 	 */
! 	if (gather->rescan_param >= 0)
! 		outerPlan->chgParam = bms_add_member(outerPlan->chgParam,
! 			 gather->rescan_param);
! 
! 
! 	/*
! 	 * if chgParam of subnode is not null then plan will be re-scanned by
! 	 * first ExecProcNode.
! 	 */
! 	if (outerPlan->chgParam == NULL)
! 		ExecReScan(outerPlan);
  }
diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 64c6239..e8c70df 100644
*** a/src/backend/executor/nodeGatherMerge.c
--- b/src/backend/executor/nodeGatherMerge.c
*** ExecShutdownGatherMergeWorkers(GatherMer
*** 325,330 
--- 325,333 
  void
  ExecReScanGatherMerge(GatherMergeState *node)
  {
+ 	GatherMerge *gm = (GatherMerge *) node->ps.plan;
+ 	PlanState  *outerPlan = outerPlanState(node);
+ 
  	/*
  	 * Re-initialize the parallel workers to perform rescan of relation. We
  	 * want to gracefully shutdown all the workers so that they should be able
*** ExecReScanGatherMerge(GatherMergeState *
*** 339,345 
  	if (node->pei)
  		ExecParallelReinitialize(node->pei);
  
! 	ExecReScan(node->ps.lefttree);
  }
  
  /*
--- 342,365 
  	if (node->pei)
  		ExecParallelReinitialize(node->pei);
  
! 	/*
! 	 * Set child node's chgParam to tell it that the next scan might deliver a
! 	 * different set of rows within the leader process.  (The overall rowset
! 	 * shouldn't change, but the leader process's subset might; hence nodes
! 	 * between here and the parallel table scan node mustn't optimize on the
! 	 * assumption of an unchanging rowset.)
! 	 */
! 	if (gm->rescan_param >= 0)
! 		outerPlan->chgParam = bms_add_member(outerPlan->chgParam,
! 			 gm->rescan_param);
! 
! 
! 	/*
! 	 * if chgParam of subnode is not null then plan will be re-scanned by
! 	 * first 

Re: [HACKERS] [BUGS] [postgresql 10 beta3] unrecognized node type: 90

2017-08-27 Thread Tom Lane
I wrote:
> 4. The fact that the test succeeds on many machines implies that the
> leader process is usually doing *all* of the work.  This is in itself not
> very good.  Even on the machines where it fails, the fact that the tuple
> counts are usually a pretty large fraction of the expected values
> indicates that the leader usually did most of the work.  We need to take
> a closer look at why that is.

I've spent some time poking into this, and it seems the bottom line is
that the time needed to launch a parallel worker and get it started on
running the query plan is comparable to the time needed to scan all 1
rows of tenk1.  Maybe this isn't surprising, I dunno; but it doesn't give
me a warm feeling about how much exercise the parallel scan machinery is
really getting in select_parallel.sql.  Not sure what to do about that ---
I don't really want to make the regression test case large enough (and
slow enough) to provide a more realistic scenario.

In the meantime, I've been able to make the failure reproducible by the
expedient of sticking "pg_usleep(1000)" into heap_parallelscan_nextpage(),
thus slowing down the leader's scan enough so that the workers can get
started.

> However, the bottom line here is that parallel scan is completely broken
> for rescans, and it's not (solely) the fault of nodeGatherMerge; rather,
> the issue is that nobody bothered to wire up parallelism to the rescan
> parameterization mechanism.  I imagine that related bugs can be
> demonstrated in 9.6 with little effort.

I've so far been unable to break it for cases involving only Gather.
The issue is triggered, basically, by having a Sort or HashAgg node
below Gather[Merge], since either of those can decide that they don't
need to rescan their child.  While it's not terribly hard to get the
planner to make such plans, you always end up with another Sort or
HashAgg above the Gather, and that masks the problem because the upper
node makes the same decision that it needn't rescan its child, protecting
the Gather from being run more than once.  The particular plan shape that
a2b70c89c used,

   ->  Finalize GroupAggregate
 ->  Gather Merge
   ->  Partial GroupAggregate
 ->  Sort
   ->  Parallel Seq Scan on tenk1

does exhibit the problem, but it requires GatherMerge so that no extra
sort is needed above the parallel subplan.

This may mean that we don't need to risk back-patching the fix into 9.6.
I'm not totally convinced of that yet, but I can't show that it's needed
given 9.6's limited support for parallelism.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch: Pgbench Serialization and deadlock errors

2017-08-27 Thread Fabien COELHO


Hello,

Here is the third version of the patch for pgbench thanks to Fabien Coelho 
comments. As in the previous one, transactions with serialization and 
deadlock failures are rolled back and retried until they end successfully or 
their number of tries reaches maximum.


Here is some partial review.

Patch applies cleanly.

It compiles with warnings, please fix them:

  pgbench.c:2624:28: warning: ‘failure_status’ may be used uninitialized in 
this function
  pgbench.c:2697:34: warning: ‘command’ may be used uninitialized in this 
function

I do not think that the error handling feature needs preeminence in the
final report, compare to scale, number of clients and so. The number
of tries should be put further on.

I would spell "number of tries" instead of "tries number" which seems to
suggest that each try is attributed a number. "sql" -> "SQL".

For the per statement latency final report, I do not think it is worth 
distinguishing the kind of retry at this level, because ISTM that 
serialization & deadlocks are unlikely to appear simultaneously. I would 
just report total failures and total tries on this report. We only have 2 
errors now, but if more are added I'm pretty sure that we would not want 
to have more columns... Moreover the 25 characters alignment is ugly, 
better use a much smaller alignment.


I'm okay with having details shown in the "log to file" group report.
The documentation does not seem consistent. It discusses "the very last fields"
and seem to suggest that there are two, but the example trace below just
adds one field.

If you want a paragraph you should add , skipping a line does not
work (around "All values are computed for ...").

I do not understand the second note of the --max-tries documentation.
It seems to suggest that some script may not end their own transaction...
which should be an error in my opinion? Some explanations would be welcome.

I'm not sure that "Retries" deserves a type of its own for two counters.
The "retries" in RetriesState may be redundant with these.
The failures are counted on simple counters while retries have a type,
this is not consistent. I suggest to just use simple counters everywhere.

I'm ok with having the detail report tell about failures & retries only
when some occured.

typo: sucessufully -> successfully

If a native English speaker could provide an opinion on that, and more
generally review the whole documentation, it would be great.

I think that the rand functions should really take a random_state pointer
argument, not a Thread or Client.

I'm at odds that FailureStatus does not have a clean NO_FAILURE state,
and that it is merged with misc failures.

I'm not sure that initRetries, mergeRetries, getAllRetries really
deserve a function.

I do not thing that there should be two accum Functions. Just extend
the existing one, and adding zero to zero is not a problem.

I guess that in the end pgbench & psql variables will have to be merged
if pgbench expression engine is to be used by psql as well, but this is
not linked to this patch.

The tap tests seems over-complicated and heavy with two pgbench run in
parallel... I'm not sure we really want all that complexity for this
somehow small feature. Moreover pgbench can run several scripts, I'm not
sure why two pgbench would need to be invoked. Could something much
simpler and lighter be proposed instead to test the feature?

The added code does not conform to Pg C style. For instance, if brace
should be aligned to the if. Please conform the project style.

The is_transaction_block_end seems simplistic. ISTM that it would not
work with compound commands. It should be clearly documented somewhere.

Also find attached two scripts I used for some testing:

  psql < dl_init.sql
  pgbench -f dl_trans.sql -c 8 -T 10 -P 1

--
Fabien.

dl_init.sql
Description: application/sql


dl_trans.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: Poor cost estimate with interaction between table correlation and partial indexes

2017-08-27 Thread David Fetter
On Sat, Aug 26, 2017 at 05:50:26PM -0700, Michael Malis wrote:
> Do you think this is a reasonable approach? Should I start working
> on a patch based on the solution I described or is there some other
> approach I should look into?

You'll get more traction with a proof-of-concept patch accompanying
the plan than without.  Don't bother with any level of care past
proof-of-concept until you get positive feedback.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Petr Jelinek
On 27/08/17 04:32, Noah Misch wrote:
> On Fri, Aug 25, 2017 at 12:09:00PM +0200, Petr Jelinek wrote:
>> On 24/08/17 19:54, Tom Lane wrote:
>>> sungazer just failed with
>>>
>>> pg_recvlogical exited with code '256', stdout '' and stderr 
>>> 'pg_recvlogical: could not send replication command "START_REPLICATION SLOT 
>>> "test_slot" LOGICAL 0/0 ("include-xids" '0', "skip-empty-xacts" '1')": 
>>> ERROR:  replication slot "test_slot" is active for PID 8913148
>>> pg_recvlogical: disconnected
>>> ' at 
>>> /home/nm/farm/gcc64/HEAD/pgsql.build/src/test/recovery/../../../src/test/perl/PostgresNode.pm
>>>  line 1657.
>>>
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sungazer=2017-08-24%2015%3A16%3A10
>>>
>>> Looks like we're still not there on preventing replication startup
>>> race conditions.
>>
>> Hmm, that looks like "by design" behavior. Slot acquiring will throw
>> error if the slot is already used by somebody else (slots use their own
>> locking mechanism that does not wait). In this case it seems the
>> walsender which was using slot for previous previous step didn't finish
>> releasing the slot by the time we start new command. We can work around
>> this by changing the test to wait perhaps.
> 
> [Action required within three days.  This is a generic notification.]
> 
> The above-described topic is currently a PostgreSQL 10 open item.  Simon,
> since you committed the patch believed to have created it, you own this open
> item.  If some other commit is more relevant or if this does not belong as a
> v10 open item, please let us know.  Otherwise, please observe the policy on
> open item ownership[1] and send a status update within three calendar days of
> this message.  Include a date for your subsequent status update.  Testers may
> discover new open items at any time, and I want to plan to get them all fixed
> well in advance of shipping v10.  Consequently, I will appreciate your efforts
> toward speedy resolution.  Thanks.
> 
> [1] 
> https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com
> 

Attached should fix this.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
>From d66caa3532558828ecb89b4f153cc1f29332b918 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Sun, 27 Aug 2017 11:28:49 +0200
Subject: [PATCH] Fix race condition in logical decoding tap test

---
 src/test/recovery/t/006_logical_decoding.pl | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/test/recovery/t/006_logical_decoding.pl b/src/test/recovery/t/006_logical_decoding.pl
index 4a90e9a..8b35bc8 100644
--- a/src/test/recovery/t/006_logical_decoding.pl
+++ b/src/test/recovery/t/006_logical_decoding.pl
@@ -78,6 +78,11 @@ chomp($stdout_recv);
 is($stdout_recv, $expected,
 	'got same expected output from pg_recvlogical decoding session');
 
+$node_master->poll_query_until('postgres',
+"SELECT EXISTS (SELECT 1 FROM pg_replication_slots WHERE slot_name = 'test_slot' AND active_pid IS NULL)"
+)
+  or die "slot never became inactive";
+
 $stdout_recv = $node_master->pg_recvlogical_upto(
 	'postgres', 'test_slot', $endpos, 10,
 	'include-xids' => '0',
-- 
2.7.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-27 Thread Fabien COELHO


Quick precision to my previous review.


sh> ./pgbench -i -I t
done.


There should be "creating tables..."


Does not seem to have initialized the tables although it was requested...

sh> ./pgbench -i -I d
creating tables...


Probably "filling tables..." would be more appropriate.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: Skipping the creating primary keys after initialization

2017-08-27 Thread Fabien COELHO


Hello Masahiko-san,


Attached latest v4 patch. Please review it.


Patch applies, compiles.

The messages/options do not seem to work properly:

 sh> ./pgbench -i -I t
 done.

Does not seem to have initialized the tables although it was requested...

 sh> ./pgbench -i -I d
 creating tables...
 10 of 10 tuples (100%) done (elapsed 0.08 s, remaining 0.00 s)
 done.

It seems that "d" triggered table creation... In fact it seems that the
work is done correctly, but the messages are not in the right place.

Also another issue:

 sh> ./pgbench -i --foreign-keys
 creating tables...
 10 of 10 tuples (100%) done (elapsed 0.09 s, remaining 0.00 s)
 vacuum...
 set primary keys...
 done.

Foreign keys do not seem to have been set... Please check that all really 
work as expected.



About the documentation:

If a native English speaker could review the text, it would be great.

At least: "Required to invoke" -> "Require to invoke".


About the code:

is_no_vacuum should be a bool?

I'm really hesitating about the out of order processing of options. If the user 
writes

  sh> pgbench -i --no-vacuum -I v
  done.

Then does it make sense to ignore the last thing the user asked for? ISTM 
that processing options in order and keeping the last resulting spec is 
more natural. Appending contradictory options can happen easily when 
scripting, and usual what is meant is the last one.


Again, as pointed out in the previous review, I do not like much 
checkCustomCmds implementation: switch/case, fprintf and return on error 
which will trigger another fprintf and error above... ISTM that you should 
either take into account previous comments or explain why you disagree 
with them, but not send the same code without addressing them in any way.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] type cache for concat functions

2017-08-27 Thread Pavel Stehule
Hi

2017-08-24 19:24 GMT+02:00 Alexander Kuzmenkov :

> Hi Pavel,
>
> I tried applying your patch, it applies and compiles fine, check and
> checkworld pass.
>
> I ran a simple performance test, select concat(generate_series(1,10),
> ... [x5 total]) vs select generate_series(1,10)::text || ... .
> Operator || runs in 60 ms, while unpatched concat takes 90 and patched --
> 55 ms.
>
> About the code:
> * There seems to be an extra tab here:
> FmgrInfo*typcache;
> It's not aligned with the previous declaration with tab size 4.
>
> * You could allocate the cache and store it into the context inside
> rebuildConcatCache. It would return return the cache pointer, and the
> calling code would look cleaner -- just one line. This is a matter of taste
> though.
>
> * The nearby functions use snake_case, so it should be
> rebuild_concat_cache too.
>

all should be fixed.

Regards

Pavel


>
> Overall the patch looks good to me.
>
> --
> Alexander Kuzmenkov
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company
>
>
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index ebfb823fb8..24d7512320 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -4734,6 +4734,38 @@ string_agg_finalfn(PG_FUNCTION_ARGS)
 }
 
 /*
+ * Prepare cache with typeOutput fmgr info for any argument of
+ * concat like function.
+ */
+static FmgrInfo *
+build_concat_typcache(FunctionCallInfo fcinfo, int argidx)
+{
+	FmgrInfo *typcache;
+	int		i;
+
+	typcache = MemoryContextAlloc(fcinfo->flinfo->fn_mcxt,
+		PG_NARGS() * sizeof(FmgrInfo));
+
+	for (i = argidx; i < PG_NARGS(); i++)
+	{
+		Oid			valtype;
+		Oid			typOutput;
+		bool		typIsVarlena;
+
+		valtype = get_fn_expr_argtype(fcinfo->flinfo, i);
+		if (!OidIsValid(valtype))
+			elog(ERROR, "could not determine data type of concat() input");
+
+		getTypeOutputInfo(valtype, , );
+		fmgr_info_cxt(typOutput, [i-argidx], fcinfo->flinfo->fn_mcxt);
+	}
+
+	fcinfo->flinfo->fn_extra = typcache;
+
+	return typcache;
+}
+
+/*
  * Implementation of both concat() and concat_ws().
  *
  * sepstr is the separator string to place between values.
@@ -4748,6 +4780,7 @@ concat_internal(const char *sepstr, int argidx,
 	StringInfoData str;
 	bool		first_arg = true;
 	int			i;
+	FmgrInfo   *typcache;
 
 	/*
 	 * concat(VARIADIC some-array) is essentially equivalent to
@@ -4787,14 +4820,15 @@ concat_internal(const char *sepstr, int argidx,
 	/* Normal case without explicit VARIADIC marker */
 	initStringInfo();
 
+	typcache = (FmgrInfo *) fcinfo->flinfo->fn_extra;
+	if (typcache == NULL)
+		typcache = build_concat_typcache(fcinfo, argidx);
+
 	for (i = argidx; i < PG_NARGS(); i++)
 	{
 		if (!PG_ARGISNULL(i))
 		{
 			Datum		value = PG_GETARG_DATUM(i);
-			Oid			valtype;
-			Oid			typOutput;
-			bool		typIsVarlena;
 
 			/* add separator if appropriate */
 			if (first_arg)
@@ -4802,13 +4836,8 @@ concat_internal(const char *sepstr, int argidx,
 			else
 appendStringInfoString(, sepstr);
 
-			/* call the appropriate type output function, append the result */
-			valtype = get_fn_expr_argtype(fcinfo->flinfo, i);
-			if (!OidIsValid(valtype))
-elog(ERROR, "could not determine data type of concat() input");
-			getTypeOutputInfo(valtype, , );
 			appendStringInfoString(,
-   OidOutputFunctionCall(typOutput, value));
+	OutputFunctionCall([i-argidx], value));
 		}
 	}
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: faster version of tpcb-like transaction

2017-08-27 Thread Fabien COELHO


About the patch:

I'm generally in favor of providing more options to pgbench, especially if 
it can give optimization ideas to the performance conscious user.


I think that the name should be "tpcb-like-plfunc": the script does not 
implement tpcb per spec, and such a function could be written in another 
language with some performance benefit, or not.


Maybe that mean to relax the prefix condition to "take the first matching 
name" when prefix are used.


If you are reimplementing the transaction anyway, you could consider using 
UPDATE RETURNING instead of SELECT to get the balance. On the other hand 
the doc says that the "steps" are put in a PL function, so maybe it should 
reflect the original script.


I'm surprised by:

  "select * from pgbench_transaction(:aid, :bid, :tid, :delta);\n"

Why not simply:

"select pgbench_transaction(:aid, :bid, :tid, :delta);\n"

I would suggest to use a more precise function name, in case other 
functions are thought of. Maybe "pgbench_tpcb_like_plfunc".


I would suggest to indent better the PL/function and put keywords and 
types in capital, and add explicitely the properties of the function (eg 
STRICT, VOLATILE?).


There is a spurious space at the end of the executeStatement call line.

The patch potentially interacts with other patches in the long and 
slow queue...


As usual with pgbench there are no regression tests.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More replication race conditions

2017-08-27 Thread Michael Paquier
On Sun, Aug 27, 2017 at 12:03 PM, Tom Lane  wrote:
> And *another* replication test race condition just now:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dangomushi=2017-08-26%2019%3A37%3A08
>
> As best I can interpret this, it's pointing out that this bit in
> src/test/recovery/t/009_twophase.pl:
>
> $cur_master->psql(
> 'postgres', "
> BEGIN;
> CREATE TABLE t_009_tbl2 (id int, msg text);
> SAVEPOINT s1;
> INSERT INTO t_009_tbl2 VALUES (27, 'issued to ${cur_master_name}');
> PREPARE TRANSACTION 'xact_009_13';
> -- checkpoint will issue XLOG_STANDBY_LOCK that can conflict with lock
> -- held by 'create table' statement
> CHECKPOINT;
> COMMIT PREPARED 'xact_009_13';");
>
> $cur_standby->psql(
> 'postgres',
> "SELECT count(*) FROM t_009_tbl2",
> stdout => \$psql_out);
> is($psql_out, '1', "Replay prepared transaction with DDL");
>
> contains exactly no means of ensuring that the master's transaction has
> been replayed on the standby before we check for its results.  It's not
> real clear why it seems to work 99.99% of the time, because, well, there
> isn't any frickin' interlock there ...

I have noticed this one this morning, and I am planning to address it
with a proper patch soonishly. (I am still fighting a bit to get
dangomushi in a more stable stable, and things run slow on it, so it
is good at catching race conditions of this kind).
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Variable substitution in psql backtick expansion

2017-08-27 Thread Fabien COELHO



Spending developer time to write code for the hypothetical someone running
a psql version 11 linked to a libpq < 7.4, if it can even link, does not
look like a very good investment... Anyway, here is required the update.


The question is the server's version, not libpq.


Ok.

Modern psql does still talk to ancient servers (I tried 11devel against 
7.2 just now, to be sure).


Wow:-)


The stuff in describe.c may not work well, but basic functionality
is there and I don't want to break it.


Ok. Then the updated patch should work, although I do not have a setup to 
test that easily.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench: faster version of tpcb-like transaction

2017-08-27 Thread Fabien COELHO


Hello Tom,


I dunno, it seems like this proposal involves jacking up the test case
and driving a completely different one underneath.  There is no reason
to consider that you've improved the benchmark results --- you've just
substituted a different benchmark, one with no historical basis, and
not a lot of field justification either.


ISTM that putting some SQL in a function and calling it is standard 
practice in some classes of applications, although probably not the most 
frequent.


Moreover, as far as the TPC-B benchmark is concerned, it looks like a 
perflectly legitimate implementation of the benchmark: the transaction 
profile (Section 1.2) is described as 4 inputs sent in and one result 
returned. The fact that the SQL commands are sent one at a time by the 
client to the server is a pgbench choice that I would not have done if I 
wanted to show the greatest TPC-B numbers with Pg.


Nor does it mean that it is a bad idea to do so... For instance an ORM web 
application might tend to generate simple unprepared CRUD queries and 
interact a lot back and forth, and the default test is not to bad to 
reflect that particular kind of behavior.


Basically there are different kind of application implementations and 
different tests could reflect those. So I am fine with providing more 
benchmarking options to pgbench, and I intend to do so once its 
capabilities are improved.


A caveat I have with Jeff patch is that "tpcb-func" is a misnommer because 
pgbench does NOT implement tpcb per spec, and it is my intention to 
propose a variant which does implement the spec when possible. Now I think 
that I'm also responsible for the prefix constraint on names...


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers