Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-01-29 Thread Teodor Sigaev

Sure. I attached two patches. But notice that pg_trgm.limit should be used with
this command:
SHOW "pg_trgm.limit";
If you will use this command:
SHOW pg_trgm.limit;
you will get the error:
ERROR:  syntax error at or near "limit"
LINE 1: SHOW pg_trgm.limit;
  ^

This is because "limit" is keyword I think.


It's easy to fix in gram.y:
@@ -1499,7 +1499,7 @@ set_rest_more:/* Generic SET syntaxes: */
;

 var_name:  ColId   { $$ = $1; }
-   | var_name '.' ColId
+   | var_name '.' ColLabel
{ $$ = psprintf("%s.%s", $1, $3); }
;

ColId doesn't contain reserved_keyword, it's impossible to change initial part 
of var_name to ColId because of a lot of conflicts in grammar but could be easy 
changed for second part of var_name. It seems like improvement in any case but
sml_limit or similarity_limit or even similarity_treshold  is more preferable 
name than just simple limit. In future we could introduce more tresholds/limits.


Also, should get/set_limit emit a warning about deprecation?

Some notices about substring patch itself:
1  trgm2.data contains too much duplicates (like Barkala or Bakalan). Is it 
really needed for testing?


2 I'm agree with Jeff Janes about <<-> and <->> operation. They are needed. 
(http://www.postgresql.org/message-id/CAMkU=1zynkqfkr-j2_uq8lzp0uho8i+ledfwgt77czk_tnt...@mail.gmail.com)


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-01-29 Thread Artur Zakirov

On 29.01.2016 17:15, Teodor Sigaev wrote:

The behavior of this function is surprising to me.

select substring_similarity('dog' ,  'hotdogpound') ;

  substring_similarity
--
  0.25


Substring search was desined to search similar word in string:
contrib_regression=# select substring_similarity('dog' ,  'hot dogpound') ;
  substring_similarity
--
  0.75

contrib_regression=# select substring_similarity('dog' ,  'hot dog
pound') ;
  substring_similarity
--
 1
It seems to me that users search words in long string. But I'm agree
that more detailed explanation needed and, may be, we need to change
feature name to fuzzywordsearch or something else, I can't imagine how.



Thank you for the review. I will rename the function name. Maybe to 
subword_similarity()?






Also, should we have a function which indicates the position in the
2nd string at which the most similar match to the 1st argument occurs?

select substring_similarity_pos('dog' ,  'hotdogpound') ;

answering: 4

Interesting, I think, it will be useful in some cases.



We could call them <<-> and <->> , where the first corresponds to <%
and the second to %>

Agree


I will add them.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-01-29 Thread Ashutosh Bapat
On Fri, Jan 29, 2016 at 2:05 PM, Etsuro Fujita 
wrote:

> On 2016/01/29 1:26, Ashutosh Bapat wrote:
>
>> Here's an updated version of the previous patches, broken up like before
>>
>
> 2. pg_fdw_join_v3.patch: changes to postgres_fdw - more description below
>>
>
> Here is the summary of changes from the last set of patches
>>
>
> 2. Included fix for EvalPlanQual in postgres_fdw - an alternate local
>> path is obtained from the list of paths linked to the joinrel. Since
>> this is done before adding the ForeignPath, we should be a local path
>> available for given join.
>>
>
> I looked at that code in the patch (ie, postgresRecheckForeignScan and the
> helper function that creates a local join path for a given foreign join
> path.), briefly.  Maybe I'm missing something, but I think that is
> basically the same as the fix I proposed for addressing this issue, posted
> before [1], right?


Yes, although I have added functions to copy the paths, not consider
pathkeys and change the relevant members of the paths. Sorry  if I have
missed giving you due credits.


>   If so, my concern is, the helper function probably wouldn't extend to
> the parameterized-foreign-join-path cases, though that would work well for
> the unparameterized-foreign-join-path cases.  We don't support
> parameterized-foreign-join paths for 9.6?
>
>
If we do not find a local path with given parameterization, it means there
are other local parameterized paths which are superior to it. This possibly
indicates that there will be foreign join parameterised paths which are
superior to this parameterized path, so we basically do not create foreign
join path with that parameterization.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-01-29 Thread Anastasia Lubennikova

28.01.2016 20:03, Thom Brown:
On 28 January 2016 at 16:12, Anastasia Lubennikova 
> 
wrote:



28.01.2016 18:12, Thom Brown:


On 28 January 2016 at 14:06, Anastasia Lubennikova
> wrote:


31.08.2015 10:41, Anastasia Lubennikova:

Hi, hackers!
I'm going to begin work on effective storage of duplicate
keys in B-tree index.
The main idea is to implement posting lists and posting
trees for B-tree index pages as it's already done for GIN.

In a nutshell, effective storing of duplicates in GIN is
organised as follows.
Index stores single index tuple for each unique key. That
index tuple points to posting list which contains pointers
to heap tuples (TIDs). If too many rows having the same key,
multiple pages are allocated for the TIDs and these
constitute so called posting tree.
You can find wonderful detailed descriptions in gin readme


and articles .
It also makes possible to apply compression algorithm to
posting list/tree and significantly decrease index size.
Read more in presentation (part 1)
.

Now new B-tree index tuple must be inserted for each table
row that we index.
It can possibly cause page split. Because of MVCC even
unique index could contain duplicates.
Storing duplicates in posting list/tree helps to avoid
superfluous splits.


I'd like to share the progress of my work. So here is a WIP
patch.
It provides effective duplicate handling using posting lists
the same way as GIN does it.

Layout of the tuples on the page is changed in the following way:
before:
TID (ip_blkid, ip_posid) + key, TID (ip_blkid, ip_posid) +
key, TID (ip_blkid, ip_posid) + key
with patch:
TID (N item pointers, posting list offset) + key, TID
(ip_blkid, ip_posid), TID (ip_blkid, ip_posid), TID
(ip_blkid, ip_posid)

It seems that backward compatibility works well without any
changes. But I haven't tested it properly yet.

Here are some test results. They are obtained by test
functions test_btbuild and test_ginbuild, which you can find
in attached sql file.
i - number of distinct values in the index. So i=1 means that
all rows have the same key, and i=1000 means that all
keys are different.
The other columns contain the index size (MB).

i   B-tree Old  B-tree New  GIN
1   214,234375  87,7109375  10,2109375
10  214,234375  87,7109375  10,71875
100 214,234375  87,4375 15,640625
1000214,234375  86,2578125  31,296875
1   214,234375  78,421875   104,3046875
10  214,234375  65,359375   49,078125
100 214,234375  90,140625   106,8203125
1000214,234375  214,234375  534,0625


You can note that the last row contains the same index sizes
for B-tree, which is quite logical - there is no compression
if all the keys are distinct.
Other cases looks really nice to me.
Next thing to say is that I haven't implemented posting list
compression yet. So there is still potential to decrease size
of compressed btree.

I'm almost sure, there are still some tiny bugs and missed
functions, but on the whole, the patch is ready for testing.
I'd like to get a feedback about the patch testing on some
real datasets. Any bug reports and suggestions are welcome.

Here is a couple of useful queries to inspect the data inside
the index pages:
create extension pageinspect;
select * from bt_metap('idx');
select bt.* from generate_series(1,1) as n, lateral
bt_page_stats('idx', n) as bt;
select n, bt.* from generate_series(1,1) as n, lateral
bt_page_items('idx', n) as bt;

And at last, the list of items I'm going to complete in the
near future:
1. Add storage_parameter 'enable_compression' for btree
access method which specifies whether the index handles
duplicates. default is 'off'
2. Bring back microvacuum functionality for compressed indexes.
3. Improve insertion speed. Insertions became significantly
slower with compressed btree, which is obviously not what we
do want.
4. Clean the code and comments, add related 

Re: [HACKERS] pgbench stats per script & other stuff

2016-01-29 Thread Fabien COELHO


Hello Alvaro,

Thanks for the progress!


I pushed this, along with a few more tweaks, mostly adding comments and
moving functions so that related things are together.  I hope I didn't
break anything.


Looks ok.

Here is a rebase of the 3 remaining parts:
 - 15-c: per script stats
 - 15-d: weighted scripts
 - 15-e: prefix selection for -b

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 42d0667..ade1b53 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1138,6 +1138,9 @@ number of transactions actually processed: 1/1
 tps = 618.764555 (including connections establishing)
 tps = 622.977698 (excluding connections establishing)
 SQL script 1: builtin: TPC-B (sort of)
+ - 1 transactions (100.0% of total, tps = 618.764555)
+ - latency average = 15.844 ms
+ - latency stddev = 2.715 ms
  - statement latencies in milliseconds:
 0.004386\set nbranches 1 * :scale
 0.001343\set ntellers 10 * :scale
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 44da3d1..64c7a6c 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -164,6 +164,7 @@ bool		use_log;			/* log transaction latencies to a file */
 bool		use_quiet;			/* quiet logging onto stderr */
 int			agg_interval;		/* log aggregates instead of individual
  * transactions */
+boolper_script_stats = false; /* whether to collect stats per script */
 int			progress = 0;		/* thread progress report every this seconds */
 bool		progress_timestamp = false; /* progress report with Unix time */
 int			nclients = 1;		/* number of clients */
@@ -299,6 +300,7 @@ static struct
 {
 	const char *name;
 	Command   **commands;
+	StatsData stats;
 }	sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
@@ -1326,7 +1328,7 @@ top:
 		/* transaction finished: calculate latency and log the transaction */
 		if (commands[st->state + 1] == NULL)
 		{
-			if (progress || throttle_delay || latency_limit || logfile)
+			if (progress || throttle_delay || latency_limit || per_script_stats || logfile)
 processXactStats(thread, st, , false, logfile, agg);
 			else
 thread->stats.cnt++;
@@ -1419,7 +1421,7 @@ top:
 	}
 
 	/* Record transaction start time under logging, progress or throttling */
-	if ((logfile || progress || throttle_delay || latency_limit) && st->state == 0)
+	if ((logfile || progress || throttle_delay || latency_limit || per_script_stats) && st->state == 0)
 	{
 		INSTR_TIME_SET_CURRENT(st->txn_begin);
 
@@ -1872,6 +1874,9 @@ processXactStats(TState *thread, CState *st, instr_time *now,
 
 	if (use_log)
 		doLog(thread, st, logfile, now, agg, skipped, latency, lag);
+
+	if (per_script_stats) /* mutex? hmmm... these are only statistics */
+		accumStats(& sql_script[st->use_file].stats, skipped, latency, lag);
 }
 
 
@@ -2661,6 +2666,7 @@ addScript(const char *name, Command **commands)
 
 	sql_script[num_scripts].name = name;
 	sql_script[num_scripts].commands = commands;
+	initStats(& sql_script[num_scripts].stats, 0.0);
 	num_scripts++;
 }
 
@@ -2744,22 +2750,40 @@ printResults(TState *threads, StatsData *total, instr_time total_time,
 	printf("tps = %f (including connections establishing)\n", tps_include);
 	printf("tps = %f (excluding connections establishing)\n", tps_exclude);
 
-	/* Report per-command latencies */
-	if (is_latencies)
+	/* Report per-script stats */
+	if (per_script_stats)
 	{
 		int			i;
 
 		for (i = 0; i < num_scripts; i++)
 		{
-			Command   **commands;
+			printf("SQL script %d: %s\n"
+   " - "INT64_FORMAT" transactions (%.1f%% of total, tps = %f)\n",
+   i + 1, sql_script[i].name,
+   sql_script[i].stats.cnt,
+   100.0 * sql_script[i].stats.cnt / total->cnt,
+   sql_script[i].stats.cnt / time_include);
 
-			printf("SQL script %d: %s\n", i + 1, sql_script[i].name);
-			printf(" - statement latencies in milliseconds:\n");
+			if (latency_limit)
+printf(" - number of transactions skipped: "INT64_FORMAT" (%.3f%%)\n",
+	   sql_script[i].stats.skipped,
+	   100.0 * sql_script[i].stats.skipped /
+	   (sql_script[i].stats.skipped + sql_script[i].stats.cnt));
 
-			for (commands = sql_script[i].commands; *commands != NULL; commands++)
-printf("   %11.3f  %s\n",
-  1000.0 * (*commands)->stats.sum / (*commands)->stats.count,
-	   (*commands)->line);
+			printSimpleStats(" - latency", & sql_script[i].stats.latency);
+
+			/* Report per-command latencies */
+			if (is_latencies)
+			{
+Command ** com;
+
+printf(" - statement latencies in milliseconds:\n");
+
+for (com = sql_script[i].commands; *com != NULL; com++)
+	printf("   %11.3f  %s\n",
+		   1000.0 * (*com)->stats.sum / (*com)->stats.count,
+		   (*com)->line);
+			}
 		}
 	}
 }
@@ -2945,6 +2969,7 @@ main(int argc, char **argv)
 			

Re: [HACKERS] extend pgbench expressions with functions

2016-01-29 Thread Fabien COELHO



Also, a comment is needed to explain why such a bizarre
condition is used/needed for just the INT64_MIN case.


The last patch I sent has this bit:
+  /*
+   * Some machines throw a floating-point exception
+   * for INT64_MIN % -1, the correct answer being
+   * zero in any case.
+   */
How would you reformulate that à-la-Fabien?


This one about modulo is fine.

I was refering to this other one in the division case:

+/* overflow check (needed for INT64_MIN) */
+if (lval != 0 && (*retval < 0 == lval < 0))

Why not use "if (lval == INT64_MIN)" instead of this complicated 
condition? If it is really needed for some reason, I think that a comment 
could help.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes && superuser review

2016-01-29 Thread Stephen Frost
Michael,

* Michael Paquier (michael.paqu...@gmail.com) wrote:
> On Fri, Jan 29, 2016 at 6:37 AM, Stephen Frost  wrote:
> > * Robert Haas (robertmh...@gmail.com) wrote:
> >> On Thu, Jan 28, 2016 at 11:04 AM, Stephen Frost 
> wrote:
> >> > Personally, I don't have any particular issue having both, but the
> >> > desire was stated that it would be better to have the regular
> >> > GRANT EXECUTE ON catalog_func() working before we consider having
> >> > default roles for same.  That moves the goal posts awful far though, if
> >> > we're to stick with that and consider how we might extend the GRANT
> >> > system in the future.
> >>
> >> I don't think it moves the goal posts all that far.  Convincing
> >> pg_dump to dump grants on system functions shouldn't be a crazy large
> >> patch.
> >
> > I wasn't clear as to what I was referring to here.  I've already written
> > a patch to pg_dump to support grants on system objects and agree that
> > it's at least reasonable.
> 
> Is it already posted somewhere? I don't recall seeing it. Robert and Noah
> have a point that this would be useful for users who would like to dump
> GRANT/REVOKE rights on system functions & all, using a new option in
> pg_dumpall, say --with-system-acl or --with-system-privileges. 

Multiple versions were posted on this thread.  I don't fault you for not
finding it, this thread is a bit long in the tooth.  The one I'm
currently working from is:

http://www.postgresql.org/message-id/attachment/38049/catalog_function_acls_v4.patch

I'm going to split up the pg_dump changes and the backend changes, as
they can logically go in independently (though without both, we're not
moving the needle very far with regards to what administrators can do).

> If at least
> the three of you are agreeing here I think that we should try to move at
> least toward this goal first. That seems a largely doable goal for 9.6. For
> the set of default roles, there is clearly no clear consensus regarding
> what each role should do or not, and under which limitation it should
> operate.

I'm trying to work towards a consensus on the default roles, hence the
questions and discussion posed in the email you replied to.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Petr Jelinek
On 29 January 2016 at 14:48, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> I would guess that the DDL boilterplate should come from Alexander
>> Korotkov's patch, right?  I think a first easy step may be to combine
>> parts both patches so that we get the "amkind" column from this patch
>> and the DDL support from Alexander's patch (means that his proposed
>> command needs a new clause to specify the amkind);
>
> Uh, what?  Surely we would provide a bespoke command for each possible
> sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
> that the provided function has the right signature, and then it would put
> the correct amkind into the pg_am entry automatically.
>
> If we skimp on this infrastructure then we're just relying on the
> user not to make a typo, which doesn't seem like a good idea.
>

Yeah it has to be completely separate command for both that do
completely different sanity checks. It can't even be called CREATE
INDEX/SEQUENCE ACCESS METHOD unless we are willing to make ACCESS a
keyword due to preexisting CREATE INDEX/SEQUENCE  commands. The
previous version of the patch which used somewhat different underlying
catalog structure already had DDL and honestly writing the DDL part is
quite easy. I used CREATE ACCESS METHOD FOR SEQUENCE there.

> (Agreed though that a reasonable first step would be to add amkind to
> pg_am and make the appropriate adjustments to existing code, ie check
> that amkind is correct when fetching an index handler.  I considered
> putting that into the AM API refactor patch, but desisted.)
>

Well I was wondering how to handle this as well, the submitted patch
currently just adds Asserts, because IMHO the actual ERROR should be
thrown in the DDL not in the code that just uses it.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-01-29 Thread Alexander Korotkov
On Tue, Dec 29, 2015 at 11:56 AM, Amit Kapila 
wrote:

> On Wed, Dec 16, 2015 at 12:26 AM, Robert Haas 
> wrote:
> >
> >
> > In terms of this project overall, NumLWLocks() now knows about only
> > four categories of stuff: fixed lwlocks, backend locks (proc.c),
> > replication slot locks, and locks needed by extensions.  I think it'd
> > probably be fine to move the backend locks into PGPROC directly, and
> > the replication slot locks into ReplicationSlot.
> >
>
> IIdus has written a patch to move backend locks into PGPROC which
> I am reviewing and will do performance tests once he responds to
> my review comments and I have written a patch to move replication
> slot locks into ReplicationSlot which is attached with this mail.
>

This patch looks good for me.
It compiles without warnings, passes regression tests.
I also did small testing of replication slots in order to check that it
works correctly.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Tom Lane
Alvaro Herrera  writes:
> I would guess that the DDL boilterplate should come from Alexander
> Korotkov's patch, right?  I think a first easy step may be to combine
> parts both patches so that we get the "amkind" column from this patch
> and the DDL support from Alexander's patch (means that his proposed
> command needs a new clause to specify the amkind);

Uh, what?  Surely we would provide a bespoke command for each possible
sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
that the provided function has the right signature, and then it would put
the correct amkind into the pg_am entry automatically.

If we skimp on this infrastructure then we're just relying on the
user not to make a typo, which doesn't seem like a good idea.

(Agreed though that a reasonable first step would be to add amkind to
pg_am and make the appropriate adjustments to existing code, ie check
that amkind is correct when fetching an index handler.  I considered
putting that into the AM API refactor patch, but desisted.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-01-29 Thread Teodor Sigaev

The behavior of this function is surprising to me.

select substring_similarity('dog' ,  'hotdogpound') ;

  substring_similarity
--
  0.25


Substring search was desined to search similar word in string:
contrib_regression=# select substring_similarity('dog' ,  'hot dogpound') ;
 substring_similarity
--
 0.75

contrib_regression=# select substring_similarity('dog' ,  'hot dog pound') ;
 substring_similarity
--
1
It seems to me that users search words in long string. But I'm agree that more 
detailed explanation needed and, may be, we need to change feature name to 
fuzzywordsearch or something else, I can't imagine how.





Also, should we have a function which indicates the position in the
2nd string at which the most similar match to the 1st argument occurs?

select substring_similarity_pos('dog' ,  'hotdogpound') ;

answering: 4

Interesting, I think, it will be useful in some cases.



We could call them <<-> and <->> , where the first corresponds to <%
and the second to %>

Agree
--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-29 Thread Robert Haas
On Fri, Jan 29, 2016 at 7:02 AM, Rahila Syed  wrote:
> Apart from these, as suggested in [1] , finer grained reporting from index
> vacuuming phase can provide better insight. Currently we report number of
> blocks processed once at the end of vacuuming of each index.
> IIUC, what was suggested in [1] was instrumenting lazy_tid_reaped with a
> counter to count number of index tuples processed so far as lazy_tid_reaped
> is called for every index tuple to see if it matches any of the dead tuple
> tids.
>
> So additional parameters for each index can be,
> scanned_index_tuples
> total_index_tuples (from pg_class.reltuples entry)

Let's report blocks, not tuples.  The reason is that
pg_class.reltuples is only an estimate and might be wildly wrong on
occasion, but the length of the relation in blocks can be known with
certainty.

But other than that I agree with this.  Fine-grained is key.  If it's
not fine grained, then people really won't be able to tell what's
going on when VACUUM doesn't finish in a timely fashion.  And the
whole point is we want to be able to know that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-01-29 Thread Ashutosh Bapat
On Fri, Jan 29, 2016 at 9:51 AM, Robert Haas  wrote:

> On Thu, Jan 28, 2016 at 11:26 AM, Ashutosh Bapat
>  wrote:
> > 2. pg_fdw_join_v3.patch: changes to postgres_fdw - more description below
>
> This patch no longer quite applies because of conflicts with one of
> your other patches that I applied today (cf. commit
> fbe5a3fb73102c2cfec114a67943f4474383).
>
> And then I broke it some more by committing a patch to extract
> deparseLockingClause from postgresGetForeignPlan and move it to
> deparse.c, but that should pretty directly reduce the size of this
> patch.  I wonder if there are any other bits of refactoring of that
> sort that we can do in advance of landing the main patch, just to
> simplify review.  But I'm not sure there are: this patch removes very
> little existing code; it just adds a ton of new stuff.
>

PFA patch to move code to deparse SELECT statement into a function
deparseSelectStmtForRel(). This code is duplicated in
estimate_path_cost_size() and postgresGetForeignPlan(), so moving it into a
function avoids that duplication. As a side note, estimate_path_cost_size()
doesn't add FOR SHARE/UPDATE clause to the statement being EXPLAINed, even
if the actual statement during execution would have it. This difference
looks unintentional to me. This patch corrects it as well.
appendOrderByClause and appendWhereClause both create a context within
themselves and pass it to deparseExpr. This patch creates the context once
in deparseSelectStmtForRel() and then passes it to the other deparse
functions avoiding repeated context creation.


> copy_path_for_epq_recheck() and friends are really ugly.  Should we
> consider just adding copyObject() support for those node types
> instead?
>

the note in copyfuncs.c says
 * We also do not support copying Path trees, mainly
 * because the circular linkages between RelOptInfo and Path nodes can't
 * be handled easily in a simple depth-first traversal.

We may avoid that by just copying the pointer to RelOptInfo and not copy
the whole RelOptInfo. The other problem is paths in epq_paths will be
copied as many times as the number of 2-way joins pushed down. Let me give
it a try and produce patch with that.

I'm sure there's more -- this is a huge patch and I don't fully
> understand it yet -- but I'm out of energy for tonight.
>
>
Thanks a lot for your comments and moving this patch forward.
-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


pg_fdw_deparse_select.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Template for commit messages

2016-01-29 Thread Andres Freund
On 2016-01-29 04:42:08 -0500, Bruce Momjian wrote:
> On Thu, Jan 28, 2016 at 07:30:58PM -0500, Andrew Dunstan wrote:
> > I have no prejudice in this area, other than one in favor of any
> > rules being fairly precise. As for nuances, I guess they can be
> > specified in the commit message. The one thing I do find annoying
> > from time to time is the limit on subject size. Sometimes it's very
> > difficult to be sufficiently communicative in, say, 50 characters.
> 
> Agreed, but that is a gitweb limit we can't control, I think.

That's not that hard to change. And neither git nor kernel people use 50
char limits. I'm *strongly* opposed to making 50 chars the limit for the
first line.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Mac OS: invalid byte sequence for encoding "UTF8"

2016-01-29 Thread Artur Zakirov

On 28.01.2016 17:42, Artur Zakirov wrote:

On 27.01.2016 15:28, Artur Zakirov wrote:

On 27.01.2016 14:14, Stas Kelvich wrote:

Hi.

I tried that and confirm strange behaviour. It seems that problem with
small cyrillic letter ‘х’. (simplest obscene language filter? =)

That can be reproduced with simpler test

Stas




The test program was corrected. Now it uses wchar_t type. And it works
correctly and gives right output.

I think the NIImportOOAffixes() in spell.c should be corrected to avoid
this bug.



I have attached a patch. It adds new functions parse_ooaffentry() and
get_nextentry() and fixes a couple comments.

Now russian and other supported dictionaries can be used for text search
in Mac OS.

parse_ooaffentry() parses an affix file entry instead of sscanf(). It
has a similar algorithm to the parse_affentry() function.

Should I create a new patch to fix this bug (as I did) or this patch
should go with the patch
http://www.postgresql.org/message-id/56aa02ee.6090...@postgrespro.ru ?



I have created a new entry in the commitfest for this patch 
https://commitfest.postgresql.org/9/496/


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-01-29 Thread Robert Haas
On Fri, Jan 8, 2016 at 11:22 PM, Amit Kapila  wrote:
> That idea won't work as we need to separately register tranche for
> each process.  The other wayout could be to do it in CreateSharedProcArray()
> which will be quite similar to what we do for other tranches and
> it will cover all kind of processes.  Attached patch fixes this problem.
>
> I have considered to separately do it in InitProcessPhase2() and
> InitAuxiliaryProcess(), but then the registration will be done twice for
> some
> of the processes like bootstrap and same is true if we do this InitProcess()
> instead of InitProcessPhase2() and I think it won't be similar to what
> we do for other tranches.
>
> I have done the performance testing of the attached patch and the
> results are attached with this mail.  The main tests conducted are
> pgbench read-write and read-only tests and the results indicate that
> this patch doesn't introduce any regression, though you will see some
> cases where the performance is better with patch by ~5% and then
> regressed by 2~3%, but I think it is more of a noise, then anything
> else.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Template for commit messages

2016-01-29 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jan 28, 2016 at 9:52 AM, Tom Lane  wrote:
>> FWIW, I'm a bit suspicious of the idea that we need to make the commit
>> messages automated-tool-friendly.  What tools are there that would need
>> to extract this info, and would we trust them if they didn't understand
>> "nuances"?

> Well, I think what people are asking for is precisely a fixed format,
> and I do think there is value in that.  It's nice to capture the
> nuance, but the nuance is going to get flattened out anyway when the
> release notes are created.  If we all agree to use a fixed format,
> then a bunch of work there that probably has to be done manually can
> be automated.

I'm not particularly impressed by this argument --- as someone who
regularly prepares release notes, I see exactly zero value in this
to me.  I'm willing to go along with it if there's consensus, but the
argument that "Foo: Bar" format has more value than free-form hasn't
actually been made in any form that withstands any skepticism.

In any case, agreeing on a set of "Foo:" keywords isn't nearly enough
to allow any useful automated processing.  You have to standardize
the "Bar" part as well, and that hasn't been addressed at all.
Some example questions:

In "Bug:", bug number with or without "#"?  What format to use if
there's more than one related bug?

In headers referring to people: name? email? both? what if we have
incomplete information (not unusual in bug reports)? what about
referencing multiple people in same header? what can we do to avoid
variant spellings in different commit messages?

In "Backpatch-through:", how are we to indicate version numbers
exactly? what about special cases such as a patch applied
only to older branches?  And perhaps most importantly, why are
we bothering with that at all when git can tell us that much
more reliably?  (Personally, I only bother with mentioning this
in the commit message to the extent that I'm explaining *why*
I patched back so far and no further.  Which I think is useful
information that a "backpatch-through: N.N" kind of entry would
entirely fail to capture.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Template for commit messages

2016-01-29 Thread Heikki Linnakangas


On 28 January 2016 15:57:15 CET, Robert Haas  wrote:
>On Thu, Jan 28, 2016 at 9:52 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Thu, Jan 28, 2016 at 8:04 AM, Tomas Vondra
>>>  wrote:
 Why can't we do both? That is, have a free-form text with the
>nuances, and
 then Reviewed-By listing the main reviewers? The first one is for
>humans,
 the other one for automated tools.
>>
>>> I'm not objecting to or endorsing any specific proposal, just asking
>>> what we want to do about this.  I think the trick if we do it that
>way
>>> will be to avoid having it seem like too much duplication, but there
>>> may be a way to manage that.
>>
>> FWIW, I'm a bit suspicious of the idea that we need to make the
>commit
>> messages automated-tool-friendly.  What tools are there that would
>need
>> to extract this info, and would we trust them if they didn't
>understand
>> "nuances"?
>>
>> I'm on board with Bruce's template as being a checklist of points to
>be
>> sure to cover when composing a commit message.  I'm not sure we need
>> fixed-format rules.
>
>Well, I think what people are asking for is precisely a fixed format,
>and I do think there is value in that.  It's nice to capture the
>nuance, but the nuance is going to get flattened out anyway when the
>release notes are created.  If we all agree to use a fixed format,
>then a bunch of work there that probably has to be done manually can
>be automated.  However, if we all agree to use a fixed format except
>for you, we might as well just forget the whole thing, because the
>percentage of commits that are done by you is quite high.

Before I agree to adding fixed format lines to my commits, I'd like to know 
exactly what people would want to do with the information. "Bunch of work that 
probably could be automated" doesn't cut it. For example, if I tag someone as 
"Reviewed-by", does it mean that his name will automatically appear in the 
release notes? Or if there's a bug, is the reviewer then somehow responsible 
for missing it?

As a way of saying "thank you", I like a personalised, nuanced, message much 
better. True, we can do both. A good thing about processing the commit messages 
manually e.g  for compiling release notes is that there's human judgement on 
what to include.  Of course, that's a lot of work. Which is why I'd like to 
hear from whoever wants to make use of these lines to explain in more detail 
what information they need,  and what they would do with it, to make sure that 
what we add is actually useful, and that we don't add noise to the commit 
messages unnecessarily.

- Heikki



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Alvaro Herrera
Petr Jelinek wrote:
> On 18 January 2016 at 09:19, Craig Ringer  wrote:
> > Needs rework after the commit of https://commitfest.postgresql.org/8/336/
> 
> Here is version that applies to current master. There is some work to
> do (mostly cleanup) and the DDL is missing, but that's because I want
> to know what people think about the principle of how it works now and
> if it makes sense to finish it this way (explained in the original
> submission for Jan CF).

I would guess that the DDL boilterplate should come from Alexander
Korotkov's patch, right?  I think a first easy step may be to combine
parts both patches so that we get the "amkind" column from this patch
and the DDL support from Alexander's patch (means that his proposed
command needs a new clause to specify the amkind); then the really tough
stuff in both Alexander's and this patch can be rebased on top of that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-01-29 Thread Michael Paquier
On Fri, Jan 29, 2016 at 6:05 PM, Fabien COELHO  wrote:
> (instead of < in my previous suggestion, if some processors return 0 on
> -INT64_MIN). Also, a comment is needed to explain why such a bizarre
> condition is used/needed for just the INT64_MIN case.

The last patch I sent has this bit:
+  /*
+   * Some machines throw a floating-point exception
+   * for INT64_MIN % -1, the correct answer being
+   * zero in any case.
+   */
How would you reformulate that à-la-Fabien?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Mithun Cy
On Fri, Jan 29, 2016 at 5:11 PM, Mithun Cy 
wrote
>
>
>
> >I just ran some tests on above patch. Mainly to compare
> >how "longer sort keys" would behave with new(Qsort) and old Algo(RS) for
> sorting.
> >I have 8GB of ram and ssd storage.
>
>
> *Key length 520*
>
>
>
>
> *Number of records* 320 640 1280 2560
>
> 1.7 GB 3.5GB 7 GB 14GB
>
>
>
>
>
> *CASE 1*
>
>
>
> *RS* 23654.677 35172.811 44965.442 106420.155
> *Qsort* 14100.362 40612.829 101068.107 334893.391
>
>
>
>
>
> *CASE 2*
>
>
>
> *RS* 13427.378 36882.898 98492.644 310670.15
> *Qsort* 12475.133 32559.074 100772.531 322080.602
>
>
>
>
>
> *CASE 3*
>
>
>
> *RS* 17202.966 45163.234 122323.299 337058.856
> *Qsort* 12530.726 23343.753 59431.315 152862.837
>
>
>
> *CASE 1* *RS* 128822.735
>
> *Qsort* 90857.496
> *CSAE 2* *RS* *105631.775*
>
> *Qsort* *105938.334*
> *CASE 3* *RS* *152301.054*
>
> *Qsort* *149649.347*
>
>
Sorry forgot to mention above data in table is in unit of ms, returned by
psql client.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-01-29 Thread Alexander Korotkov
On Mon, Jan 4, 2016 at 5:58 PM, Robert Haas  wrote:

> On Mon, Jan 4, 2016 at 1:20 AM, Amit Kapila 
> wrote:
> > If we do that way, then user of API needs to maintain the interlock
> > guarantee that the requested number of locks is same as allocations
> > done from that tranche and also if it is not allocating all the
> > LWLocks in one function, it needs to save the base address of the
> > array and then allocate from it by maintaining some counter.
> > I agree that looking up for tranche names multiple times is not good,
> > if there are many number of lwlocks, but that is done at startup
> > time and not at operation-level.  Also if we look currently at
> > the extensions in contrib, then just one of them allocactes one
> > LWLock in this fashion, now there could be extnsions apart from
> > extensions in contrib, but not sure if they require many number of
> > LWLocks, so I feel it is better to give an API which is more
> > convinient for user to use.
>
> Well, I agree with you that we should provide the most convenient API
> possible, but I don't agree that yours is more convenient than the one
> I proposed.  I think it is less convenient.  In most cases, if the
> user asks for a large number of LWLocks, they aren't going to be each
> for a different purpose - they will all be for the same purpose, like
> one per buffer or one per hash partition.  The case where the user
> wants to allocate 8 lwlocks from an extension, each for a different
> purpose, and spread those allocations across a bunch of different
> functions probably does not exist.  *Maybe* there is somebody
> allocating lwlocks from an extension for unrelated purposes, but I'd
> be willing to bet that, if so, all of those allocations happen one
> right after the other in a single function, because anything else
> would be completely nuts.
>

I'm not sure if we should return LWLockPadded*.

+LWLockPadded *
> +GetLWLockAddinTranche(const char *tranche_name)


It would be nice to isolate extension from knowledge about padding. Could
we one day change it from LWLockPadded * to LWLockMinimallyPadded * or any?

+LWLock **
> +GetLWLockAddinTranche(const char *tranche_name)


Could we use this signature?


> >> > Also I have retained NUM_USER_DEFINED_LWLOCKS in
> >> > MainLWLockArray so that if any extensions want to assign a LWLock
> >> > after startup, it can be used from this pool.  The tranche for such
> >> > locks
> >> > will be reported as main.
> >>
> >> I don't like this.  I think we should get rid of
> >> NUM_USER_DEFINED_LWLOCKS altogether.  It's just an accident waiting to
> >> happen, and I don't think there are enough people using LWLocks from
> >> extensions that we'll annoy very many people by breaking backward
> >> compatibility here.  If we are going to care about backward
> >> compatibility, then I think the old-style lwlocks should go in their
> >> own tranche, not main.  But personally, I think it'd be better to just
> >> force a hard break and make people switch to using the new APIs.
> >
> > One point to think before removing it altogether, is that the new API's
> > provide a way to allocate LWLocks at the startup-time and if any user
> > wants to allocate a new LWLock after start-up, it will not be possible
> > with the proposed API's.  Do you think for such cases, we should ask
> > user to use it the way we have exposed or do you have something else
> > in mind?
>
> Allocating a new lock after startup mostly doesn't work, because there
> will be at most 3 available and sometimes less.  And even if it does
> work, it often isn't very useful because you probably need some other
> shared memory space as well - otherwise, what is the lock protecting?
> And that space might not be available either.
>
> I'd be interested in knowing whether there are cases where useful
> extensions can be loaded apart from shared_preload_libraries because
> of NUM_USER_DEFINED_LWLOCKS and our tendency to allocate a little bit
> of extra shared memory, but my suspicion is that it rarely works out
> and is too flaky to be useful to anybody.


+1 for dropping old API
I don't think it's useful to have LWLocks without having shared memory.

There is another thing I'd like extensions to be able to do. It would be
nice if extensions could use dynamic shared memory instead of static. Then
extensions could use shared memory without being in
shared_preload_libraries. But if extension register DSM, then there is no
way to tell other backends to use it for that extension. Also DSM would be
deallocated when all backends detached from it. This it out of scope for
this patch though.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-01-29 Thread Rahila Syed
>Okay, I agree that reporting just the current blkno is simple and good
>enough. How about numbers of what we're going to report as the "Vacuuming
>Index and Heap" phase? I guess we can still keep the scanned_index_pages
>and index_scan_count So we have:
>+CREATE VIEW pg_stat_vacuum_progress AS
>+   SELECT
>+  S.pid,
>+  S.relid,
>+  S.phase,
>+  S.total_heap_blks,
>+  S.current_heap_blkno,
>+  S.total_index_pages,
>+  S.scanned_index_pages,
>+  S.index_scan_count
>+  S.percent_complete,
>+   FROM pg_stat_get_vacuum_progress() AS S;
>I guess it won't remain pg_stat_get_"vacuum"_progress(
>), though.

Apart from these, as suggested in [1] , finer grained reporting from index
vacuuming phase can provide better insight. Currently we report number of
blocks processed once at the end of vacuuming of each index.
IIUC, what was suggested in [1] was instrumenting lazy_tid_reaped with a
counter to count number of index tuples processed so far as lazy_tid_reaped
is called for every index tuple to see if it matches any of the dead tuple
tids.

So additional parameters for each index can be,
scanned_index_tuples
total_index_tuples (from pg_class.reltuples entry)

Thank you,
Rahila Syed

[1]. http://www.postgresql.org/message-id/56500356.4070...@bluetreble.com


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-01-29 Thread Robert Haas
On Fri, Jan 29, 2016 at 6:59 AM, Alexander Korotkov
 wrote:
> On Thu, Jan 21, 2016 at 12:37 AM, Alvaro Herrera 
> wrote:
>> So far as I can tell, there are three patches in flight here:
>>
>> * replication slot IO lwlocks
>> * ability of extensions to request tranches dynamically
>> * PGPROC
>>
>> The first one hasn't been reviewed at all, but the other two have seen a
>> bit of discussion and evolution.  Is anyone doing any more reviewing?
>
> I'd like to add another one: fixed tranche id for each SLRU.

+1 for this.  The patch looks good and I will commit it if nobody objects.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Mithun Cy
On Tue, Dec 29, 2015 at 4:33 AM, Peter Geoghegan  wrote:
>Attached is a revision that significantly overhauls the memory patch,
>with several smaller changes.

I just ran some tests on above patch. Mainly to compare
how "longer sort keys" would behave with new(Qsort) and old Algo(RS) for
sorting.
I have 8GB of ram and ssd storage.

Settings and Results.

Work_mem= DEFAULT (4mb).
key width = 520.


CASE 1. Data is pre-sorted as per  sort key order.

CASE 2. Data is sorted in opposite order of sort key.

CASE 3. Data is randomly distributed.


*Key length 520*




*Number of records* 320 640 1280 2560

1.7 GB 3.5GB 7 GB 14GB





*CASE 1*



*RS* 23654.677 35172.811 44965.442 106420.155
*Qsort* 14100.362 40612.829 101068.107 334893.391





*CASE 2*



*RS* 13427.378 36882.898 98492.644 310670.15
*Qsort* 12475.133 32559.074 100772.531 322080.602





*CASE 3*



*RS* 17202.966 45163.234 122323.299 337058.856
*Qsort* 12530.726 23343.753 59431.315 152862.837


If data is sorted as same as sort key order then current code performs
better than proposed patch
as sort size increases.

It appears new algo do not seem have any major impact if rows are presorted
in opposite order.

For randomly distributed order quick sort performs well when compared to
current sort method (RS).


==
Now Increase the work_mem to 64MB and for 14 GB of data to sort.

CASE 1: We can see Qsort is able to catchup with current sort method(RS).
CASE 2:  No impact.
CASE 3: RS is able to catchup with Qsort.


*CASE 1* *RS* 128822.735

*Qsort* 90857.496
*CSAE 2* *RS* *105631.775*

*Qsort* *105938.334*
*CASE 3* *RS* *152301.054*

*Qsort* *149649.347*

I think for long keys both old (RS) and new (Qsort) sort method has its own
characteristics
based on data distribution. I think work_mem is the key If properly set new
method(Qsort) will
be able to fit most of the cases. If work_mem is not tuned right it, there
are cases it can regress.


-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


Test Queries.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-01-29 Thread Oleg Bartunov
On Fri, Jan 29, 2016 at 1:11 PM, Alvaro Herrera 
wrote:

> Artur Zakirov wrote:
>
> > What status of this patch? In commitfest it is "Needs review".
>
> "Needs review" means it needs a reviewer to go over it and, uh, review
> it.  Did I send an email to you prodding you to review patches?  I sent
> such an email to several people from PostgresPro, but I don't remember
> getting a response from anyone, and honestly I don't see you guys/gal
> doing much review on-list.  If you can please talk to your colleagues so
> that they look over your patch, while at the same time your review their
> patches, that would help not only this one patch but everyone else's
> patches as well.
>

I think Teodor is planning to review these patches.


>
> > Can this patch get the status "Ready for Commiter"?
>
> Sure, as soon as it has gotten enough review to say it's past the "needs
> review" phase.  Just like all patches.
>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-01-29 Thread Alexander Korotkov
On Thu, Jan 21, 2016 at 12:37 AM, Alvaro Herrera 
wrote:

> So far as I can tell, there are three patches in flight here:
>
> * replication slot IO lwlocks
> * ability of extensions to request tranches dynamically
> * PGPROC
>
> The first one hasn't been reviewed at all, but the other two have seen a
> bit of discussion and evolution.  Is anyone doing any more reviewing?


I'd like to add another one: fixed tranche id for each SLRU.
And I'm going to make review over others.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


slru_tranche_ids_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench stats per script & other stuff

2016-01-29 Thread Alvaro Herrera
Fabien COELHO wrote:

> The answer is essentially yes, the field is needed for the "aggregated" mode
> where this specific behavior is used.

OK, thanks, that looks better to me.

Can you now appreciate why I asked for split patches?  If I had to go
over the big patch I probably wouldn't have been able to read through
each to make sense of it.

I pushed this, along with a few more tweaks, mostly adding comments and
moving functions so that related things are together.  I hope I didn't
break anything.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-29 Thread Michael Paquier
On Fri, Jan 29, 2016 at 5:13 PM, Andres Freund  wrote:
> On 2016-01-28 16:40:13 +0900, Michael Paquier wrote:
>> OK, so as a first step and after thinking about the whole for a while,
>> I have finished with the patch attached. This patch is aimed at
>> avoiding unnecessary checkpoints on idle systems when wal_level >=
>> hot_standby by centralizing the check to look at if there has some WAL
>> activity since the last checkpoint.
>
> That's not what I suggested.
>
>>  /*
>> + * XLOGHasActivity -- Check if XLOG had some significant activity or
>> + * if it is idle lately. This is primarily used to check if there has
>> + * been some WAL activity since the last checkpoint that occurred on
>> + * system to control the generaton of XLOG record related to standbys.
>> + */
>> +bool
>> +XLOGHasActivity(void)
>> +{
>> + XLogCtlInsert *Insert = >Insert;
>> + XLogRecPtr  redo_lsn = ControlFile->checkPointCopy.redo;
>> + uint64  prev_bytepos;
>> +
>> + /* Check if any activity has happened since last checkpoint */
>> + SpinLockAcquire(>insertpos_lck);
>> + prev_bytepos = Insert->PrevBytePos;
>> + SpinLockRelease(>insertpos_lck);
>> +
>> + return XLogBytePosToRecPtr(prev_bytepos) == redo_lsn;
>> +}
>>
>
> How should this actually should work reliably, given we *want* to have
> included a standby snapshot after the last checkpoint?
>
> In CreateCheckPoint() we have
> /*
>  * Here we update the shared RedoRecPtr for future XLogInsert calls; 
> this
>  * must be done while holding all the insertion locks.
>  *
> RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;
>
> computing the next redo rec ptr and then
> if (!shutdown && XLogStandbyInfoActive())
> LogStandbySnapshot();
> before the final
> XLogRegisterData((char *) (), sizeof(checkPoint));
> recptr = XLogInsert(RM_XLOG_ID,
> shutdown ? 
> XLOG_CHECKPOINT_SHUTDOWN :
> XLOG_CHECKPOINT_ONLINE);
>
> so the above condition doesn't really something we want to rely on. Am I
> missing what you're trying to do?

Well, to put it short, I am just trying to find a way to make the
backend skip unnecessary checkpoints on an idle system, which results
in the following WAL pattern if system is completely idle:
CHECKPOINT_ONLINE
RUNNING_XACTS
RUNNING_XACTS
[etc..]

The thing is that I am lost with the meaning of this condition to
decide if a checkpoint should be skipped or not:
if (prevPtr == ControlFile->checkPointCopy.redo &&
prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
{
WALInsertLockRelease();
LWLockRelease(CheckpointLock);
return;
}
As at least one standby snapshot is logged before the checkpoint
record, the redo position is never going to match the previous insert
LSN, so checkpoints will never be skipped if wal_level >= hot_standby.
Skipping such unnecessary checkpoints is what you would like to
address, no? Because that's what I would like to do here first. And
once we got that right, we could think about addressing the case where
WAL segments are forcibly archived for idle systems.

Perhaps I simply do not grab the meaning of what you defined as
"relevant LSN" upthread. As I understand it, it would be a LSN
position marker in shared memory that we could use for some
decision-making regarding if a checkpoint or a standby snapshot record
should be generated. I have for example played with an array in
XLogCtl->Insert made of NUM_XLOGINSERT_LOCKS elements, each one
protected by one insert lock that tracked the last insert LSN of a
checkpoint record (I did that in XLogInsertRecord because XLogRecData
makes that easy), then I used that to do this decision making in
CreateCheckpoint() and in the bgwriter to decide if a standby snapshot
should be logged or not.
But then I noticed that it would be actually easier and cleaner to do
this decision making using directly the checkpoint redo LSN and
compare that with the previous insert LSN, then use that for the
bgwriter code path, resulting in the WIP I just sent previously.

Is what I am trying to do here biased with what you have in mind? Do
we have a different meaning of the problem in mind? What are your
ideas regarding this relevant LSN?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-01-29 Thread Alexander Korotkov
On Tue, Jan 5, 2016 at 4:04 PM, Amit Kapila  wrote:

> On Mon, Jan 4, 2016 at 8:28 PM, Robert Haas  wrote:
> >
> > On Mon, Jan 4, 2016 at 1:20 AM, Amit Kapila 
> wrote:
> > > On Mon, Jan 4, 2016 at 4:49 AM, Robert Haas 
> wrote:
> > >> On Thu, Dec 31, 2015 at 3:36 AM, Amit Kapila  >
> > >> wrote:
> > >> > LWLock *LWLockAssignFromTranche(const char *tranche_name) will
> > >> > assign a lock for the specified tranche.  This also ensures that no
> > >> > more than requested number of LWLocks can be assigned from a
> > >> > specified tranche.
> > >>
> > >> However, this seems like an awkward API, because if there are many
> > >> LWLocks you're going to need to look up the tranche name many times,
> > >> and then you're going to need to make an array of pointers to them
> > >> somehow, introducing a thoroughly unnecessary layer of indirection.
> > >> Instead, I suggest just having a function LWLockPadded
> > >> *GetLWLockAddinTranche(const char *tranche_name) that returns a
> > >> pointer to the base of the array.
> > >
> > > If we do that way, then user of API needs to maintain the interlock
> > > guarantee that the requested number of locks is same as allocations
> > > done from that tranche and also if it is not allocating all the
> > > LWLocks in one function, it needs to save the base address of the
> > > array and then allocate from it by maintaining some counter.
> > > I agree that looking up for tranche names multiple times is not good,
> > > if there are many number of lwlocks, but that is done at startup
> > > time and not at operation-level.  Also if we look currently at
> > > the extensions in contrib, then just one of them allocactes one
> > > LWLock in this fashion, now there could be extnsions apart from
> > > extensions in contrib, but not sure if they require many number of
> > > LWLocks, so I feel it is better to give an API which is more
> > > convinient for user to use.
> >
> > Well, I agree with you that we should provide the most convenient API
> > possible, but I don't agree that yours is more convenient than the one
> > I proposed.  I think it is less convenient.  In most cases, if the
> > user asks for a large number of LWLocks, they aren't going to be each
> > for a different purpose - they will all be for the same purpose, like
> > one per buffer or one per hash partition.  The case where the user
> > wants to allocate 8 lwlocks from an extension, each for a different
> > purpose, and spread those allocations across a bunch of different
> > functions probably does not exist.
>
> Probably, but the point is to make user of API do what he or she wants
> to accomplish without much knowledge of underlying stuff.  However,
> I think it is not too much details for user to know, so I have changed the
> API as per your suggestion.
>
> >
> >   *Maybe* there is somebody
> > allocating lwlocks from an extension for unrelated purposes, but I'd
> > be willing to bet that, if so, all of those allocations happen one
> > right after the other in a single function, because anything else
> > would be completely nuts.
> >
> > >> > Also I have retained NUM_USER_DEFINED_LWLOCKS in
> > >> > MainLWLockArray so that if any extensions want to assign a LWLock
> > >> > after startup, it can be used from this pool.  The tranche for such
> > >> > locks
> > >> > will be reported as main.
> > >>
> >
> > I'd be interested in knowing whether there are cases where useful
> > extensions can be loaded apart from shared_preload_libraries because
> > of NUM_USER_DEFINED_LWLOCKS and our tendency to allocate a little bit
> > of extra shared memory, but my suspicion is that it rarely works out
> > and is too flaky to be useful to anybody.
> >
>
> I am not aware of such cases, however the reason I have kept it was for
> backward-compatability, but now I have removed it in the attached patch.
>
> Apart from that, I have updated the docs to reflect the changes related
> to new API's.
>
> Fe things to Note -
> Some change is needed in LWLockCounter[1] if this goes after 2
> other patches (separate tranche for PGProcs and separate tranche
> for ReplicationSlots).  Also, LWLockAssign() can be removed after
> those patches
>

Also couple of minor comments from me.

I think this

+ StrNCpy(LWLockTrancheRequestArray[LWLockTrancheRequestsCount].tranche_name,
> tranche_name, strlen(tranche_name) + 1);


should be

+ StrNCpy(LWLockTrancheRequestArray[LWLockTrancheRequestsCount].tranche_name,
> tranche_name,
> sizeof(LWLockTrancheRequestArray[LWLockTrancheRequestsCount].tranche_name));


And as far as I know english "it's" should be "its" in the sentence below.

+ from _PG_init.  Tranche repersents an array of LWLocks
> and
> + can be accessed by it's name.  First parameter
> tranche_name


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-01-29 Thread Artur Zakirov

On 21.01.2016 00:25, Alvaro Herrera wrote:

Artur Zakirov wrote:


I don't quite understand why aren't we using a custom GUC variable here.
These already have SHOW and SET support ...



Added GUC variables:
- pg_trgm.limit
- pg_trgm.substring_limit
I added this variables to the documentation.
show_limit() and set_limit() functions work correctly and they are marked as
deprecated.


Thanks.  I'm willing to commit quickly a small patch that only changes
the existing function to GUCs, then have a look at a separate patch that
adds the new substring operator.  Would you split the patch that way?



What status of this patch? In commitfest it is "Needs review".

Can this patch get the status "Ready for Commiter"?

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2016-01-29 Thread Andres Freund
Hi,

so, I'm reviewing the output of:
> git diff $(git merge-base upstream/master 
> 2ndq/dev/pglogical-output)..2ndq/dev/pglogical-output
> diff --git a/contrib/Makefile b/contrib/Makefile
> index bd251f6..028fd9a 100644
> --- a/contrib/Makefile
> +++ b/contrib/Makefile
> @@ -35,6 +35,8 @@ SUBDIRS = \
>   pg_stat_statements \
>   pg_trgm \
>   pgcrypto\
> + pglogical_output \
> + pglogical_output_plhooks \

I'm doubtful we want these plhooks. You aren't allowed to access normal
(non user catalog) tables in output plugins. That seems too much to
expose to plpgsql function imo.

> +++ b/contrib/pglogical_output/README.md

I don't think we've markdown in postgres so far - so let's just keep the
current content and remove the .md :P

> + Table metadata header
> +
> +|===
> +|*Message*|*Type/Size*|*Notes*
> +
> +|Message type|signed char|Literal ‘**R**’ (0x52)
> +|flags|uint8| * 0-6: Reserved, client _must_ ERROR if set and not recognised.
> +|relidentifier|uint32|Arbitrary relation id, unique for this upstream. In 
> practice this will probably be the upstream table’s oid, but the downstream 
> can’t assume anything.
> +|nspnamelength|uint8|Length of namespace name
> +|nspname|signed char[nspnamelength]|Relation namespace (null terminated)
> +|relnamelength|uint8|Length of relation name
> +|relname|char[relname]|Relation name (null terminated)
> +|attrs block|signed char|Literal: ‘**A**’ (0x41)
> +|natts|uint16|number of attributes
> +|[fields]|[composite]|Sequence of ‘natts’ column metadata blocks, each of 
> which begins with a column delimiter followed by zero or more column metadata 
> blocks, each with the same column metadata block header.

That's a fairly high overhead. Hm.


> +== JSON protocol
> +
> +If `proto_format` is set to `json` then the output plugin will emit JSON
> +instead of the custom binary protocol. JSON support is intended mainly for
> +debugging and diagnostics.
> +

I'm fairly strongly opposed to including two formats in one output
plugin. I think the demand for being able to look into the binary
protocol should instead be satisfied by having a function that "expands"
the binary data returned into something easier to understand.

> + * Copyright (c) 2012-2015, PostgreSQL Global Development Group

2016 ;)


> + case PARAM_BINARY_BASETYPES_MAJOR_VERSION:
> + val = get_param_value(elem, false, 
> OUTPUT_PARAM_TYPE_UINT32);
> + data->client_binary_basetypes_major_version = 
> DatumGetUInt32(val);
> + break;

Why is the major version tied to basetypes (by name)? Seem more
generally useful.


> + case PARAM_RELMETA_CACHE_SIZE:
> + val = get_param_value(elem, false, 
> OUTPUT_PARAM_TYPE_INT32);
> + data->client_relmeta_cache_size = 
> DatumGetInt32(val);
> + break;

I'm not convinced this a) should be optional b) should have a size
limit. Please argue for that choice. And how the client should e.g. know
about evictions in that cache.



> --- /dev/null
> +++ b/contrib/pglogical_output/pglogical_config.h
> @@ -0,0 +1,55 @@
> +#ifndef PG_LOGICAL_CONFIG_H
> +#define PG_LOGICAL_CONFIG_H
> +
> +#ifndef PG_VERSION_NUM
> +#error  must be included first
> +#endif

Huh?

> +#include "nodes/pg_list.h"
> +#include "pglogical_output.h"
> +
> +inline static bool
> +server_float4_byval(void)
> +{
> +#ifdef USE_FLOAT4_BYVAL
> + return true;
> +#else
> + return false;
> +#endif
> +}
> +
> +inline static bool
> +server_float8_byval(void)
> +{
> +#ifdef USE_FLOAT8_BYVAL
> + return true;
> +#else
> + return false;
> +#endif
> +}
> +
> +inline static bool
> +server_integer_datetimes(void)
> +{
> +#ifdef USE_INTEGER_DATETIMES
> + return true;
> +#else
> + return false;
> +#endif
> +}
> +
> +inline static bool
> +server_bigendian(void)
> +{
> +#ifdef WORDS_BIGENDIAN
> + return true;
> +#else
> + return false;
> +#endif
> +}

Not convinced these should exists, and even moreso exposed in a header.

> +/*
> + * Returns Oid of the hooks function specified in funcname.
> + *
> + * Error is thrown if function doesn't exist or doen't return correct 
> datatype
> + * or is volatile.
> + */
> +static Oid
> +get_hooks_function_oid(List *funcname)
> +{
> + Oid funcid;
> + Oid funcargtypes[1];
> +
> + funcargtypes[0] = INTERNALOID;
> +
> + /* find the the function */
> + funcid = LookupFuncName(funcname, 1, funcargtypes, false);
> +
> + /* Validate that the function returns void */
> + if (get_func_rettype(funcid) != VOIDOID)
> + {
> + ereport(ERROR,
> + (errcode(ERRCODE_WRONG_OBJECT_TYPE),
> +  errmsg("function %s must return void",
> +

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-01-29 Thread Etsuro Fujita

On 2016/01/29 1:26, Ashutosh Bapat wrote:

Here's an updated version of the previous patches, broken up like before



2. pg_fdw_join_v3.patch: changes to postgres_fdw - more description below



Here is the summary of changes from the last set of patches



2. Included fix for EvalPlanQual in postgres_fdw - an alternate local
path is obtained from the list of paths linked to the joinrel. Since
this is done before adding the ForeignPath, we should be a local path
available for given join.


I looked at that code in the patch (ie, postgresRecheckForeignScan and 
the helper function that creates a local join path for a given foreign 
join path.), briefly.  Maybe I'm missing something, but I think that is 
basically the same as the fix I proposed for addressing this issue, 
posted before [1], right?  If so, my concern is, the helper function 
probably wouldn't extend to the parameterized-foreign-join-path cases, 
though that would work well for the unparameterized-foreign-join-path 
cases.  We don't support parameterized-foreign-join paths for 9.6?


Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/5666b59f.6010...@lab.ntt.co.jp




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Template for commit messages

2016-01-29 Thread Bruce Momjian
On Thu, Jan 28, 2016 at 07:30:58PM -0500, Andrew Dunstan wrote:
> I have no prejudice in this area, other than one in favor of any
> rules being fairly precise. As for nuances, I guess they can be
> specified in the commit message. The one thing I do find annoying
> from time to time is the limit on subject size. Sometimes it's very
> difficult to be sufficiently communicative in, say, 50 characters.

Agreed, but that is a gitweb limit we can't control, I think.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Template for commit messages

2016-01-29 Thread Bruce Momjian
On Thu, Jan 28, 2016 at 03:16:18PM -0500, Stephen Frost wrote:
> > OK, but keep in mind whatever script committers user should remove tags
> > that are empty after exiting the editor.  I can provide the grep regex
> > in git somewhere too:
> > 
> >   egrep -v 
> > "^(Author|Reported-by|Bug|Reviewed-by|Tested-by|Backpatch-through): *$"
> 
> If the template is there then, for my part at least, I wouldn't mind
> killing the empty lines.  Having a decent script would work too, of
> course... but I have to admit that I've not tried having a script modify
> my commit messages right before they're committed and, well, it'd take a
> bit for me to be comfortable with it.
> 
> No one wants garbled commit messages from a script that went awry. ;)

I have always used a script.  This removes trailing blank lines:

sed -e :a -e '/./,$!d;/^\n*$/{$d;N;};/\n$/ba'

and this removes adjacent blank lines:

cat --squeeze-blank

I can publish my script at the appropriate time.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Template for commit messages

2016-01-29 Thread Tom Lane
Andres Freund  writes:
> That's not that hard to change. And neither git nor kernel people use 50
> char limits. I'm *strongly* opposed to making 50 chars the limit for the
> first line.

Ditto.  It's hard enough to fit a useful headline in 75 characters.
I personally will ignore any rule that says it must be even less.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-01-29 Thread Fabien COELHO



Hi Michaël,


I think it is overkill, but do as you feel.


Perhaps we could have Robert decide on this one first? That's a bug after
all that had better be backpatched.


Fine with me.

[modulo...] Right, forgot this one, we just need to check if rval is -1 
here, and return 0 as result. I am updating the fix as attached.


This looks to me like it works.

I still feel that the condition should be simplified, probably with:

  if (lval < 0 && *resval <= 0) ...

(instead of < in my previous suggestion, if some processors return 0 on 
-INT64_MIN). Also, a comment is needed to explain why such a bizarre 
condition is used/needed for just the INT64_MIN case.


--
Fabien.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-01-29 Thread Alvaro Herrera
Artur Zakirov wrote:

> What status of this patch? In commitfest it is "Needs review".

"Needs review" means it needs a reviewer to go over it and, uh, review
it.  Did I send an email to you prodding you to review patches?  I sent
such an email to several people from PostgresPro, but I don't remember
getting a response from anyone, and honestly I don't see you guys/gal
doing much review on-list.  If you can please talk to your colleagues so
that they look over your patch, while at the same time your review their
patches, that would help not only this one patch but everyone else's
patches as well.

> Can this patch get the status "Ready for Commiter"?

Sure, as soon as it has gotten enough review to say it's past the "needs
review" phase.  Just like all patches.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-01-29 Thread Andres Freund
On 2016-01-28 16:40:13 +0900, Michael Paquier wrote:
> OK, so as a first step and after thinking about the whole for a while,
> I have finished with the patch attached. This patch is aimed at
> avoiding unnecessary checkpoints on idle systems when wal_level >=
> hot_standby by centralizing the check to look at if there has some WAL
> activity since the last checkpoint.

That's not what I suggested.

>  /*
> + * XLOGHasActivity -- Check if XLOG had some significant activity or
> + * if it is idle lately. This is primarily used to check if there has
> + * been some WAL activity since the last checkpoint that occurred on
> + * system to control the generaton of XLOG record related to standbys.
> + */
> +bool
> +XLOGHasActivity(void)
> +{
> + XLogCtlInsert *Insert = >Insert;
> + XLogRecPtr  redo_lsn = ControlFile->checkPointCopy.redo;
> + uint64  prev_bytepos;
> +
> + /* Check if any activity has happened since last checkpoint */
> + SpinLockAcquire(>insertpos_lck);
> + prev_bytepos = Insert->PrevBytePos;
> + SpinLockRelease(>insertpos_lck);
> +
> + return XLogBytePosToRecPtr(prev_bytepos) == redo_lsn;
> +}
>

How should this actually should work reliably, given we *want* to have
included a standby snapshot after the last checkpoint?

In CreateCheckPoint() we have
/*
 * Here we update the shared RedoRecPtr for future XLogInsert calls; 
this
 * must be done while holding all the insertion locks.
 *
RedoRecPtr = XLogCtl->Insert.RedoRecPtr = checkPoint.redo;

computing the next redo rec ptr and then
if (!shutdown && XLogStandbyInfoActive())
LogStandbySnapshot();
before the final
XLogRegisterData((char *) (), sizeof(checkPoint));
recptr = XLogInsert(RM_XLOG_ID,
shutdown ? 
XLOG_CHECKPOINT_SHUTDOWN :
XLOG_CHECKPOINT_ONLINE);

so the above condition doesn't really something we want to rely on. Am I
missing what you're trying to do?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Alvaro Herrera
Petr Jelinek wrote:
> On 29 January 2016 at 14:48, Tom Lane  wrote:
> > Alvaro Herrera  writes:

> > Uh, what?  Surely we would provide a bespoke command for each possible
> > sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
> > that the provided function has the right signature, and then it would put
> > the correct amkind into the pg_am entry automatically.

I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
like that.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Tom Lane
Alvaro Herrera  writes:
>> On 29 January 2016 at 14:48, Tom Lane  wrote:
> Uh, what?  Surely we would provide a bespoke command for each possible
> sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to check
> that the provided function has the right signature, and then it would put
> the correct amkind into the pg_am entry automatically.

> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
> like that.

That seems okay.  I had the impression you were proposing
CREATE ACCESS METHOD foobar TYPE 'i' USING functionname
or something like that, where it would be totally up to the user that
the amkind matched the function.  That seems unnecessarily error-prone,
not to mention that it would close the door forever on any hope that
we might allow non-superusers to execute such commands.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-01-29 Thread Artur Zakirov

On 29.01.2016 18:39, Alvaro Herrera wrote:

Teodor Sigaev wrote:

The behavior of this function is surprising to me.

select substring_similarity('dog' ,  'hotdogpound') ;

  substring_similarity
--
  0.25


Substring search was desined to search similar word in string:
contrib_regression=# select substring_similarity('dog' ,  'hot dogpound') ;
  substring_similarity
--
  0.75

contrib_regression=# select substring_similarity('dog' ,  'hot dog pound') ;
  substring_similarity
--
 1


Hmm, this behavior looks too much like magic to me.  I mean, a substring
is a substring -- why are we treating the space as a special character
here?



I think, I can rename this function to subword_similarity() and correct 
the documentation.


The current behavior is developed to find most similar word in a text. 
For example, if we will search just substring (not word) then we will 
get the following result:


select substring_similarity('dog', 'dogmatist');
 substring_similarity
-
1
(1 row)

But this is wrong I think. They are completely different words.

For searching a similar substring (not word) in a text maybe another 
function should be added?


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Template for commit messages

2016-01-29 Thread Joshua D. Drake

On 01/29/2016 02:59 AM, Heikki Linnakangas wrote:


Well, I think what people are asking for is precisely a fixed format,
and I do think there is value in that.  It's nice to capture the
nuance, but the nuance is going to get flattened out anyway when the
release notes are created.  If we all agree to use a fixed format,
then a bunch of work there that probably has to be done manually can
be automated.  However, if we all agree to use a fixed format except
for you, we might as well just forget the whole thing, because the
percentage of commits that are done by you is quite high.


Before I agree to adding fixed format lines to my commits, I'd like to know exactly what people 
would want to do with the information. "Bunch of work that probably could be automated" 
doesn't cut it. For example, if I tag someone as "Reviewed-by", does it mean that his 
name will automatically appear in the release notes? Or if there's a bug, is the reviewer then 
somehow responsible for missing it?

As a way of saying "thank you", I like a personalised, nuanced, message much 
better. True, we can do both. A good thing about processing the commit messages manually 
e.g  for compiling release notes is that there's human judgement on what to include.  Of 
course, that's a lot of work. Which is why I'd like to hear from whoever wants to make 
use of these lines to explain in more detail what information they need,  and what they 
would do with it, to make sure that what we add is actually useful, and that we don't add 
noise to the commit messages unnecessarily.

- Heikki


I think the best question to ask is:

"What is the problem we are trying to solve?"

"A bunch of work that probably could be automated"

Does not answer that question.

Sincerely,

Joshua D. Drake


--
Command Prompt, Inc.  http://the.postgres.company/
 +1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-01-29 Thread Anastasia Lubennikova

29.01.2016 19:01, Thom Brown:

On 29 January 2016 at 15:47, Aleksander Alekseev
 wrote:

I tested this patch on x64 and ARM servers for a few hours today. The
only problem I could find is that INSERT works considerably slower after
applying a patch. Beside that everything looks fine - no crashes, tests
pass, memory doesn't seem to leak, etc.
Thank you for testing. I rechecked that, and insertions are really very 
very very slow. It seems like a bug.

Okay, now for some badness.  I've restored a database containing 2
tables, one 318MB, another 24kB.  The 318MB table contains 5 million
rows with a sequential id column.  I get a problem if I try to delete
many rows from it:
# delete from contacts where id % 3 != 0 ;
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory

I didn't manage to reproduce this. Thom, could you describe exact steps
to reproduce this issue please?

Sure, I used my pg_rep_test tool to create a primary (pg_rep_test
-r0), which creates an instance with a custom config, which is as
follows:

shared_buffers = 8MB
max_connections = 7
wal_level = 'hot_standby'
cluster_name = 'primary'
max_wal_senders = 3
wal_keep_segments = 6

Then create a pgbench data set (I didn't originally use pgbench, but
you can get the same results with it):

createdb -p 5530 pgbench
pgbench -p 5530 -i -s 100 pgbench

And delete some stuff:

thom@swift:~/Development/test$ psql -p 5530 pgbench
Timing is on.
psql (9.6devel)
Type "help" for help.


  ➤ psql://thom@[local]:5530/pgbench

# DELETE FROM pgbench_accounts WHERE aid % 3 != 0;
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
...
WARNING:  out of shared memory
WARNING:  out of shared memory
DELETE 667
Time: 22218.804 ms

There were 358 lines of that warning message.  I don't get these
messages without the patch.

Thom


Thank you for this report.
I tried to reproduce it, but I couldn't. Debug will be much easier now.

I hope I'll fix these issueswithin the next few days.

BTW, I found a dummy mistake, the previous patch contains some unrelated 
changes. I fixed it in the new version (attached).


--
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/access/nbtree/nbtinsert.c b/src/backend/access/nbtree/nbtinsert.c
index e3c55eb..3908cc1 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -24,6 +24,7 @@
 #include "storage/predicate.h"
 #include "utils/tqual.h"
 
+#include "catalog/catalog.h"
 
 typedef struct
 {
@@ -60,7 +61,8 @@ static void _bt_findinsertloc(Relation rel,
   ScanKey scankey,
   IndexTuple newtup,
   BTStack stack,
-  Relation heapRel);
+  Relation heapRel,
+  bool *updposing);
 static void _bt_insertonpg(Relation rel, Buffer buf, Buffer cbuf,
 			   BTStack stack,
 			   IndexTuple itup,
@@ -113,6 +115,7 @@ _bt_doinsert(Relation rel, IndexTuple itup,
 	BTStack		stack;
 	Buffer		buf;
 	OffsetNumber offset;
+	bool updposting = false;
 
 	/* we need an insertion scan key to do our search, so build one */
 	itup_scankey = _bt_mkscankey(rel, itup);
@@ -162,8 +165,9 @@ top:
 	{
 		TransactionId xwait;
 		uint32		speculativeToken;
+		bool fakeupdposting = false; /* Never update posting in unique index */
 
-		offset = _bt_binsrch(rel, buf, natts, itup_scankey, false);
+		offset = _bt_binsrch(rel, buf, natts, itup_scankey, false, );
 		xwait = _bt_check_unique(rel, itup, heapRel, buf, offset, itup_scankey,
  checkUnique, _unique, );
 
@@ -200,8 +204,54 @@ top:
 		CheckForSerializableConflictIn(rel, NULL, buf);
 		/* do the insertion */
 		_bt_findinsertloc(rel, , , natts, itup_scankey, itup,
-		  stack, heapRel);
-		_bt_insertonpg(rel, buf, InvalidBuffer, stack, itup, offset, false);
+		  stack, heapRel, );
+
+		if (IsSystemRelation(rel))
+			updposting = false;
+
+		/*
+		 * New tuple has the same key with tuple at the page.
+		 * Unite them into one posting.
+		 */
+		if (updposting)
+		{
+			Page		page;
+			IndexTuple olditup, newitup;
+			ItemPointerData *ipd;
+			int nipd;
+
+			page = BufferGetPage(buf);
+			olditup = (IndexTuple) PageGetItem(page, PageGetItemId(page, offset));
+
+			if (BtreeTupleIsPosting(olditup))
+nipd = BtreeGetNPosting(olditup);
+			else
+nipd = 1;
+
+			ipd = palloc0(sizeof(ItemPointerData)*(nipd + 1));
+			/* copy item pointers from old tuple into ipd */
+			if (BtreeTupleIsPosting(olditup))
+memcpy(ipd, BtreeGetPosting(olditup), sizeof(ItemPointerData)*nipd);
+			else
+memcpy(ipd, olditup, sizeof(ItemPointerData));
+
+			/* add item pointer of the new tuple into ipd */
+			memcpy(ipd+nipd, itup, sizeof(ItemPointerData));
+
+			/*
+			 * Form posting tuple, then delete old tuple and insert posting tuple.
+			 */

Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-01-29 Thread Alvaro Herrera
Teodor Sigaev wrote:
> >The behavior of this function is surprising to me.
> >
> >select substring_similarity('dog' ,  'hotdogpound') ;
> >
> >  substring_similarity
> >--
> >  0.25
> >
> Substring search was desined to search similar word in string:
> contrib_regression=# select substring_similarity('dog' ,  'hot dogpound') ;
>  substring_similarity
> --
>  0.75
> 
> contrib_regression=# select substring_similarity('dog' ,  'hot dog pound') ;
>  substring_similarity
> --
> 1

Hmm, this behavior looks too much like magic to me.  I mean, a substring
is a substring -- why are we treating the space as a special character
here?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-01-29 Thread Thom Brown
On 29 January 2016 at 15:47, Aleksander Alekseev
 wrote:
> I tested this patch on x64 and ARM servers for a few hours today. The
> only problem I could find is that INSERT works considerably slower after
> applying a patch. Beside that everything looks fine - no crashes, tests
> pass, memory doesn't seem to leak, etc.
>
>> Okay, now for some badness.  I've restored a database containing 2
>> tables, one 318MB, another 24kB.  The 318MB table contains 5 million
>> rows with a sequential id column.  I get a problem if I try to delete
>> many rows from it:
>> # delete from contacts where id % 3 != 0 ;
>> WARNING:  out of shared memory
>> WARNING:  out of shared memory
>> WARNING:  out of shared memory
>
> I didn't manage to reproduce this. Thom, could you describe exact steps
> to reproduce this issue please?

Sure, I used my pg_rep_test tool to create a primary (pg_rep_test
-r0), which creates an instance with a custom config, which is as
follows:

shared_buffers = 8MB
max_connections = 7
wal_level = 'hot_standby'
cluster_name = 'primary'
max_wal_senders = 3
wal_keep_segments = 6

Then create a pgbench data set (I didn't originally use pgbench, but
you can get the same results with it):

createdb -p 5530 pgbench
pgbench -p 5530 -i -s 100 pgbench

And delete some stuff:

thom@swift:~/Development/test$ psql -p 5530 pgbench
Timing is on.
psql (9.6devel)
Type "help" for help.


 ➤ psql://thom@[local]:5530/pgbench

# DELETE FROM pgbench_accounts WHERE aid % 3 != 0;
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
WARNING:  out of shared memory
...
WARNING:  out of shared memory
WARNING:  out of shared memory
DELETE 667
Time: 22218.804 ms

There were 358 lines of that warning message.  I don't get these
messages without the patch.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Peter Geoghegan
On Fri, Jan 29, 2016 at 3:41 AM, Mithun Cy  wrote:
> I just ran some tests on above patch. Mainly to compare
> how "longer sort keys" would behave with new(Qsort) and old Algo(RS) for 
> sorting.
> I have 8GB of ram and ssd storage.
>
> Settings and Results.
> 
> Work_mem= DEFAULT (4mb).
> key width = 520.

> If data is sorted as same as sort key order then current code performs better 
> than proposed patch
> as sort size increases.
>
> It appears new algo do not seem have any major impact if rows are presorted 
> in opposite order.
>
> For randomly distributed order quick sort performs well when compared to 
> current sort method (RS).
>
>
> ==
> Now Increase the work_mem to 64MB and for 14 GB of data to sort.
>
> CASE 1: We can see Qsort is able to catchup with current sort method(RS).
> CASE 2:  No impact.
> CASE 3: RS is able to catchup with Qsort.


I think that the basic method you're using to do these tests may have
additional overhead:

-- sort in ascending order.
CREATE FUNCTION test_orderby_asc( ) RETURNS int
AS $$
#print_strict_params on
DECLARE
gs int;
jk text;
BEGIN
SELECT string_4k, generate_series INTO  jk, gs
FROM so order by string_4k, generate_series;

RETURN gs;
END
$$ LANGUAGE plpgsql;

Anyway, these test cases all remove much of the advantage of increased
cache efficiency.  No comparisons are *ever* resolved using the
leading attribute, which calls into question why anyone would sort on
that. It's 512 bytes, so artificially makes the comparisons themselves
the bottleneck, as opposed to cache efficiency. You can't even fit the
second attribute in the same cacheline as the first in the "tuple
proper" (MinimalTuple).

You are using a 4MB work_mem setting, but you almost certainly have a
CPU with an L3 cache size that's a multiple of that, even with cheap
consumer grade hardware. You have 8GB of ram; a 4MB work_mem setting
is very small setting (I mean in an absolute sense, less so than
relative to the size of data, although especially relative to the
data).

You mentioned "CASE 3: RS is able to catchup with Qsort", which
doesn't make much sense to me. The only way I think that is possible
is by making the increased work_mem sufficient to have much longer
runs, because there is in fact somewhat of a correlation in the data,
and an increased work_mem makes the critical difference, allowing
perhaps one long run to be used -- there is now enough memory to
"juggle" tuples without ever needing to start a new run. But, how
could that be? You said case 3 was totally random data, so I'd only
expect incremental improvement. It could also be some weird effect
from polyphase merge. A discontinuity.

I also don't understand why the patch ("Qsort") can be so much slower
between case 1 and case 3 on 3.5GB+ sizes, but not the 1.7GB size.
Even leaving aside the differences between "RS" and "Qsort", it makes
no sense to me that *both* are faster with random data ("CASE 3") than
with presorted data ("CASE 1").

Another weird thing is that the traditional best case for replacement
selection ("RS") is a strong correlation, and a traditional worst case
is an inverse correlation, where run size is bound strictly by memory.
But you show just the opposite here -- the inverse correlation is
faster with RS in the 1.7 GB data case. So, I have no idea what's
going on here, and find it all very confusing.

In order for these numbers to be useful, they need more detail --
"trace_sort" output. There are enough confounding factors in general,
and especially here, that not having that information makes raw
numbers very difficult to interpret.

> I think for long keys both old (RS) and new (Qsort) sort method has its own 
> characteristics
> based on data distribution. I think work_mem is the key If properly set new 
> method(Qsort) will
> be able to fit most of the cases. If work_mem is not tuned right it, there 
> are cases it can regress.

work_mem is impossible to tune right with replacement selection.
That's a key advantage of the proposed new approach.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-01-29 Thread Aleksander Alekseev
I tested this patch on x64 and ARM servers for a few hours today. The
only problem I could find is that INSERT works considerably slower after
applying a patch. Beside that everything looks fine - no crashes, tests
pass, memory doesn't seem to leak, etc.

> Okay, now for some badness.  I've restored a database containing 2
> tables, one 318MB, another 24kB.  The 318MB table contains 5 million
> rows with a sequential id column.  I get a problem if I try to delete
> many rows from it:
> # delete from contacts where id % 3 != 0 ;
> WARNING:  out of shared memory
> WARNING:  out of shared memory
> WARNING:  out of shared memory

I didn't manage to reproduce this. Thom, could you describe exact steps
to reproduce this issue please?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-01-29 Thread Catalin Iacob
On Tue, Jan 26, 2016 at 5:42 PM, Pavel Stehule  wrote:
> I removed from previous patch all OOP related changes. New patch contains
> raise_ functions only. This interface is new generation of previous
> functions: info, notice, warning, error with keyword parameters interface. I
> didn't changed older functions due keeping compatibility.

Hi,

Even without the OOP changes, the exception classes are still there as
the underlying mechanism that error and raise_error use.

I looked at the patch and what I don't like about it is that
raise_error uses SPIError to transport detail, hint etc while error
uses Error. This is inconsistent and confusing.

Take this example:

CREATE OR REPLACE FUNCTION err()
  RETURNS int
AS $$
  plpy.error('only msg from plpy.error', 'arg2 for error is part of
tuple', 'arg3 also in tuple')
$$ LANGUAGE plpython3u;

SELECT err();

CREATE OR REPLACE FUNCTION raise_err()
  RETURNS int
AS $$
  plpy.raise_error('only msg from plpy.raise_error', 'arg2 for
raise_error is detail', 'arg3 is hint')
$$ LANGUAGE plpython3u;

SELECT raise_err();

With your patch, this results in:

CREATE FUNCTION
psql:plpy_error_raise_error_difference:9: ERROR:  plpy.Error: ('only
msg from plpy.error', 'arg2 for error is part of tuple', 'arg3 also in
tuple')
CONTEXT:  Traceback (most recent call last):
  PL/Python function "err", line 2, in 
plpy.error('only msg from plpy.error', 'arg2 for error is part of
tuple', 'arg3 also in tuple')
PL/Python function "err"
CREATE FUNCTION
psql:plpy_error_raise_error_difference:17: ERROR:  plpy.SPIError: only
msg from plpy.raise_error
DETAIL:  arg2 for raise_error is detail
HINT:  arg3 is hint
CONTEXT:  Traceback (most recent call last):
  PL/Python function "raise_err", line 2, in 
plpy.raise_error('only msg from plpy.raise_error', 'arg2 for
raise_error is detail', 'arg3 is hint')
PL/Python function "raise_err"

>From the output you can already see that plpy.error and
plpy.raise_error (even with a single argument) don't use the same
exception: plpy.error uses Error while raise_error uses SPIError. I
think with a single argument they should be identical and with
multiple arguments raise_error should still use Error but use the
arguments as detail, hint etc. In code you had to export a function to
plpy_spi.h to get to the details in SPIError while plpy.error worked
just fine without that.

I've attached two patches on top of yours: first is a comment fix, the
next one is a rough proof of concept for using plpy.Error to carry the
details. This allows undoing the change to export
PLy_spi_exception_set to plpy_spi.h. And then I realized, the changes
you did to SPIError are actually a separate enhancement (report more
details like schema name, table name and so on for the query executed
by SPI). They should be split into a separate patch. With these
patches the output of the above test is:

CREATE FUNCTION
psql:plpy_error_raise_error_difference:9: ERROR:  plpy.Error: ('only
msg from plpy.error', 'arg2 for error is part of tuple', 'arg3 also in
tuple')
CONTEXT:  Traceback (most recent call last):
  PL/Python function "err", line 2, in 
plpy.error('only msg from plpy.error', 'arg2 for error is part of
tuple', 'arg3 also in tuple')
PL/Python function "err"
CREATE FUNCTION
psql:plpy_error_raise_error_difference:17: ERROR:  plpy.Error: only
msg from plpy.raise_error
DETAIL:  arg2 for raise_error is detail
HINT:  arg3 is hint
CONTEXT:  Traceback (most recent call last):
  PL/Python function "raise_err", line 2, in 
plpy.raise_error('only msg from plpy.raise_error', 'arg2 for
raise_error is detail', 'arg3 is hint')
PL/Python function "raise_err"

The two approaches are consistent now, both create a plpy.Error. The
patch is not complete, it only handles detail and hint not the others
but it should illustrate the idea.

Looking at the output above, I don't see who would rely on calling
plpy.error with multiple arguments and getting the tuple so I'm
actually in favor of just breaking backward compatibility. Note that
passing multiple arguments isn't even documented. So I would just
change debug, info, error and friends to do what raise_debug,
raise_info, raise_error do. With a single argument behavior stays the
same, with multiple arguments one gets more useful behavior (detail,
hint) instead of the useless tuple. That's my preference but we can
leave the patch with raise and leave the decision to the committer.

What do you think? Jim doesn't like the separate Error being raised. I
don't agree, but even if I would, it's not this patch's job to change
that and Error is already raised today. This patch should be
consistent with the status quo and therefore be similar to plpy.error.
If Error is bad, it should be replaced by SPIError everywhere
separately.

Next week I'll send a patch to improve the docs.


0002-use-plpy.Error-to-hold-the-data-not-plpy.SPIError.patch
Description: binary/octet-stream


0001-fix-comment.patch
Description: 

Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Robert Haas
On Wed, Jan 27, 2016 at 8:20 AM, Peter Geoghegan  wrote:
> Correct me if I'm wrong, but I think that the only outstanding issue
> with all patches posted here so far is the "quicksort with spillover"
> cost model. Hopefully this can be cleared up soon. As I've said, I am
> very receptive to other people's suggestions about how that should
> work.

I feel like this could be data driven.  I mean, the cost model is
based mainly on the tuple width and the size of the SortTuple array.
So, it should be possible to tests of both algorithms on 32, 64, 96,
128, ... byte tuples with a SortTuple array that is 256MB, 512MB,
768MB, 1GB, ...  Then we can judge how closely the cost model comes to
mimicking the actual behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Peter Geoghegan
On Fri, Jan 29, 2016 at 9:24 AM, Robert Haas  wrote:
> I feel like this could be data driven.  I mean, the cost model is
> based mainly on the tuple width and the size of the SortTuple array.
> So, it should be possible to tests of both algorithms on 32, 64, 96,
> 128, ... byte tuples with a SortTuple array that is 256MB, 512MB,
> 768MB, 1GB, ...  Then we can judge how closely the cost model comes to
> mimicking the actual behavior.

You would also need to represent how much of the input actually ended
up being sorted with the heap in each case. Maybe that could be tested
at 50% (bad for "quicksort with spillover"), 25% (better), and 5%
(good).

An alternative approach that might be acceptable is to add a generic,
conservative 90% threshold (so 10% of tuples sorted by heap).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [WIP] Effective storage of duplicates in B-tree index.

2016-01-29 Thread Thom Brown
On 29 January 2016 at 16:50, Anastasia Lubennikova
 wrote:
> 29.01.2016 19:01, Thom Brown:
>>
>> On 29 January 2016 at 15:47, Aleksander Alekseev
>>  wrote:
>>>
>>> I tested this patch on x64 and ARM servers for a few hours today. The
>>> only problem I could find is that INSERT works considerably slower after
>>> applying a patch. Beside that everything looks fine - no crashes, tests
>>> pass, memory doesn't seem to leak, etc.
>
> Thank you for testing. I rechecked that, and insertions are really very very
> very slow. It seems like a bug.
>
 Okay, now for some badness.  I've restored a database containing 2
 tables, one 318MB, another 24kB.  The 318MB table contains 5 million
 rows with a sequential id column.  I get a problem if I try to delete
 many rows from it:
 # delete from contacts where id % 3 != 0 ;
 WARNING:  out of shared memory
 WARNING:  out of shared memory
 WARNING:  out of shared memory
>>>
>>> I didn't manage to reproduce this. Thom, could you describe exact steps
>>> to reproduce this issue please?
>>
>> Sure, I used my pg_rep_test tool to create a primary (pg_rep_test
>> -r0), which creates an instance with a custom config, which is as
>> follows:
>>
>> shared_buffers = 8MB
>> max_connections = 7
>> wal_level = 'hot_standby'
>> cluster_name = 'primary'
>> max_wal_senders = 3
>> wal_keep_segments = 6
>>
>> Then create a pgbench data set (I didn't originally use pgbench, but
>> you can get the same results with it):
>>
>> createdb -p 5530 pgbench
>> pgbench -p 5530 -i -s 100 pgbench
>>
>> And delete some stuff:
>>
>> thom@swift:~/Development/test$ psql -p 5530 pgbench
>> Timing is on.
>> psql (9.6devel)
>> Type "help" for help.
>>
>>
>>   ➤ psql://thom@[local]:5530/pgbench
>>
>> # DELETE FROM pgbench_accounts WHERE aid % 3 != 0;
>> WARNING:  out of shared memory
>> WARNING:  out of shared memory
>> WARNING:  out of shared memory
>> WARNING:  out of shared memory
>> WARNING:  out of shared memory
>> WARNING:  out of shared memory
>> WARNING:  out of shared memory
>> ...
>> WARNING:  out of shared memory
>> WARNING:  out of shared memory
>> DELETE 667
>> Time: 22218.804 ms
>>
>> There were 358 lines of that warning message.  I don't get these
>> messages without the patch.
>>
>> Thom
>
>
> Thank you for this report.
> I tried to reproduce it, but I couldn't. Debug will be much easier now.
>
> I hope I'll fix these issueswithin the next few days.
>
> BTW, I found a dummy mistake, the previous patch contains some unrelated
> changes. I fixed it in the new version (attached).

Thanks.  Well I've tested this latest patch, and the warnings are no
longer generated.  However, the index sizes show that the patch
doesn't seem to be doing its job, so I'm wondering if you removed too
much from it.

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Additional role attributes && superuser review

2016-01-29 Thread Michael Paquier
On Fri, Jan 29, 2016 at 11:41 PM, Stephen Frost  wrote:
> Michael,
>
> * Michael Paquier (michael.paqu...@gmail.com) wrote:
>> On Fri, Jan 29, 2016 at 6:37 AM, Stephen Frost  wrote:
>> > * Robert Haas (robertmh...@gmail.com) wrote:
>> >> On Thu, Jan 28, 2016 at 11:04 AM, Stephen Frost 
>> wrote:
>> >> > Personally, I don't have any particular issue having both, but the
>> >> > desire was stated that it would be better to have the regular
>> >> > GRANT EXECUTE ON catalog_func() working before we consider having
>> >> > default roles for same.  That moves the goal posts awful far though, if
>> >> > we're to stick with that and consider how we might extend the GRANT
>> >> > system in the future.
>> >>
>> >> I don't think it moves the goal posts all that far.  Convincing
>> >> pg_dump to dump grants on system functions shouldn't be a crazy large
>> >> patch.
>> >
>> > I wasn't clear as to what I was referring to here.  I've already written
>> > a patch to pg_dump to support grants on system objects and agree that
>> > it's at least reasonable.
>>
>> Is it already posted somewhere? I don't recall seeing it. Robert and Noah
>> have a point that this would be useful for users who would like to dump
>> GRANT/REVOKE rights on system functions & all, using a new option in
>> pg_dumpall, say --with-system-acl or --with-system-privileges.
>
> Multiple versions were posted on this thread.  I don't fault you for not
> finding it, this thread is a bit long in the tooth.  The one I'm
> currently working from is:
>
> http://www.postgresql.org/message-id/attachment/38049/catalog_function_acls_v4.patch
>
> I'm going to split up the pg_dump changes and the backend changes, as
> they can logically go in independently (though without both, we're not
> moving the needle very far with regards to what administrators can do).

OK. Looks like a good way to move forward to me.

>> If at least
>> the three of you are agreeing here I think that we should try to move at
>> least toward this goal first. That seems a largely doable goal for 9.6. For
>> the set of default roles, there is clearly no clear consensus regarding
>> what each role should do or not, and under which limitation it should
>> operate.
>
> I'm trying to work towards a consensus on the default roles, hence the
> questions and discussion posed in the email you replied to.

So the current CF entry should be marked as returned with feedback
perhaps? What do you think? Once patches are ready for the default
roles in backend and for pg_dump, we could then work on reviewing them
separately.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Robert Haas
On Fri, Jan 29, 2016 at 12:46 PM, Peter Geoghegan  wrote:
> On Fri, Jan 29, 2016 at 9:24 AM, Robert Haas  wrote:
>> I feel like this could be data driven.  I mean, the cost model is
>> based mainly on the tuple width and the size of the SortTuple array.
>> So, it should be possible to tests of both algorithms on 32, 64, 96,
>> 128, ... byte tuples with a SortTuple array that is 256MB, 512MB,
>> 768MB, 1GB, ...  Then we can judge how closely the cost model comes to
>> mimicking the actual behavior.
>
> You would also need to represent how much of the input actually ended
> up being sorted with the heap in each case. Maybe that could be tested
> at 50% (bad for "quicksort with spillover"), 25% (better), and 5%
> (good).
>
> An alternative approach that might be acceptable is to add a generic,
> conservative 90% threshold (so 10% of tuples sorted by heap).

I don't quite know what you mean by these numbers.  Add a generic,
conservative threshold to what?

Thinking about this some more, I really think we should think hard
about going back to the strategy which you proposed and discarded in
your original post: always generate the first run using replacement
selection, and every subsequent run by quicksorting.  In that post you
mention powerful advantages of this method: "even without a strong
logical/physical correlation, the algorithm tends to produce runs that
are about twice the size of work_mem. (It's also notable that
replacement selection only produces one run with mostly presorted
input, even where input far exceeds work_mem, which is a neat trick.)"
 You went on to dismiss that strategy, saying that "despite these
upsides, replacement selection is obsolete, and should usually be
avoided."  But I don't see that you've justified that statement.  It
seems pretty easy to construct cases where this technique regresses,
and a large percentage of those cases are precisely those where
replacement selection would have produced a single run, avoiding the
merge step altogether.  I think those cases are extremely important.
I'm quite willing to run somewhat more slowly than in other cases to
be certain of not regressing the case of completely or
almost-completely ordered input.  Even if that didn't seem like a
sufficient reason unto itself, I'd be willing to go that way just so
we don't have to depend on a cost model that might easily go wrong due
to bad input even if it were theoretically perfect in every other
respect (which I'm pretty sure is not true here anyway).

I also have another idea that might help squeeze more performance out
of your approach and avoid regressions.  Suppose that we add a new GUC
with a name like sort_mem_stretch_multiplier or something like that,
with a default value of 2.0 or 4.0 or whatever we think is reasonable.
When we've written enough runs that a polyphase merge will be
required, or when we're actually performing a polyphase merge, the
amount of memory we're allowed to use increases by this multiple.  The
idea is: we hope that people will set work_mem appropriately and
consequently won't experience polyphase merges at all, but it might.
However, it's almost certain not to happen very frequently.
Therefore, using extra memory in such cases should be acceptable,
because while you might have every backend in the system using 1 or
more copies of work_mem for something if the system is very busy, it
is extremely unlikely that you will have more than a handful of
processes doing polyphase merges.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Tom Lane
Alexander Korotkov  writes:
> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera 
> wrote:
>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>> like that.

> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
> handler;", but I don't insist.

I think that Alvaro's idea is less likely to risk future grammar
conflicts.  We've had enough headaches from CREATE [ UNIQUE ] INDEX
[ CONCURRENTLY ] to make me really wary of variables in the statement-name
part of the syntax.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Robert Haas
On Fri, Jan 29, 2016 at 5:24 PM, Tom Lane  wrote:
> Alexander Korotkov  writes:
>> On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera 
>> wrote:
>>> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
>>> like that.
>
>> I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
>> handler;", but I don't insist.
>
> I think that Alvaro's idea is less likely to risk future grammar
> conflicts.  We've had enough headaches from CREATE [ UNIQUE ] INDEX
> [ CONCURRENTLY ] to make me really wary of variables in the statement-name
> part of the syntax.

Strong +1.  If we put the type of access method immediately after
CREATE, I'm almost positive we'll regret it for exactly that reason.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Releasing in September

2016-01-29 Thread Jim Nasby

On 1/22/16 12:14 PM, Andres Freund wrote:

On 2016-01-22 08:40:28 -0600, Jim Nasby wrote:

Ideally reviewers shouldn't be doing any testing, because the tests
that are part of the patch should answer every question they would
have, but I don't see that happening until we have a separate
automation-only target that we don't care how long it takes to run.


I think that's completely wrong.

Yes, more tests are good, and we need a place for longer running
tests. But assuming that every patch author will create a testsuite that
covers every angle is just about akin to assuming every submitter will
deliver perfect, bug free code. And we know how well that turns out.

I think actively trying to break a feature, and postgres in general, is
one of the most important tasks of reviewers and testers. And with that
I don't mean trying to run "make check". Look e.g. at the tests Jeff
Janes has performed, what the recent plug tests of Tomas Vondra brought
to light, or at what the full page write checker tool of Heikki's
showed.


IIRC Jeff's tests are scripted and obviously the page write checker is 
as well. I don't recall the exact methodology Tomas was using but I 
suspect it could also be scripted if you had a way to pull the plug via 
software (via a power management unit or maybe kill -9 of a VM). All of 
that is stuff that can and should be automated. Presumably it won't ever 
be part of the Makefile tests, but that's fine. Heck, the test scripts 
could stay in completely separate repos.


Where the code lives isn't the issue; it's getting stuff like this 
automated so humans can go back do doing things that can't be automated.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizer questions

2016-01-29 Thread Alexander Korotkov
On Fri, Jan 8, 2016 at 11:58 AM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> Attached please find improved version of the optimizer patch for LIMIT
> clause.
> Now I try to apply this optimization only for non-trivial columns
> requiring evaluation.
> May be it will be better to take in account cost of this columns
> evaluation but right now I just detect non-variable columns.
>

We may think about more general feature: LIMIT pushdown. In the
Konstantin's patch planner push LIMIT before tlist calculation.
But there are other cases when calculating LIMIT first would be beneficial.
For instance, we can do LIMIT before JOINs. That is possible only for LEFT
JOIN which is not used in filter and ordering clauses. See the example
below.

# create table t1 as (select i, random() v from generate_series(1,100)
i);
SELECT 100

# create table t2 as (select i, random() v from generate_series(1,100)
i);
SELECT 100

# explain analyze select * from t1 left join t2 on t1.i = t2.i order by
t1.v limit 10;
  QUERY PLAN

 Limit  (cost=87421.64..87421.67 rows=10 width=24) (actual
time=1486.276..1486.278 rows=10 loops=1)
   ->  Sort  (cost=87421.64..89921.64 rows=100 width=24) (actual
time=1486.275..1486.275 rows=10 loops=1)
 Sort Key: t1.v
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Hash Left Join  (cost=27906.00..65812.00 rows=100
width=24) (actual time=226.180..1366.238 rows=100
   Hash Cond: (t1.i = t2.i)
   ->  Seq Scan on t1  (cost=0.00..15406.00 rows=100
width=12) (actual time=0.010..77.041 rows=100 l
   ->  Hash  (cost=15406.00..15406.00 rows=100 width=12)
(actual time=226.066..226.066 rows=100 loop
 Buckets: 131072  Batches: 1  Memory Usage: 46875kB
 ->  Seq Scan on t2  (cost=0.00..15406.00 rows=100
width=12) (actual time=0.007..89.002 rows=100
 Planning time: 0.123 ms
 Execution time: 1492.118 ms
(12 rows)

# explain analyze select * from (select * from t1 order by t1.v limit 10)
t1 left join t2 on t1.i = t2.i;
  QUERY PLAN

 Hash Right Join  (cost=37015.89..56171.99 rows=10 width=24) (actual
time=198.478..301.278 rows=10 loops=1)
   Hash Cond: (t2.i = t1.i)
   ->  Seq Scan on t2  (cost=0.00..15406.00 rows=100 width=12) (actual
time=0.005..74.303 rows=100 loops=1)
   ->  Hash  (cost=37015.77..37015.77 rows=10 width=12) (actual
time=153.584..153.584 rows=10 loops=1)
 Buckets: 1024  Batches: 1  Memory Usage: 1kB
 ->  Limit  (cost=37015.64..37015.67 rows=10 width=12) (actual
time=153.579..153.580 rows=10 loops=1)
   ->  Sort  (cost=37015.64..39515.64 rows=100 width=12)
(actual time=153.578..153.578 rows=10 loops=1)
 Sort Key: t1.v
 Sort Method: top-N heapsort  Memory: 25kB
 ->  Seq Scan on t1  (cost=0.00..15406.00 rows=100
width=12) (actual time=0.012..78.828 rows=100
 Planning time: 0.132 ms
 Execution time: 301.308 ms
(12 rows)

In this example LIMIT pushdown makes query 5 times faster. It would be very
nice if optimizer make this automatically.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Fwd: Core dump with nested CREATE TEMP TABLE

2016-01-29 Thread Noah Misch
On Thu, Jan 28, 2016 at 11:12:15AM -0500, Robert Haas wrote:
> I'm honestly failing to understand why we should change anything at
> all.  I don't believe that doing something more severe than marking
> the portal failed actually improves anything.  I suppose if I had to
> pick between what you proposed before and elog(FATAL) I'd pick the
> latter, but I see no real reason to cut off future code (or
> third-party code) at the knees like that.  I don't see it as either
> necessary or desirable to forbid something just because there's no
> clear present use case for it.

As you say, forbidding things makes friction in the event that someone comes
along wanting to do the forbidden thing.  Forbidding things also simplifies
the system, making it easier to verify.  This decision should depend on the
API's audience.  We have prominently-public APIs like lsyscache.h,
stringinfo.h and funcapi.h.  They should cater to reasonably-foreseeable use
cases, not just what yesterday's users have actually used.  We then have
narrow-interest APIs like subselect.h and view.h.  For those, the value of a
simpler system exceeds the risk-adjusted cost of friction.  They should impose
strict constraints on their callers.

Portal status belongs to the second category.  PortalRun(), PortalRunFetch()
and PersistHoldablePortal() are the three core functions that place portals
into PORTAL_ACTIVE status.  No PGXN module does so; none so much as references
a PortalStatus constant or MarkPortal{Active,Done,Failed}() function.  If
someone adds a fourth core portal runner, the system will be simpler and
better if that function replicates the structure of the existing three.

> The code you quote emits a warning
> about a reasonably forseeable situation that can never be right, but
> there's no particular reason to think that MarkPortalFailed is the
> wrong thing to do here if that situation came up.

I have two reasons to expect these MarkPortalFailed() calls will be the wrong
thing for hypothetical code reaching them.  First, PortalRun() and peers reset
ActivePortal and PortalContext on error in addition to fixing portal status
with MarkPortalFailed().  If code forgets to update the status, there's a
substantial chance it forgot the other two.  My patch added a comment
explaining the second reason:

+   /*
+* See similar code in AtSubAbort_Portals().  This would fire 
if code
+* orchestrating multiple top-level transactions within a 
portal, such
+* as VACUUM, caught errors and continued under the same portal 
with a
+* fresh transaction.  No part of core PostgreSQL functions 
that way,
+* though it's a fair thing to want.  Such code would wish the 
portal
+* to remain ACTIVE, as in PreCommit_Portals(); we don't cater 
to
+* that.  Instead, like AtSubAbort_Portals(), interpret this as 
a bug.
+*/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Greg Stark
On 29 Jan 2016 11:58 pm, "Robert Haas"  wrote:
> It
> seems pretty easy to construct cases where this technique regresses,
> and a large percentage of those cases are precisely those where
> replacement selection would have produced a single run, avoiding the
> merge step altogether.

Now that avoiding the merge phase altogether didn't necessarily represent
any actual advantage.

We don't find out we've avoided the merge phase until the entire run has
been spiked to disk. Then we need to read it back in from disk to serve up
those tuples.

If we have tapes to merge but can do then in a single pass we do that
lazily and merge as needed when we serve up the tuples. I doubt there's any
speed difference in reading two sequential streams with our buffering over
one especially in the midst of a quiet doing other i/o. And N extra
comparisons is less than the quicksort advantage.

If we could somehow predict that it'll be a single output run that would be
a huge advantage. But having to spill all the tuples and then find out
isn't really helpful.


Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Greg Stark
On 30 Jan 2016 8:27 am, "Greg Stark"  wrote:
>
>
> On 29 Jan 2016 11:58 pm, "Robert Haas"  wrote:
> > It
> > seems pretty easy to construct cases where this technique regresses,
> > and a large percentage of those cases are precisely those where
> > replacement selection would have produced a single run, avoiding the
> > merge step altogether.
>
> Now that avoiding the merge phase altogether didn't necessarily represent
any actual advantage.
>
> We don't find out we've avoided the merge phase until the entire run has
been spiked to disk.

Hm, sorry about the phone typos. I thought I proofread it as I went but
obviously not that effectively...


Re: [HACKERS] Using quicksort for every external sort run

2016-01-29 Thread Peter Geoghegan
On Fri, Jan 29, 2016 at 2:58 PM, Robert Haas  wrote:
> I don't quite know what you mean by these numbers.  Add a generic,
> conservative threshold to what?

I meant use "quicksort with spillover" simply because an estimated
90%+ of all tuples have already been consumed. Don't consider the
tuple width, etc.

> Thinking about this some more, I really think we should think hard
> about going back to the strategy which you proposed and discarded in
> your original post: always generate the first run using replacement
> selection, and every subsequent run by quicksorting.  In that post you
> mention powerful advantages of this method: "even without a strong
> logical/physical correlation, the algorithm tends to produce runs that
> are about twice the size of work_mem. (It's also notable that
> replacement selection only produces one run with mostly presorted
> input, even where input far exceeds work_mem, which is a neat trick.)"
>  You went on to dismiss that strategy, saying that "despite these
> upsides, replacement selection is obsolete, and should usually be
> avoided."  But I don't see that you've justified that statement.

Really? Just try it with a heap that is not tiny. Performance tanks.
The fact that replacement selection can produce one long run then
becomes a liability, not a strength. With a work_mem of something like
1GB, it's *extremely* painful.

> It seems pretty easy to construct cases where this technique regresses,
> and a large percentage of those cases are precisely those where
> replacement selection would have produced a single run, avoiding the
> merge step altogether.

...*and* where many passes are otherwise required (otherwise, the
merge is still cheap enough to leave us ahead). Typically with very
small work_mem settings, like 4MB, and far larger data volumes. It's
easy to construct those cases, but that doesn't mean that they
particularly matter. Using 4MB of work_mem to sort 10GB of data is
penny wise and pound foolish. The cases we've seen regressed are
mostly a concern because misconfiguration happens.

A compromise that may be acceptable is to always do a "quicksort with
spillover" when there is a very low work_mem setting and the estimate
of the number of input tuples is less than 10x of what we've seen so
far. Maybe less than 20MB. That will achieve the same thing.

> I'm quite willing to run somewhat more slowly than in other cases to
> be certain of not regressing the case of completely or
> almost-completely ordered input.  Even if that didn't seem like a
> sufficient reason unto itself, I'd be willing to go that way just so
> we don't have to depend on a cost model that might easily go wrong due
> to bad input even if it were theoretically perfect in every other
> respect (which I'm pretty sure is not true here anyway).

The consequences of being wrong either way are not severe (note that
making one long run isn't a goal of the cost model currently).

> I also have another idea that might help squeeze more performance out
> of your approach and avoid regressions.  Suppose that we add a new GUC
> with a name like sort_mem_stretch_multiplier or something like that,
> with a default value of 2.0 or 4.0 or whatever we think is reasonable.
> When we've written enough runs that a polyphase merge will be
> required, or when we're actually performing a polyphase merge, the
> amount of memory we're allowed to use increases by this multiple.  The
> idea is: we hope that people will set work_mem appropriately and
> consequently won't experience polyphase merges at all, but it might.
> However, it's almost certain not to happen very frequently.
> Therefore, using extra memory in such cases should be acceptable,
> because while you might have every backend in the system using 1 or
> more copies of work_mem for something if the system is very busy, it
> is extremely unlikely that you will have more than a handful of
> processes doing polyphase merges.

I'm not sure that that's practical. Currently, tuplesort decides on a
number of tapes ahead of time. When we're constrained on those, the
stretch multiplier would apply, but I think that that could be
invasive because the number of tapes ("merge order" + 1) was a
function of non-stretched work_mem.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-01-29 Thread Amit Kapila
On Fri, Jan 29, 2016 at 6:21 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Mon, Jan 4, 2016 at 5:58 PM, Robert Haas  wrote:
>
>> On Mon, Jan 4, 2016 at 1:20 AM, Amit Kapila 
>> wrote:
>> > If we do that way, then user of API needs to maintain the interlock
>> > guarantee that the requested number of locks is same as allocations
>> > done from that tranche and also if it is not allocating all the
>> > LWLocks in one function, it needs to save the base address of the
>> > array and then allocate from it by maintaining some counter.
>> > I agree that looking up for tranche names multiple times is not good,
>> > if there are many number of lwlocks, but that is done at startup
>> > time and not at operation-level.  Also if we look currently at
>> > the extensions in contrib, then just one of them allocactes one
>> > LWLock in this fashion, now there could be extnsions apart from
>> > extensions in contrib, but not sure if they require many number of
>> > LWLocks, so I feel it is better to give an API which is more
>> > convinient for user to use.
>>
>> Well, I agree with you that we should provide the most convenient API
>> possible, but I don't agree that yours is more convenient than the one
>> I proposed.  I think it is less convenient.  In most cases, if the
>> user asks for a large number of LWLocks, they aren't going to be each
>> for a different purpose - they will all be for the same purpose, like
>> one per buffer or one per hash partition.  The case where the user
>> wants to allocate 8 lwlocks from an extension, each for a different
>> purpose, and spread those allocations across a bunch of different
>> functions probably does not exist.  *Maybe* there is somebody
>> allocating lwlocks from an extension for unrelated purposes, but I'd
>> be willing to bet that, if so, all of those allocations happen one
>> right after the other in a single function, because anything else
>> would be completely nuts.
>>
>
> I'm not sure if we should return LWLockPadded*.
>
> +LWLockPadded *
>> +GetLWLockAddinTranche(const char *tranche_name)
>
>
> It would be nice to isolate extension from knowledge about padding. Could
> we one day change it from LWLockPadded * to LWLockMinimallyPadded * or any?
>
>
Valid point, but I think there is an advantage of exposing padded
structure as well which is that if extension owner wants
LWLockPadded for some performance reason or otherwise (like
currently SlruSharedData is using), it can use this API as it is.


> +LWLock **
>> +GetLWLockAddinTranche(const char *tranche_name)
>
>
> Could we use this signature?
>
>

I think we can do this way as well.  There is some discussion
upthread [1] about the signature of API where the current API has
been thought of as a better API.

I think we can expose it in many ways like the v1 version of patch or
the way it has been discussed and done in latest patch or in the
way you are suggesting and each has its pros and cons.  It seems
to me we should leave this point at committers discretion unless,
there is clear win for any kind of API or more people are in favour of one
kind of signature over other.



[1] -
http://www.postgresql.org/message-id/ca+tgmozfbrjto3rnebprmmccbvysbjkrjndfqgckoaenh3v...@mail.gmail.com


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-01-29 Thread Amit Kapila
On Fri, Jan 29, 2016 at 6:55 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:
>
> Also couple of minor comments from me.
>
> I think this
>
> + StrNCpy(LWLockTrancheRequestArray[LWLockTrancheRequestsCount].tranche_name,
>> tranche_name, strlen(tranche_name) + 1);
>
>
> should be
>
> + StrNCpy(LWLockTrancheRequestArray[LWLockTrancheRequestsCount].tranche_name,
>> tranche_name,
>> sizeof(LWLockTrancheRequestArray[LWLockTrancheRequestsCount].tranche_name));
>
>
>
I think you are right, otherwise it might try to copy more, how
about

StrNCpy(LWLockTrancheRequestArray[LWLockTrancheRequestsCount].tranche_name,
tranche_name, Min (strlen(tranche_name) + 1,
sizeof(LWLockTrancheRequestArray[LWLockTrancheRequestsCount].tranche_name)));



> And as far as I know english "it's" should be "its" in the sentence below.
>
> + from _PG_init.  Tranche repersents an array of LWLocks
>> and
>> + can be accessed by it's name.  First parameter
>> tranche_name
>
>
>
Right, will change.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-01-29 Thread Amit Kapila
On Fri, Jan 29, 2016 at 6:47 PM, Robert Haas  wrote:
>
> On Fri, Jan 29, 2016 at 6:59 AM, Alexander Korotkov
>  wrote:
> > On Thu, Jan 21, 2016 at 12:37 AM, Alvaro Herrera <
alvhe...@2ndquadrant.com>
> > wrote:
> >> So far as I can tell, there are three patches in flight here:
> >>
> >> * replication slot IO lwlocks
> >> * ability of extensions to request tranches dynamically
> >> * PGPROC
> >>
> >> The first one hasn't been reviewed at all, but the other two have seen
a
> >> bit of discussion and evolution.  Is anyone doing any more reviewing?
> >
> > I'd like to add another one: fixed tranche id for each SLRU.
>
> +1 for this.  The patch looks good and I will commit it if nobody objects.
>

+1. Patch looks good to me as well, but I have one related question:
Is there a reason why we should not assign ReplicationOrigins a
fixed tranche id  and then we might want to even get away with
LWLockRegisterTranche()?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Template for commit messages

2016-01-29 Thread Alvaro Herrera
Joshua D. Drake wrote:

> I think the best question to ask is:
> 
> "What is the problem we are trying to solve?"

The problem is alluring more patch reviewers, beta testers and bug
reporters.  One of the offers is to credit them (I'm not exactly clear
on what is the group to benefit from this, but the phrasing used in the
meeting was "contributors to the release") by having a section somewhere
in the release notes with a list of their names.  This proposal is
different from the previous proposal because their names wouldn't appear
next to each feature.

So the problem, of course, is collating that list of names, and the
point of having a commit template is to have a single, complete source
of truth from where to extract the info.

Personally I don't see value in having the commit message follow a
machine-parseable format; like if you say "Backpatch to" instead of
"Backpatch-through:" makes your commit message wrong.  I think what was
being proposed is to have committers ensure that the commit messages
always carried the necessary info (which, as far as I know, they do.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-01-29 Thread Merlin Moncure
On Fri, Jan 29, 2016 at 3:17 PM, Alexander Korotkov
 wrote:
> Hi, Dilip!
>
> On Tue, Jan 19, 2016 at 6:00 PM, Dilip Kumar  wrote:
>>
>>
>> On Tue, Jan 19, 2016 at 5:44 PM, Michael Paquier
>>  wrote:
>>>
>>> > Test3:
>>> > pgbench -i -s 100 postgres
>>> > pgbench -c$ -j$ -Mprepared -S postgres
>>> >
>>> > Client Base  Pached
>>> >
>>> > 1  2055519404
>>> > 32  375919  332670
>>> > 64  509067  440680
>>> > 128431346  415121
>>> > 256380926  379176
>>>
>>> It seems like you did a copy-paste of the results with s=100 and
>>> s=300. Both are showing the exact same numbers.
>>
>>
>> Oops, my mistake, re-pasting the correct results for s=100
>>
>> pgbench -i -s 100 postgres
>> pgbench -c$ -j$ -Mprepared -S postgres
>>
>> Client Base  Pached
>>
>> 12054820791
>> 32  372633  355356
>> 64  532052  552148
>> 128412755  478826
>> 256 346701 372057
>
>
> Could you please re-run these tests few times?
> Just to be sure it's a reproducible regression with s=300 and not a
> statistical error.

Probably want to run for at least 5 minutes via -T 300

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Generic WAL logical messages

2016-01-29 Thread Alexander Korotkov
Hi, Petr!

On Sat, Jan 23, 2016 at 1:22 AM, Petr Jelinek  wrote:

> here is updated version of this patch, calling the messages logical
> (decoding) messages consistently everywhere and removing any connection to
> standby messages. Moving this to it's own module gave me place to write
> some brief explanation about this so the code documentation has hopefully
> improved as well.
>
> The functionality itself didn't change.


I'd like to mention that there is my upcoming patch which is named generic
WAL records.
*http://www.postgresql.org/message-id/capphfdsxwzmojm6dx+tjnpyk27kt4o7ri6x_4oswcbyu1rm...@mail.gmail.com
*
But it has to be distinct feature from your generic WAL logical messages.
Theoretically, we could have generic messages with arbitrary content and
both having custom WAL reply function and being decoded by output plugin.
But custom WAL reply function would let extension bug break recovery,
archiving and physical replication. And that doesn't seem to be acceptable.
This is why we have to develop these as separate features.

Should we think more about naming? Does two kinds of generic records
confuse people?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Sequence Access Method WIP

2016-01-29 Thread Alexander Korotkov
On Fri, Jan 29, 2016 at 6:36 PM, Alvaro Herrera 
wrote:

> Petr Jelinek wrote:
> > On 29 January 2016 at 14:48, Tom Lane  wrote:
> > > Alvaro Herrera  writes:
>
> > > Uh, what?  Surely we would provide a bespoke command for each possible
> > > sort of handler.  As an example, CREATE INDEX ACCESS METHOD ought to
> check
> > > that the provided function has the right signature, and then it would
> put
> > > the correct amkind into the pg_am entry automatically.
>
> I'm thinking we'd do CREATE ACCESS METHOD foobar TYPE INDEX or something
> like that.
>

Ok! Let's nail down the syntax and I can integrate it into my createam
patch.
I would prefer "CREATE {INDEX | SEQUENCE | ... } ACCESS METHOD name HANDLER
handler;", but I don't insist.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] [PATCH] better systemd integration

2016-01-29 Thread Pavel Stehule
Hi

>
> >
> > You sent only rebased code of previous version. I didn't find additional
> > checks.
>
> Oops.  Here is the actual new code.
>
>
New test is working as expected

I did lot of tests - and this code works perfect in single server mode, and
with slave hot-standby mode.

It doesn't work with only standby mode

[root@dhcppc1 pavel]# systemctl start pg2.service
Job for pg2.service failed because a timeout was exceeded. See "systemctl
status pg2.service" and "journalctl -xe" for details.

Default timeout on FC is 90 sec - it is should not to be enough for large
servers with large shared buffers and high checkpoint segments. It should
be mentioned in service file.

Regards

Pavel


Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-01-29 Thread Alexander Korotkov
Hi, Dilip!

On Tue, Jan 19, 2016 at 6:00 PM, Dilip Kumar  wrote:

>
> On Tue, Jan 19, 2016 at 5:44 PM, Michael Paquier <
> michael.paqu...@gmail.com> wrote:
>
>> > Test3:
>> > pgbench -i -s 100 postgres
>> > pgbench -c$ -j$ -Mprepared -S postgres
>> >
>> > Client Base  Pached
>> >
>> > 1  2055519404
>> > 32  375919  332670
>> > 64  509067  440680
>> > 128431346  415121
>> > 256380926  379176
>>
>> It seems like you did a copy-paste of the results with s=100 and
>> s=300. Both are showing the exact same numbers.
>
>
> Oops, my mistake, re-pasting the correct results for s=100
>
> pgbench -i -s 100 postgres
> pgbench -c$ -j$ -Mprepared -S postgres
>
> Client Base  Pached
>
> 12054820791
> 32  372633  355356
> 64  532052  552148
> 128412755  478826
> 256 346701 372057
>

Could you please re-run these tests few times?
Just to be sure it's a reproducible regression with s=300 and not a
statistical error.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company