date:20160201

Re: [HACKERS] extend pgbench expressions with functions

2016-02-01 Thread Fabien COELHO



Yet another rebase, so as to propagate the same special case checks int 
for / and %.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ade1b53..ebaf4e5 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -796,17 +796,21 @@ pgbench  options  dbname
   Sets variable varname to an integer value calculated
   from expression.
   The expression may contain integer constants such as 5432,
-  references to variables :variablename,
+  double constants such as 3.14159,
+  references to integer variables :variablename,
   and expressions composed of unary (-) or binary operators
-  (+, -, *, /, %)
-  with their usual associativity, and parentheses.
+  (+, -, *, /,
+  %) with their usual associativity, function calls and
+  parentheses.
+   shows the available
+  functions.
  
 
  
   Examples:
 
 \set ntellers 10 * :scale
-\set aid (1021 * :aid) % (10 * :scale) + 1
+\set aid (1021 * random(1, 10 * :scale)) % (10 * :scale) + 1
 
 

@@ -826,66 +830,35 @@ pgbench  options  dbname
  
 
  
-  By default, or when uniform is specified, all values in the
-  range are drawn with equal probability.  Specifying gaussian
-  or  exponential options modifies this behavior; each
-  requires a mandatory parameter which determines the precise shape of the
-  distribution.
- 
+  
+   
+
+ \setrandom n 1 10 or \setrandom n 1 10 uniform
+ is equivalent to \set n random(1, 10) and uses a uniform
+ distribution.
+
+   
 
- 
-  For a Gaussian distribution, the interval is mapped onto a standard
-  normal distribution (the classical bell-shaped Gaussian curve) truncated
-  at -parameter on the left and +parameter
-  on the right.
-  Values in the middle of the interval are more likely to be drawn.
-  To be precise, if PHI(x) is the cumulative distribution
-  function of the standard normal distribution, with mean mu
-  defined as (max + min) / 2.0, with
-
- f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
-(2.0 * PHI(parameter) - 1.0)
-
-  then value i between min and
-  max inclusive is drawn with probability:
-  f(i + 0.5) - f(i - 0.5).
-  Intuitively, the larger parameter, the more
-  frequently values close to the middle of the interval are drawn, and the
-  less frequently values close to the min and
-  max bounds. About 67% of values are drawn from the
-  middle 1.0 / parameter, that is a relative
-  0.5 / parameter around the mean, and 95% in the middle
-  2.0 / parameter, that is a relative
-  1.0 / parameter around the mean; for instance, if
-  parameter is 4.0, 67% of values are drawn from the
-  middle quarter (1.0 / 4.0) of the interval (i.e. from
-  3.0 / 8.0 to 5.0 / 8.0) and 95% from
-  the middle half (2.0 / 4.0) of the interval (second and
-  third quartiles). The minimum parameter is 2.0 for
-  performance of the Box-Muller transform.
- 
+  
+   
+\setrandom n 1 10 exponential 3.0 is equivalent to
+\set n random_exponential(1, 10, 3.0) and uses an
+exponential distribution.
+   
+  
 
- 
-  For an exponential distribution, parameter
-  controls the distribution by truncating a quickly-decreasing
-  exponential distribution at parameter, and then
-  projecting onto integers between the bounds.
-  To be precise, with
-
-f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter))
-
-  Then value i between min and
-  max inclusive is drawn with probability:
-  f(x) - f(x + 1).
-  Intuitively, the larger parameter, the more
-  frequently values close to min are accessed, and the
-  less frequently values close to max are accessed.
-  The closer to 0 parameter, the flatter (more uniform)
-  the access distribution.
-  A crude approximation of the distribution is that the most frequent 1%
-  values in the range, close to min, are drawn
-  parameter% of the time.
-  parameter value must be strictly positive.
+  
+   
+\setrandom n 1 10 gaussian 2.0 is equivalent to
+\set n random_gaussian(1, 10, 2.0), and uses a gaussian
+distribution.
+   
+  
+ 
+
+   See the documentation of these functions below for further information
+   about the precise shape of these distributions, depending on the value
+   of the parameter.
  
 
  
@@ -965,18 +938,184 @@ f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter))

   
 
+   
+   
+PgBench Functions
+
+ 
+  
+   Function
+   Return Type
+   Description
+   Example
+   Result
+  
+ 
+ 
+  
+   abs(a)
+   same as a
+   integer or double

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-01 Thread David Steele

On 1/31/16 8:07 AM, Alvaro Herrera wrote:
> David Steele wrote:
>> The attached patch implements audit logging for PostgreSQL as an
>> extension.  I believe I have addressed the concerns that were raised at
>> the end of the 9.5 development cycle.
> 
> This patch got no feedback at all during the commitfest.  I think there
> is some interest on auditing capabilities so it's annoying and
> surprising that this has no movement at all.

Yes, especially since I've had queries at conferences, by email, and
once on this list asking if it would be submitted again.  I would not
have taken the time to prepare it if I hadn't been getting positive
feedback.

> If the lack of activity means lack of interest, please can we all keep
> what we've been doing in this thread, which is to ignore it, so that we
> can just mark this as rejected.

If nobody comments or reviews then that would be the best course of action.

> I'm giving
> this patch some more time by moving it to next commitfest instead.

I was under the mistaken impression that the CF was two months long so I
appreciate getting more time.

Thanks,
-- 
-David
da...@pgmasters.net

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-01 Thread Alvaro Herrera

David Steele wrote:
> On 1/31/16 8:07 AM, Alvaro Herrera wrote:

> > I'm giving
> > this patch some more time by moving it to next commitfest instead.
> 
> I was under the mistaken impression that the CF was two months long so I
> appreciate getting a little more time.

Yeah, sorry to disappoint but my intention is to close shop as soon as
possible.  Commitfests are supposed to last one month only, leaving the
month before the start of the next one free so that committers and
reviewers can enjoy life.

Anyway, what would *you* do with any additional time?  It's reviewers
that need to chime in when a patch is in needs-review state.  You
already did your part by submitting.  Anyway I think the tests here are
massive and the code is not; perhaps people get the mistaken impression
that this is a huge amount of code which scares them.  Perhaps you could
split it up in (1) code and (2) tests, which wouldn't achieve any
technical benefit but would offer some psychological comfort to
potential reviewers.  You know it's all psychology in these parts.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-01 Thread Tom Lane

Alvaro Herrera  writes:
>   Anyway I think the tests here are
> massive and the code is not; perhaps people get the mistaken impression
> that this is a huge amount of code which scares them.  Perhaps you could
> split it up in (1) code and (2) tests, which wouldn't achieve any
> technical benefit but would offer some psychological comfort to
> potential reviewers.  You know it's all psychology in these parts.

Perhaps the tests could be made less bulky.  We do not need massive
permanent regression tests for a single feature, IMO.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/pgSQL, get diagnostics and big data

2016-02-01 Thread Andreas 'ads' Scherbaum


On 01.02.2016 21:24, Andreas 'ads' Scherbaum wrote:


Attached patch expands the row_count to 64 bit.


Remembering an issue we had recently with clear text patches, attached 
is a gzipped version as well.



Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project


64bit_3.diff.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-02-01 Thread Alvaro Herrera

Alexander Korotkov wrote:
> Hi!
> 
> Patches was rebased against master.
> 
> In the attached version of patch access methods get support of pg_dump.

Thanks.  This patch came in just as the commitfest was ending, so I'm
moving it to the next one.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pl/pgSQL, get diagnostics and big data

2016-02-01 Thread Andreas 'ads' Scherbaum



Hello,

one of our customers approached us and complained, that GET DIAGNOSTICS 
row_count returns invalid results if the number of rows is > 2^31. It's 
a bit complicated to test for this case, so I set up a separate instance 
with this patch, and inserted 2^32+x rows into a table. Internally, 
row_count it's a signed integer, the resulting number is negative:


diagnostics=# select testfunc_pg((2^31 + 5)::bigint);
 testfunc_pg
-
 -2147433648
(1 row)


Going over 2^32 wraps around:

diagnostics=# select testfunc_pg((2^32 + 5)::bigint);
 testfunc_pg
-
   5
(1 row)



Attached patch expands the row_count to 64 bit.

diagnostics=# select testfunc_pg((2^32 + 5)::bigint);
 testfunc_pg
-
  4295017296
(1 row)


I hope, I covered all the places which count the result set.


Regards,

--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
diff -ru postgresql-9.5.0.orig/src/backend/executor/spi.c postgresql-9.5.0/src/backend/executor/spi.c
--- postgresql-9.5.0.orig/src/backend/executor/spi.c	2016-01-04 22:29:34.0 +0100
+++ postgresql-9.5.0/src/backend/executor/spi.c	2016-01-27 01:30:06.099132294 +0100
@@ -36,7 +36,7 @@
 #include "utils/typcache.h"
 
 
-uint32		SPI_processed = 0;
+uint64		SPI_processed = 0;
 Oid			SPI_lastoid = InvalidOid;
 SPITupleTable *SPI_tuptable = NULL;
 int			SPI_result;
@@ -1994,7 +1994,7 @@
   bool read_only, bool fire_triggers, long tcount)
 {
 	int			my_res = 0;
-	uint32		my_processed = 0;
+	uint64		my_processed = 0;
 	Oid			my_lastoid = InvalidOid;
 	SPITupleTable *my_tuptable = NULL;
 	int			res = 0;
@@ -2562,7 +2562,7 @@
 static bool
 _SPI_checktuples(void)
 {
-	uint32		processed = _SPI_current->processed;
+	uint64		processed = _SPI_current->processed;
 	SPITupleTable *tuptable = _SPI_current->tuptable;
 	bool		failed = false;
 
diff -ru postgresql-9.5.0.orig/src/backend/tcop/pquery.c postgresql-9.5.0/src/backend/tcop/pquery.c
--- postgresql-9.5.0.orig/src/backend/tcop/pquery.c	2016-01-04 22:29:34.0 +0100
+++ postgresql-9.5.0/src/backend/tcop/pquery.c	2016-01-30 12:11:56.573841810 +0100
@@ -195,7 +195,7 @@
 		{
 			case CMD_SELECT:
 snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
-		 "SELECT %u", queryDesc->estate->es_processed);
+		 "SELECT %lu", queryDesc->estate->es_processed);
 break;
 			case CMD_INSERT:
 if (queryDesc->estate->es_processed == 1)
@@ -203,15 +203,15 @@
 else
 	lastOid = InvalidOid;
 snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
-   "INSERT %u %u", lastOid, queryDesc->estate->es_processed);
+   "INSERT %u %lu", lastOid, queryDesc->estate->es_processed);
 break;
 			case CMD_UPDATE:
 snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
-		 "UPDATE %u", queryDesc->estate->es_processed);
+		 "UPDATE %lu", queryDesc->estate->es_processed);
 break;
 			case CMD_DELETE:
 snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
-		 "DELETE %u", queryDesc->estate->es_processed);
+		 "DELETE %lu", queryDesc->estate->es_processed);
 break;
 			default:
 strcpy(completionTag, "???");
@@ -892,7 +892,7 @@
 {
 	QueryDesc  *queryDesc;
 	ScanDirection direction;
-	uint32		nprocessed;
+	uint64		nprocessed;
 
 	/*
 	 * NB: queryDesc will be NULL if we are fetching from a held cursor or a
diff -ru postgresql-9.5.0.orig/src/include/executor/spi.h postgresql-9.5.0/src/include/executor/spi.h
--- postgresql-9.5.0.orig/src/include/executor/spi.h	2016-01-04 22:29:34.0 +0100
+++ postgresql-9.5.0/src/include/executor/spi.h	2016-01-27 01:34:46.388245129 +0100
@@ -59,7 +59,7 @@
 #define SPI_OK_UPDATE_RETURNING 13
 #define SPI_OK_REWRITTEN		14
 
-extern PGDLLIMPORT uint32 SPI_processed;
+extern PGDLLIMPORT uint64 SPI_processed;
 extern PGDLLIMPORT Oid SPI_lastoid;
 extern PGDLLIMPORT SPITupleTable *SPI_tuptable;
 extern PGDLLIMPORT int SPI_result;
diff -ru postgresql-9.5.0.orig/src/include/executor/spi_priv.h postgresql-9.5.0/src/include/executor/spi_priv.h
--- postgresql-9.5.0.orig/src/include/executor/spi_priv.h	2016-01-04 22:29:34.0 +0100
+++ postgresql-9.5.0/src/include/executor/spi_priv.h	2016-01-27 01:34:55.220056918 +0100
@@ -21,7 +21,7 @@
 typedef struct
 {
 	/* current results */
-	uint32		processed;		/* by Executor */
+	uint64		processed;		/* by Executor */
 	Oid			lastoid;
 	SPITupleTable *tuptable;	/* tuptable currently being built */
 
diff -ru postgresql-9.5.0.orig/src/include/nodes/execnodes.h postgresql-9.5.0/src/include/nodes/execnodes.h
--- postgresql-9.5.0.orig/src/include/nodes/execnodes.h	2016-01-04 22:29:34.0 +0100
+++ postgresql-9.5.0/src/include/nodes/execnodes.h	2016-01-27 01:32:04.711625720 +0100
@@ -387,7 +387,7 @@
 
 	List	   *es_rowMarks;	/* List of ExecRowMarks */
 
-	uint32		es_processed;	/* # of tuples processed */
+	uint64		es_processed;	/* # of tuples

Re: [HACKERS] Spurious standby query cancellations

2016-02-01 Thread Alvaro Herrera

Simon Riggs wrote:

> This looks good to me, apart from some WhitespaceCrime.
> 
> Header split applied, will test and apply the main patch this week.

Since the patch already appears to have a committer's attention, it's
okay to move it to the next commitfest; if Simon happens to commit ahead
of time, that would be a great outcome too.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-02-01 Thread Andres Freund

On 2016-02-01 13:06:57 +0300, Alexander Korotkov wrote:
> On Mon, Feb 1, 2016 at 11:34 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
> >> ClientBasePatch
> >> 11974419382
> >> 8125923126395
> >> 3231393151
> >> 64387339496830
> >> 128306412350610
> >>
> >> Shared Buffer= 512MB
> >> max_connections=150
> >> Scale Factor=300
> >>
> >> ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
> >>
> >> ClientBasePatch
> >> 11716916454
> >> 8108547105559
> >> 32241619262818
> >> 64206868233606
> >> 128137084217013

So, there's a small regression on low client counts. That's worth
addressing.

> Attached patch is rebased and have better comments.
> Also, there is one comment which survive since original version by Andres.
> 
> /* Add exponential backoff? Should seldomly be contended tho. */
> 
> 
> Andres, did you mean we should twice the delay with each unsuccessful try
> to lock?

Spinning on a lock as fast as possible leads to rapid cacheline bouncing
without anybody making progress. See s_lock() in s_lock.c.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-02-01 Thread Andres Freund

On 2016-02-01 22:35:06 +0100, Alvaro Herrera wrote:
> I know Andres is already pretty busy with the checkpoint flush patch and
> I very much doubt he will be able to give this patch a lot of attention
> in the short term.  Moving to next CF.

Yea, there's no realistic chance I'll be able to take care of this in
the next couple days.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing

2016-02-01 Thread Alvaro Herrera

This patch got its fair share of reviewer attention this commitfest.
Moving to the next one.  Andres, if you want to commit ahead of time
you're of course encouraged to do so.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing Functionally Dependent GROUP BY Columns

2016-02-01 Thread Alvaro Herrera

This patch got a good share of review, so it's fair to move it to the
next commitfest.  I marked it as ready-for-committer there which seems
to be the right status.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Add links to commit fests to patch summary page

2016-02-01 Thread Jim Nasby

It would be nice if the patch summary page (ie, [1]) had links to the 
relevant entry in that CF. The specific need I see is if you look up a 
patch in the current CF and it's been moved to the next CF you have to 
manually go to that CF and search for the patch.


[1] https://commitfest.postgresql.org/9/353/
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-01 Thread Robert Haas

On Mon, Feb 1, 2016 at 8:27 AM, Ashutosh Bapat
 wrote:
> Here are patches rebased on recent commit
> cc592c48c58d9c1920f8e2063756dcbcce79e4dd. Renamed original deparseSelectSql
> as deparseSelectSqlForBaseRel and added deparseSelectSqlForJoinRel to
> construct SELECT and FROM clauses for base and join relations.
>
> pg_fdw_core_v5.patch GetUserMappingById changes
> pg_fdw_join_v5.patch: postgres_fdw changes for join pushdown including
> suggestions as described below
> pg_join_pd_v5.patch: combined patch for ease of testing.
>
> The patches also have following changes along with the changes described in
> my last mail.
> 1. Revised the way targetlists are handled. For a bare base relation the
> SELECT clause is deparsed from fpinfo->attrs_used but for a base relation
> which is part of join relation, the expected targetlist is passed down to
> deparseSelectSqlForBaseRel(). This change removed 75 odd lines in
> build_tlist_to_deparse() which were very similar to
> deparseTargetListFromAttrsUsed() in the previous patch.

Nice!

> 2. Refactored postgresGetForeignJoinPaths to be more readable moving the
> code to assess safety of join pushdown into a separate function.

That looks good.  But maybe call the function foreign_join_ok() or
something like that?  is_foreign_join() isn't terrible but it sounds a
little odd to me.

The path-copying stuff in get_path_for_epq_recheck() looks a lot
better now, but you neglected to add a comment explaining why you did
it that way (e.g. "Make a shallow copy of the join path, because the
planner might free the original structure after a future add_path().
We don't need to copy the substructure, though; that won't get freed."
 I would forget about setting merge_path->materialize_inner = false;
that doesn't seem essential.  Also, I would arrange things so that if
you hit an unrecognized path type (like a custom join, or a gather)
you skip that particular path instead of erroring out.  I think this
whole function should be moved to core, and I think the argument
should be a RelOptInfo * rather than a List *.

+ * We can't know VERBOSE option is specified or not, so always add shcema

We can't know "whether" VERBOSE...
shcema -> schema

+ * the join relaiton is already considered, so that we won't waste time in

Typo.

+ * judging safety of join pushdow and adding the same paths again if found

Typo.

More when I have a bit more time to look at this...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2016-02-01 Thread Alvaro Herrera

Haribabu Kommi wrote:

> Hi, Do you have any further comments on the patch that needs to be
> taken care?

I do.  I think the jsonb functions you added should be added to one of
the json .c files instead, since they seem of general applicability.
But actually, I don't think you have ever replied to the question of why
are you using JSON in the first place; isn't a normal array suitable?

The callback stuff is not documented in check_hba() at all.  Can you
please add an explanation just above the function, so that people trying
to use it know what can the callback be used for?  Also a few lines
above the callback itself would be good.  Also, the third argument of
check_hba() is a translatable message so you should enclose it in _() so
that it is picked up for translation.  The "skipped"/"matched" words
(and maybe others?) need to be marked similarly.

That "Failed" in the errmsg in pg_hba_lookup() should be lowercase.

Moving to next CF.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2016-02-01 Thread Alvaro Herrera

Masahiko Sawada wrote:

> Attached updated version patch.
> Please review it.

In pg_upgrade, the "page convert" functionality is there to abstract
rewrites of pages being copied; your patch is circumventing it and
AFAICS it makes the interface more complicated for no good reason.  I
think the real way to do that is to write your rewriteVisibilityMap as a
pageConverter routine.  That should reduce some duplication there.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-02-01 Thread Alvaro Herrera

Alexander Korotkov wrote:

> Attached patch is rebased and have better comments.
> Also, there is one comment which survive since original version by Andres.
> 
> /* Add exponential backoff? Should seldomly be contended tho. */
> 
> 
> Andres, did you mean we should twice the delay with each unsuccessful try
> to lock?

This is probably a tough patch to review; trying to break it with low
number of shared buffers and high concurrency might be an interesting
exercise.

I know Andres is already pretty busy with the checkpoint flush patch and
I very much doubt he will be able to give this patch a lot of attention
in the short term.  Moving to next CF.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] FETCH limited by bytes.

2016-02-01 Thread Alvaro Herrera

Corey Huinker wrote:

> Enjoy.

Didn't actually look into the patch but looks like there's progress
here and this might be heading for commit.  Moving to next one.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapshot too old, configured by time

2016-02-01 Thread Alvaro Herrera

Kevin Grittner wrote:

> > There has been a review but no replies for more than 1 month. Returned
> > with feedback?
> 
> I do intend to post another version of the patch to tweak the
> calculations again, after I can get a patch in to expand the
> testing capabilities to allow an acceptable way to test the patch
> -- so I put it into the next CF instead.

Two months passed since this, and no activity.  I'm closing this as
returned-with-feedback now; you're of course free to resubmit to
2016-03.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: add 'waiting for replication' to pg_stat_activity.state

2016-02-01 Thread Alvaro Herrera

Thomas Munro wrote:

> Thinking of other patches in flight, I think I'd want the proposed
> N-sync standbys feature to be able to explain in more detail what it's
> waiting for (and likewise my causal reads proposal which does that via
> the process title), though I realise that the details of how/where to
> do that are changing in the "replace pg_stat_activity.waiting" thread
> which this patch is waiting on.

Right, interesting thought.  I think that for the time being we should
close this one as returned with feedback, since there's no point in
keeping it open in commitfest.  I encourage the author (and everyone
else) to look at the other patch and help there, which is going to
ultimately achieve the outcome desired with this patch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] "using previous checkpoint record at" maybe not the greatest idea?

2016-02-01 Thread Andres Freund

Hi,

currently if, when not in standby mode, we can't read a checkpoint
record, we automatically fall back to the previous checkpoint, and start
replay from there.

Doing so without user intervention doesn't actually seem like a good
idea. While not super likely, it's entirely possible that doing so can
wreck a cluster, that'd otherwise easily recoverable. Imagine e.g. a
tablespace being dropped - going back to the previous checkpoint very
well could lead to replay not finishing, as the directory to create
files in doesn't even exist.

As there's, afaics, really no "legitimate" reasons for needing to go
back to the previous checkpoint I don't think we should do so in an
automated fashion.

All the cases where I could find logs containing "using previous
checkpoint record at" were when something else had already gone pretty
badly wrong. Now that obviously doesn't have a very large significance,
because in the situations where it "just worked" are unlikely to be
reported...

Am I missing a reason for doing this by default?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: Implement failover on libpq connect level.

2016-02-01 Thread Alvaro Herrera

Victor Wagner wrote:
> On Fri, 22 Jan 2016 16:36:15 -0300
> Alvaro Herrera  wrote:
> 
> > You're editing the expected file for the libpq-regress thingy, but you
> > haven't added any new lines to test the new capability.  I think it'd
> > be good to add some there.  (I already said this earlier in the
> > thread; is there any reason you ignored it the first time?)
> 
> I seriously doubt that this program can be used to test new
> capabilities.
> 
> All it does, it calls PQconninfoParse and than examines some fields of
> PGconn structure.

Ah, you're right, I didn't remember that.

> If I add some new uris, than only thing I can test is that comma is
> properly copied from the URI to this field. And may be that some syntax
> errors are properly detected.

Yeah, we should do that.

> So, I think that new functionality need other approach for testing.
> There should be test of real connection to real temporary cluster.
> Probably, based on Perl TAP framework which is actively used in the
> Postgres recently.

Yes, agreed.  So please have a look at that one and share your opinion
about it.  It'd be useful.

Meanwhile I'm moving the patch to the next commitfest.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

2016-02-01 Thread Alvaro Herrera

David Steele wrote:
> On 1/11/16 6:50 PM, Alvaro Herrera wrote:
> >David Steele wrote:
> >>The patch creates a new counter to separate the log filtering from the
> >>authentication functionality.  This makes it possible to get the same
> >>filtering in other parts of the code (or extensions) without abusing the
> >>ClientAuthInProgress variable or using the log hook.
> >
> >Hmm, okay, but this would need a large blinking comment explaining that
> >the calling code have better set a PG_TRY block when incrementing, so
> >that on errors it resets to zero again.  Or maybe some handling in
> >AbortOutOfAnyTransaction/AbortCurrentTransaction.  or both.
> 
> In the case of pgaudit only the ereport call is wrapped in the
> LimitClientLogOutput() calls which I thought was safe - am I wrong about
> that?

Yeah, errstart itself could fail.  It's not common but it does happen.

> 1) Unless pgaudit is committed there wouldn't be any code calling the
> errhidefromclient() function and that seemed like a bad plan.

Well, you could add a test module to ensure it continues to work.

> 2) There would be two different ways to suppress client messages but I was
> hoping to only have one.

I think they are two different things actually.

I'm closing this as returned with feedback.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add links to commit fests to patch summary page

2016-02-01 Thread Alvaro Herrera

Jim Nasby wrote:
> It would be nice if the patch summary page (ie, [1]) had links to the
> relevant entry in that CF. The specific need I see is if you look up a patch
> in the current CF and it's been moved to the next CF you have to manually go
> to that CF and search for the patch.

Agreed, I could use that.  In the "status" row, each commitfest entry
(the "2015-11" text) could be a link to that patch in that commitfest.

(You can actually construct the URL easily just by changing the
commitfest ID, which is the first number in the URL; for example 2016-01
is /8/).

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Access method extendability

2016-02-01 Thread Jim Nasby

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, failed
Spec compliant:   not tested
Documentation:not tested

There are currently no docs or unit tests. I suspect this patch is still WIP?

create-am.5.patch:
General notes:
Needs catversion bump.

IndexQualInfo and GenericCosts have been moved to src/include/utils/selfuncs.h.

METHOD becomes an unreserved keyword.

generic-xlog.5.patch:
generic_xlog.c: At least needs a bunch of explanatory comments, if not info in 
a README. Since I don't really understand what the design here is my review is 
just cursory.

static memoryMove() - seems like an overly-generic function name to me...

writeCopyFlagment(), writeMoveFlagment(): s/Fl/Fr/?

bloom-control.5:
README:
+ CREATE INDEX bloomidx ON tbloom(i1,i2,i3) 
+WITH (length=5, col1=2, col2=2, col3=4);
+ 
+ Here, we create bloom index with signature length 80 bits and attributes
+ i1, i2  mapped to 2 bits, attribute i3 - to 4 bits.

It's not clear to me where 80 bits is coming from...
bloom.h:
+ #define BITBYTE   (8)
ISTR seeing this defined somewhere in the Postgres headers; dunno if it's worth 
using that definition instead.

Testing:
I ran the SELECT INTO from the README, as well as CREATE INDEX bloomidx. I then 
ran

insert into tbloom select
(generate_series(1,1000)*random())::int as i1,
(generate_series(1,1000)*random())::int as i2,
(generate_series(1,1000)*random())::int as i3,
(generate_series(1,1000)*random())::int as i4,
(generate_series(1,1000)*random())::int as i5,
(generate_series(1,1000)*random())::int as i6,
(generate_series(1,1000)*random())::int as i7,
(generate_series(1,1000)*random())::int as i8,
(generate_series(1,1000)*random())::int as i9,
(generate_series(1,1000)*random())::int as i10,
(generate_series(1,1000)*random())::int as i11,
(generate_series(1,1000)*random())::int as i12,
(generate_series(1,1000)*random())::int as i13
 from generate_series(1,999);

and kill -9'd the backend. After restart I did VACUUM VERBOSE without issue. I 
ran the INSERT INTO, kill -9 and VACUUM VERBOSE sequence again. This time I got 
an assert:

TRAP: FailedAssertion("!(((bool) (((const void*)((ItemPointer) left) != 
((void*)0)) && (((ItemPointer) left)->ip_posid != 0", File: "vacuumlazy.c", 
Line: 1823)

That line is

lblk = ItemPointerGetBlockNumber((ItemPointer) left);

which does

AssertMacro(ItemPointerIsValid(pointer)), \

which is the assert that's failing.

I've squirreled this install away for now, in case you can't repro this failure.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] statistics for shared catalogs not updated when autovacuum is off

2016-02-01 Thread Peter Eisentraut

On 2/1/16 9:49 AM, Jim Nasby wrote:
> On 1/30/16 5:05 PM, Peter Eisentraut wrote:
>> When autovacuum is off, the statistics in pg_stat_sys_tables for shared
>> catalogs (e.g., pg_authid, pg_database) never update.  So seq_scan
>> doesn't update when you read the table, last_analyze doesn't update when
>> you run analyze, etc.
>>
>> But when you shut down the server and restart it with autovacuum on, the
> 
> What about with autovacuum still off?

nothing

>> updated statistics magically appear right away.  So seq_scan is updated
>> with the number of reads you did before the shutdown, last_analyze
>> updates with the time of the analyze you did before the shutdown, etc.
>> So the data is saved, just not propagated to the view properly.
>>
>> I can reproduce this back to 9.3, but not 9.2.  Any ideas?
> 
> ISTR that there's some code in the autovac launcher that ensures certain
> stats have been loaded from the file into memory; I'm guessing that the
> functions implementing the shared catalog views need something similar.

That's probably right.  Even with autovacuum on, the statistics for
shared catalogs do not appear as updated right away.  That is, if you
run VACUUM and then look at pg_stat_sys_tables right away, you will see
the stats for shared catalogs to be slightly out of date until the
minutely autovacuum check causes them to update.

So the problem exists in general, but the autovacuum launcher papers
over it every minute.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-01 Thread David Steele

[I pulled this back in from the other thread that got started.  This
happened because my email to pgsql-hackers bounced but was also sent to
Alvaro who replied to that message not the later one that went through
to hackers.]

On 2/1/16 3:59 PM, Alvaro Herrera wrote:
> Anyway, what would *you* do with any additional time? It's reviewers
> that need to chime in when a patch is in needs-review state. You
> already did your part by submitting. Anyway I think the tests
> here are massive and the code is not; perhaps people get the mistaken
> impression that this is a huge amount of code which scares them.
> Perhaps you could split it up in (1) code and (2) tests, which
> wouldn't achieve any technical benefit but would offer some
> psychological comfort to potential reviewers.

Good suggestion, so that's exactly what I've done.  Attached are a set
of patches that apply cleanly against 1d0c3b3.

* 01: Add errhidefromclient() macro for ereport

This is the same patch that I submitted here:

https://commitfest.postgresql.org/9/431/

but since pgaudit requires it I added it to this set for convenience.

* 02: The pgaudit code

And make file, etc.  Everything needed to get it compiled and installed
as an extension.

* 03: Docs and regression tests

To run regression tests this patch will need to be applied.  They have
been separated to make the code patch not look so large and hopefully
encourage reviews.

-- 
-David
da...@pgmasters.net
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 9005b26..c53ef95 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -1091,6 +1091,23 @@ errhidecontext(bool hide_ctx)
return 0;   /* return value does 
not matter */
 }
 
+/*
+ * errhidefromclient --- optionally suppress output of message to client
+ *
+ * Only log levels below ERROR can be hidden from the client.
+ */
+int
+errhidefromclient(bool hide_from_client)
+{
+   ErrorData  *edata = [errordata_stack_depth];
+
+   /* we don't bother incrementing recursion_depth */
+   CHECK_STACK_DEPTH();
+
+   edata->hide_from_client = hide_from_client;
+
+   return 0;   /* return value does 
not matter */
+}
 
 /*
  * errfunction --- add reporting function name to the current error
@@ -1467,7 +1484,7 @@ EmitErrorReport(void)
send_message_to_server_log(edata);
 
/* Send to client, if enabled */
-   if (edata->output_to_client)
+   if (edata->output_to_client && !edata->hide_from_client)
send_message_to_frontend(edata);
 
MemoryContextSwitchTo(oldcontext);
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 326896f..14b87b7 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -179,6 +179,7 @@ extern int  errcontext_msg(const char *fmt,...) 
pg_attribute_printf(1, 2);
 
 extern int errhidestmt(bool hide_stmt);
 extern int errhidecontext(bool hide_ctx);
+extern int errhidefromclient(bool hide_from_client);
 
 extern int errfunction(const char *funcname);
 extern int errposition(int cursorpos);
@@ -343,6 +344,7 @@ typedef struct ErrorData
boolshow_funcname;  /* true to force funcname inclusion */
boolhide_stmt;  /* true to prevent STATEMENT: 
inclusion */
boolhide_ctx;   /* true to prevent CONTEXT: 
inclusion */
+   boolhide_from_client;   /* true to prevent client 
output */
const char *filename;   /* __FILE__ of ereport() call */
int lineno; /* __LINE__ of 
ereport() call */
const char *funcname;   /* __func__ of ereport() call */
diff --git a/contrib/Makefile b/contrib/Makefile
index bd251f6..0046610 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -34,6 +34,7 @@ SUBDIRS = \
pg_standby  \
pg_stat_statements \
pg_trgm \
+   pgaudit \
pgcrypto\
pgrowlocks  \
pgstattuple \
diff --git a/contrib/pgaudit/.gitignore b/contrib/pgaudit/.gitignore
new file mode 100644
index 000..e8d2612
--- /dev/null
+++ b/contrib/pgaudit/.gitignore
@@ -0,0 +1,8 @@
+log/
+results/
+tmp_check/
+regression.diffs
+regression.out
+*.o
+*.so
+.vagrant
diff --git a/contrib/pgaudit/LICENSE b/contrib/pgaudit/LICENSE
new file mode 100644
index 000..998f814
--- /dev/null
+++ b/contrib/pgaudit/LICENSE
@@ -0,0 +1,4 @@
+This code is released under the PostgreSQL licence, as given at
+http://www.postgresql.org/about/licence/
+
+Copyright is novated to the PostgreSQL Global Development Group.
diff --git a/contrib/pgaudit/Makefile b/contrib/pgaudit/Makefile
new file mode 100644
index 000..b3a488b
--- /dev/null
+++ b/contrib/pgaudit/Makefile
@@ -0,0 +1,22 @@
+#

Re: [HACKERS] Freeze avoidance of very large table.

2016-02-01 Thread Masahiko Sawada

On Tue, Feb 2, 2016 at 10:15 AM, Jim Nasby  wrote:
> On 2/1/16 4:59 PM, Alvaro Herrera wrote:
>>
>> Masahiko Sawada wrote:
>>
>>> Attached updated version patch.
>>> Please review it.
>>
>>
>> In pg_upgrade, the "page convert" functionality is there to abstract
>> rewrites of pages being copied; your patch is circumventing it and
>> AFAICS it makes the interface more complicated for no good reason.  I
>> think the real way to do that is to write your rewriteVisibilityMap as a
>> pageConverter routine.  That should reduce some duplication there.
>

This means that user always have to set pageConverter plug-in when upgrading?
I was thinking that this conversion is required for all user who wants
to upgrade to 9.6, so we should support it in core, not as a plug-in.

>
> IIRC this is about the third problem that's been found with pg_upgrade in
> this patch. That concerns me given the potential for disaster if freeze bits
> are set incorrectly.

Yeah, I tend to have diagnostic tool for visibilitymap, and to exactly
compare VM between old one and new one after upgrading postgres
server.
I've implemented a such tool that is in my github repository[1].
I'm thinking to add this tool into core(e.g., pg_upgrade directory,
not contrib module) as testing function.

[1] https://github.com/MasahikoSawada/pg_visibilitymap

Regards,

--
Masahiko Sawada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] statistics for shared catalogs not updated when autovacuum is off

2016-02-01 Thread Jim Nasby


On 2/1/16 7:20 PM, Peter Eisentraut wrote:

That's probably right.  Even with autovacuum on, the statistics for
shared catalogs do not appear as updated right away.  That is, if you
run VACUUM and then look at pg_stat_sys_tables right away, you will see
the stats for shared catalogs to be slightly out of date until the
minutely autovacuum check causes them to update.

So the problem exists in general, but the autovacuum launcher papers
over it every minute.


I suspect the issue is in backend_read_statsfile(). Presumably the if 
just needs a call to AutoVacuumingActive() added:



/*
 * Autovacuum launcher wants stats about all databases, but a shallow 
read
 * is sufficient.
 */
if (IsAutoVacuumLauncherProcess())
pgStatDBHash = pgstat_read_statsfiles(InvalidOid, false, false);
else
pgStatDBHash = pgstat_read_statsfiles(MyDatabaseId, false, 
true);


The interesting thing is that we always start the launcher one time, to 
protect against wraparound, but apparently that path doesn't call 
anything that calls backend_read_statsfile() (which is static).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising the checkpoint_timeout limit

2016-02-01 Thread Euler Taveira

On 01-02-2016 21:13, Andres Freund wrote:
> is there any reason for the rather arbitrary and low checkpoint_timeout
> limit?
> 
AFAICS the only reason is to run recover quickly. This setting is the
same value since day 1.

> A high timeout has the advantage that the total amount of full page
> writes reduces and, especially if the whole system fits into s_b, that
> the total amount of writes to disk is drastically reduced.
> 
This statement could be added to documentation. Using this use case, I
want to propose raising the c_t upper limit to one day or even a week.

> I'm not sure what'd actually be a good upper limit. I'd be inclined to
> even go to as high as a week or so. A lot of our settings have
> upper/lower limits that aren't a good idea in general.
> 
A week is an insane default value. However, I'm fine with 10 until 20
minutes (those are the most common values I use for c_t).


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Ununsed member in printQueryOpt

2016-02-01 Thread Dickson S. Guedes

I found the following code in src/bin/psql/print.h:155 (master 1d0c3b3f8a)


boolquote;  /* quote all values as much as possible */


That code was there since:

commit a45195a191eec367a4f305bb71ab541d17a3b9f9
Author: Bruce Momjian 
Date:   Thu Nov 4 21:56:02 1999 +

Major psql overhaul by Peter Eisentraut.



But I didn't found any other references to that "quote" and, after
removing that line,
the code still compiles without any error/warning.

Did I overlook something?

Best regards,
-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://github.com/guedes - http://guedesoft.net
http://www.postgresql.org.br


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-01 Thread David Steele

On 2/1/16 4:19 PM, Tom Lane wrote:
> Alvaro Herrera  writes:
>>   Anyway I think the tests here are
>> massive and the code is not; perhaps people get the mistaken impression
>> that this is a huge amount of code which scares them.  Perhaps you could
>> split it up in (1) code and (2) tests, which wouldn't achieve any
>> technical benefit but would offer some psychological comfort to
>> potential reviewers.  You know it's all psychology in these parts.
> 
> Perhaps the tests could be made less bulky.  We do not need massive
> permanent regression tests for a single feature, IMO.

I'd certainly like to but pgaudit uses a lot of different techniques to
log various commands and there are a number of GUCs.  Each test provides
coverage for a different code path.

I'm sure they could be reorganized and tightened up a but I don't think
by a whole lot.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

[HACKERS] Raising the checkpoint_timeout limit

2016-02-01 Thread Andres Freund

Hi,

is there any reason for the rather arbitrary and low checkpoint_timeout
limit? Obviously it's not not appropriate to have a 10h timeout on all
that many systems. But if the reaction to a server crashing is going to
be a failover to one of several standbys anyway, setting the timeout to
a high value, can make sense.

A high timeout has the advantage that the total amount of full page
writes reduces and, especially if the whole system fits into s_b, that
the total amount of writes to disk is drastically reduced.

I'm not sure what'd actually be a good upper limit. I'd be inclined to
even go to as high as a week or so. A lot of our settings have
upper/lower limits that aren't a good idea in general.

I'm also wondering if it'd not make sense to raise the default timeout
to 15min or so. The upper ceiling for that really is recovery time, and
that has really shrunk rather drastically due to faster cpus and
architectural improvements in postgres (bgwriter, separate
checkpointer/bgwriter, restartpoints, ...).

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] Client Log Output Filtering

2016-02-01 Thread David Steele

On 2/1/16 5:25 PM, Alvaro Herrera wrote:
> David Steele wrote:

>> 2) There would be two different ways to suppress client messages but I was
>> hoping to only have one.
> 
> I think they are two different things actually.

Fair enough - that was my initial reaction as well but then I thought
the other way would be better.

> I'm closing this as returned with feedback.

I have attached a patch that adds an ereport() macro to suppress client
output for a single report call (applies cleanly on 1d0c3b3).  I'll also
move it to the next CF.

Thanks!
-- 
-David
da...@pgmasters.net
diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c
index 9005b26..c53ef95 100644
--- a/src/backend/utils/error/elog.c
+++ b/src/backend/utils/error/elog.c
@@ -1091,6 +1091,23 @@ errhidecontext(bool hide_ctx)
return 0;   /* return value does 
not matter */
 }
 
+/*
+ * errhidefromclient --- optionally suppress output of message to client
+ *
+ * Only log levels below ERROR can be hidden from the client.
+ */
+int
+errhidefromclient(bool hide_from_client)
+{
+   ErrorData  *edata = [errordata_stack_depth];
+
+   /* we don't bother incrementing recursion_depth */
+   CHECK_STACK_DEPTH();
+
+   edata->hide_from_client = hide_from_client;
+
+   return 0;   /* return value does 
not matter */
+}
 
 /*
  * errfunction --- add reporting function name to the current error
@@ -1467,7 +1484,7 @@ EmitErrorReport(void)
send_message_to_server_log(edata);
 
/* Send to client, if enabled */
-   if (edata->output_to_client)
+   if (edata->output_to_client && !edata->hide_from_client)
send_message_to_frontend(edata);
 
MemoryContextSwitchTo(oldcontext);
diff --git a/src/include/utils/elog.h b/src/include/utils/elog.h
index 326896f..14b87b7 100644
--- a/src/include/utils/elog.h
+++ b/src/include/utils/elog.h
@@ -179,6 +179,7 @@ extern int  errcontext_msg(const char *fmt,...) 
pg_attribute_printf(1, 2);
 
 extern int errhidestmt(bool hide_stmt);
 extern int errhidecontext(bool hide_ctx);
+extern int errhidefromclient(bool hide_from_client);
 
 extern int errfunction(const char *funcname);
 extern int errposition(int cursorpos);
@@ -343,6 +344,7 @@ typedef struct ErrorData
boolshow_funcname;  /* true to force funcname inclusion */
boolhide_stmt;  /* true to prevent STATEMENT: 
inclusion */
boolhide_ctx;   /* true to prevent CONTEXT: 
inclusion */
+   boolhide_from_client;   /* true to prevent client 
output */
const char *filename;   /* __FILE__ of ereport() call */
int lineno; /* __LINE__ of 
ereport() call */
const char *funcname;   /* __func__ of ereport() call */


signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Trigger.sgml

2016-02-01 Thread Tatsuo Ishii

> On 1/28/16 8:02 PM, Tatsuo Ishii wrote:
>> I am working as a volunteer to translate docs to Japanese. I have been
>> having hard time to parse the following sentence in
>> doc/src/sgml/trigger.sgml.
>>
>> 
>> The possibility of surprising outcomes should be considered when there
>> are both BEFORE INSERT and
>> BEFORE UPDATE row-level triggers that
>> both affect a row being inserted/updated (this can still be
>> problematic if the modifications are more or less equivalent if
>> they're not also idempotent).
>> 
>>
>> Especially I don't understand this part:
>>
>>(this can still be problematic if the modifications are more or less
>>equivalent if they're not also idempotent).
>>
>> It would be great if someone could enligntend me.
> 
> I believe the idea here is that thanks to UPSERT you can now get very
> strange behavior if you have BEFORE triggers that aren't
> idempotent. IE:
> 
> CREATE TABLE test(
>   a int PRIMARY KEY
> );
> 
> BEFORE INSERT a = a - 1
> BEFORE UPDATE a = a + 1
> 
> INSERT (1) -- Results in 0
> INSERT (2) -- Results in 1
> 
> Now if you try to UPSERT (1), the before insert will give you a=0,
> which conflicts. So then you end up with an UPDATE, which gives you
> a=1 again. Things are even worse when you try to UPSERT (2), because
> the insert conflicts but then you try to update a row that doesn't
> exist (a=2).
> 
> Obviously this is a ridiculous example, but hopefully it shows the
> problem.

Thank you for the info!

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] checkpoints after database start/immediate checkpoints

2016-02-01 Thread Andres Freund

Hi,

Right now it takes checkpoint_timeout till we start a checkpoint, and
checkpoint_timeout + checkpoint_timeout * checkpoint_completion_target
till we complete the first checkpoint after shutdown/forced checkpoints.

That means a) that such checkpoint will often be bigger/more heavyweight
than the following ones, not what you want after a restart/create
database/basebackup... b) reliable benchmarks measuring steady state
have to run for even longer.

There's some logic to that behaviour though: With a target of 0 it'd be
weird to start a checkpoint directly after startup, and even otherwise
there'll be less to write at the beginning.

As an example:
2016-02-02 01:32:58.053 CET 21361 LOG:  checkpoint complete: wrote 186041 
buffers (71.0%); 1 transaction log file(s) added, 0 removed, 0 recycled; 
sort=0.071 s, write=9.504 s, sync=0.000 s, total=9.532 s; sync files=0, 
longest=0.000 s, average=0.000 s; distance=1460260 kB, estimate=1460260 kB
2016-02-02 01:32:58.053 CET 21361 LOG:  checkpoint starting: time
2016-02-02 01:33:08.061 CET 21361 LOG:  checkpoint complete: wrote 127249 
buffers (48.5%); 0 transaction log file(s) added, 0 removed, 89 recycled; 
sort=0.045 s, write=9.987 s, sync=0.000 s, total=10.007 s; sync files=0, 
longest=0.000 s, average=0.000 s; distance=1187558 kB, estimate=1432989 kB
2016-02-02 01:33:58.216 CET 21361 LOG:  checkpoint complete: wrote 124649 
buffers (47.5%); 0 transaction log file(s) added, 0 removed, 69 recycled; 
sort=0.048 s, write=10.160 s, sync=0.000 s, total=10.176 s; sync files=0, 
longest=0.000 s, average=0.000 s; distance=1135086 kB, estimate=1337776 kB
2016-02-02 01:33:58.216 CET 21361 LOG:  checkpoint starting: time
2016-02-02 01:34:08.060 CET 21361 LOG:  checkpoint complete: wrote 123298 
buffers (47.0%); 0 transaction log file(s) added, 0 removed, 69 recycled; 
sort=0.049 s, write=9.838 s, sync=0.000 s, total=9.843 s; sync files=0, 
longest=0.000 s, average=0.000 s; distance=1184077 kB, estimate=1322406 kB
2016-02-02 01:34:08.060 CET 21361 LOG:  checkpoint starting: time
2016-02-02 01:34:18.052 CET 21361 LOG:  checkpoint complete: wrote 118165 
buffers (45.1%); 0 transaction log file(s) added, 0 removed, 72 recycled; 
sort=0.046 s, write=9.987 s, sync=0.000 s, total=9.992 s; sync files=0, 
longest=0.000 s, average=0.000 s; distance=1198018 kB, estimate=1309967 kB
2016-02-02 01:34:18.053 CET 21361 LOG:  checkpoint starting: time
2016-02-02 01:34:28.089 CET 21361 LOG:  checkpoint complete: wrote 120814 
buffers (46.1%); 0 transaction log file(s) added, 0 removed, 73 recycled; 
sort=0.051 s, write=10.020 s, sync=0.000 s, total=10.036 s; sync files=0, 
longest=0.000 s, average=0.000 s; distance=1203691 kB, estimate=1299339 kB
2016-02-02 01:34:28.090 CET 21361 LOG:  checkpoint starting: time
2016-02-02 01:34:39.182 CET 21361 LOG:  checkpoint complete: wrote 110411 
buffers (42.1%); 0 transaction log file(s) added, 0 removed, 74 recycled; 
sort=0.047 s, write=9.908 s, sync=0.000 s, total=11.092 s; sync files=0, 
longest=0.000 s, average=0.000 s; distance=1073612 kB, estimate=1276767 kB

We wrote roughly 1/3 more wal/buffers during the first checkpoint than
following ones (And yes, the above is with an impossibly low timeout,
but that doesn't change anything). You can make that much more extreme
with larget shared buffers and larger timeouts.


I wonder if this essentially point at checkpoint_timeout being wrongly
defined: Currently it means we'll try to finish a checkpoint
(1-checkpoint_completion_target) * timeout before the next one - but
perhaps it should instead be that we start checkpoint_timeout * _target
before the next timeout? Afaics that'd work more graceful in the face of
restarts and forced checkpoints.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "using previous checkpoint record at" maybe not the greatest idea?

2016-02-01 Thread Andres Freund

On 2016-02-01 17:29:39 -0700, David G. Johnston wrote:
> Learning by reading here...
> 
> http://www.postgresql.org/docs/current/static/wal-internals.html
> """
> After a checkpoint has been made and the log flushed, the checkpoint's
> position is saved in the file pg_control. Therefore, at the start of
> recovery, the server first reads pg_control and then the checkpoint record;
> then it performs the REDO operation by scanning forward from the log
> position indicated in the checkpoint record. Because the entire content of
> data pages is saved in the log on the first page modification after a
> checkpoint (assuming full_page_writes is not disabled), all pages changed
> since the checkpoint will be restored to a consistent state.

> The above comment appears out-of-date if this post describes what
> presently happens.

Where do you see a conflict with what I wrote about? We store both the
last and the previous checkpoint's location in pg_control. Or are you
talking about:

> To deal with the case where pg_control is corrupt, we should support the
> possibility of scanning existing log segments in reverse order — newest to
> oldest — in order to find the latest checkpoint. This has not been
> implemented yet. pg_control is small enough (less than one disk page) that
> it is not subject to partial-write problems, and as of this writing there
> have been no reports of database failures due solely to the inability to
> read pg_control itself. So while it is theoretically a weak spot,
> pg_control does not seem to be a problem in practice.

if so, no, that's not a out-of-date, as we simply store two checkpoint
locations:
 $ pg_controldata /srv/dev/pgdev-dev/|grep 'checkpoint location'
Latest checkpoint location:   B3/2A730028
Prior checkpoint location:B3/2A72FFA0

> Also, I was under the impression that tablespace commands resulted in
> checkpoints so that the state of the file system could be presumed
> current...

That actually doesn't really make it any better - it forces the *latest*
checkpoint, but if we can't read that, we'll start with the previous
one...

> I don't know enough internals but its seems like we'd need to distinguish
> between an interrupted checkpoint (pull the plug during checkpoint) and one
> that supposedly completed without interruption but then was somehow
> corrupted (solar flares).  The former seem legitimate for auto-skip while
> the later do not.

I don't think such a distinction is really possible (or necessary). If
pg_control is corrupted we won't even start, and if WAL is corrupted
that badly we won't finish replay...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-02-01 Thread Michael Paquier

On Tue, Feb 2, 2016 at 1:08 AM, Andres Freund  wrote:
> On 2016-02-01 16:49:46 +0100, Alvaro Herrera wrote:
>> Yeah.  On 9.4 there are already some conflicts, and I'm sure there will
>> be more in almost each branch.  Does anyone want to volunteer for
>> producing per-branch versions?
>
>> The next minor release is to be tagged next week and it'd be good to put
>> this fix there.
>
> I don't think this is going to be ready for that. The risk of hurrying
> this through seems higher than the loss risk at this point.

Agreed. And there is no actual risk of data loss, so let's not hurry
and be sure that the right think is pushed.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising the checkpoint_timeout limit

2016-02-01 Thread Jim Nasby


On 2/1/16 6:13 PM, Andres Freund wrote:

I'm not sure what'd actually be a good upper limit. I'd be inclined to
even go to as high as a week or so. A lot of our settings have
upper/lower limits that aren't a good idea in general.


The only reason I can see for the 1 hour limit is to try and prevent 
footguns. I think that's a valid goal, but there should be a way to 
over-ride it. And if we don't want that kind of protection then I'd say 
just yank the upper limit.



I'm also wondering if it'd not make sense to raise the default timeout
to 15min or so. The upper ceiling for that really is recovery time, and
that has really shrunk rather drastically due to faster cpus and
architectural improvements in postgres (bgwriter, separate
checkpointer/bgwriter, restartpoints, ...).


It would be interesting if someone had a large-ish 9.4 or 9.5 install 
that they could test recovery timing on. My suspicion is that as long as 
FPWs are on that you'd generally end up limited by how fast you could 
read WAL unless you exceeded the FS cache. (I'm assuming a BBU and that 
the FS and controller will do a nice job of ordering writes optimally so 
that you'll get performance similar to reads when it's time to fsync.)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Add links to commit fests to patch summary page

2016-02-01 Thread Jim Nasby


On 2/1/16 6:15 PM, Alvaro Herrera wrote:

Jim Nasby wrote:

It would be nice if the patch summary page (ie, [1]) had links to the
relevant entry in that CF. The specific need I see is if you look up a patch
in the current CF and it's been moved to the next CF you have to manually go
to that CF and search for the patch.


Agreed, I could use that.  In the "status" row, each commitfest entry
(the "2015-11" text) could be a link to that patch in that commitfest.


Yeah, what I was thinking.


(You can actually construct the URL easily just by changing the
commitfest ID, which is the first number in the URL; for example 2016-01
is /8/).


*waits for someone to comment on how surrogate keys are bad*

;P
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-02-01 Thread Daniel Verite

Pavel Stehule wrote:

> 1. maybe we can decrease name to shorter "crossview" ?? I am happy with
> crosstabview too, just crossview is correct too, and shorter

I'm in favor of keeping crosstabview. It's more explicit, only 
3 characters longer and we have tab completion anyway.

> 2. Columns used for ordering should not be displayed by default. I can live
> with current behave, but hiding ordering columns is much more practical for
> me

I can see why, but I'm concerned about a consequence:
say we have 4 columns A,B,C,D and user does \crosstabview +A:B +C:D
If B and D are excluded by default, then there's nothing
left to display inside the grid.
It doesn't feel quite right. There's something counter-intuitive
in the fact that values in the grid would disappear depending on
whether and how headers are sorted.
With the 3rd argument, we let the user decide what they want
to see.

> 3. This code is longer, so some regress tests are recommended - attached
> simple test case

I've added a few regression tests to the psql testsuite
based on your sample data. New patch with these tests
included is attached, make check passes.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 6d0cb3d..a242ec4 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -990,6 +990,123 @@ testdb=
   
 
   
+\crosstabview [
+[-|+]colV
+[:scolV]
+[-|+]colH
+[:scolH]
+[colG1[,colG2...]]
+] 
+
+
+Execute the current query buffer (like \g) and shows
+the results inside a crosstab grid.
+The output column colV
+becomes a vertical header, optionally sorted by scolV,
+and the output column colH
+becomes a horizontal header, optionally sorted by
+scolH.
+
+colG1[,colG2...]
+is the list of output columns to project into the grid.
+By default, all output columns of the query except 
+colV and
+colH
+are included in this list.
+
+
+
+All columns can be refered to by their position (starting at 1), or by
+their name. Normal case folding and quoting rules apply on column
+names. By default,
+colV corresponds to column 1
+and colH to column 2.
+A query having only one output column cannot be viewed in crosstab, and
+colH must differ from
+colV.
+
+
+
+The vertical header, displayed as the leftmost column,
+contains the set of all distinct values found in
+column colV, in the order
+of their first appearance in the query results,
+or in ascending order if a plus (+) sign precedes
+colV,
+or in descending order if it's a minus (-) sign.
+
+
+The horizontal header, displayed as the first row,
+contains the set of all distinct non-null values found in
+column colH.  They come
+by default in their order of appearance in the query results,
+or in ascending order if a plus (+) sign precedes
+colH,
+or in descending order if it's a minus (-) sign.
+
+Also, they can optionally be sorted by another column, if
+colV
+(respectively colH) is
+immediately followed by a colon and a reference to another column
+scolV
+(respectively scolH).
+
+
+
+Inside the crosstab grid,
+given a query output with N columns
+(including colV and
+colH),
+for each distinct value x of
+colH
+and each distinct value y of
+colV,
+the contents of a cell located at the intersection
+(x,y) is determined by these rules:
+
+
+
+ if there is no corresponding row in the query results such that the
+ value for colH
+ is x and the value
+ for colV
+ is y, the cell is empty.
+
+
+
+
+
+ if there is exactly one row such that the value
+ for colH
+ is x and the value
+ for colV
+ is y, then the N-2 other
+ columns or the columns listed in
+ colG1[,colG2...]
+ are displayed in the cell, separated between each other by
+ a space character if needed.
+
+ If N=2, the letter X is displayed
+ in the cell as if a virtual third column contained that character.
+
+
+
+
+
+ if there are several corresponding rows, the behavior is identical to
+ the case of one row except that the values coming from different rows
+ are stacked vertically, the different source rows being separated by
+ newline characters inside the cell.
+

Re: [HACKERS] Several problems in tab-completions for SET/RESET

2016-02-01 Thread Michael Paquier

On Mon, Feb 1, 2016 at 10:22 PM, Fujii Masao  wrote:

> Pushed. Thanks!
>

OK. And attached is the promised patch for ALTER FUNCTION.
-- 
Michael
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5f27120..3369a3d 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1370,7 +1370,7 @@ psql_completion(const char *text, int start, int end)
 	else if (Matches3("ALTER", "AGGREGATE|FUNCTION", MatchAny))
 		COMPLETE_WITH_CONST("(");
 	/* ALTER AGGREGATE,FUNCTION  (...) */
-	else if (Matches4("ALTER", "AGGREGATE|FUNCTION", MatchAny, MatchAny))
+	else if (Matches4("ALTER", "AGGREGATE", MatchAny, MatchAny))
 	{
 		if (ends_with(prev_wd, ')'))
 			COMPLETE_WITH_LIST3("OWNER TO", "RENAME TO", "SET SCHEMA");
@@ -1378,6 +1378,18 @@ psql_completion(const char *text, int start, int end)
 			COMPLETE_WITH_FUNCTION_ARG(prev2_wd);
 	}
 
+	/* ALTER FUNCTION  (...) */
+	else if (Matches4("ALTER", "FUNCTION", MatchAny, MatchAny))
+	{
+		if (ends_with(prev_wd, ')'))
+			COMPLETE_WITH_LIST4("OWNER TO", "RENAME TO", "SET", "RESET");
+		else
+			COMPLETE_WITH_FUNCTION_ARG(prev2_wd);
+	}
+	/* ALTER FUNCTION  (...) SET */
+	else if (Matches5("ALTER", "FUNCTION", MatchAny, MatchAny, "SET"))
+		COMPLETE_WITH_QUERY(Query_for_list_of_set_vars "UNION SELECT 'SCHEMA'");
+
 	/* ALTER SCHEMA  */
 	else if (Matches3("ALTER", "SCHEMA", MatchAny))
 		COMPLETE_WITH_LIST2("OWNER TO", "RENAME TO");
@@ -2705,7 +2717,9 @@ psql_completion(const char *text, int start, int end)
 
 /* SET, RESET, SHOW */
 	/* Complete with a variable name */
-	else if (TailMatches1("SET|RESET") && !TailMatches3("UPDATE", MatchAny, "SET"))
+	else if (TailMatches1("SET|RESET") &&
+			 !TailMatches3("UPDATE", MatchAny, "SET") &&
+			 !TailMatches2("ALTER", "FUNCTION"))
 		COMPLETE_WITH_QUERY(Query_for_list_of_set_vars);
 	else if (Matches1("SHOW"))
 		COMPLETE_WITH_QUERY(Query_for_list_of_show_vars);

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "using previous checkpoint record at" maybe not the greatest idea?

2016-02-01 Thread David G. Johnston

On Mon, Feb 1, 2016 at 4:58 PM, Andres Freund  wrote:

> Hi,
>
> currently if, when not in standby mode, we can't read a checkpoint
> record, we automatically fall back to the previous checkpoint, and start
> replay from there.
>
> Doing so without user intervention doesn't actually seem like a good
> idea. While not super likely, it's entirely possible that doing so can
> wreck a cluster, that'd otherwise easily recoverable. Imagine e.g. a
> tablespace being dropped - going back to the previous checkpoint very
> well could lead to replay not finishing, as the directory to create
> files in doesn't even exist.

> As there's, afaics, really no "legitimate" reasons for needing to go
> back to the previous checkpoint I don't think we should do so in an
> automated fashion.
>
> All the cases where I could find logs containing "using previous
> checkpoint record at" were when something else had already gone pretty
> badly wrong. Now that obviously doesn't have a very large significance,
> because in the situations where it "just worked" are unlikely to be
> reported...
>
> Am I missing a reason for doing this by default?
>

Learning by reading here...

http://www.postgresql.org/docs/current/static/wal-internals.html
"""
After a checkpoint has been made and the log flushed, the checkpoint's
position is saved in the file pg_control. Therefore, at the start of
recovery, the server first reads pg_control and then the checkpoint record;
then it performs the REDO operation by scanning forward from the log
position indicated in the checkpoint record. Because the entire content of
data pages is saved in the log on the first page modification after a
checkpoint (assuming full_page_writes is not disabled), all pages changed
since the checkpoint will be restored to a consistent state.

To deal with the case where pg_control is corrupt, we should support the
possibility of scanning existing log segments in reverse order — newest to
oldest — in order to find the latest checkpoint. This has not been
implemented yet. pg_control is small enough (less than one disk page) that
it is not subject to partial-write problems, and as of this writing there
have been no reports of database failures due solely to the inability to
read pg_control itself. So while it is theoretically a weak spot,
pg_control does not seem to be a problem in practice.
"""

The above comment appears out-of-date if this post describes what
presently happens.

Also, I was under the impression that tablespace commands resulted in
checkpoints so that the state of the file system could be presumed
current...

I don't know enough internals but its seems like we'd need to distinguish
between an interrupted checkpoint (pull the plug during checkpoint) and one
that supposedly completed without interruption but then was somehow
corrupted (solar flares).  The former seem legitimate for auto-skip while
the later do not.

David J.

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-02-01 Thread Andres Freund

On 2016-02-02 09:56:40 +0900, Michael Paquier wrote:
> And there is no actual risk of data loss

Huh?

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-02-01 Thread Michael Paquier

On Tue, Feb 2, 2016 at 12:49 AM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>> On Mon, Jan 25, 2016 at 6:50 PM, Tomas Vondra
>>  wrote:
>> > Seems OK to me. Thanks for the time and improvements!
>>
>> Thanks. Perhaps a committer could have a look then? I have switched
>> the patch as such in the CF app. Seeing the accumulated feedback
>> upthread that's something that should be backpatched.
>
> Yeah.  On 9.4 there are already some conflicts, and I'm sure there will
> be more in almost each branch.  Does anyone want to volunteer for
> producing per-branch versions?

I don't mind doing it once we have something fully-bloomed for master,
something that I guess is very likely to apply easily to 9.5 as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PostgreSQL Auditing

2016-02-01 Thread Curtis Ruck

So Auditing, it seems that some people want auditing (myself, David Steele,
2nd quadrant, and probably others).  I personally love postgresql, but
until it can meet my annoying compliance requirements, I can't leverage it
fully as my organization spends more time on meeting compliance, than
actually doing development and engineering.

Sadly, due to the incumbent solutions in the database arena, we are also
wasting idiotic amounts of time, money, and increasing system complexity
because we are having to use alternative solutions that provide things like
auditing.

If David's auditing patch isn't sufficient, what is?  Are we waiting on the
holy grail of auditing, which implements an entirely new logging subsystem,
and hooks so deeply into the innards of PostgreSQL its perfect?  Does this
mailing list just not care about the potential customers (and potential
financial benefits) of providing a complete database solution?  Or does the
postgresql community just want to stay a hobbyist database that never
broaches the enterprise or compliance arenas?

I've worked with many database vendors, and honestly auditing is fairly
bland, its boring, and no one really likes it except for the lawyers, and
then only when someone was actually caught doing something wrong, which
lets face it is quite infrequent given the number of databases that exist
out there.

Just because auditing isn't sexy sharding, parallel partitioning, creative
indexing (BRIN), or hundreds of thousands of transactions a second, doesn't
make it any less of a requirement to countless organizations that would
like to use postgresql, but find the audit requirement a must have.

So, in summary, what would it take to get the core PostgreSQL team to
actually let auditing patches into the next version?

P.S., do you know what sucks, having a highly performant PostGIS database
that works great, and being told to move to Oracle or SQL Server (because
they have auditing).  Even though they charge extra for Geospatial support
(seriously?) or when they don't even have geospatial support (10 years
ago).  My customer would prefer to re-engineer software designed around
PostgreSQL and pay the overpriced licenses, than not have auditing.  I
agree that their cost analysis is probably way off, even 10 years later, my
only solution would be to move to Oracle, SQL Server, a NoSQL solution, or
pay EnterpriseDB for their 2 year old version that doesn't have all the
cool/modern jsonb support.

Re: [HACKERS] Freeze avoidance of very large table.

2016-02-01 Thread Jim Nasby


On 2/1/16 4:59 PM, Alvaro Herrera wrote:

Masahiko Sawada wrote:


Attached updated version patch.
Please review it.


In pg_upgrade, the "page convert" functionality is there to abstract
rewrites of pages being copied; your patch is circumventing it and
AFAICS it makes the interface more complicated for no good reason.  I
think the real way to do that is to write your rewriteVisibilityMap as a
pageConverter routine.  That should reduce some duplication there.


IIRC this is about the third problem that's been found with pg_upgrade 
in this patch. That concerns me given the potential for disaster if 
freeze bits are set incorrectly.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "using previous checkpoint record at" maybe not the greatest idea?

2016-02-01 Thread David G. Johnston

On Mon, Feb 1, 2016 at 5:48 PM, Andres Freund  wrote:

> On 2016-02-01 17:29:39 -0700, David G. Johnston wrote:
> > Learning by reading here...
> >
> > http://www.postgresql.org/docs/current/static/wal-internals.html
> > """
> > After a checkpoint has been made and the log flushed, the checkpoint's
> > position is saved in the file pg_control. Therefore, at the start of
> > recovery, the server first reads pg_control and then the checkpoint
> record;
> > then it performs the REDO operation by scanning forward from the log
> > position indicated in the checkpoint record. Because the entire content
> of
> > data pages is saved in the log on the first page modification after a
> > checkpoint (assuming full_page_writes is not disabled), all pages changed
> > since the checkpoint will be restored to a consistent state.
>
> > The above comment appears out-of-date if this post describes what
> > presently happens.
>
> Where do you see a conflict with what I wrote about? We store both the
> last and the previous checkpoint's location in pg_control. Or are you
> talking about:
>

Mainly the following...but the word I used was "out-of-date" and not
"conflict".  The present state seems to do the above, and then some.


> > To deal with the case where pg_control is corrupt, we should support the
> > possibility of scanning existing log segments in reverse order — newest
> to
> > oldest — in order to find the latest checkpoint. This has not been
> > implemented yet. pg_control is small enough (less than one disk page)
> that
> > it is not subject to partial-write problems, and as of this writing there
> > have been no reports of database failures due solely to the inability to
> > read pg_control itself. So while it is theoretically a weak spot,
> > pg_control does not seem to be a problem in practice.
>
> if so, no, that's not a out-of-date, as we simply store two checkpoint
> locations:
>  $ pg_controldata /srv/dev/pgdev-dev/|grep 'checkpoint location'
> Latest checkpoint location:   B3/2A730028
> Prior checkpoint location:B3/2A72FFA0
>
>
The quote implies that only a single checkpoint is noted and that no
"searching" is performed - whether by scanning or by being told the
position of a previous one so that it can jump there immediately without
scanning backwards.  It isn't strictly the fact that we do not "scan"
backwards but the implications that arise in making that statement.  Maybe
this is being picky but if you cannot trust the value of "Latest checkpoint
location" then pg_control is arguably corrupt.  Corruption is not strictly
limited to "unable to be read" but does include "contains invalid data".

> Also, I was under the impression that tablespace commands resulted in
> > checkpoints so that the state of the file system could be presumed
> > current...
>
> That actually doesn't really make it any better - it forces the *latest*
> checkpoint, but if we can't read that, we'll start with the previous
> one...


> > I don't know enough internals but its seems like we'd need to distinguish
> > between an interrupted checkpoint (pull the plug during checkpoint) and
> one
> > that supposedly completed without interruption but then was somehow
> > corrupted (solar flares).  The former seem legitimate for auto-skip while
> > the later do not.
>
> I don't think such a distinction is really possible (or necessary). If
> pg_control is corrupted we won't even start, and if WAL is corrupted
> that badly we won't finish replay...
>

My takeaway from the above is that we should only record what we think is a
usable/readable/valid checkpoint location to "Latest checkpoint location"

 (LCL) and if the system is not able to use that information to perform a
successful recovery it should be allowed to die without using the value in
"Previous checkpoint location" - which becomes effectively ignored during
master recovery.

David J.

Re: [HACKERS] extend pgbench expressions with functions

2016-02-01 Thread Michael Paquier

On Mon, Feb 1, 2016 at 10:34 PM, Robert Haas  wrote:
> On Sat, Jan 30, 2016 at 7:36 AM, Michael Paquier
>  wrote:
>> On Fri, Jan 29, 2016 at 11:21 PM, Fabien COELHO  wrote:
>>> +/* overflow check (needed for INT64_MIN) */
>>> +if (lval != 0 && (*retval < 0 == lval < 0))
>>>
>>> Why not use "if (lval == INT64_MIN)" instead of this complicated condition?
>>> If it is really needed for some reason, I think that a comment could help.
>>
>> Checking for PG_INT64_MIN only would be fine as well, so let's do so.
>> I thought honestly that we had better check if the result and the left
>> argument are not of the same sign, but well.
>
> Committed and back-patched to 9.5.  Doesn't apply further back.

OK, here are patches for 9.1~9.4. The main differences are that in
9.3/9.4 int64 is used for the division operations, and in 9.2/9.1
that's int32. In the latter case pgbench blows up the same way with
that:
\set i -2147483648
\set i :i / -1
select :i;
In those patches INT32_MIN/INT64_MIN need to be explicitly set as well
at the top of pgbench.c. I thing that's fine.
-- 
Michael
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 2111f16..84b6303 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -56,6 +56,10 @@
 #ifndef INT64_MAX
 #define INT64_MAX	INT64CONST(0x7FFF)
 #endif
+#ifndef INT32_MIN
+#define INT32_MIN	(-0x7FFF-1)
+#endif
+
 
 /*
  * Multi-platform pthread implementations
@@ -1152,13 +1156,37 @@ top:
 	snprintf(res, sizeof(res), "%d", ope1 * ope2);
 else if (strcmp(argv[3], "/") == 0)
 {
+	int		operes;
+
 	if (ope2 == 0)
 	{
 		fprintf(stderr, "%s: division by zero\n", argv[0]);
 		st->ecnt++;
 		return true;
 	}
-	snprintf(res, sizeof(res), "%d", ope1 / ope2);
+	/*
+	 * INT32_MIN / -1 is problematic, since the result can't
+	 * be represented on a two's-complement machine. Some
+	 * machines produce INT32_MIN, some produce zero, some
+	 * throw an exception. We can dodge the problem by
+	 * recognizing that division by -1 is the same as
+	 * negation.
+	 */
+	if (ope2 == -1)
+	{
+		operes = -ope1;
+
+		/* overflow check (needed for INT32_MIN) */
+		if (ope1 == INT32_MIN)
+		{
+			fprintf(stderr, "integer out of range\n");
+			st->ecnt++;
+			return true;
+		}
+	}
+	else
+		operes = ope1 / ope2;
+	snprintf(res, sizeof(res), "%d", operes);
 }
 else
 {
diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 4e22695..062a32f 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -57,6 +57,10 @@
 #ifndef INT64_MAX
 #define INT64_MAX	INT64CONST(0x7FFF)
 #endif
+#ifndef INT64_MIN
+#define INT64_MIN	(-INT64CONST(0x7FFF) - 1)
+#endif
+
 
 /*
  * Multi-platform pthread implementations
@@ -1331,13 +1335,37 @@ top:
 	snprintf(res, sizeof(res), INT64_FORMAT, ope1 * ope2);
 else if (strcmp(argv[3], "/") == 0)
 {
+	int64	operes;
+
 	if (ope2 == 0)
 	{
 		fprintf(stderr, "%s: division by zero\n", argv[0]);
 		st->ecnt++;
 		return true;
 	}
-	snprintf(res, sizeof(res), INT64_FORMAT, ope1 / ope2);
+	/*
+	 * INT64_MIN / -1 is problematic, since the result can't
+	 * be represented on a two's-complement machine. Some
+	 * machines produce INT64_MIN, some produce zero, some
+	 * throw an exception. We can dodge the problem by
+	 * recognizing that division by -1 is the same as
+	 * negation.
+	 */
+	if (ope2 == -1)
+	{
+		operes = -ope1;
+
+		/* overflow check (needed for INT64_MIN) */
+		if (ope1 == INT64_MIN)
+		{
+			fprintf(stderr, "bigint out of range\n");
+			st->ecnt++;
+			return true;
+		}
+	}
+	else
+		operes = ope1 / ope2;
+	snprintf(res, sizeof(res), INT64_FORMAT, operes);
 }
 else
 {

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Refactoring of LWLock tranches

2016-02-01 Thread Alexander Korotkov

On Mon, Feb 1, 2016 at 10:26 AM, Amit Kapila 
wrote:

> On Sat, Jan 30, 2016 at 2:15 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
> >
> > On Sat, Jan 30, 2016 at 10:58 AM, Amit Kapila 
> wrote:
> >>
> >> On Fri, Jan 29, 2016 at 6:47 PM, Robert Haas 
> wrote:
> >> >
> >> > On Fri, Jan 29, 2016 at 6:59 AM, Alexander Korotkov
> >> >  wrote:
> >> > > On Thu, Jan 21, 2016 at 12:37 AM, Alvaro Herrera <
> alvhe...@2ndquadrant.com>
> >> > > wrote:
> >> > >> So far as I can tell, there are three patches in flight here:
> >> > >>
> >> > >> * replication slot IO lwlocks
> >> > >> * ability of extensions to request tranches dynamically
> >> > >> * PGPROC
> >> > >>
> >> > >> The first one hasn't been reviewed at all, but the other two have
> seen a
> >> > >> bit of discussion and evolution.  Is anyone doing any more
> reviewing?
> >> > >
> >> > > I'd like to add another one: fixed tranche id for each SLRU.
> >> >
> >> > +1 for this.  The patch looks good and I will commit it if nobody
> objects.
> >> >
> >>
> >> +1. Patch looks good to me as well, but I have one related question:
> >> Is there a reason why we should not assign ReplicationOrigins a
> >> fixed tranche id  and then we might want to even get away with
> >> LWLockRegisterTranche()?
> >
> >
> > +1. I think we should do this.
> >
>
> Okay, Attached patch assigns fixed trancheid for ReplicationOrigins.
> I don't think we can remove LWLockRegisterTranche(), as that will be
> required for assigning transcheid's for extensions, so didn't change that
> part of code.
>

OK. This one looks good for me too.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-01 Thread Masahiko Sawada

On Sun, Jan 31, 2016 at 8:58 PM, Michael Paquier
 wrote:
> On Sun, Jan 31, 2016 at 5:28 PM, Masahiko Sawada  
> wrote:
>> On Sun, Jan 31, 2016 at 5:18 PM, Michael Paquier
>>  wrote:
>>> On Sun, Jan 31, 2016 at 5:08 PM, Masahiko Sawada  
>>> wrote:
 On Sun, Jan 31, 2016 at 1:17 PM, Michael Paquier
  wrote:
> On Thu, Jan 28, 2016 at 10:10 PM, Masahiko Sawada wrote:
>> By the discussions so far, I'm planning to have several replication
>> methods such as 'quorum', 'complex' in the feature, and the each
>> replication method specifies the syntax of s_s_names.
>> It means that s_s_names could have the number of sync standbys like
>> what current patch does.
>
> What if the application_name of a standby node has the format of an 
> integer?

 Even if the standby has an integer as application_name, we can set
 s_s_names like '2,1,2,3'.
 The leading '2' is always handled as the number of sync standbys when
 s_r_method = 'priority'.
>>>
>>> Hm. I agree with Fujii-san here, having the number of sync standbys
>>> defined in a parameter that should have a list of names is a bit
>>> confusing. I'd rather have a separate GUC, which brings us back to one
>>> of the first patches that I came up with, and a couple of people,
>>> including Josh were not happy with that because this did not support
>>> real quorum. Perhaps the final answer would be really to get a set of
>>> hooks, and a contrib module making use of that.
>>
>> Yeah, I agree with having set of hooks, and postgres core has simple
>> multi sync replication mechanism like you suggested at first version.
>
> If there are hooks, I don't think that we should really bother about
> having in core anything more complicated than what we have now. The
> trick will be to come up with a hook design modular enough to support
> the kind of configurations mentioned on this thread. Roughly perhaps a
> refactoring of the syncrep code so as it is possible to wait for
> multiple targets some of them being optional,, one modular way in
> pg_stat_get_wal_senders to represent the status of a node to user, and
> another hook to return to decide which are the nodes to wait for. Some
> of the nodes being waited for may be based on conditions for quorum
> support. That's a hard problem to do that in a flexible enough way.

Hm, I think not-nested quorum and priority are not complicated, and we
should support at least both or either simple method in core of
postgres.
More complicated method like using json-style, or dedicated language
would be supported by external module.

Regards,

--
Masahiko Sawada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2016-02-01 Thread Fabien COELHO



Hello Michaël,


 - remove the short macros (although IMO it is a code degradation)


FWIW, I like this suggestion from Robert.


I'm not especially found of macros, my main reserve is that because of the 
length of the function names this necessarily creates lines longer than 80 
columns or awkward and repeated new lines within expressions, which are 
both ugly.



+make_op(const char *operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr)
+{
+   return make_func(find_func(operator),
+/* beware that the list is
reversed in make_func */
+make_elist(rexpr,
make_elist(lexpr, NULL)));
+}
I think that this should use as argument the function ID, aka PGBENCH_ADD
or similar instead of find_func() with an operator character. This saves a
couple of instructions.


Not that simply: The number is the index in the array of functions, not 
the enum value, they are currently off by one, and there may be the same 
function with different names for some reason some day, so I do not think 
it would be a good idea to enforce the order so that they are identical.

Also this minor search is only executed when parsing the script.


+ * - nargs: number of arguments (-1 is a special value for min & max)
My fault perhaps, it may be better to mention here that -1 means that min
and max need at least one argument, and that the number of arguments is not
fixed.


Ok for a better comment.


For mod() there is no need to have an error, returning 0 is fine. You can
actually do it unconditionally when rval == -1.


Ok.


+   setDoubleValue(retval, d < 0.0? -d: d);
Nitpick: lack of spaces between the question mark.


Ok.


NONE is used nowhere, but I think that you could use it for an assertion
check here: [...]
Just replace the "none" message by Assert(type != PGBT_NONE) for example.


I've added a use of the macro.


Another remaining item: should support for \setrandom be dropped? As the
patch is presented this remains intact.


As you know my opinion is "yes", but I have not receive a clear go about 
that from a committer and I'm not motivated by removing and then re adding 
code to the patch.


If nothing clear is said, I'll do a patch just for that one functions are 
added and submit it to the next CF.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 42d0667..d42208a 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -796,17 +796,21 @@ pgbench  options  dbname
   Sets variable varname to an integer value calculated
   from expression.
   The expression may contain integer constants such as 5432,
-  references to variables :variablename,
+  double constants such as 3.14159,
+  references to integer variables :variablename,
   and expressions composed of unary (-) or binary operators
-  (+, -, *, /, %)
-  with their usual associativity, and parentheses.
+  (+, -, *, /,
+  %) with their usual associativity, function calls and
+  parentheses.
+   shows the available
+  functions.
  
 
  
   Examples:
 
 \set ntellers 10 * :scale
-\set aid (1021 * :aid) % (10 * :scale) + 1
+\set aid (1021 * random(1, 10 * :scale)) % (10 * :scale) + 1
 
 

@@ -826,66 +830,35 @@ pgbench  options  dbname
  
 
  
-  By default, or when uniform is specified, all values in the
-  range are drawn with equal probability.  Specifying gaussian
-  or  exponential options modifies this behavior; each
-  requires a mandatory parameter which determines the precise shape of the
-  distribution.
- 
+  
+   
+
+ \setrandom n 1 10 or \setrandom n 1 10 uniform
+ is equivalent to \set n random(1, 10) and uses a uniform
+ distribution.
+
+   
 
- 
-  For a Gaussian distribution, the interval is mapped onto a standard
-  normal distribution (the classical bell-shaped Gaussian curve) truncated
-  at -parameter on the left and +parameter
-  on the right.
-  Values in the middle of the interval are more likely to be drawn.
-  To be precise, if PHI(x) is the cumulative distribution
-  function of the standard normal distribution, with mean mu
-  defined as (max + min) / 2.0, with
-
- f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
-(2.0 * PHI(parameter) - 1.0)
-
-  then value i between min and
-  max inclusive is drawn with probability:
-  f(i + 0.5) - f(i - 0.5).
-  Intuitively, the larger parameter, the more
-  frequently values close to the middle of the interval are drawn, and the
-  less frequently values close to the min and
-  max bounds. About 67% of values are drawn from the
-  middle 1.0 / parameter, that is a relative
-  0.5 / parameter around the mean, and 95% in the middle
-  2.0 / parameter, that is a relative
-  1.0 / parameter

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-02-01 Thread Alexander Korotkov

On Mon, Feb 1, 2016 at 7:05 AM, Dilip Kumar  wrote:

>
> On Sun, Jan 31, 2016 at 11:44 AM, Dilip Kumar 
> wrote:
>
>> By looking at the results with scale factor 1000 and 100 i don't see any
>> reason why it will regress with scale factor 300.
>>
>> So I will run the test again with scale factor 300 and this time i am
>> planning to run 2 cases.
>> 1. when data fits in shared buffer
>> 2. when data doesn't fit in shared buffer.
>>
>
> I have run the test again with 300 S.F and found no regression, in fact
> there is improvement with the patch like we saw with 1000 scale factor.
>
> Shared Buffer= 8GB
> max_connections=150
> Scale Factor=300
>
> ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
>
> ClientBasePatch
> 11974419382
> 8125923126395
> 3231393151
> 64387339496830
> 128306412350610
>
> Shared Buffer= 512MB
> max_connections=150
> Scale Factor=300
>
> ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
>
> ClientBasePatch
> 11716916454
> 8108547105559
> 32241619262818
> 64206868233606
> 128137084217013
>

Great, thanks!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Proposal: Generic WAL logical messages

2016-02-01 Thread Petr Jelinek

On 1 February 2016 at 09:45, Alexander Korotkov
 wrote:
> On Sat, Jan 30, 2016 at 11:58 AM, Simon Riggs  wrote:
>>
>> On 29 January 2016 at 21:11, Alexander Korotkov
>>  wrote:
>>>
>>> Should we think more about naming? Does two kinds of generic records
>>> confuse people?
>>
>>
>> Logical messages
>>
>> Generic WAL records
>>
>> Seems like I can tell them apart. Worth checking, but I think we're OK.
>
>
> I was worrying because topic name is "Generic WAL logical messages". But if
> we name them just "Logical messages" then it's OK for me.
>

Yeah the patch talks about logical messages, I use different title in
mailing list and CF to make it more clear on first sight what this
actually is technically.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Template for commit messages

2016-02-01 Thread Alvaro Herrera

Joshua D. Drake wrote:
> On 01/31/2016 04:34 PM, Michael Paquier wrote:

> >This page would need a refresh IMO. I think it has not been touched
> >for the last couple of years.
> 
> No doubt but if we can't bother to keep that refreshed what makes us think
> that a structured format in commit messages that magically (through a lot of
> hard work and extra parsing anyway) is going to be any more accurate?

If you're proposing to keep that page updated, please propose that in
pgsql-www instead of hijacking this thread.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Copy-pasto in the ExecForeignDelete documentation

2016-02-01 Thread Etsuro Fujita

Hi,

While working on FDW DML pushdown, I ran into a copy-pasto in the
ExecForeignDelete documentation in fdwhandler.sgml:

 The data in the returned slot is used only if the DELETE
 query has a RETURNING clause or the foreign table has
 an AFTER ROW trigger.  Triggers require all columns,
but the

I don't think the data is referenced by the AFTER ROW DELETE triggers.
Attached is a patch to fix that.  The patch also avoids adding an
unnecessary RETURNING clause to DELETE when deparsing a remote DELETE
statement in postgres_fdw.

I'll add this to the next CF.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 1078,1085  deparseDeleteSql(StringInfo buf, PlannerInfo *root,
  	deparseRelation(buf, rel);
  	appendStringInfoString(buf, " WHERE ctid = $1");
  
! 	deparseReturningList(buf, root, rtindex, rel,
! 	   rel->trigdesc && rel->trigdesc->trig_delete_after_row,
  		 returningList, retrieved_attrs);
  }
  
--- 1078,1085 
  	deparseRelation(buf, rel);
  	appendStringInfoString(buf, " WHERE ctid = $1");
  
! 	/* No need to retrieve columns for AFTER ROW DELETE triggers */
! 	deparseReturningList(buf, root, rtindex, rel, false,
  		 returningList, retrieved_attrs);
  }
  
*** a/doc/src/sgml/fdwhandler.sgml
--- b/doc/src/sgml/fdwhandler.sgml
***
*** 606,613  ExecForeignDelete (EState *estate,
  
  
   The data in the returned slot is used only if the DELETE
!  query has a RETURNING clause or the foreign table has
!  an AFTER ROW trigger.  Triggers require all columns, but the
   FDW could choose to optimize away returning some or all columns depending
   on the contents of the RETURNING clause.  Regardless, some
   slot must be returned to indicate success, or the query's reported row
--- 606,614 
  
  
   The data in the returned slot is used only if the DELETE
!  query has a RETURNING clause.  (Note that the data is not
!  referenced by AFTER ROW triggers on the foreign table in
!  the DELETE case.)  The
   FDW could choose to optimize away returning some or all columns depending
   on the contents of the RETURNING clause.  Regardless, some
   slot must be returned to indicate success, or the query's reported row

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-02-01 Thread Alexander Korotkov

On Mon, Feb 1, 2016 at 11:34 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Mon, Feb 1, 2016 at 7:05 AM, Dilip Kumar  wrote:
>
>>
>> On Sun, Jan 31, 2016 at 11:44 AM, Dilip Kumar 
>> wrote:
>>
>>> By looking at the results with scale factor 1000 and 100 i don't see any
>>> reason why it will regress with scale factor 300.
>>>
>>> So I will run the test again with scale factor 300 and this time i am
>>> planning to run 2 cases.
>>> 1. when data fits in shared buffer
>>> 2. when data doesn't fit in shared buffer.
>>>
>>
>> I have run the test again with 300 S.F and found no regression, in fact
>> there is improvement with the patch like we saw with 1000 scale factor.
>>
>> Shared Buffer= 8GB
>> max_connections=150
>> Scale Factor=300
>>
>> ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
>>
>> ClientBasePatch
>> 11974419382
>> 8125923126395
>> 3231393151
>> 64387339496830
>> 128306412350610
>>
>> Shared Buffer= 512MB
>> max_connections=150
>> Scale Factor=300
>>
>> ./pgbench  -j$ -c$ -T300 -M prepared -S postgres
>>
>> ClientBasePatch
>> 11716916454
>> 8108547105559
>> 32241619262818
>> 64206868233606
>> 128137084217013
>>
>
> Great, thanks!
>

Attached patch is rebased and have better comments.
Also, there is one comment which survive since original version by Andres.

/* Add exponential backoff? Should seldomly be contended tho. */


Andres, did you mean we should twice the delay with each unsuccessful try
to lock?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>


pinunpin-cas-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] How can I build OSSP UUID support on Windows to avoid duplicate UUIDs?

2016-02-01 Thread Alvaro Herrera

高增琦 wrote:
> : (
> 
> still don't know how to build ossp-uuid on windows with MSVC.
> Saito san's patch doesn't fix all errors during compiling...

I don't understand why you want to build that specific module on Windows.
Doesn't Windows have its own UUID generator that you can access, using a
smaller extension?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comment typos in source code: s/thats/that is/

2016-02-01 Thread Magnus Hagander

On Thu, Jan 28, 2016 at 1:39 PM, Michael Paquier 
wrote:

> Hi all,
>
> I found a couple of typos as per the $subject.
> A patch is attached.
>

Applied, thanks.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Several problems in tab-completions for SET/RESET

2016-02-01 Thread Fujii Masao

On Mon, Feb 1, 2016 at 9:23 PM, Michael Paquier
 wrote:
> On Mon, Feb 1, 2016 at 9:15 PM, Fujii Masao wrote:
>> If we do that, we also should change the tab-completion for SET command
>> so that "=" is suggested. But I'm afraid that which might decrease that
>> tab-completion.
>>
>> Imagine the case of "SET work_mem ". If "TO" and "=" are suggested,
>> we need to type either "T" or "=" and then  to input the setting value.
>> Otherwise, "SET work_mem " suggests only "TO" and we can input
>> the setting value by just typing  again.
>> This extra step is very small, but SET command is usually used very often,
>> so I'd like to avoid such extra step.
>
> OK, that's fine.

Pushed. Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2016-02-01 Thread Robert Haas

On Sat, Jan 30, 2016 at 7:36 AM, Michael Paquier
 wrote:
> On Fri, Jan 29, 2016 at 11:21 PM, Fabien COELHO  wrote:
>> +/* overflow check (needed for INT64_MIN) */
>> +if (lval != 0 && (*retval < 0 == lval < 0))
>>
>> Why not use "if (lval == INT64_MIN)" instead of this complicated condition?
>> If it is really needed for some reason, I think that a comment could help.
>
> Checking for PG_INT64_MIN only would be fine as well, so let's do so.
> I thought honestly that we had better check if the result and the left
> argument are not of the same sign, but well.

Committed and back-patched to 9.5.  Doesn't apply further back.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions involving multiple postgres foreign servers

2016-02-01 Thread Alvaro Herrera

Alvaro Herrera wrote:
> Ashutosh Bapat wrote:
> 
> > Here's updated patch. I didn't use version numbers in file names in my
> > previous patches. I am starting from this onwards.
> 
> Um, I tried this patch and it doesn't apply at all.  There's a large
> number of conflicts.  Please update it and resubmit to the next
> commitfest.

Also, please run "git show --check" of "git diff origin/master --check"
and fix the whitespace problems that it shows.  It's an easy thing but
there's a lot of red squares in my screen.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: [DOCS] pgbench doc typos

2016-02-01 Thread Fujii Masao

On Wed, Jan 27, 2016 at 8:32 PM, Erik Rijkers  wrote:
> On 2016-01-27 11:06, Erik Rijkers wrote:
>>
>> Two trivial changes to  doc/src/sgml/ref/pgbench.sgml

-   Here is example outputs:
+   Here is example output:

"Here are example outputs" is better?
In 9.4 document, that description was used for the example outputs
of per-transaction logging.

> I did not understand this sentence:
>
>"Notice that while the plain (unaggregated) log file contains index
>of the custom script files, the aggregated log does not."
>
> "contains index"?  what does that mean?
>
> Not understanding, I have not changed it. I think it could be made more
> clear.

What about the following?

When multiple custom script files are specified with -f option,
while the plain (unaggregated) log file contains file_no field
which identifies which script file is used, the aggregated log does not.
Therefore if you need per script data, --aggregate-interval option
should not be specified and you need to aggregate the data on your own
from the plain log file.

BTW, I found another problem in pgbench.sgml. skipped_transactions is
documented as the last field of the plain log, but the format of the log
doesn't include that as follows.

client_id transaction_no time file_no time_epoch time_us [schedule_lag]

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] RFC: replace pg_stat_activity.waiting with something more descriptive

2016-02-01 Thread Alexander Korotkov

On Sun, Jan 31, 2016 at 6:55 AM, Amit Kapila 
wrote:

> On Thu, Jan 28, 2016 at 9:16 AM, Amit Kapila 
> wrote:
> >
> > On Thu, Jan 28, 2016 at 2:12 AM, Robert Haas 
> wrote:
> > >
> > > On Tue, Jan 26, 2016 at 3:10 AM, and...@anarazel.de <
> and...@anarazel.de> wrote:
> > > > I do think there's a considerable benefit in improving the
> > > > instrumentation here, but his strikes me as making live more complex
> for
> > > > more users than it makes it easier. At the very least this should be
> > > > split into two fields (type & what we're actually waiting on). I also
> > > > strongly suspect we shouldn't use in band signaling ("process not
> > > > waiting"), but rather make the field NULL if we're not waiting on
> > > > anything.
> > >
> > > +1 for splitting it into two fields.
> > >
> >
> > I will take care of this.
> >
>
> As discussed, I have added a new field wait_event_type along with
> wait_event in pg_stat_activity.  Changed the code return NULL, if
> backend is not waiting.  Updated the docs as well.
>

I wonder if we can use 4-byte wait_event_info more efficiently.
LWLock number in the tranche would be also useful information to expose.
Using lwlock number user can determine if there is high concurrency for
single lwlock in tranche or it is spread over multiple lwlocks.
I think it would be enough to have 6 bits for event class id and 10 bit for
event id. So, it would be maximum 64 event classes and maximum 1024 events
per class. These limits seem to be fair enough for me.
And then we save 16 bits for lock number. It's certainly not enough for
some tranches. For instance, number of buffers could be easily more than
2^16. However, we could expose at least lower 16 bits. It would be at least
something. Using this information user at least can make a conclusion like
"It MIGHT be a high concurrency for single buffer content. Other way it is
coincidence that a lot of different buffers have the same 16 lower bits.".

Any thoughts?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-02-01 Thread Andres Freund

On 2016-01-25 16:30:47 +0900, Michael Paquier wrote:
> diff --git a/src/backend/access/transam/xlog.c 
> b/src/backend/access/transam/xlog.c
> index a2846c4..b124f90 100644
> --- a/src/backend/access/transam/xlog.c
> +++ b/src/backend/access/transam/xlog.c
> @@ -3278,6 +3278,14 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
>   tmppath, path)));
>   return false;
>   }
> +
> + /*
> +  * Make sure the rename is permanent by fsyncing the parent
> +  * directory.
> +  */
> + START_CRIT_SECTION();
> + fsync_fname(XLOGDIR, true);
> + END_CRIT_SECTION();
>  #endif

Hm. I'm seriously doubtful that using critical sections for this is a
good idea. What's the scenario you're protecting against by declaring
this one?  We intentionally don't error out in the isdir cases in
fsync_fname() cases anyway.

Afaik we need to fsync tmppath before and after the rename, and only
then the directory, to actually be safe.

> @@ -5297,6 +5313,9 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr 
> endOfLog)
>errmsg("could not rename file \"%s\" to 
> \"%s\": %m",
>   RECOVERY_COMMAND_FILE, 
> RECOVERY_COMMAND_DONE)));
>  
> + /* Make sure the rename is permanent by fsyncing the data directory. */
> + fsync_fname(".", true);
> +

Shouldn't RECOVERY_COMMAND_DONE be fsynced first here?

>   ereport(LOG,
>   (errmsg("archive recovery complete")));
>  }
> @@ -6150,6 +6169,12 @@ StartupXLOG(void)
>   TABLESPACE_MAP, 
> BACKUP_LABEL_FILE),
>errdetail("Could not rename 
> file \"%s\" to \"%s\": %m.",
>  
> TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
> +
> + /*
> +  * Make sure the rename is permanent by fsyncing the 
> data
> +  * directory.
> +  */
> + fsync_fname(".", true);
>   }

Is it just me, or are the repeated four line comments a bit grating?

>   /*
> @@ -6525,6 +6550,13 @@ StartupXLOG(void)
>   TABLESPACE_MAP, 
> TABLESPACE_MAP_OLD)));
>   }
>  
> + /*
> +  * Make sure the rename is permanent by fsyncing the parent
> +  * directory.
> +  */
> + if (haveBackupLabel || haveTblspcMap)
> + fsync_fname(".", true);
> +

Isn't that redundant with the haveTblspcMap case above?

>   /* Check that the GUCs used to generate the WAL allow recovery 
> */
>   CheckRequiredParameterValues();
>  
> @@ -7305,6 +7337,12 @@ StartupXLOG(void)
>errmsg("could not rename file 
> \"%s\" to \"%s\": %m",
>   origpath, 
> partialpath)));
>   XLogArchiveNotify(partialfname);
> +
> + /*
> +  * Make sure the rename is permanent by 
> fsyncing the parent
> +  * directory.
> +  */
> + fsync_fname(XLOGDIR, true);

.partial should be fsynced first.

>   }
>   }
>   }
> @@ -10905,6 +10943,9 @@ CancelBackup(void)
>  BACKUP_LABEL_FILE, 
> BACKUP_LABEL_OLD,
>  TABLESPACE_MAP, 
> TABLESPACE_MAP_OLD)));
>   }
> +
> + /* fsync the data directory to persist the renames */
> + fsync_fname(".", true);
>  }

Same.

>  /*
> diff --git a/src/backend/access/transam/xlogarchive.c 
> b/src/backend/access/transam/xlogarchive.c
> index 277c14a..8dda80b 100644
> --- a/src/backend/access/transam/xlogarchive.c
> +++ b/src/backend/access/transam/xlogarchive.c
> @@ -477,6 +477,12 @@ KeepFileRestoredFromArchive(char *path, char *xlogfname)
>   path, xlogfpath)));
>  
>   /*
> +  * Make sure the renames are permanent by fsyncing the parent
> +  * directory.
> +  */
> + fsync_fname(XLOGDIR, true);

Afaics the file under the temporary filename has not been fsynced up to
here.

> + /*
>* Create .done file forcibly to prevent the restored segment from being
>* archived again later.
>*/
> @@ -586,6 +592,11 @@ XLogArchiveForceDone(const char *xlog)
>errmsg("could not rename file \"%s\" 
> to \"%s\": %m",
>   archiveReady, 
> archiveDone)));
>  
> + /*
> +  * Make sure the rename is permanent by fsyncing the

Re: [HACKERS] pgbench stats per script & other stuff

2016-02-01 Thread Fabien COELHO



Hello Michaël,

Two rebase attached.


 - 15-e: prefix selection for -b



-   if (strncmp(builtin_script[i].name, name,
-   strlen(builtin_script[i].name)) == 0)
+   if (strncmp(builtin_script[i].name, name, len) == 0)

I agree with Alvaro here: this should remain unchanged. It seems to be
a rebase mistake.


I do not understand. I tested it and it works as expected. If I put the 
above strlen instead the suffix detection does not work:



 fabien@sto:bin/pgbench> git diff
 diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
 index 783cbf9..0c33012 100644
 --- a/src/bin/pgbench/pgbench.c
 +++ b/src/bin/pgbench/pgbench.c
 @@ -2684,7 +2684,7 @@ findBuiltin(const char *name, char **desc)

 for (i = 0; i < N_BUILTIN; i++)
 {
 -   if (strncmp(builtin_script[i].name, name, len) == 0)
 +   if (strncmp(builtin_script[i].name, name, 
strlen(builtin_script[i].name)) == 0)
 {
 *desc = builtin_script[i].desc;
 commands = builtin_script[i].commands;


 ./pgbench -b t
 no builtin script found for name "t"
 Available builtin scripts:
 tpcb-like
 simple-update
 select-only

Indeed, then it can only match if the provided "name" is as long as the 
recorded len. The point of the "suffix" selection is to align to the short 
supplied string.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ade1b53..ca3e158 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -262,11 +262,13 @@ pgbench  options  dbname
 
 
  
-  -b scriptname
-  --builtin scriptname
+  -b scriptname[@weight]
+  --builtin scriptname[@weight]
   

 Add the specified builtin script to the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the test.
 Available builtin scripts are: tpcb-like,
 simple-update and select-only.
 With special name list, show the list of builtin scripts
@@ -321,12 +323,14 @@ pgbench  options  dbname
  
 
  
-  -f filename
-  --file=filename
+  -f filename[@weight]
+  --file=filename[@weight]
   

 Add a transaction script read from filename to
 the list of executed scripts.
+An optional integer weight after @ allows to adjust the
+probability of drawing the test.
 See below for details.

   
@@ -689,6 +693,9 @@ pgbench  options  dbname
Pgbench executes test scripts chosen randomly from a specified list.
They include built-in scripts with -b and
user-provided custom scripts with -f.
+   Each script may be given a relative weight specified after a
+   @ so as to change its drawing probability.
+   The default weight is 1.
  
 
   
@@ -1137,7 +1144,7 @@ number of transactions per client: 1000
 number of transactions actually processed: 1/1
 tps = 618.764555 (including connections establishing)
 tps = 622.977698 (excluding connections establishing)
-SQL script 1: builtin: TPC-B (sort of)
+SQL script 1, weight 1: builtin: TPC-B (sort of)
  - 1 transactions (100.0% of total, tps = 618.764555)
  - latency average = 15.844 ms
  - latency stddev = 2.715 ms
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 7eb6a2d..5d24768 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -179,6 +179,8 @@ char	   *login = NULL;
 char	   *dbName;
 const char *progname;
 
+#define WSEP '@' /* weight separator */
+
 volatile bool timer_exceeded = false;	/* flag from signal handler */
 
 /* variable definitions */
@@ -299,11 +301,14 @@ typedef struct
 static struct
 {
 	const char *name;
+	int weight;
 	Command   **commands;
 	StatsData stats;
 }	sql_script[MAX_SCRIPTS];	/* SQL script files */
 static int	num_scripts;		/* number of scripts in sql_script[] */
 static int	num_commands = 0;	/* total number of Command structs */
+static int  total_weight = 0;
+
 static int	debug = 0;			/* debug flag */
 
 /* Define builtin test scripts */
@@ -389,9 +394,9 @@ usage(void)
 	 "  --tablespace=TABLESPACE  create tables in the specified tablespace\n"
 		   "  --unlogged-tablescreate tables as unlogged tables\n"
 		   "\nOptions to select what to run:\n"
-		   "  -b, --builtin=NAME   add buitin script (use \"-b list\" to display\n"
-		   "   available scripts)\n"
-		   "  -f, --file=FILENAME  add transaction script from FILENAME\n"
+		   "  -b, --builtin=NAME[@W]   add weighted buitin script (use \"-b list\"\n"
+		   "   to display available scripts)\n"
+		   "  -f, --file=FILENAME[@W]  add weighted transaction script from FILENAME\n"
 		   "  -N, --skip-some-updates  skip updates of pgbench_tellers and pgbench_branches\n"
 		   "

Re: [HACKERS] [PATCH] better systemd integration

2016-02-01 Thread Pavel Stehule

Hi

> index 2e7f1d7..d983a50 100644
> --- a/src/backend/postmaster/postmaster.c
> +++ b/src/backend/postmaster/postmaster.c
> @@ -4933,6 +4933,11 @@ sigusr1_handler(SIGNAL_ARGS)
> if (XLogArchivingAlways())
> PgArchPID = pgarch_start();
>
> +#ifdef USE_SYSTEMD
> +   if (!EnableHotStandby)
> +   sd_notify(0, "READY=1");
> +#endif
> +
> pmState = PM_RECOVERY;
> }
> if (CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) &&
>

I cannot to check it this week, but this should to work. "ready" state for
standby only mode starting when slave is able to get wal records or
segments from server. I will test it next week.

regards

Pavel

> > Default timeout on FC is 90 sec - it is should not to be enough for
> > large servers with large shared buffers and high checkpoint segments. It
> > should be mentioned in service file.
>
> Good point.  I think we should set TimeoutSec=0 in the suggested service
file.
>

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-02-01 Thread Andres Freund

On 2016-02-01 16:49:46 +0100, Alvaro Herrera wrote:
> Yeah.  On 9.4 there are already some conflicts, and I'm sure there will
> be more in almost each branch.  Does anyone want to volunteer for
> producing per-branch versions?

> The next minor release is to be tagged next week and it'd be good to put
> this fix there.

I don't think this is going to be ready for that. The risk of hurrying
this through seems higher than the loss risk at this point.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fuzzy substring searching with the pg_trgm extension

2016-02-01 Thread Artur Zakirov


On 29.01.2016 18:58, Artur Zakirov wrote:

On 29.01.2016 18:39, Alvaro Herrera wrote:

Teodor Sigaev wrote:

The behavior of this function is surprising to me.

select substring_similarity('dog' ,  'hotdogpound') ;

  substring_similarity
--
  0.25


Substring search was desined to search similar word in string:
contrib_regression=# select substring_similarity('dog' ,  'hot
dogpound') ;
  substring_similarity
--
  0.75

contrib_regression=# select substring_similarity('dog' ,  'hot dog
pound') ;
  substring_similarity
--
 1


Hmm, this behavior looks too much like magic to me.  I mean, a substring
is a substring -- why are we treating the space as a special character
here?



I think, I can rename this function to subword_similarity() and correct
the documentation.

The current behavior is developed to find most similar word in a text.
For example, if we will search just substring (not word) then we will
get the following result:

select substring_similarity('dog', 'dogmatist');
  substring_similarity
-
 1
(1 row)

But this is wrong I think. They are completely different words.

For searching a similar substring (not word) in a text maybe another
function should be added?



I have changed the patch:
1 - trgm2.data was corrected, duplicates were deleted.
2 - I have added operators <<-> and <->> with GiST index supporting. A 
regression test will pass only with the patch 
http://www.postgresql.org/message-id/capphfdt19fwqxaryjkzxb3oxmv-kan3fluzrooare_u3h3c...@mail.gmail.com

3 - the function substring_similarity() was renamed to subword_similarity().

But there is not a function substring_similarity_pos() yet. It is not 
trivial.


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
*** a/contrib/pg_trgm/pg_trgm--1.2.sql
--- b/contrib/pg_trgm/pg_trgm--1.2.sql
***
*** 3,13 
--- 3,15 
  -- complain if script is sourced in psql, rather than via CREATE EXTENSION
  \echo Use "CREATE EXTENSION pg_trgm" to load this file. \quit
  
+ -- Deprecated function
  CREATE FUNCTION set_limit(float4)
  RETURNS float4
  AS 'MODULE_PATHNAME'
  LANGUAGE C STRICT VOLATILE;
  
+ -- Deprecated function
  CREATE FUNCTION show_limit()
  RETURNS float4
  AS 'MODULE_PATHNAME'
***
*** 26,32  LANGUAGE C STRICT IMMUTABLE;
  CREATE FUNCTION similarity_op(text,text)
  RETURNS bool
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT STABLE;  -- stable because depends on trgm_limit
  
  CREATE OPERATOR % (
  LEFTARG = text,
--- 28,34 
  CREATE FUNCTION similarity_op(text,text)
  RETURNS bool
  AS 'MODULE_PATHNAME'
! LANGUAGE C STRICT STABLE;  -- stable because depends on pg_trgm.sml_limit
  
  CREATE OPERATOR % (
  LEFTARG = text,
*** a/contrib/pg_trgm/trgm.h
--- b/contrib/pg_trgm/trgm.h
***
*** 105,111  typedef char *BITVECP;
  
  typedef struct TrgmPackedGraph TrgmPackedGraph;
  
! extern float4 trgm_limit;
  
  extern uint32 trgm2int(trgm *ptr);
  extern void compact_trigram(trgm *tptr, char *str, int bytelen);
--- 105,111 
  
  typedef struct TrgmPackedGraph TrgmPackedGraph;
  
! extern double trgm_sml_limit;
  
  extern uint32 trgm2int(trgm *ptr);
  extern void compact_trigram(trgm *tptr, char *str, int bytelen);
*** a/contrib/pg_trgm/trgm_gin.c
--- b/contrib/pg_trgm/trgm_gin.c
***
*** 206,212  gin_trgm_consistent(PG_FUNCTION_ARGS)
  			 * similarity is just c / len1.
  			 * So, independly on DIVUNION the upper bound formula is the same.
  			 */
! 			res = (nkeys == 0) ? false : ((float4) ntrue) / ((float4) nkeys))) >= trgm_limit) ? true : false);
  			break;
  		case ILikeStrategyNumber:
  #ifndef IGNORECASE
--- 206,213 
  			 * similarity is just c / len1.
  			 * So, independly on DIVUNION the upper bound formula is the same.
  			 */
! 			res = (nkeys == 0) ? false :
! ((float4) ntrue) / ((float4) nkeys))) >= trgm_sml_limit) ? true : false);
  			break;
  		case ILikeStrategyNumber:
  #ifndef IGNORECASE
***
*** 283,289  gin_trgm_triconsistent(PG_FUNCTION_ARGS)
  			/*
  			 * See comment in gin_trgm_consistent() about * upper bound formula
  			 */
! 			res = (nkeys == 0) ? GIN_FALSE : (float4) ntrue) / ((float4) nkeys)) >= trgm_limit) ? GIN_MAYBE : GIN_FALSE);
  			break;
  		case ILikeStrategyNumber:
  #ifndef IGNORECASE
--- 284,291 
  			/*
  			 * See comment in gin_trgm_consistent() about * upper bound formula
  			 */
! 			res = (nkeys == 0) ? GIN_FALSE :
! (float4) ntrue) / ((float4) nkeys)) >= trgm_sml_limit) ? GIN_MAYBE : GIN_FALSE);
  			break;
  		case ILikeStrategyNumber:
  #ifndef IGNORECASE
*** a/contrib/pg_trgm/trgm_gist.c
--- b/contrib/pg_trgm/trgm_gist.c
***
*** 294,300  gtrgm_consistent(PG_FUNCTION_ARGS)
  float4		tmpsml = cnt_sml(key, qtrg);
  
  /* strange bug at

[HACKERS] Fix KNN GiST ordering type

2016-02-01 Thread Alexander Korotkov

Hi!

KNN GiST detects which type it should return by returning type of ordering
operator.
But it appears that type of sk_func is detected after it was replaced with
distance function. That is wrong: distance function should always return
float8.
I think it is just a typo.
Should be backpatched to 9.5.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


fix-knn-gist-ordering-type.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] [PATCH] Phrase search ported to 9.6

2016-02-01 Thread Dmitry Ivanov

Hi Hackers,

Although PostgreSQL is capable of performing some FTS (full text search) 
queries, there's still a room for improvement. Phrase search support could 
become a great addition to the existing set of features.


Introduction


It is no secret that one can make Google search for an exact phrase (instead 
of an unordered lexeme set) simply by enclosing it within double quotes. This 
is a really nice feature which helps to save the time that would otherwise be 
spent on annoying result filtering.

One weak spot of the current FTS implementation is that there is no way to 
specify the desired lexeme order (since it would not make any difference at 
all). In the end, the search engine will look for each lexeme individually, 
which means that a hypothetical end user would have to discard documents not 
including search phrase all by himself. This problem is solved by the patch 
below (should apply cleanly to 61ce1e8f1).


Problem description
===

The problem comes from the lack of lexeme ordering operator. Consider the 
following example:

select q @@ plainto_tsquery('fatal error') from 
unnest(array[to_tsvector('fatal error'), to_tsvector('error is not fatal')]) 
as q;
?column?
--
t
t
(2 rows)

Clearly the latter match is not the best result in case we wanted to find 
exactly the "fatal error" phrase. That's when the need for a lexeme ordering 
operator arises:

select q @@ to_tsquery('fatal ? error') from unnest(array[to_tsvector('fatal 
error'), to_tsvector('error is not fatal')]) as q;
?column?
--
t
f
(2 rows)


Implementation
==

The ? (FOLLOWED BY) binary operator takes form of "?" or "?[N]" where 0 <= N < 
~16K. If N is provided, the distance between left and right operands must be 
no greater that N. For example:

select to_tsvector('postgres has taken severe damage') @@ to_tsquery('postgres 
? (severe ? damage)');
 ?column?
--
 f
(1 row)

select to_tsvector('postgres has taken severe damage') @@ to_tsquery('postgres 
?[4] (severe ? damage)');
 ?column?
--
 t
(1 row)

New function phraseto_tsquery([ regconfig, ] text) takes advantage of the "?
[N]" operator in order to facilitate phrase search:

select to_tsvector('postgres has taken severe damage') @@ 
phraseto_tsquery('severely damaged');
 ?column?
--
 t
(1 row)


This patch was originally developed by Teodor Sigaev and Oleg Bartunov in 
2009, so all credit goes to them. Any feedback is welcome.


-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Companydiff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index f70cfe7..9b0a703 100644
--- a/doc/src/sgml/datatype.sgml
+++ b/doc/src/sgml/datatype.sgml
@@ -3925,8 +3925,9 @@ SELECT to_tsvector('english', 'The Fat Rats');
 
  A tsquery value stores lexemes that are to be
  searched for, and combines them honoring the Boolean operators
-  (AND), | (OR), and
- ! (NOT).  Parentheses can be used to enforce grouping
+  (AND), | (OR),
+ ! (NOT) and ? (FOLLOWED BY) phrase search
+ operator.  Parentheses can be used to enforce grouping
  of the operators:
 
 
@@ -3947,8 +3948,8 @@ SELECT 'fat  rat  ! cat'::tsquery;
 
 
  In the absence of parentheses, ! (NOT) binds most tightly,
- and  (AND) binds more tightly than
- | (OR).
+ and  (AND) and ? (FOLLOWED BY)
+ both bind more tightly than | (OR).
 
 
 
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 139aa2b..7585f6c 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -9060,6 +9060,12 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple
 !'cat'


+ ?? 
+tsquery followed by tsquery
+to_tsquery('fat') ?? to_tsquery('rat')
+'fat' ? 'rat'
+   
+   
  @ 
 tsquery contains another ?
 'cat'::tsquery @ 'cat  rat'::tsquery
@@ -9154,6 +9160,18 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple

 
  
+  phraseto_tsquery
+ 
+ phraseto_tsquery( config regconfig ,  query text)
+
+tsquery
+produce tsquery ignoring punctuation
+phraseto_tsquery('english', 'The Fat Rats')
+'fat' ? 'rat'
+   
+   
+
+ 
   querytree
  
  querytree(query tsquery)
@@ -9177,6 +9195,15 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple


 
+ setweight(tsquery, "char")
+
+tsquery
+add weight to each element of tsquery
+setweight('fat ? cat  rat:B'::tsquery, 'A')
+( 'fat':A ? 'cat':A )  'rat':AB
+   
+   
+
  
   strip
  
@@ -9269,6 +9296,27 @@ CREATE TYPE rainbow AS ENUM ('red', 'orange', 'yellow', 'green', 'blue', 'purple

Re: [HACKERS] Template for commit messages

2016-02-01 Thread Andres Freund

On 2016-02-01 12:56:21 +0100, Magnus Hagander wrote:
> Let's just assume that we can fix that part. As in, we can expose either an
> internal db id or a short hash or something, from the archives server. We
> could then have that visible/linkable directly on the archives page, and
> one could copy/paste that link into the commit message. That way we can
> avoid the long messageids.

Raw messageids have the big big advantage that you can get them locally
in your mailreader, and you can dereference them locally... I type
esc-X id:$id and 20ms later I have the whole thread open.  Maybe
I just travel too much, but travel time is often long and uninterrupted
and thus pretty nice to work on patches.

I don't really see the problem, even with gmail style messageids
Discussion: cabuevez_uja0ddrw7apjh3nm4p0kghwopm0wpw4mh_gt2-n...@mail.gmail.com
isn't that bad.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: make behavior of all versions of the "isinf" function be similar

2016-02-01 Thread Vitaly Burovoy

On 1/31/16, Tom Lane  wrote:
> 2. POSIX:2008 only requires that "The isinf() macro shall return a
> non-zero value if and only if its argument has an infinite value."
> Therefore, any assumption about the sign of the result is wrong anyway,
> and any patch that depends on it will be rejected, regardless of what
> isinf.c does.  Or else, if you can convince us that such an assumption
> is really valuable, we'd need isinf.c more not less so that we can
> replace isinf() on platforms where it doesn't meet the stronger spec.
>
>   regards, tom lane

Ok, then I'll use "is_infinite" from "float.c".
But why functions' (in "src/port/isinf.c") behavior are different? It
is a bit confusing…
-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Template for commit messages

2016-02-01 Thread Alvaro Herrera

Andres Freund wrote:
> On 2016-02-01 12:56:21 +0100, Magnus Hagander wrote:
> > Let's just assume that we can fix that part. As in, we can expose either an
> > internal db id or a short hash or something, from the archives server. We
> > could then have that visible/linkable directly on the archives page, and
> > one could copy/paste that link into the commit message. That way we can
> > avoid the long messageids.
> 
> Raw messageids have the big big advantage that you can get them locally
> in your mailreader, and you can dereference them locally... I type
> esc-X id:$id and 20ms later I have the whole thread open.

Yes, +1 on that.

> Maybe I just travel too much, but travel time is often long and
> uninterrupted and thus pretty nice to work on patches.

I don't travel as much as you or Magnus do (!) but I do offline work a
lot, so I appreciate full message-ids.

> I don't really see the problem, even with gmail style messageids
> Discussion: cabuevez_uja0ddrw7apjh3nm4p0kghwopm0wpw4mh_gt2-n...@mail.gmail.com
> isn't that bad.

Yes, yes, I prefer to use the full URL because then a machine can tell
whether it's an email address and not a message-id, which is useful for
mangling addresses while at the same time not mangling message-ids.  But
I'm not going to force that on anybody.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Several problems in tab-completions for SET/RESET

2016-02-01 Thread Michael Paquier

On Mon, Feb 1, 2016 at 9:15 PM, Fujii Masao wrote:
> If we do that, we also should change the tab-completion for SET command
> so that "=" is suggested. But I'm afraid that which might decrease that
> tab-completion.
>
> Imagine the case of "SET work_mem ". If "TO" and "=" are suggested,
> we need to type either "T" or "=" and then  to input the setting value.
> Otherwise, "SET work_mem " suggests only "TO" and we can input
> the setting value by just typing  again.
> This extra step is very small, but SET command is usually used very often,
> so I'd like to avoid such extra step.

OK, that's fine.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Template for commit messages

2016-02-01 Thread Heikki Linnakangas


On 01/02/16 13:36, Andres Freund wrote:

On 2016-01-28 06:03:02 -0500, Bruce Momjian wrote:

Reported-by:
Bug:
Author:
Reviewed-by:
Tested-by:
Backpatch-through:


I personally, and I realize that I'm likely alone on that, would really
like to see references to relevant threads. A year after a commit or
such it often starts to get hard to know which threads a commit was
about.  Often it's easy enough if it's about a single feature, but
bugfixes often originate in threads that have no directly corresponding
thread. And often the commit happens a while after there's been activity
in a thread.  I spent days searching what thread "per discussion" in
commit messages refers to.


+1. I also quite often look at a commit message, and then hunt the "per 
discussion". A message-id would save some time there. I've seen you add 
annotations like that to your commits, which is nice. I'll start doing 
that myself. You've used the "Discussion: " format for that, 
so I'll go with that too. (Even if the message-id is given in free-form 
in the paragraphs, that's helpful, but might as well use a fixed form.)


And that's a proposal that's not "we must all do it or it's useless". If 
half of commit messages include that, then hunting the "per discussion" 
for those commits becomes easier, and the rest are the same as now.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Template for commit messages

2016-02-01 Thread Andres Freund

On 2016-02-01 20:53:56 +0900, Michael Paquier wrote:
> Last time this was suggested though there were concerns regarding the
> 72-80 character limit per line in a commit message.

That seems like a misdirect. The limit is there to keep things readable,
not because there's a hard technical limit. So if a messageid is longer,
it doesn't have much consequence.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Template for commit messages

2016-02-01 Thread Michael Paquier

On Mon, Feb 1, 2016 at 6:13 PM, Alvaro Herrera 
wrote:

> Joshua D. Drake wrote:
> > On 01/31/2016 04:34 PM, Michael Paquier wrote:
>
> > >This page would need a refresh IMO. I think it has not been touched
> > >for the last couple of years.
> >
> > No doubt but if we can't bother to keep that refreshed what makes us
> think
> > that a structured format in commit messages that magically (through a
> lot of
> > hard work and extra parsing anyway) is going to be any more accurate?
>
> If you're proposing to keep that page updated, please propose that in
> pgsql-www instead of hijacking this thread.
>

Oops. Sorry about that.
-- 
Michael

Re: [HACKERS] Template for commit messages

2016-02-01 Thread Andres Freund

On 2016-01-28 06:03:02 -0500, Bruce Momjian wrote:
>   Reported-by:
>   Bug:
>   Author:
>   Reviewed-by:
>   Tested-by:
>   Backpatch-through:

I personally, and I realize that I'm likely alone on that, would really
like to see references to relevant threads. A year after a commit or
such it often starts to get hard to know which threads a commit was
about.  Often it's easy enough if it's about a single feature, but
bugfixes often originate in threads that have no directly corresponding
thread. And often the commit happens a while after there's been activity
in a thread.  I spent days searching what thread "per discussion" in
commit messages refers to.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Template for commit messages

2016-02-01 Thread Michael Paquier

On Mon, Feb 1, 2016 at 8:36 PM, Andres Freund  wrote:
> I personally, and I realize that I'm likely alone on that, would really
> like to see references to relevant threads. A year after a commit or
> such it often starts to get hard to know which threads a commit was
> about.  Often it's easy enough if it's about a single feature, but
> bugfixes often originate in threads that have no directly corresponding
> thread. And often the commit happens a while after there's been activity
> in a thread.  I spent days searching what thread "per discussion" in
> commit messages refers to.

+1. Having to look at the archives to find to which discussion a
commit is attached is a pain. Last time this was suggested though
there were concerns regarding the 72-80 character limit per line in a
commit message.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Several problems in tab-completions for SET/RESET

2016-02-01 Thread Fujii Masao

On Mon, Feb 1, 2016 at 3:14 PM, Michael Paquier
 wrote:
> On Mon, Feb 1, 2016 at 1:21 PM, Fujii Masao  wrote:
>> On Fri, Jan 29, 2016 at 1:02 PM, Michael Paquier
>>  wrote:
>>> On Fri, Jan 29, 2016 at 11:53 AM, Fujii Masao  wrote:
 I removed the above and added the following for that case.

 +/* Complete ALTER DATABASE|FUNCTION|ROLE|USER ... SET  */
 +else if (Matches2("ALTER", "DATABASE|FUNCTION|ROLE|USER") &&
 + TailMatches2("SET", MatchAny))
 +COMPLETE_WITH_LIST2("FROM CURRENT", "TO");

 Attached is the updated version of the patch.
>>
>> Thanks for the review!
>>
>>> "ALTER FUNCTION foo(bar)" suggests OWNER TO, RENAME TO and SET SCHEMA.
>>> I think that we had better suggesting SET instead of SET SCHEMA, and
>>> add SCHEMA in the list of things suggested by SET.
>>
>> Maybe, and it should suggest other keywords like RESET. That's it's better to
>> overhaul the tab-completion of ALTER FUNCTION. But that's not the task of
>> this patch. IMO it's better to fix that as a separate patch.
>
> Er, OK... I thought that both problems seem rather linked per the
> $subject but I can send an extra patch on this thread if necessary.
> Never mind.
>
>>> "ALTER DATABASE foodb SET foo_param" should suggest TO/= but that's
>>> not the case. After adding TO/= manually, a list of values is
>>> suggested though. Same problem with ALTER ROLE and ALTER FUNCTION.
>>
>> Fixed. Attached is the updated version of the patch.
>
> +   /* Complete ALTER DATABASE|FUNCTION|ROLE|USER ... SET  */
> +   else if (HeadMatches2("ALTER", "DATABASE|FUNCTION|ROLE|USER") &&
> +TailMatches2("SET", MatchAny))
> +   COMPLETE_WITH_LIST2("FROM CURRENT", "TO");
> Small thing: adding "=" to the list of things that can be completed?

If we do that, we also should change the tab-completion for SET command
so that "=" is suggested. But I'm afraid that which might decrease that
tab-completion.

Imagine the case of "SET work_mem ". If "TO" and "=" are suggested,
we need to type either "T" or "=" and then  to input the setting value.
Otherwise, "SET work_mem " suggests only "TO" and we can input
the setting value by just typing  again.
This extra step is very small, but SET command is usually used very often,
so I'd like to avoid such extra step.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Comment typos in source code: s/thats/that is/

2016-02-01 Thread Michael Paquier

On Mon, Feb 1, 2016 at 7:44 PM, Magnus Hagander  wrote:
> On Thu, Jan 28, 2016 at 1:39 PM, Michael Paquier 
> wrote:
>> I found a couple of typos as per the $subject.
>> A patch is attached.
>
> Applied, thanks.

Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Template for commit messages

2016-02-01 Thread Magnus Hagander

On Mon, Feb 1, 2016 at 1:04 PM, Andres Freund  wrote:

> On 2016-02-01 12:56:21 +0100, Magnus Hagander wrote:
> > Let's just assume that we can fix that part. As in, we can expose either
> an
> > internal db id or a short hash or something, from the archives server. We
> > could then have that visible/linkable directly on the archives page, and
> > one could copy/paste that link into the commit message. That way we can
> > avoid the long messageids.
>
> Raw messageids have the big big advantage that you can get them locally
> in your mailreader, and you can dereference them locally... I type
> esc-X id:$id and 20ms later I have the whole thread open.  Maybe
> I just travel too much, but travel time is often long and uninterrupted
> and thus pretty nice to work on patches.
>

If it was a short hash of the same thing, you could equally find them
locally though, couldn't you?



> I don't really see the problem, even with gmail style messageids
> Discussion:
> cabuevez_uja0ddrw7apjh3nm4p0kghwopm0wpw4mh_gt2-n...@mail.gmail.com
> isn't that bad.
>

If people don't actually have a problem with it, that of course makes it
even easier :)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2016-02-01 Thread Pavel Stehule

hi

I am sorry, I am writing from mobile phone and I cannot to cutting text
well.

Dne 29. 1. 2016 18:09 napsal uživatel "Catalin Iacob" <
iacobcata...@gmail.com>:
This interface is new generation of previous
> > functions: info, notice, warning, error with keyword parameters
interface. I
> > didn't changed older functions due keeping compatibility.
>
> Hi,
>
> Even without the OOP changes, the exception classes are still there as
> the underlying mechanism that error and raise_error use.
>
> I looked at the patch and what I don't like about it is that
> raise_error uses SPIError to transport detail, hint etc while error
> uses Error. This is inconsistent and confusing.
>
> Take this example:
>
> CREATE OR REPLACE FUNCTION err()
>   RETURNS int
> AS $$
>   plpy.error('only msg from plpy.error', 'arg2 for error is part of
> tuple', 'arg3 also in tuple')
> $$ LANGUAGE plpython3u;
>
> SELECT err();
>
> CREATE OR REPLACE FUNCTION raise_err()
>   RETURNS int
> AS $$
>   plpy.raise_error('only msg from plpy.raise_error', 'arg2 for
> raise_error is detail', 'arg3 is hint')
> $$ LANGUAGE plpython3u;
>
> SELECT raise_err();
>
> With your patch, this results in:
>
> CREATE FUNCTION
> psql:plpy_error_raise_error_difference:9: ERROR:  plpy.Error: ('only
> msg from plpy.error', 'arg2 for error is part of tuple', 'arg3 also in
> tuple')
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "err", line 2, in 
> plpy.error('only msg from plpy.error', 'arg2 for error is part of
> tuple', 'arg3 also in tuple')
> PL/Python function "err"
> CREATE FUNCTION
> psql:plpy_error_raise_error_difference:17: ERROR:  plpy.SPIError: only
> msg from plpy.raise_error
> DETAIL:  arg2 for raise_error is detail
> HINT:  arg3 is hint
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "raise_err", line 2, in 
> plpy.raise_error('only msg from plpy.raise_error', 'arg2 for
> raise_error is detail', 'arg3 is hint')
> PL/Python function "raise_err"
>
> From the output you can already see that plpy.error and
> plpy.raise_error (even with a single argument) don't use the same
> exception: plpy.error uses Error while raise_error uses SPIError. I
> think with a single argument they should be identical and with
> multiple arguments raise_error should still use Error but use the
> arguments as detail, hint etc. In code you had to export a function to
> plpy_spi.h to get to the details in SPIError while plpy.error worked
> just fine without that.
>

yes, using SPIError is unhappy, I used it, because this exception supported
structured exceptions already, but In python code it has not sense and
raising Error is better.

> I've attached two patches on top of yours: first is a comment fix, the
> next one is a rough proof of concept for using plpy.Error to carry the
> details. This allows undoing the change to export
> PLy_spi_exception_set to plpy_spi.h. And then I realized, the changes
> you did to SPIError are actually a separate enhancement (report more
> details like schema name, table name and so on for the query executed
> by SPI). They should be split into a separate patch.

This patch is simple, and maintaining more patches has not sense.

With these
> patches the output of the above test is:
>
> CREATE FUNCTION
> psql:plpy_error_raise_error_difference:9: ERROR:  plpy.Error: ('only
> msg from plpy.error', 'arg2 for error is part of tuple', 'arg3 also in
> tuple')
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "err", line 2, in 
> plpy.error('only msg from plpy.error', 'arg2 for error is part of
> tuple', 'arg3 also in tuple')
> PL/Python function "err"
> CREATE FUNCTION
> psql:plpy_error_raise_error_difference:17: ERROR:  plpy.Error: only
> msg from plpy.raise_error
> DETAIL:  arg2 for raise_error is detail
> HINT:  arg3 is hint
> CONTEXT:  Traceback (most recent call last):
>   PL/Python function "raise_err", line 2, in 
> plpy.raise_error('only msg from plpy.raise_error', 'arg2 for
> raise_error is detail', 'arg3 is hint')
> PL/Python function "raise_err"
>
> The two approaches are consistent now, both create a plpy.Error. The
> patch is not complete, it only handles detail and hint not the others
> but it should illustrate the idea.
>
> Looking at the output above, I don't see who would rely on calling
> plpy.error with multiple arguments and getting the tuple so I'm
> actually in favor of just breaking backward compatibility. Note that
> passing multiple arguments isn't even documented. So I would just
> change debug, info, error and friends to do what raise_debug,
> raise_info, raise_error do. With a single argument behavior stays the
> same, with multiple arguments one gets more useful behavior (detail,
> hint) instead of the useless tuple. That's my preference but we can
> leave the patch with raise and leave the decision to the committer.
>

if breaking compatibility, then raise* functions are useless, and should be
removed.

> What do you think? Jim doesn't like the

Re: [HACKERS] silent data loss with ext4 / all current versions

2016-02-01 Thread Michael Paquier

On Tue, Feb 2, 2016 at 1:07 AM, Andres Freund  wrote:
> On 2016-01-25 16:30:47 +0900, Michael Paquier wrote:
>> diff --git a/src/backend/access/transam/xlog.c 
>> b/src/backend/access/transam/xlog.c
>> index a2846c4..b124f90 100644
>> --- a/src/backend/access/transam/xlog.c
>> +++ b/src/backend/access/transam/xlog.c
>> @@ -3278,6 +3278,14 @@ InstallXLogFileSegment(XLogSegNo *segno, char 
>> *tmppath,
>>   tmppath, path)));
>>   return false;
>>   }
>> +
>> + /*
>> +  * Make sure the rename is permanent by fsyncing the parent
>> +  * directory.
>> +  */
>> + START_CRIT_SECTION();
>> + fsync_fname(XLOGDIR, true);
>> + END_CRIT_SECTION();
>>  #endif
>
> Hm. I'm seriously doubtful that using critical sections for this is a
> good idea. What's the scenario you're protecting against by declaring
> this one?  We intentionally don't error out in the isdir cases in
> fsync_fname() cases anyway.
>
> Afaik we need to fsync tmppath before and after the rename, and only
> then the directory, to actually be safe.

Regarding the fsync call on the new file before the rename, would it
be better to extend fsync_fname() with some kind of noerror flag to
work around the case of a file that does not exist or do you think it
is better just to use pg_fsync in this case after getting an fd? Using
directly pg_fsync() looks redundant with what fsync_fname does
already.

>> @@ -5297,6 +5313,9 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr 
>> endOfLog)
>>errmsg("could not rename file \"%s\" to 
>> \"%s\": %m",
>>   RECOVERY_COMMAND_FILE, 
>> RECOVERY_COMMAND_DONE)));
>>
>> + /* Make sure the rename is permanent by fsyncing the data directory. */
>> + fsync_fname(".", true);
>> +
>
> Shouldn't RECOVERY_COMMAND_DONE be fsynced first here?

Check.

>>   /*
>> @@ -6525,6 +6550,13 @@ StartupXLOG(void)
>>   
>> TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
>>   }
>>
>> + /*
>> +  * Make sure the rename is permanent by fsyncing the parent
>> +  * directory.
>> +  */
>> + if (haveBackupLabel || haveTblspcMap)
>> + fsync_fname(".", true);
>> +
>
> Isn't that redundant with the haveTblspcMap case above?

I am not sure I get your point here. Are you referring to the fact
that fsync should be done after each rename in this case?

>>   /* Check that the GUCs used to generate the WAL allow recovery 
>> */
>>   CheckRequiredParameterValues();
>>
>> @@ -7305,6 +7337,12 @@ StartupXLOG(void)
>>errmsg("could not rename file 
>> \"%s\" to \"%s\": %m",
>>   origpath, 
>> partialpath)));
>>   XLogArchiveNotify(partialfname);
>> +
>> + /*
>> +  * Make sure the rename is permanent by 
>> fsyncing the parent
>> +  * directory.
>> +  */
>> + fsync_fname(XLOGDIR, true);
>
> .partial should be fsynced first.

Check.

>>   }
>>   }
>>   }
>> @@ -10905,6 +10943,9 @@ CancelBackup(void)
>>  BACKUP_LABEL_FILE, 
>> BACKUP_LABEL_OLD,
>>  TABLESPACE_MAP, 
>> TABLESPACE_MAP_OLD)));
>>   }
>> +
>> + /* fsync the data directory to persist the renames */
>> + fsync_fname(".", true);
>>  }
>
> Same.

Re-check.

>>  /*
>> diff --git a/src/backend/access/transam/xlogarchive.c 
>> b/src/backend/access/transam/xlogarchive.c
>> index 277c14a..8dda80b 100644
>> --- a/src/backend/access/transam/xlogarchive.c
>> +++ b/src/backend/access/transam/xlogarchive.c
>> @@ -477,6 +477,12 @@ KeepFileRestoredFromArchive(char *path, char *xlogfname)
>>   path, xlogfpath)));
>>
>>   /*
>> +  * Make sure the renames are permanent by fsyncing the parent
>> +  * directory.
>> +  */
>> + fsync_fname(XLOGDIR, true);
>
> Afaics the file under the temporary filename has not been fsynced up to
> here.

Yes, true, the old file...

>> + /*
>>* Create .done file forcibly to prevent the restored segment from 
>> being
>>* archived again later.
>>*/
>> @@ -586,6 +592,11 @@ XLogArchiveForceDone(const char *xlog)
>>errmsg("could not rename file \"%s\" 
>> to \"%s\": %m",
>>   archiveReady, 
>> archiveDone)));
>>
>> + /*
>> +  * Make sure the rename is permanent by fsyncing the parent
>> +  * directory.
>> +

Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-02-01 Thread Andreas Joseph Krogh

På tirsdag 02. februar 2016 kl. 04:22:57, skrev Michael Paquier <
michael.paqu...@gmail.com >:
    On Mon, Feb 1, 2016 at 8:21 PM, Dmitry Ivanov > wrote: This patch was originally developed by 
Teodor Sigaev and Oleg Bartunov in
 2009, so all credit goes to them. Any feedback is welcome. 

This is not a small patch:
 28 files changed, 2441 insertions(+), 380 deletions(-)
And the last CF of 9.6 should not contain rather large patches.
-- Michael


 
 
OTOH; It would be extremely nice to get this into 9.6.
 
 
-- Andreas Joseph Krogh

Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-01 Thread Kouhei Kaigai

> KaiGai-san,
> 
> On 2016/02/01 10:38, Kouhei Kaigai wrote:
> > As an aside, background of my motivation is the slide below:
> > http://www.slideshare.net/kaigai/sqlgpussd-english
> > (LT slides in JPUG conference last Dec)
> >
> > I'm under investigation of SSD-to-GPU direct feature on top of
> > the custom-scan interface. It intends to load a bunch of data
> > blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data
> > loading onto CPU/RAM, to preprocess the data to be filtered out.
> > It only makes sense if the target blocks are not loaded to the
> > CPU/RAM yet, because SSD device is essentially slower than RAM.
> > So, I like to have a reliable way to check the latest status of
> > the shared buffer, to kwon whether a particular block is already
> > loaded or not.
> 
> Quite interesting stuff, thanks for sharing!
> 
> I'm in no way expert on this but could this generally be attacked from the
> smgr API perspective? Currently, we have only one implementation - md.c
> (the hard-coded RelationData.smgr_which = 0). If we extended that and
> provided end-to-end support so that there would be md.c alternatives to
> storage operations, I guess that would open up opportunities for
> extensions to specify smgr_which as an argument to ReadBufferExtended(),
> provided there is already support in place to install md.c alternatives
> (perhaps in .so). Of course, these are just musings and, perhaps does not
> really concern the requirements of custom scan methods you have been
> developing.
>
Thanks for your idea. Indeed, smgr hooks are good candidate to implement
the feature, however, what I need is a thin intermediation layer rather
than alternative storage engine.

It becomes clear we need two features here.
1. A feature to check whether a particular block is already on the shared
   buffer pool.
   It is available. BufTableLookup() under the BufMappingPartitionLock
   gives us the information we want.

2. A feature to suspend i/o write-out towards a particular blocks
   that are registered by other concurrent backend, unless it is not
   unregistered (usually, at the end of P2P DMA).
   ==> to be discussed.

When we call smgrwrite(), like FlushBuffer(), it fetches function pointer
from the 'smgrsw' array, then calls smgr_write.

  void
  smgrwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
char *buffer, bool skipFsync)
  {
  (*(smgrsw[reln->smgr_which].smgr_write)) (reln, forknum, blocknum,
buffer, skipFsync);
  }

If extension would overwrite smgrsw[] array, then call the original
function under the control by extension, it allows to suspend the call
of the original smgr_write until completion of P2P DMA.

It may be a minimum invasive way to implement, and portable to any
further storage layers.

How about your thought? Even though it is a bit different from your
original proposition.
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Support for N synchronous standby servers - take 2

2016-02-01 Thread Michael Paquier

On Mon, Feb 1, 2016 at 11:28 PM, Fujii Masao  wrote:

> [first version]
> Add only synchronous_standby_num which specifies the number of standbys
> that the master must wait for before marking sync replication as completed.
> This version supports simple use cases like "I want to have two synchronous
> standbys".
>
> [second version]
> Add synchronous_replication_method: 'prioriry' and 'quorum'. This version
> additionally supports simple quorum commit case like "I want to ensure
> that WAL is replicated synchronously to at least two standbys from five
> ones listed in s_s_names".
>
> Or
>
> Add something like quorum_replication_num and quorum_standby_names, i.e.,
> the master must wait for at least q_r_num standbys from ones listed in
> q_s_names before marking sync replication as completed. Also the master
> must wait for sync replication according to s_s_num and s_s_num.
> That is, this approach separates 'priority' and 'quorum' to each
> parameters.
> This increases the number of GUC parameters, but ISTM less confusing, and
> it supports a bit complicated case like "there is one local standby and
> three
> remote standbys, then I want to ensure that WAL is replicated synchronously
> to the local standby and at least two remote one", e.g.,
>
>   s_s_num = 1, s_s_names = 'local'
>   q_s_num = 2, q_s_names = 'remote1, remote2, remote3'
>
> [third version]
> Add the hooks for more complicated sync replication cases.
>
> I'm thinking that the realistic target for 9.6 might be the first one.
>

If we want to get something out for this release, clearly yes, and being
able to specify 2 sync targets is already a win when the two sync standbys
are not exactly at the same location. FWIW, I don't doing coding and/or
review work, that's basically my first patch that needs a bit more love and
polishing, *and* test cases but I am used enough to perl and PostgresNode
these days to produce something based on sanity checks of
pg_stat_replication and my other set of patches that have more basic
routines.

Now I would not mind if we actually jump into the 3rd case if we are fine
with doing nothing for this release, but this requires a lot of design and
background work, so that's not plausible for 9.6. Of course if there are
voices against the scenario proposed by Fujii-san others feel free to speak
up.
-- 
Michael

Re: [HACKERS] extend pgbench expressions with functions

2016-02-01 Thread Michael Paquier

On Tue, Feb 2, 2016 at 1:35 PM, Michael Paquier
 wrote:
> And now there are patches. Well, nobody has complained about that until now 
> except me... So we could live without patching back-branches, but it don't 
> think it hurts much to fix those holes.

Meh, s/it don't/I don't/
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-01 Thread Robert Haas

On Mon, Feb 1, 2016 at 6:48 PM, Robert Haas  wrote:
> More when I have a bit more time to look at this...

Why does deparseSelectStmtForRel change the order of the existing
arguments?  I have no issue with adding new arguments as required, but
rearranging the existing argument order doesn't serve any useful
purpose that is immediately apparent.  There's also no real need for
the rel -> foreignrel renaming.

+/*
+ * If we have constructed the SELECT clause from the targetlist, construct
+ * the retrieved attributes list as continuously increasing list of
+ * integers.
+ */
+if (retrieved_attrs && tlist)
+{
+int i;
+*retrieved_attrs = NIL;
+for (i = 1; i <= list_length(tlist); i++)
+*retrieved_attrs = lappend_int(*retrieved_attrs, i);
+}

This is really wonky.  First, you pass retrieved_attrs to
deparseSelectSqlForBaseRel, but then you have this code which blows it
away and replaces it if tlist != NIL.  So I guess this will run always
for a join relation, and for a base relation only sometimes.  But
that's certainly not at all clear.  I think you need to find some way
of rearranging this so that it's more obvious what is going on here.

I suggest not renaming the existing deparseTargetList() and instead
coming up with a different name for the new thing you need, maybe
deparseExplicitTargetList().

How about adding another sentence to the header comment for
appendConditions() saying something like "This is used for both WHERE
clauses and for JOIN .. ON"?

+ * The targetlist is list of TargetEntry's which in turn contains Var nodes.

contain.

+/*
+ * Output join name for given join type */

Formatting.  This patch, overall, is badly in need of a pgindent run.

+/*
+ * XXX
+ * Since the query is being built in recursive manner from bottom up,
+ * the FOR UPDATE/SHARE clause referring the base relations can
not be added
+ * at the top level. They need to be added to the subqueries corresponding
+ * to the base relations. This has an undesirable effect of locking more
+ * rows than specified by user, as it locks even those rows from base
+ * relations which are not part of the final join result. To avoid this
+ * undesirable effect, we need to build the join query without the
+ * subqueries, which for now, seems hard.
+ */

This is a second good reason that we should actually do that work
instead of complaining that it's too hard.  The first one is that the
queries that come out of this right now are too long and hard to read.
I actually don't see why this is all that hard.  Deparsing the target
list is simple enough; you just need to emit tab.col using varno to
determine what tab looks like and varattno to determine what col looks
like.  The trickier part is emitting the FROM clause. But this doesn't
seem all that hard either.  Suppose that when we construct the fpinfo
(PgFdwRelationInfo *) for a relation, we include in it a FROM clause
appropriate to that rel.  So, for a baserel, it's something like "foo
r4" where 4 is foo's RTI.  For a joinrel, do this:

1. Emit the FROM clause constructed for the outer relation,
surrounding it with parentheses if the outer relation is a joinrel.
2. Emit " JOIN ", " LEFT JOIN ", " RIGHT JOIN ", or " FULL JOIN "
according to the join type.
3. Emit the FROM clause constructed for the inner relation,
surrounding it with parentheses if the inner relation is a joinrel.
4. Emit " ON ".
5. Emit the joinqual.

This will produce nice things like (foo r3 JOIN bar r4 ON r3.x = r4.x)
JOIN baz r2 ON r3.y = r2.y

Then, you'd also need to stuff the conditions into the
PgFdwRelationInfo so that those could be added to paths constructed at
higher levels.  But that's not too hard either.  Basically you'd end
up with mostly the same stuff you have now, but the PgFdwRelationInfo
would store a join clause and a set of deparsed quals to be included
in the FROM and WHERE clauses respectively.  And then you'd pull the
information from the inner and outer sides to build up what you need
at the joinrel level.

This would actually be faster than what you've got right now, because
right now you're recursing down the whole join tree all over again at
each new join level, maybe not the best plan.

+if (foreignrel->reloptkind == RELOPT_JOINREL)
+return;
+
 /* Add any necessary FOR UPDATE/SHARE. */
 deparseLockingClause();

Generally, I think in these kinds of cases it is better to reverse the
test and get rid of the return statement, like this:

if (foreignrel->reloptkind != RELOPT_JOINREL)
deparseLockingClause();

The way you wrote it, somebody who wants to add more code to the end
of the function will probably have to make that change anyhow.
Really, your intention here was to skip that code for joins, not to
skip the rest of the function for baserels.

@@ -1424,9 +1875,7 @@ deparseVar(Var *node, deparse_expr_cxt *context)

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-01 Thread Amit Kapila

On Sat, Jan 30, 2016 at 7:38 PM, Michael Paquier 
wrote:
>
> On Fri, Jan 29, 2016 at 9:13 PM, Michael Paquier wrote:
> > Well, to put it short, I am just trying to find a way to make the
> > backend skip unnecessary checkpoints on an idle system, which results
> > in the following WAL pattern if system is completely idle:
> > CHECKPOINT_ONLINE
> > RUNNING_XACTS
> > RUNNING_XACTS
> > [etc..]
> >
> > The thing is that I am lost with the meaning of this condition to
> > decide if a checkpoint should be skipped or not:
> > if (prevPtr == ControlFile->checkPointCopy.redo &&
> > prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
> > {
> > WALInsertLockRelease();
> > LWLockRelease(CheckpointLock);
> > return;
> > }
> > As at least one standby snapshot is logged before the checkpoint
> > record, the redo position is never going to match the previous insert
> > LSN, so checkpoints will never be skipped if wal_level >= hot_standby.
> > Skipping such unnecessary checkpoints is what you would like to
> > address, no? Because that's what I would like to do here first. And
> > once we got that right, we could think about addressing the case where
> > WAL segments are forcibly archived for idle systems.
>
> I have put a bit more of brain power into that, and finished with the
> patch attached. A new field called chkpProgressLSN is attached to
> XLogCtl, which is to the current insert position of the checkpoint
> when wal_level <= archive, or the LSN position of the standby snapshot
> taken after a checkpoint. The bgwriter code is modified as well so as
> it uses this progress LSN and compares it with the current insert LSN
> to determine if a standby snapshot should be logged or not. On an
> instance of Postgres completely idle, no useless checkpoints, and no
> useless standby snapshots are generated anymore.
>

@@ -8239,7 +8262,7 @@ CreateCheckPoint(int flags)
  if ((flags & (CHECKPOINT_IS_SHUTDOWN |
CHECKPOINT_END_OF_RECOVERY |
   CHECKPOINT_FORCE)) == 0)
  {
- if
(prevPtr == ControlFile->checkPointCopy.redo &&
+ if (GetProgressRecPtr() == prevPtr &&

prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
  {

I think such a check won't consider all the WAL-activity happened
during checkpoint (after checkpoint start, till it finishes) which was
not the intention of this code without your patch.

I think both this and previous patch (hs-checkpoints-v1 ) approach
won't fix the issue in all kind of scenario's.

Let me try to explain what I think should fix this issue based on
discussion above, suggestions by Andres and some of my own
thoughts:

Have a new variable in WALInsertLock that would indicate the last
insertion position (last_insert_pos) we write to after holding that lock.
Ensure that we don't update last_insert_pos while logging standby
snapshot (simple way is to pass a flag in XLogInsert or distinguish
it based on type of record (XLOG_RUNNING_XACTS) or if you can
think of a more smarter way).  Now both during checkpoint and
in bgwriter, to find out whether there is any activity since last
time, we need to check all the WALInsertLocks for latest insert position
(by referring last_insert_pos) and compare it with the latest position
we have noted during last such check and decide whether to proceed
or not based on comparison result.  If you think it is not adequate to
store last_insert_pos in WALInsertLock, then we can think of having
it in PGPROC.

Yet another idea that occurs to me this morning is that why not
have a variable in shared memory in XLogCtlInsert or XLogCtl
similar to CurrBytePos/PrevBytePos which will be updated on
each XLOGInsert apart from the XLOGInsert for standby snapshots
and use that in a patch somewhat close to what you have in
hs-checkpoints-v1.

One related point is don't we need to bother about activity on
unlogged relations for checkpoints to occur, considering the
case when the only activity happened after last checkpoint
is on unlogged relations?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-01 Thread Kouhei Kaigai

> On 1/31/16 7:38 PM, Kouhei Kaigai wrote:
> > I'm under investigation of SSD-to-GPU direct feature on top of
> > the custom-scan interface. It intends to load a bunch of data
> > blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data
> > loading onto CPU/RAM, to preprocess the data to be filtered out.
> > It only makes sense if the target blocks are not loaded to the
> > CPU/RAM yet, because SSD device is essentially slower than RAM.
> > So, I like to have a reliable way to check the latest status of
> > the shared buffer, to kwon whether a particular block is already
> > loaded or not.
> 
> That completely ignores the OS cache though... wouldn't that be a major
> issue?
>
Once we can ensure the target block is not cached in the shared buffer,
it is a job of the driver that support P2P DMA to handle OS page cache.
Once driver get a P2P DMA request from PostgreSQL, it checks OS page
cache status and determine the DMA source; whether OS buffer or SSD block.

> To answer your direct question, I'm no expert, but I haven't seen any
> functions that do exactly what you want. You'd have to pull relevant
> bits from ReadBuffer_*. Or maybe a better method would just be to call
> BufTableLookup() without any locks and if you get a result > -1 just
> call the relevant ReadBuffer function. Sometimes you'll end up calling
> ReadBuffer even though the buffer isn't in shared buffers, but I would
> think that would be a rare occurrence.
>
Thanks, indeed, extension can call BufTableLookup(). PrefetchBuffer()
has a good example for this.

If it returned a valid buf_id, we have nothing difficult; just call
ReadBuffer() to pin the buffer.

Elsewhere, when BufTableLookup() returned negative, it means a pair of
(relation, forknum, blocknum) does not exist on the shared buffer.
So, extension enqueues P2P DMA request for asynchronous translation,
then driver processes the P2P DMA soon but later.
Concurrent access may always happen. PostgreSQL uses MVCC, so the backend
which issued P2P DMA does not need to pay attention for new tuples that
didn't exist on executor start time, even if other backend loads and
updates the same buffer just after the above BufTableLookup().

On the other hands, we have to pay attention whether a fraction of
the buffer page is partially written to OS buffer or storage. It is
in the scope of operating system, so it is not controllable from us.

One idea I can find out is, temporary suspension of FlushBuffer() for
a particular pairs of (relation, forknum, blocknum) until P2P DMA gets
completed. Even if concurrent backend updates the buffer page after the
BufTableLookup(), it allows to prevent OS caches and storages getting
dirty during the P2P DMA.

How about people's thought?
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising the checkpoint_timeout limit

2016-02-01 Thread Michael Paquier

On Tue, Feb 2, 2016 at 1:16 PM, Noah Misch  wrote:
> On Tue, Feb 02, 2016 at 01:13:20AM +0100, Andres Freund wrote:
>> is there any reason for the rather arbitrary and low checkpoint_timeout
>> limit?
>
> Not that I know, and it is inconvenient.
>
>> I'm not sure what'd actually be a good upper limit. I'd be inclined to
>> even go to as high as a week or so. A lot of our settings have
>> upper/lower limits that aren't a good idea in general.
>
> In general, I favor having limits reflect fundamental system limitations
> rather than paternalism.  Therefore, I would allow INT_MAX (68 years).

+1. This way users can play as they wish.

>> I'm also wondering if it'd not make sense to raise the default timeout
>> to 15min or so. The upper ceiling for that really is recovery time, and
>> that has really shrunk rather drastically due to faster cpus and
>> architectural improvements in postgres (bgwriter, separate
>> checkpointer/bgwriter, restartpoints, ...).
>
> Have those recovery improvements outpaced the increases in max recovery time
> from higher core counts generating more WAL per minute?

Perhaps having some numbers showing that the architecture improvements
of Postgres really matter at constant checkpoint_segments for pgbench
load would help to get into a better default value. I would tend to
agree that as things speed up it would make sense to increase this
value a bit though, even if Postgres is usually conservative enough in
default parameters aimed at low-spec machines.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-01 Thread Amit Langote

KaiGai-san,

On 2016/02/01 10:38, Kouhei Kaigai wrote:
> As an aside, background of my motivation is the slide below:
> http://www.slideshare.net/kaigai/sqlgpussd-english
> (LT slides in JPUG conference last Dec)
> 
> I'm under investigation of SSD-to-GPU direct feature on top of
> the custom-scan interface. It intends to load a bunch of data
> blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data
> loading onto CPU/RAM, to preprocess the data to be filtered out.
> It only makes sense if the target blocks are not loaded to the
> CPU/RAM yet, because SSD device is essentially slower than RAM.
> So, I like to have a reliable way to check the latest status of
> the shared buffer, to kwon whether a particular block is already
> loaded or not.

Quite interesting stuff, thanks for sharing!

I'm in no way expert on this but could this generally be attacked from the
smgr API perspective? Currently, we have only one implementation - md.c
(the hard-coded RelationData.smgr_which = 0). If we extended that and
provided end-to-end support so that there would be md.c alternatives to
storage operations, I guess that would open up opportunities for
extensions to specify smgr_which as an argument to ReadBufferExtended(),
provided there is already support in place to install md.c alternatives
(perhaps in .so). Of course, these are just musings and, perhaps does not
really concern the requirements of custom scan methods you have been
developing.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-01 Thread Joshua D. Drake


On 02/01/2016 08:23 PM, Robert Haas wrote:


It also appears to me that if we did want to do that, it would need
quite a lot of additional cleanup.  I haven't dug in enough to have a
list of specific issues, but it does look to me like there would be
quite a bit.  Maybe that'd be worth doing if there were other
advantages of having this be in core, but I don't see them.



Thus... it is a great extension and doesn't need to be in core.

JD


--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-01 Thread Noah Misch

On Mon, Feb 01, 2016 at 09:59:01PM +0100, Alvaro Herrera wrote:
> Anyway I think the tests here are massive

I would not want to see fewer tests.  When I reviewed a previous incarnation
of pgaudit, the tests saved me hours of writing my own.

> and the code is not;

Nah; the patch size category is the same regardless of how you arrange the
tests.  The tests constitute less than half of the overall change.

> perhaps people get the mistaken impression
> that this is a huge amount of code which scares them.  Perhaps you could
> split it up in (1) code and (2) tests, which wouldn't achieve any
> technical benefit but would offer some psychological comfort to
> potential reviewers.  You know it's all psychology in these parts.

That's harmless.  If it makes someone feel better, great.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising the checkpoint_timeout limit

2016-02-01 Thread Noah Misch

On Tue, Feb 02, 2016 at 01:13:20AM +0100, Andres Freund wrote:
> is there any reason for the rather arbitrary and low checkpoint_timeout
> limit?

Not that I know, and it is inconvenient.

> I'm not sure what'd actually be a good upper limit. I'd be inclined to
> even go to as high as a week or so. A lot of our settings have
> upper/lower limits that aren't a good idea in general.

In general, I favor having limits reflect fundamental system limitations
rather than paternalism.  Therefore, I would allow INT_MAX (68 years).

> I'm also wondering if it'd not make sense to raise the default timeout
> to 15min or so. The upper ceiling for that really is recovery time, and
> that has really shrunk rather drastically due to faster cpus and
> architectural improvements in postgres (bgwriter, separate
> checkpointer/bgwriter, restartpoints, ...).

Have those recovery improvements outpaced the increases in max recovery time
from higher core counts generating more WAL per minute?

nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2016-02-01 Thread Robert Haas

On Mon, Feb 1, 2016 at 9:46 PM, Michael Paquier
 wrote:
> On Mon, Feb 1, 2016 at 10:34 PM, Robert Haas  wrote:
>> On Sat, Jan 30, 2016 at 7:36 AM, Michael Paquier
>>  wrote:
>>> On Fri, Jan 29, 2016 at 11:21 PM, Fabien COELHO  wrote:
 +/* overflow check (needed for INT64_MIN) */
 +if (lval != 0 && (*retval < 0 == lval < 0))

 Why not use "if (lval == INT64_MIN)" instead of this complicated condition?
 If it is really needed for some reason, I think that a comment could help.
>>>
>>> Checking for PG_INT64_MIN only would be fine as well, so let's do so.
>>> I thought honestly that we had better check if the result and the left
>>> argument are not of the same sign, but well.
>>
>> Committed and back-patched to 9.5.  Doesn't apply further back.
>
> OK, here are patches for 9.1~9.4. The main differences are that in
> 9.3/9.4 int64 is used for the division operations, and in 9.2/9.1
> that's int32. In the latter case pgbench blows up the same way with
> that:
> \set i -2147483648
> \set i :i / -1
> select :i;
> In those patches INT32_MIN/INT64_MIN need to be explicitly set as well
> at the top of pgbench.c. I thing that's fine.

Oh, gosh, I should have said more clearly that I didn't really see a
need to fix this all the way back.  But I guess we could.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2016-02-01 Thread Michael Paquier

On Tue, Feb 2, 2016 at 1:24 PM, Robert Haas  wrote:

> On Mon, Feb 1, 2016 at 9:46 PM, Michael Paquier
>  wrote:
> > On Mon, Feb 1, 2016 at 10:34 PM, Robert Haas 
> wrote:
> >> On Sat, Jan 30, 2016 at 7:36 AM, Michael Paquier
> >>  wrote:
> >>> On Fri, Jan 29, 2016 at 11:21 PM, Fabien COELHO 
> wrote:
>  +/* overflow check (needed for INT64_MIN) */
>  +if (lval != 0 && (*retval < 0 == lval < 0))
> 
>  Why not use "if (lval == INT64_MIN)" instead of this complicated
> condition?
>  If it is really needed for some reason, I think that a comment could
> help.
> >>>
> >>> Checking for PG_INT64_MIN only would be fine as well, so let's do so.
> >>> I thought honestly that we had better check if the result and the left
> >>> argument are not of the same sign, but well.
> >>
> >> Committed and back-patched to 9.5.  Doesn't apply further back.
> >
> > OK, here are patches for 9.1~9.4. The main differences are that in
> > 9.3/9.4 int64 is used for the division operations, and in 9.2/9.1
> > that's int32. In the latter case pgbench blows up the same way with
> > that:
> > \set i -2147483648
> > \set i :i / -1
> > select :i;
> > In those patches INT32_MIN/INT64_MIN need to be explicitly set as well
> > at the top of pgbench.c. I thing that's fine.
>
> Oh, gosh, I should have said more clearly that I didn't really see a
> need to fix this all the way back.  But I guess we could.
>

And now there are patches. Well, nobody has complained about that until now
except me... So we could live without patching back-branches, but it don't
think it hurts much to fix those holes.
-- 
Michael

Re: [HACKERS] [PATCH] Phrase search ported to 9.6

2016-02-01 Thread Michael Paquier

On Mon, Feb 1, 2016 at 8:21 PM, Dmitry Ivanov 
wrote:

> This patch was originally developed by Teodor Sigaev and Oleg Bartunov in
> 2009, so all credit goes to them. Any feedback is welcome.
>

This is not a small patch:
28 files changed, 2441 insertions(+), 380 deletions(-)
And the last CF of 9.6 should not contain rather large patches.
-- 
Michael

1 2 >

1 - 100 of 112 matches

Mail list logo