Re: [HACKERS] printing table in asciidoc with psql

2014-10-30 Thread Pavel Stehule
2014-10-29 12:23 GMT+01:00 Szymon Guz mabew...@gmail.com:



 On 17 October 2014 09:01, Pavel Stehule pavel.steh...@gmail.com wrote:

 Hi Szymon

 I found a small bug - it doesn't escape | well

 postgres=# select * from mytab ;
 a | numeric_b | c
 --+---+
  Ahoj |10 | 2014-10-17
  Hello|20 | 2014-10-18
  Hi   |30 | 2014-10-19
  aaa| |   | 2014-10-17
 (4 rows)

 result


 [options=header,cols=literal,literal,literal,frame=all,grid=all]
 |
 ^| +++a+++ ^| +++numeric_b+++ ^| +++c+++
 | Ahoj | 10 | 2014-10-17
 | Hello | 20 | 2014-10-18
 | Hi | 30 | 2014-10-19
 | aaa| |  | 2014-10-17
 |


 Next, I tested it with asciidoc and asciidoctor and I have a problem with
 asciidoctor - it doesn't respect aligning .. so numbers are aligned to left
 instead to right.

 When you use a option header then a formatting +++ is useless.


 Hi Pavel,
 thanks for the remarks. I've attached another version of the pach. It
 works a little better now, including escaping | and asciidoctor alignment
 support.


it is fixed. Thank you.

I fixed formatting - please, recheck it.

I don't see any issue - it should be ready for commiter

Regards

Pavel







 thanks,
 Szymon

commit f8ffd152aaa78ac1c96f808761680b8e593528a2
Author: Pavel Stehule pavel.steh...@gooddata.com
Date:   Thu Oct 30 09:02:59 2014 +0100

fix formatting

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index e7fcc73..cd64b88 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2092,8 +2092,8 @@ lo_import 152801
   literalaligned/literal, literalwrapped/literal,
   literalhtml/literal,
   literallatex/literal (uses literaltabular/literal),
-  literallatex-longtable/literal, or
-  literaltroff-ms/literal.
+  literallatex-longtable/literal,
+  literaltroff-ms/literal, or literalasciidoc/literal.
   Unique abbreviations are allowed.  (That would mean one letter
   is enough.)
   /para
@@ -2120,7 +2120,8 @@ lo_import 152801
 
   para
   The literalhtml/, literallatex/,
-  literallatex-longtable/literal, and literaltroff-ms/
+  literallatex-longtable/literal, literaltroff-ms/,
+  and literalasciidoc/
   formats put out tables that are intended to
   be included in documents using the respective mark-up
   language. They are not complete documents! This might not be
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 26089352..e00e47b 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -2248,6 +2248,9 @@ _align2string(enum printFormat in)
 		case PRINT_TROFF_MS:
 			return troff-ms;
 			break;
+		case PRINT_ASCIIDOC:
+			return asciidoc;
+			break;
 	}
 	return unknown;
 }
@@ -2321,9 +2324,11 @@ do_pset(const char *param, const char *value, printQueryOpt *popt, bool quiet)
 			popt-topt.format = PRINT_LATEX_LONGTABLE;
 		else if (pg_strncasecmp(troff-ms, value, vallen) == 0)
 			popt-topt.format = PRINT_TROFF_MS;
+		else if (pg_strncasecmp(asciidoc, value, vallen) == 0)
+			popt-topt.format = PRINT_ASCIIDOC;
 		else
 		{
-			psql_error(\\pset: allowed formats are unaligned, aligned, wrapped, html, latex, troff-ms\n);
+			psql_error(\\pset: allowed formats are unaligned, aligned, wrapped, html, latex, troff-ms, asciidoc\n);
 			return false;
 		}
 
diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
index ae5fe88..b14b313 100644
--- a/src/bin/psql/help.c
+++ b/src/bin/psql/help.c
@@ -351,7 +351,7 @@ helpVariables(unsigned short int pager)
 	fprintf(output, _(  expanded (or x)toggle expanded output\n));
 	fprintf(output, _(  fieldsep   field separator for unaligned output (default '|')\n));
 	fprintf(output, _(  fieldsep_zero  set field separator in unaligned mode to zero\n));
-	fprintf(output, _(  format set output format [unaligned, aligned, wrapped, html, latex, ..]\n));
+	fprintf(output, _(  format set output format [unaligned, aligned, wrapped, html, latex, asciidoc ..]\n));
 	fprintf(output, _(  footer enable or disable display of the table footer [on, off]\n));
 	fprintf(output, _(  linestyle  set the border line drawing style [ascii, old-ascii, unicode]\n));
 	fprintf(output, _(  null   set the string to be printed in place of a null value\n));
diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c
index 3b3c3b7..ae98ff9 100644
--- a/src/bin/psql/print.c
+++ b/src/bin/psql/print.c
@@ -2475,6 +2475,216 @@ print_troff_ms_vertical(const printTableContent *cont, FILE *fout)
 	}
 }
 
+/*/
+/* ASCIIDOC		 */
+/*/
+
+static void
+asciidoc_escaped_print(const char *in, FILE *fout)
+{
+	const char *p;
+	for (p = in; *p; p++)
+	{
+		switch(*p)
+		{
+			case '|':
+fputs(\\|, fout);
+break;
+			default:
+	

Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2014-10-30 Thread Michael Paquier
Thanks for your input, Jim!

On Wed, Oct 29, 2014 at 7:59 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 Patch applies against current HEAD and builds, but I'm getting 37 failed
 tests (mostly parallel, but also misc and WITH; results attached). Is that
 expected?
This is caused by the recent commit 7b1c2a0 (that I actually participated
in reviewing :p) because of a missing inclusion of ruleutils.h in index.c.

 The mark the concurrent index bit is rather confusing; it sounds like
it's
 referring to the new index instead of the old. Now that I've read the
code I
 understand what's going on here between the concurrent index *entry* and
the
 filenode swap, but I don't think the docs make this sufficiently clear to
 users.

 How about something like this:

 The following steps occur in a concurrent index build, each in a separate
 transaction. Note that if there are multiple indexes to be rebuilt then
each
 step loops through all the indexes we're rebuilding, using a separate
 transaction for each one.

 1. [blah]
Definitely a good idea! I took your text and made it more precise, listing
the actions done for each step, the pg_index flags switched, using
orderedlist to make the list of steps described in a separate paragraph
more exhaustive for the user. At the same time I reworked the docs removing
a part that was somewhat duplicated about dealing with the constraints
having invalid index entries and how to drop them.

 + * index_concurrent_create
 + *
 + * Create an index based on the given one that will be used for
concurrent
 + * operations. The index is inserted into catalogs and needs to be built
 later
 + * on. This is called during concurrent index processing. The heap
relation
 + * on which is based the index needs to be closed by the caller.

 Last bit presumably should be on which the index is based.
What about Create a concurrent index based on the definition of the one
provided by caller?

 +   /* Build the list of column names, necessary for index_create */
 Instead of all this work wouldn't it be easier to create a version of
 index_create/ConstructTupleDescriptor that will use the IndexInfo for the
 old index? ISTM index_concurrent_create() is doing a heck of a lot of work
 to marshal data into one form just to have it get marshaled yet again.
Worst
 case, if we do have to play this game, there should be a stand-alone
 function to get the columns/expressions for an existing index; you're
 duplicating a lot of code from pg_get_indexdef_worker().
Yes, this definitely sucks and the approach creating a function to get all
the column names is not productive as well. Then let's define an additional
argument in index_create to pass a potential TupleDesc instead of this
whole wart. I noticed as well that we need to actually reset attcacheoff to
be able to use a fresh version of the tuple descriptor of the old index. I
added a small API for this purpose in tupdesc.h called ResetTupleDescCache.
Would it make sense instead to extend CreateTupleDescCopyConstr or
CreateTupleDescCopy with a boolean flag?

 index_concurrent_swap(): Perhaps it'd be better to create
 index_concurrent_swap_setup() and index_concurrent_swap_cleanup() and
 refactor the duplicated code out... the actual function would then become:
This sentence is not finished :) IMO, index_concurrent_swap looks good as
is, taking as arguments the index and its concurrent entry, and swapping
their relfilenode after taking AccessExclusiveLock that will be hold until
the end of this transaction.

 ReindexRelationConcurrently()

 + * Process REINDEX CONCURRENTLY for given relation Oid. The relation can
be
 + * either an index or a table. If a table is specified, each step of
 REINDEX
 + * CONCURRENTLY is done in parallel with all the table's indexes as well
as
 + * its dependent toast indexes.
 This comment is a bit misleading; we're not actually doing anything in
 parallel, right? AFAICT index_concurrent_build is going to block while
each
 index is built the first time.
Yes, parallel may be misleading. What is meant here is that each step of
the process is done one by one on all the valid indexes a table may have.

 +* relkind is an index, this index itself will be rebuilt. The
locks
 taken
 +* parent relations and involved indexes are kept until this
 transaction
 +* is committed to protect against schema changes that might occur
 until
 +* the session lock is taken on each relation.

 This comment is a bit unclear to me... at minimum I think it should be *
on
 parent relations instead of * parent relations, but I think it needs to
 elaborate on why/when we're also taking session level locks.
Hum, done as follows:
@@ -896,9 +896,11 @@ ReindexRelationConcurrently(Oid relationOid)
 * If the relkind of given relation Oid is a table, all its valid
indexes
 * will be rebuilt, including its associated toast table indexes. If
 * relkind is an index, this index itself will be rebuilt. The
locks 

Re: [HACKERS] WITH CHECK and Column-Level Privileges

2014-10-30 Thread Dean Rasheed
On 29 October 2014 13:04, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 On Wed, Oct 29, 2014 at 8:16 AM, Stephen Frost sfr...@snowman.net wrote:
  suggestions.  If the user does not have table-level SELECT rights,
  they'll see for the Failing row contains case, they'll get:
 
  Failing row contains (col1, col2, col3) = (1, 2, 3).
 
  Or, if they have no access to any columns:
 
  Failing row contains () = ()

 I haven't looked at the code, but that sounds nice, except that if
 they have no access to any columns, I'd leave the message out
 altogether instead of emitting it with no useful content.

 Alright, I can change things around to make that happen without too much
 trouble.


Yes, skim-reading the patch, it looks like a good approach to me, and
also +1 for not emitting anything if no values are visible.

I fear, however, that for updatable views, in the most common case,
the user won't have any permissions on the underlying table, and so
the error detail will always be omitted. I wonder if one way we can
improve upon that is to include the RTE's modifiedCols set in the set
of columns the user can see, since for those columns what we would be
reporting are the new user-supplied values, and so there is no
information leakage.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] printing table in asciidoc with psql

2014-10-30 Thread Szymon Guz
On 30 October 2014 09:04, Pavel Stehule pavel.steh...@gmail.com wrote:



 2014-10-29 12:23 GMT+01:00 Szymon Guz mabew...@gmail.com:



 On 17 October 2014 09:01, Pavel Stehule pavel.steh...@gmail.com wrote:

 Hi Szymon

 I found a small bug - it doesn't escape | well

 postgres=# select * from mytab ;
 a | numeric_b | c
 --+---+
  Ahoj |10 | 2014-10-17
  Hello|20 | 2014-10-18
  Hi   |30 | 2014-10-19
  aaa| |   | 2014-10-17
 (4 rows)

 result


 [options=header,cols=literal,literal,literal,frame=all,grid=all]
 |
 ^| +++a+++ ^| +++numeric_b+++ ^| +++c+++
 | Ahoj | 10 | 2014-10-17
 | Hello | 20 | 2014-10-18
 | Hi | 30 | 2014-10-19
 | aaa| |  | 2014-10-17
 |


 Next, I tested it with asciidoc and asciidoctor and I have a problem
 with asciidoctor - it doesn't respect aligning .. so numbers are aligned to
 left instead to right.

 When you use a option header then a formatting +++ is
 useless.


 Hi Pavel,
 thanks for the remarks. I've attached another version of the pach. It
 works a little better now, including escaping | and asciidoctor alignment
 support.


 it is fixed. Thank you.

 I fixed formatting - please, recheck it.

 I don't see any issue - it should be ready for commiter

 Regards

 Pavel



Hi Pavel,
thanks for the review and reformatting. It looks much better after the
reformatting.

thanks,
Szymon


Re: [HACKERS] Index scan optimization

2014-10-30 Thread Haribabu Kommi
On Mon, Oct 27, 2014 at 10:38 PM, Rajeev rastogi
rajeev.rast...@huawei.com wrote:
 On 26 October 2014 10:42, Haribabu Kommi wrote:
 I have a question regarding setting of key flags matched. Only the
 first key was set as matched even if we have multiple index conditions.
 Is there any reason behind that?

 Actually the keysCount can be more than one only if the key strategy is 
 BTEqualStrategyNumber (i.e. = condition)
 But our optimization is only for the '' or '=' condition only. So adding 
 code to set flag for all keys is of no use.

 Let me know if I am missing anything here?

Thanks. I understood the point.

 If any volatile function is present in the index condition, the index
 scan itself is not choosen, Is there any need of handling the same
 similar to NULLS?

 Do you mean to say that whether we should add any special handling for 
 volatile function?
 If yes then as you said since index scan itself is not selected for such 
 functions, then
 I feel We don’t need to add anything for that.

I also have the same opinion. I marked the patch as ready for committer.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-30 Thread Simon Riggs
On 30 October 2014 04:24, Amit Kapila amit.kapil...@gmail.com wrote:

 Locking the toast table of any main tables we access seems easily
 done. Though perhaps we should make weak locking of the toast table
 presumed. Do we have cases where the toast table can be accessed when
 the main table is not also strong locked first?

 I think it is possible to have a strong lock on toast table before
 main table.
 Cluster pg_toast.toast_table_name using toast_index;
 Vacuum Full pg_toast.toast_table_name;
 Reindex Index pg_toast.toast_index_name;
 ..

 Now if take the lock on toast table in main task, it will block some of
 the operations before actually they need to be blocked.

Very strange.

Those commands are not common operations? I doubt many people even
know that exists.

All of those commands would block SELECTs on the main table, so there
is no significant benefit in that behaviour.

In fact it would be more sensible to lock the toast table earlier.


 As for locking the enums table or collation table, that's simple stuff
 also. They're just AccessShareLocks.

 Yeah, taking AccessShareLocks is not a problem, however the main
 problem is that they block any other operation on those tables which require
 AccessExclusiveLock lock or any other strong lock required, before it is
 actually required to block any such operation.

What operations are those? How frequently do they occur?

I can't see any need for major maintenance operations on small catalog tables.


 I can't imagine we'll be able to presume that all functions of every
 kind are parallel-safe.

 Yeah, thats right, so we need to either block such functions to get executed
 in parallel worker or make arrangements so that they can be safely executed,
 I think for first version blocking might be better.

I think that also. That way we won't need complex locking.

 Now, I think we can find ways to avoid all such things by hacking code such
 that conflicting operations can be banned/blocked, however eventually we
 need to have some kind of group locking to avoid deadlocks which can arise
 due to operations in parallel worker.  So it seems to me it is not a bad
 idea
 to have a strong infrastructure first for the things (like group locking)
 which
 we think are required for any non-trivial useful parallel operation without
 putting too much restrictions on users for using the feature.

We are all agreed we need to block some functions from executing in
parallel mode.

Why don't we start by making a list of the functions that might cause
problems in parallel workers, and why. That way we might find out
easier ways of doing things.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb -S and tablespaces

2014-10-30 Thread Abhijit Menon-Sen
At 2014-09-29 11:54:10 +0200, and...@2ndquadrant.com wrote:

 On 2014-09-29 14:09:01 +0530, Abhijit Menon-Sen wrote:
  
  I just noticed that initdb -S (Safely write all database files to disk
  and exit) does (only) the following in perform_fsync:
  
  pre_sync_fname(pdir, true);
  walkdir(pg_data, pre_sync_fname);
  
  fsync_fname(pdir, true);
  walkdir(pg_data, fsync_fname);
  
  walkdir() reads the directory and calls itself recursively for S_ISDIR
  entries, or calls the function for S_ISREG entries… which means it
  doesn't follow links.
  
  Which means it doesn't fsync the contents of tablespaces.
 
 Which means at least pg_upgrade is unsafe right
 now... c.f. 630cd14426dc1daf85163ad417f3a224eb4ac7b0.

Here's a proposed patch to initdb to make initdb -S fsync everything
under pg_tblspc. It introduces a new function that calls walkdir on
every entry under pg_tblspc. This is only one approach: I could have
also changed walkdir to follow links, but that would have required a
bunch of #ifdefs for Windows (because it doesn't have symlinks), and I
guessed a separate function with two calls might be easier to patch into
back branches. I've tested this patch under various conditions on Linux,
but it could use some testing on Windows.

-- Abhijit
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index dc1f1df..4fdc9eb 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -218,6 +218,7 @@ static char **filter_lines_with_token(char **lines, const char *token);
 static char **readfile(const char *path);
 static void writefile(char *path, char **lines);
 static void walkdir(char *path, void (*action) (char *fname, bool isdir));
+static void walktblspc_links(char *path, void (*action) (char *fname, bool isdir));
 static void pre_sync_fname(char *fname, bool isdir);
 static void fsync_fname(char *fname, bool isdir);
 static FILE *popen_check(const char *command, const char *mode);
@@ -588,6 +589,53 @@ walkdir(char *path, void (*action) (char *fname, bool isdir))
 }
 
 /*
+ * walktblspc_links: call walkdir on each entry under the given
+ * pg_tblspc directory, or do nothing if pg_tblspc doesn't exist.
+ */
+static void
+walktblspc_links(char *path, void (*action) (char *fname, bool isdir))
+{
+	DIR		   *dir;
+	struct dirent *direntry;
+	char		subpath[MAXPGPATH];
+
+	dir = opendir(path);
+	if (dir == NULL)
+	{
+		if (errno == ENOENT)
+			return;
+		fprintf(stderr, _(%s: could not open directory \%s\: %s\n),
+progname, path, strerror(errno));
+		exit_nicely();
+	}
+
+	while (errno = 0, (direntry = readdir(dir)) != NULL)
+	{
+		if (strcmp(direntry-d_name, .) == 0 ||
+			strcmp(direntry-d_name, ..) == 0)
+			continue;
+
+		snprintf(subpath, MAXPGPATH, %s/%s, path, direntry-d_name);
+
+		walkdir(subpath, action);
+	}
+
+	if (errno)
+	{
+		fprintf(stderr, _(%s: could not read directory \%s\: %s\n),
+progname, path, strerror(errno));
+		exit_nicely();
+	}
+
+	if (closedir(dir))
+	{
+		fprintf(stderr, _(%s: could not close directory \%s\: %s\n),
+progname, path, strerror(errno));
+		exit_nicely();
+	}
+}
+
+/*
  * Hint to the OS that it should get ready to fsync() this file.
  */
 static void
@@ -2377,6 +2425,7 @@ static void
 perform_fsync(void)
 {
 	char		pdir[MAXPGPATH];
+	char		pg_tblspc[MAXPGPATH];
 
 	fputs(_(syncing data to disk ... ), stdout);
 	fflush(stdout);
@@ -2398,6 +2447,11 @@ perform_fsync(void)
 	/* then recursively through the directory */
 	walkdir(pg_data, pre_sync_fname);
 
+	/* now do the same thing for everything under pg_tblspc */
+
+	snprintf(pg_tblspc, MAXPGPATH, %s/pg_tblspc, pg_data);
+	walktblspc_links(pg_tblspc, pre_sync_fname);
+
 	/*
 	 * Now, do the fsync()s in the same order.
 	 */
@@ -2408,6 +2462,8 @@ perform_fsync(void)
 	/* then recursively through the directory */
 	walkdir(pg_data, fsync_fname);
 
+	walktblspc_links(pg_tblspc, fsync_fname);
+
 	check_ok();
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: multivariate statistics / proof of concept

2014-10-30 Thread David Rowley
On Thu, Oct 30, 2014 at 12:48 AM, Tomas Vondra t...@fuzzy.cz wrote:

 Dne 29 Říjen 2014, 12:31, Petr Jelinek napsal(a):
  I've not really gotten around to looking at the patch yet, but I'm also
  wondering if it would be simple include allowing functional statistics
  too. The pg_mv_statistic name seems to indicate multi columns, but how
  about stats on date(datetime_column), or perhaps any non-volatile
  function. This would help to solve the problem highlighted here
 
 http://www.postgresql.org/message-id/CAApHDvp2vH=7O-gp-zAf7aWy+A-WHWVg7h3Vc6=5pf9uf34...@mail.gmail.com
  . Without giving it too much thought, perhaps any expression that can be
  indexed should be allowed to have stats? Would that be really difficult
  to implement in comparison to what you've already done with the patch so
  far?
 
 
  I would not over-complicate requirements for the first version of this,
  I think it's already complicated enough.

 My thoughts, exactly. I'm not willing to put more features into the
 initial version of the patch. Actually, I'm thinking about ripping out
 some experimental features (particularly hashed MCV and associative
 rules).


That's fair, but I didn't really mean to imply that you should go work on
that too and that it should be part of this patch..
I was thinking more along the lines of that I don't really agree with the
table name for the new stats and that at some later date someone will want
to add expression stats and we'd probably better come up design that would
be friendly towards that. At this time I can only think that the name of
the table might not suit well to expression stats, I'd hate to see someone
have to invent a 3rd table to support these when we could likely come up
with something that could be extended later and still make sense both today
and in the future.

I was just looking at how expression indexes are stored in pg_index and I
see that if it's an expression index that the expression is stored in
the indexprs column which is of type pg_node_tree, so quite possibly at
some point in the future the new stats table could just have an extra
column added, and for today, we'd just need to come up with a future proof
name... Perhaps pg_statistic_ext or pg_statisticx, and name functions and
source files something along those lines instead?

Regards

David Rowley


Re: [HACKERS] WIP: multivariate statistics / proof of concept

2014-10-30 Thread David Rowley
On Thu, Oct 30, 2014 at 12:21 AM, Tomas Vondra t...@fuzzy.cz wrote:

 Dne 29 Říjen 2014, 10:41, David Rowley napsal(a):
  I'm quite interested in reviewing your work on this, but it appears that
  some of your changes are not C89:
 
   src\backend\commands\analyze.c(3774): error C2057: expected constant
  expression [D:\Postgres\a\postgres.vcxproj]
   src\backend\commands\analyze.c(3774): error C2466: cannot allocate an
  array of constant size 0 [D:\Postgres\a\postgres.vcxproj]
   src\backend\commands\analyze.c(3774): error C2133: 'indexes' : unknown
  size [D:\Postgres\a\postgres.vcxproj]
   src\backend\commands\analyze.c(4302): error C2057: expected constant
  expression [D:\Postgres\a\postgres.vcxproj]
   src\backend\commands\analyze.c(4302): error C2466: cannot allocate an
  array of constant size 0 [D:\Postgres\a\postgres.vcxproj]
   src\backend\commands\analyze.c(4302): error C2133: 'ndistincts' :
 unknown
  size [D:\Postgres\a\postgres.vcxproj]
   src\backend\commands\analyze.c(4775): error C2057: expected constant
  expression [D:\Postgres\a\postgres.vcxproj]
   src\backend\commands\analyze.c(4775): error C2466: cannot allocate an
  array of constant size 0 [D:\Postgres\a\postgres.vcxproj]
   src\backend\commands\analyze.c(4775): error C2133: 'keys' : unknown size
  [D:\Postgres\a\postgres.vcxproj]
 

 I'll look into that. The thing is I don't have access to MSVC, so it's a
 bit
 difficult to spot / fix those issues :-(


It should be a pretty simple fix, just use the files and line numbers from
the above. It's just a problem that in those 3 places you're declaring an
array of a variable size, which is not allowed in C89. The thing to do
instead would just be to palloc() the size you need and the pfree() it when
you're done.

Regards

David Rowley


Re: [HACKERS] printing table in asciidoc with psql

2014-10-30 Thread Pavel Stehule
2014-10-30 9:30 GMT+01:00 Szymon Guz mabew...@gmail.com:

 On 30 October 2014 09:04, Pavel Stehule pavel.steh...@gmail.com wrote:



 2014-10-29 12:23 GMT+01:00 Szymon Guz mabew...@gmail.com:



 On 17 October 2014 09:01, Pavel Stehule pavel.steh...@gmail.com wrote:

 Hi Szymon

 I found a small bug - it doesn't escape | well

 postgres=# select * from mytab ;
 a | numeric_b | c
 --+---+
  Ahoj |10 | 2014-10-17
  Hello|20 | 2014-10-18
  Hi   |30 | 2014-10-19
  aaa| |   | 2014-10-17
 (4 rows)

 result


 [options=header,cols=literal,literal,literal,frame=all,grid=all]
 |
 ^| +++a+++ ^| +++numeric_b+++ ^| +++c+++
 | Ahoj | 10 | 2014-10-17
 | Hello | 20 | 2014-10-18
 | Hi | 30 | 2014-10-19
 | aaa| |  | 2014-10-17
 |


 Next, I tested it with asciidoc and asciidoctor and I have a problem
 with asciidoctor - it doesn't respect aligning .. so numbers are aligned to
 left instead to right.

 When you use a option header then a formatting +++ is
 useless.


 Hi Pavel,
 thanks for the remarks. I've attached another version of the pach. It
 works a little better now, including escaping | and asciidoctor alignment
 support.


 it is fixed. Thank you.

 I fixed formatting - please, recheck it.

 I don't see any issue - it should be ready for commiter

 Regards

 Pavel



 Hi Pavel,
 thanks for the review and reformatting. It looks much better after the
 reformatting.


ok

so

1. There are no any objections against proposed and implemented feature.
This patch contains just implementation of asciidoc format and nothing
else. It has zero impact on current code.

2. There are no problems with patching and compilation. All current regress
tests passed.

3. Patch contains doc and small set of regress tests.

4. I tested output against asciidoc and asciidoctor, I didn't find any
problems.

This patch is ready for commiter

Thank you for patch

Regards

Pavel



 thanks,
 Szymon



Re: [HACKERS] alter user/role CURRENT_USER

2014-10-30 Thread Kyotaro HORIGUCHI
Hello,

At Tue, 28 Oct 2014 09:05:20 -0400, Stephen Frost sfr...@snowman.net wrote in 
20141028130520.gl28...@tamriel.snowman.net
  As well, the
  originally proposed RoleId_or_curruser suffers from the same issue.  I'm
  going to go out on a limb here, but is it not possible to consider
  current_user, etc. reserved in the same sense that we do with PUBLIC and
  NONE and disallow users/roles from being created as them?
 
 Maybe we could in some future release, but we can't for back-branches so
 I'm afraid we're going to have to figure out how to fix this to work
 regardless.

Zero-length identifiers are rejected in scanner so RoleId cannot
be a valid pointer to a zero-length string. (NULL is used as
PUBLIC in auth_ident)

| postgres=# create role ;
| ERROR:  zero-length delimited identifier at or near 
| postgres=# create role U\00;
| ERROR:  invalid Unicode escape value at or near \00

As a dirty and quick hack we can use some integer values prfixed
by single zero byte to represent special roles such as
CURRENT_USER.

| RoleId_or_curruser: RoleId{ $$ = $1; }
|   | CURRENT_USER  { $$ = \x00\x01;};

| Oid ResolveRoleId(const char *name, bool missing_ok)
| {
|   Oid roleid;
| 
|   if (name[0] == 0  name[1] == 1)
|   roleid = GetUserId();

This is ugly but needs no additional struct member or special
logics. (Macros could make them look better.)

  Taking a step back, I'm still not sure I understand the need for this
  feature or the use case.  It seems to have started as a potential fix to an
  inconsistency between ALTER USER and ALTER ROLE syntax (which I think I
  could see some value in).  However, I think it has been taken beyond just
  resolving the inconsistency and started to cross over into feature creep.
  Is the intent simply to resolve inconsistencies between what is now an
  alias of another command?  Or is it to add new functionality?  I think the
  original proposal needs to be revisited and more time needs to be spent
  defining the scope and purpose of this patch.
 
 I agree that we should probably seperate the concerns here.  Personally,
 I like the idea of being able to say CURRENT_USER in utility commands
 to refer to the current user where a role would normally be expected, as
 I could see it simplifying things for some applications, but that's a
 new feature and independent of making role-vs-user cases more
 consistent.

I agree that role-vs-user compatibility is another problem.

In a sense, the CURRENT_USER problem is also a separate problem
but simplly spreading current current_user mechanism (which
cannot allow using the words literally even if double-quoted) out
to other commands is a kind of pandemic so it should be fixed
before making current_user usable in other commands.

From a view of comptibility (specification stability), fixing
this problem could break someone's application if he/she uses
current_user in the meaning of CURRENT_USER intentionally but I
think it is least likely.


As another problem, in the defenition of grantee, there is the
following comment.

|   /* This hack lets us avoid reserving PUBLIC as a keyword*/
|   if (strcmp($1, public) == 0)

Please let me know the reason to avoid registering new keyword
making the word unusable as an literal identifier, if any?

PUBLIC and NONE ought to can be identifiers but in reality
unusable because they are handled as keywords in literal
form. Thses are fixed by making them real keywords.

So if there's no particular reason, I will register new keywords
PUBLIC and NONE as another patch.

# However, I don't think that considerable number of people want
# to do that..

On the other hand, pg_* as shcmea names and operators are cases
simply of special names, which cannot be identifiers from the
first so it should be as it is, I think.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: multivariate statistics / proof of concept

2014-10-30 Thread Tomas Vondra
Dne 30 Říjen 2014, 10:17, David Rowley napsal(a):
 On Thu, Oct 30, 2014 at 12:48 AM, Tomas Vondra t...@fuzzy.cz wrote:

 Dne 29 Říjen 2014, 12:31, Petr Jelinek napsal(a):
  I've not really gotten around to looking at the patch yet, but I'm
 also
  wondering if it would be simple include allowing functional
 statistics
  too. The pg_mv_statistic name seems to indicate multi columns, but
 how
  about stats on date(datetime_column), or perhaps any non-volatile
  function. This would help to solve the problem highlighted here
 
 http://www.postgresql.org/message-id/CAApHDvp2vH=7O-gp-zAf7aWy+A-WHWVg7h3Vc6=5pf9uf34...@mail.gmail.com
  . Without giving it too much thought, perhaps any expression that can
 be
  indexed should be allowed to have stats? Would that be really
 difficult
  to implement in comparison to what you've already done with the patch
 so
  far?
 
 
  I would not over-complicate requirements for the first version of
 this,
  I think it's already complicated enough.

 My thoughts, exactly. I'm not willing to put more features into the
 initial version of the patch. Actually, I'm thinking about ripping out
 some experimental features (particularly hashed MCV and associative
 rules).


 That's fair, but I didn't really mean to imply that you should go work on
 that too and that it should be part of this patch..
 I was thinking more along the lines of that I don't really agree with the
 table name for the new stats and that at some later date someone will want
 to add expression stats and we'd probably better come up design that would
 be friendly towards that. At this time I can only think that the name of
 the table might not suit well to expression stats, I'd hate to see someone
 have to invent a 3rd table to support these when we could likely come up
 with something that could be extended later and still make sense both
 today
 and in the future.

 I was just looking at how expression indexes are stored in pg_index and I
 see that if it's an expression index that the expression is stored in
 the indexprs column which is of type pg_node_tree, so quite possibly at
 some point in the future the new stats table could just have an extra
 column added, and for today, we'd just need to come up with a future proof
 name... Perhaps pg_statistic_ext or pg_statisticx, and name functions and
 source files something along those lines instead?

Ah, OK. I don't think the catalog name pg_mv_statistic is somehow
inappropriate for this purpose, though. IMHO the multivariate does not
mean only columns or no expressions, it simply describes that the
approximated density function has multiple input variables, be it
attributes or expressions.

But maybe there's a better name.

Tomas



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-10-30 Thread Etsuro Fujita

(2014/10/09 11:49), Etsuro Fujita wrote:

(2014/10/08 22:51), Fujii Masao wrote:

On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:

On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:

Fujii Masao wrote:

On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:



PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
Wouldn't it be easy-to-use to have only one parameter,
PENDING_LIST_CLEANUP_SIZE?  How about setting
PENDING_LIST_CLEANUP_SIZE
to
work_mem as the default value when running the CREATE INDEX command?



So what about introduing pending_list_cleanup_size also as GUC?
That is, users can set the threshold by using either the reloption or
GUC.



Yes, I think having both a GUC and a reloption makes sense -- the GUC
applies to all indexes, and can be tweaked for individual indexes
using
the reloption.



OK, I'd vote for your idea of having both the GUC and the reloption.
So, I
think the patch needs to be updated.  Fujii-san, what plan do you
have about
the patch?



Please see the attached patch. In this patch, I introduced the GUC
parameter,
pending_list_cleanup_size. I chose 4MB as the default value of the
parameter.
But do you have any better idea about that default value?



It seems reasonable to me that the GUC has the same default value as
work_mem.  So, +1 for the default value of 4MB.



BTW, I moved the CommitFest entry of this patch to next CF 2014-10.



OK, I'll review the patch in the CF.


Thank you for updating the patch!  Here are my review comments.

* The patch applies cleanly and make and make check run successfully.  I 
think that the patch is mostly good.


* In src/backend/utils/misc/guc.c:
+   {
+   {pending_list_cleanup_size, PGC_USERSET, 
CLIENT_CONN_STATEMENT,
+ 			gettext_noop(Sets the maximum size of the pending list for GIN 
index.),

+NULL,
+   GUC_UNIT_KB
+   },
+   pending_list_cleanup_size,
+   4096, 0, MAX_KILOBYTES,
+   NULL, NULL, NULL
+   },

ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No? 
 Also why not set min to 64, not to 0, in accoradance with that of 
work_mem?


* In src/backend/utils/misc/postgresql.conf.sample:
Likewise, why not put this variable in the section of RESOURCE USAGE, 
not in that of CLIENT CONNECTION DEFAULTS.


* In src/backend/access/common/reloptions.c:
+   {
+   {
+   pending_list_cleanup_size,
+   Maximum size of the pending list for this GIN index, in 
kilobytes.,
+   RELOPT_KIND_GIN
+   },
+   -1, 0, MAX_KILOBYTES
+   },

As in guc.c, why not set min to 64, not to 0?

* In src/include/access/gin.h:
  /* GUC parameter */
  extern PGDLLIMPORT int GinFuzzySearchLimit;
+ extern int pending_list_cleanup_size;

The comment should be GUC parameters.

* In src/backend/access/gin/ginfast.c:
+ /* GUC parameter */
+ int   pending_list_cleanup_size = 0;

Do we need to substitute 0 for pending_list_cleanup_size?

* In doc/src/sgml/config.sgml:
+  varlistentry id=guc-pending-list-cleanup-size 
xreflabel=pending_list_cleanup_size
+   termvarnamepending_list_cleanup_size/varname 
(typeinteger/type)


As in postgresql.conf.sample, ISTM it'd be better to explain this 
variable in the section of Resource Consumption (maybe in Memory), not 
in that of Client Connection Defaults.


* In doc/src/sgml/gin.sgml:
!  literalpending_list_cleanup_size/. To avoid fluctuations in 
observed


ISTM it'd be better to use varname for pending_list_cleanup_size, not 
literal, here.


* In doc/src/sgml/ref/create_index.sgml:
+ termliteralPENDING_LIST_CLEANUP_SIZE//term

IMHO, it seems to me better for this variable to be in lowercase in 
accordance with the GUC version.  Also, I think that the words GIN 
indexes accept a different parameter: in the section of Index Storage 
Parameters in this reference page would be GIN indexes accept 
different parameters:.


Sorry for the delay in reviewing the patch.

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Lockless StrategyGetBuffer() clock sweep

2014-10-30 Thread Andres Freund
On 2014-10-30 10:23:56 +0530, Amit Kapila wrote:
 I have a feeling that this might also have some regression at higher
 loads (like scale_factor = 5000, shared_buffers = 8GB,
 client_count = 128, 256) for the similar reasons as bgreclaimer patch,
 means although both reduces contention around spin lock, however
 it moves contention somewhere else.  I have yet to take data before
 concluding anything (I am just waiting for your other patch (wait free
 LW_SHARED) to be committed).

I have a hard time to see how this could be. In the uncontended case the
number of cachelines touched and the number of atomic operations is
exactly the same. In the contended case the new implementation does far
fewer atomic ops - and doesn't do spinning.

What's your theory?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] foreign data wrapper option manipulation during Create foreign table time?

2014-10-30 Thread Ronan Dunklau
Le mercredi 29 octobre 2014 14:16:23 Demai Ni a écrit :
 Robert and Ronan,
 
 many thanks for your response.
 
 I realized there is no clean way/api for it. maybe a hacking of ptree can
 do the trick.. :-)
 
 I will also take a look at IMPORT FOREIGN SCHEMA. However, for this
 requirement, I still need the user to input filename or filefolder, and I'd
 like to process the file(s) during create foreign table time, and save the
 processed result somewhere like ftoptions column in pg_foreign_table. may
 be some other way I can save the process result and make it assessable
 during query time?

You can pass options to IMPORT FOREIGN SCHEMA. So, you could maybe implement 
it so that the user can do that:

IMPORT FOREIGN SCHEMA public FROM SERVER file_server INTO local_schema OPTIONS  
(filename '/path/to/the/file');

Your FDW would then issue several CREATE FOREIGN TABLE statements, with all 
the necessary options.

Or even, to allow importing multiple files at once:

IMPORT FOREIGN SCHEMA public FROM SERVER file_server INTO local_schema OPTIONS  
(directory '/path/to/the/file_dir/');



 
 Demai
 
 On Wed, Oct 29, 2014 at 10:01 AM, Ronan Dunklau ronan.dunk...@dalibo.com
 
 wrote:
  Le mercredi 29 octobre 2014 12:49:12 Robert Haas a écrit :
   On Tue, Oct 28, 2014 at 5:26 PM, Demai Ni nid...@gmail.com wrote:
I am looking for a couple pointers here about fdw, and how to change
  
  the
  
option values during CREATE table time.

I am using postgres-xc-1.2.1 right now. For example, it contains
  
  file_fdw,
  
whose create-table-stmt looks like:
CREATE FOREIGN TABLE t1()
SERVER file_server
OPTIONS(format 'text',filename 'testing.txt');

I would like to replace the 'testing.txt' with absolute path like
'/user/testor1/testing.txt', and make sure the new value is saved in
pg_foreign_table; the file_fdw_validator is used to validate the
  
  options,
  
but is there a way to replace the optionValue here? And get the new
  
  value
  
stored in pg_foreign_table?

Thanks

BTW, in my real use case, I am trying to interpret a hdfs file and
  
  would
  
need to save some hostname/port information in the option value, which
  
  not
  
necessary specified by user.
   
   I don't think there's going to be a clean way to do this.  The
   intention of the system is that the user-provided options are simply
   stored, not that the FDW author is going to use the options list to
   store their own bits and pieces.
  
  I would do that during the IMPORT FOREIGN SCHEMA statement. That way, the
  user
  doesn't have to specify those options: they would be generated at IMPORT
  time,
  and the user could change them later if really needed.
  
  --
  Ronan Dunklau
  http://dalibo.com - http://dalibo.org

-- 
Ronan Dunklau
http://dalibo.com - http://dalibo.org

signature.asc
Description: This is a digitally signed message part.


[HACKERS] Converting an expression of one data type to another

2014-10-30 Thread rohtodeveloper
Dear
I'm doing a job about converting an expression of one data type to another.In 
SQLServer, there'are two functions to do this job.
1. CAST ( expression AS data_type [ ( length ) ] )2. CONVERT ( data_type [ ( 
length ) ] , expression )
However, In PostgreSQL, there's only the CAST ( expression AS data_type [ ( 
length ) ] ) function. I have tried the following two ways to implenting the 
CONVERT ( data_type [ ( length ) ] , expression ) function, but both are failed.
1. CREATE FUNCTION . The function's arguments can only be expressions but 
not data_type . 2. Modifying the gram.y .The CONVERT ( data_type [ ( length 
) ] , expression ) is in grammer conflict with the PostgreSQL self's 
convert(data,src_encoding_name,dest_encoding_name) function. And the PostgreSQL 
self's convert(data,src_encoding_name,dest_encoding_name) function cannot be 
used.
I wonder whether there's a better way to solve this problem. Any help will be 
appreciated.
Best RegardsRohtodeveloper

Re: [HACKERS] Lockless StrategyGetBuffer() clock sweep

2014-10-30 Thread Robert Haas
On Wed, Oct 29, 2014 at 3:09 PM, Andres Freund and...@2ndquadrant.com wrote:
 But if it is, then how about
 adding a flag that is 4 bytes wide or less alongside bgwriterLatch,
 and just checking the flag instead of checking bgwriterLatch itself?

 Yea, that'd be nicer. I didn't do it because it made the patch slightly
 more invasive... The complexity really is only needed because we're not
 guaranteed that 64bit reads are atomic. Although we actually can be
 sure, because there's no platform with nonatomic intptr_t reads - so
 64bit platforms actually *do* have atomic 64bit reads/writes.

 So if we just have a integer 'setBgwriterLatch' somewhere we're good. We
 don't even need to take a lock to set it. Afaics the worst that can
 happen is that several processes set the latch...

OK, that design is fine with me.

 Actually, the same approach would work for INT_ACCESS_ONCE.  So I
 propose this approach instead: Add a new atomic flag to
 StrategyControl, useSlowPath.  If StrategyGetBuffer() finds
 useSlowPath set, call StrategyGetBufferSlow(), which will acquire the
 spinlock, check whether bgwriterLatch is set and/or the freelist is
 non-empty and return a buffer ID if able to allocate from the
 freelist; otherwise -1.  If StrategyGetBufferSlow() returns -1, or we
 decide not to call it in the first place because useSlowPath isn't
 set, then fall through to your clock-sweep logic.  We can set
 useSlowPath at stratup and whenever we put buffers on the free list.

 I don't like the idea to to conflate the slowpath for the bgwriter latch
 with the freelist slowpath. And I don't really see what that'd buy us
 over what's in the patch right now?

Well, I was thinking that with a single flag check we could skip two
paths that, as of today, are both uncommonly taken.  Moving all that
logic off into a separate function might help the compiler optimize
for the fast case, too.

 I wonder if we could make a trylock for spinlocks work - then we could
 look at the freelist if the lock is free and just go to the clock cycle
 otherwise. My guess is that that'd be quite performant.  IIRC it's just
 the spinlock semaphore fallback that doesn't know how to do trylock...

I could go for that.  I don't think the semaphore thing is really a
problem.  If the spinlock semaphore implementation indeed can't
support that, then just have it fall back to waiting for the lock and
always returning true.  Semaphores are so slow that it's not going to
make any difference to performance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-30 Thread Andres Freund
On 2014-10-21 12:40:56 +0530, Amit Kapila wrote:
 While doing performance tests, I noticed a hang at higher client
 counts with patch. I have tried to check call stack for few of
 processes and it is as below:
 
 #0  0x008010933e54 in .semop () from /lib64/libc.so.6
 #1  0x10286e48 in .PGSemaphoreLock ()
 #2  0x102f68bc in .LWLockAcquire ()
 #3  0x102d1ca0 in .ReadBuffer_common ()
 #4  0x102d2ae0 in .ReadBufferExtended ()
 #5  0x100a57d8 in ._bt_getbuf ()
 #6  0x100a6210 in ._bt_getroot ()
 #7  0x100aa910 in ._bt_search ()
 #8  0x100ab494 in ._bt_first ()
 #9  0x100a8e84 in .btgettuple ()
 ..
 
 #0  0x008010933e54 in .semop () from /lib64/libc.so.6
 #1  0x10286e48 in .PGSemaphoreLock ()
 #2  0x102f68bc in .LWLockAcquire ()
 #3  0x102d1ca0 in .ReadBuffer_common ()
 #4  0x102d2ae0 in .ReadBufferExtended ()
 #5  0x100a57d8 in ._bt_getbuf ()
 #6  0x100a6210 in ._bt_getroot ()
 #7  0x100aa910 in ._bt_search ()
 #8  0x100ab494 in ._bt_first ()
 ...
 
 The test configuration is as below:
 Test env - Power - 7 (hydra)
 scale_factor - 3000
 shared_buffers - 8GB
 test mode - pgbench read only
 
 test execution -
 ./pgbench -c 128 -j 128 -T 1800 -S -M prepared postgres
 
 I have ran it for half an hour, but it doesn't came out even after
 ~2 hours.  It doesn't get reproduced every time, currently I am
 able to reproduce it and the m/c is in same state, if you want any
 info, let me know (unfortunately binaries are in release mode, so
 might not get enough information).

Hm. What commit did you apply the series ontop? I managed to reproduce a
hang, but it was just something that heikki had already fixed...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-10-30 Thread Fujii Masao
On Thu, Oct 30, 2014 at 7:30 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 (2014/10/09 11:49), Etsuro Fujita wrote:

 (2014/10/08 22:51), Fujii Masao wrote:

 On Wed, Sep 24, 2014 at 11:10 AM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:

 On Wed, Sep 10, 2014 at 10:37 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:

 Fujii Masao wrote:

 On Wed, Sep 10, 2014 at 12:15 PM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp wrote:


 PENDING_LIST_CLEANUP_SIZE and work_mem, for this setting.
 Wouldn't it be easy-to-use to have only one parameter,
 PENDING_LIST_CLEANUP_SIZE?  How about setting
 PENDING_LIST_CLEANUP_SIZE
 to
 work_mem as the default value when running the CREATE INDEX command?


 So what about introduing pending_list_cleanup_size also as GUC?
 That is, users can set the threshold by using either the reloption or
 GUC.


 Yes, I think having both a GUC and a reloption makes sense -- the GUC
 applies to all indexes, and can be tweaked for individual indexes
 using
 the reloption.


 OK, I'd vote for your idea of having both the GUC and the reloption.
 So, I
 think the patch needs to be updated.  Fujii-san, what plan do you
 have about
 the patch?


 Please see the attached patch. In this patch, I introduced the GUC
 parameter,
 pending_list_cleanup_size. I chose 4MB as the default value of the
 parameter.
 But do you have any better idea about that default value?


 It seems reasonable to me that the GUC has the same default value as
 work_mem.  So, +1 for the default value of 4MB.


 BTW, I moved the CommitFest entry of this patch to next CF 2014-10.


 OK, I'll review the patch in the CF.


 Thank you for updating the patch!  Here are my review comments.

 * The patch applies cleanly and make and make check run successfully.  I
 think that the patch is mostly good.

Thanks! Attached is the updated version of the patch.

 * In src/backend/utils/misc/guc.c:
 +   {
 +   {pending_list_cleanup_size, PGC_USERSET,
 CLIENT_CONN_STATEMENT,
 +   gettext_noop(Sets the maximum size of the pending
 list for GIN index.),
 +NULL,
 +   GUC_UNIT_KB
 +   },
 +   pending_list_cleanup_size,
 +   4096, 0, MAX_KILOBYTES,
 +   NULL, NULL, NULL
 +   },

 ISTM it'd be better to use RESOURCES_MEM, not CLIENT_CONN_STATEMENT. No?

Yes if the pending list always exists in the memory. But not, IIUC. Thought?

 Also why not set min to 64, not to 0, in accoradance with that of work_mem?

I'm OK to use 64. But I just chose 0 because I could not think of any reasonable
reason why 64k is suitable as the minimum size of the pending list.
IOW, I have no idea about whether it's reasonable to use the min value of
work_mem as the min size of the pending list.

 * In src/backend/utils/misc/postgresql.conf.sample:
 Likewise, why not put this variable in the section of RESOURCE USAGE, not in
 that of CLIENT CONNECTION DEFAULTS.

Same as above.


 * In src/backend/access/common/reloptions.c:
 +   {
 +   {
 +   pending_list_cleanup_size,
 +   Maximum size of the pending list for this GIN
 index, in kilobytes.,
 +   RELOPT_KIND_GIN
 +   },
 +   -1, 0, MAX_KILOBYTES
 +   },

 As in guc.c, why not set min to 64, not to 0?

Same as above.

 * In src/include/access/gin.h:
   /* GUC parameter */
   extern PGDLLIMPORT int GinFuzzySearchLimit;
 + extern int pending_list_cleanup_size;

 The comment should be GUC parameters.

Yes, fixed.

 * In src/backend/access/gin/ginfast.c:
 + /* GUC parameter */
 + int   pending_list_cleanup_size = 0;

 Do we need to substitute 0 for pending_list_cleanup_size?

Same as above.


 * In doc/src/sgml/config.sgml:
 +  varlistentry id=guc-pending-list-cleanup-size
 xreflabel=pending_list_cleanup_size
 +   termvarnamepending_list_cleanup_size/varname
 (typeinteger/type)

 As in postgresql.conf.sample, ISTM it'd be better to explain this variable
 in the section of Resource Consumption (maybe in Memory), not in that of
 Client Connection Defaults.

Same as above.

 * In doc/src/sgml/gin.sgml:
 !  literalpending_list_cleanup_size/. To avoid fluctuations in
 observed

 ISTM it'd be better to use varname for pending_list_cleanup_size, not
 literal, here.

Yes, fixed.

 * In doc/src/sgml/ref/create_index.sgml:
 + termliteralPENDING_LIST_CLEANUP_SIZE//term

 IMHO, it seems to me better for this variable to be in lowercase in
 accordance with the GUC version.

Using lowercase only for pending_list_cleanup_size and uppercase for
other options
looks strange to me. What about using lowercase for all the storage options?
I changed the document in that way.

 Also, I think that the words GIN indexes
 accept a different parameter: in the section of Index Storage Parameters
 in this reference page would be GIN indexes accept different parameters:.

Yes, 

Re: [HACKERS] Review of Refactoring code for sync node detection

2014-10-30 Thread Michael Paquier
Thanks for your review! (No worries for creating a new thread, I don't
mind as this is not a huge patch. I think you could have downloaded
the mbox from postgresql.org btw).

On Thu, Oct 30, 2014 at 8:24 AM, Jim Nasby j...@nasby.net wrote:
 SyncRepGetSynchronousNode():
 IMHO, the top comment should mention that if there are multiple standbys of
 the same priority it returns the first one.
OK. Corrected.

 This switches from using a single if() with multiple conditions 'd
 together to a bunch of if() continue's. I don't know if those will perform
 the same, and AFAIK this is pretty performance critical.
Well, we could still use the old notation with a single if(). That's
not much complicated to change.

 grammarZealotIn the comments, Process should be
 Proceed./grammarZealot
Noted. Thanks for correcting my Frenglish.

 pg_stat_get_wal_senders():
 The original version only loops through walsnds[] one time; the new code
 loops twice, both times while holding SyncRepLock(shared). This will block
 transaction commit if syncrep is enabled. That's not great, but presumably
 this function isn't going to be called terribly often, so maybe it's OK. (On
 a side note, perhaps we should add a warning to the documentation for
 pg_stat_replication warning not to hammer it...)
Hm. For code readability I did not really want to do so, but let's do
this: let's add a new argument in SyncRepGetSynchronousNode to
retrieve the priority of each WAL sender and do a single pass through
the array such as there is no performance difference.

Updated patch attached.
Regards,
-- 
Michael
From 1a02c982de0aeb2bd7dcf0bbf34605017619a417 Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Thu, 30 Oct 2014 22:01:10 +0900
Subject: [PATCH] Refactor code to detect synchronous node in WAL sender array

This patch is made to remove code duplication in walsender.c and syncrep.c
in order to detect what is the node with the lowest strictly-positive
priority, facilitating maintenance of this code.
---
 src/backend/replication/syncrep.c   | 99 +++--
 src/backend/replication/walsender.c | 36 +++---
 src/include/replication/syncrep.h   |  1 +
 3 files changed, 80 insertions(+), 56 deletions(-)

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..70c799b 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,7 +5,7 @@
  * Synchronous replication is new as of PostgreSQL 9.1.
  *
  * If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the synchronous standby.
  *
  * This module contains the code for waiting and release of backends.
  * All code in this module executes on the primary. The core streaming
@@ -357,6 +357,70 @@ SyncRepInitConfig(void)
 	}
 }
 
+
+/*
+ * Obtain position of synchronous standby in the array referencing all
+ * the WAL senders, or -1 if no synchronous node can be found. The caller
+ * of this function should take a lock on SyncRepLock. If there are
+ * multiple nodes with the same lowest priority value, the first node
+ * found is selected.
+ * sync_priority is a preallocated array of size max_wal_senders that
+ * can be used to retrieve the priority of each WAL sender. Its
+ * inclusion in this function has the advantage to limit the scan
+ * of the WAL sender array to one pass, limiting the amount of cycles
+ * SyncRepLock is taken.
+ */
+int
+SyncRepGetSynchronousNode(int *node_priority)
+{
+	int		sync_node = -1;
+	int		priority = 0;
+	int		i;
+
+	/* Scan WAL senders and find synchronous node if any */
+	for (i = 0; i  max_wal_senders; i++)
+	{
+		/* Use volatile pointer to prevent code rearrangement */
+		volatile WalSnd *walsnd = WalSndCtl-walsnds[i];
+
+		/* First get the priority of this WAL sender */
+		if (node_priority)
+			node_priority[i] = XLogRecPtrIsInvalid(walsnd-flush) ?
+0 : walsnd-sync_standby_priority;
+
+		/*
+		 * Proceed immediately to next WAL sender if one of those
+		 * conditions is satisfied:
+		 * - Not active with PID equal to 0.
+		 * - Not streaming.
+		 * - Priority conditions not satisfied. If priority is not 0 it
+		 *   means that one potential synchronous node has already been
+		 *   found. The node selected needs as well to have the lowest
+		 *   priority in the list of WAL senders.
+		 * - Stream flush position is invalid.
+		 */
+		if (walsnd-pid == 0 ||
+			walsnd-state != WALSNDSTATE_STREAMING ||
+			walsnd-sync_standby_priority == 0 ||
+			(priority != 0 
+			 priority = walsnd-sync_standby_priority) ||
+			XLogRecPtrIsInvalid(walsnd-flush))
+			continue;
+
+		/*
+		 * We have a potential synchronous candidate, choose it as the
+		 * synchronous node for the time being before going through the
+		 * other nodes listed in the WAL sender array.
+		 */
+		sync_node = i;
+
+		/* Update priority to current value of WAL sender */
+		priority = walsnd-sync_standby_priority;
+	}

Re: [HACKERS] Converting an expression of one data type to another

2014-10-30 Thread Craig Ringer
I think you're looking for the pgsql-general mailing list. This list is
for PostgreSQL extensions and core database engine software development.

On 10/30/2014 07:44 PM, rohtodeveloper wrote:
 Dear
 
 I'm doing a job about converting an expression of one data type to another.
 In SQLServer, there'are two functions to do this job.
 
 1. CAST ( expression AS data_type [ ( length ) ] )
 2. CONVERT ( data_type [ ( length ) ] , expression )
 
 However, In PostgreSQL, there's only the CAST ( expression AS data_type
 [ ( length ) ] ) function. I have tried the following two ways to
 implenting the CONVERT ( data_type [ ( length ) ] , expression )
 function, but both are failed.
 
 1. CREATE FUNCTION . 
 The function's arguments can only be expressions but not data_type . 
 2. Modifying the gram.y .
 The CONVERT ( data_type [ ( length ) ] , expression ) is in grammer
 conflict with the PostgreSQL self's
 convert(data,src_encoding_name,dest_encoding_name) function. And the
 PostgreSQL self's convert(data,src_encoding_name,dest_encoding_name)
 function cannot be used.
 
 I wonder whether there's a better way to solve this problem. 
 Any help will be appreciated.
 
 Best Regards
 Rohtodeveloper


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-30 Thread Amit Kapila
On Thu, Oct 30, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com
wrote:

 On 2014-10-21 12:40:56 +0530, Amit Kapila wrote:
  While doing performance tests, I noticed a hang at higher client
  counts with patch. I have tried to check call stack for few of
  processes and it is as below:
 
  #0  0x008010933e54 in .semop () from /lib64/libc.so.6
  #1  0x10286e48 in .PGSemaphoreLock ()
  #2  0x102f68bc in .LWLockAcquire ()
  #3  0x102d1ca0 in .ReadBuffer_common ()
  #4  0x102d2ae0 in .ReadBufferExtended ()
  #5  0x100a57d8 in ._bt_getbuf ()
  #6  0x100a6210 in ._bt_getroot ()
  #7  0x100aa910 in ._bt_search ()
  #8  0x100ab494 in ._bt_first ()
  #9  0x100a8e84 in .btgettuple ()
  ..
 
  #0  0x008010933e54 in .semop () from /lib64/libc.so.6
  #1  0x10286e48 in .PGSemaphoreLock ()
  #2  0x102f68bc in .LWLockAcquire ()
  #3  0x102d1ca0 in .ReadBuffer_common ()
  #4  0x102d2ae0 in .ReadBufferExtended ()
  #5  0x100a57d8 in ._bt_getbuf ()
  #6  0x100a6210 in ._bt_getroot ()
  #7  0x100aa910 in ._bt_search ()
  #8  0x100ab494 in ._bt_first ()
  ...
 
  The test configuration is as below:
  Test env - Power - 7 (hydra)
  scale_factor - 3000
  shared_buffers - 8GB
  test mode - pgbench read only
 
  test execution -
  ./pgbench -c 128 -j 128 -T 1800 -S -M prepared postgres
 
  I have ran it for half an hour, but it doesn't came out even after
  ~2 hours.  It doesn't get reproduced every time, currently I am
  able to reproduce it and the m/c is in same state, if you want any
  info, let me know (unfortunately binaries are in release mode, so
  might not get enough information).

 Hm. What commit did you apply the series ontop? I managed to reproduce a
 hang, but it was just something that heikki had already fixed...


commit 494affbd900d1c90de17414a575af1a085c3e37a
Author: Noah Misch n...@leadboat.com
Date:   Sun Oct 12 23:33:37 2014 -0400

And, I think you are saying that heikki's commit e0d97d has fixed
this issue, in that case I will check once by including that fix?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Escaping from blocked send() reprised.

2014-10-30 Thread Heikki Linnakangas

On 10/03/2014 06:29 PM, Heikki Linnakangas wrote:

On 10/03/2014 05:26 PM, Andres Freund wrote:

On 2014-10-03 17:12:18 +0300, Heikki Linnakangas wrote:

On 09/28/2014 01:54 AM, Andres Freund wrote:

0003 Sinval/notify processing got simplified further. There really isn't
   any need for DisableNotifyInterrupt/DisableCatchupInterrupt
   anymore. Also begin_client_read/client_read_ended don't make much
   sense anymore. Instead introduce ProcessClientReadInterrupt (which
   wants a better name).
There's also a very WIP
0004 Allows secure_read/write be interrupted when ProcDiePending is
   set. All of that happens via the latch mechanism, nothing happens
   inside signal handlers. So I do think it's quite an improvement
   over what's been discussed in this thread.
   But it (and the other approaches) do noticeably increase the
   likelihood of clients not getting the error message if the client
   isn't actually dead. The likelihood of write() being blocked
   *temporarily* due to normal bandwidth constraints is quite high
   when you consider COPY FROM and similar. Right now we'll wait till
   we can put the error message into the socket afaics.

1-3 need some serious comment work, but I think the approach is
basically sound. I'm much, much less sure about allowing send() to be
interrupted.


Yeah, 1-3 seem sane.


I think 3 also needs a careful look. Have you looked through it? While
imo much less complex than before, there's some complex interactions in
the touched code. And we have terrible coverage of both catchup
interrupts and notify stuff...


I only looked at the .patch, I didn't apply it, so I didn't look at the
context much. But I don't see any fundamental problem with it. I would
like to have a closer look before it's committed, though.


About patch 3:

Looking closer, this design still looks OK to me. As you said yourself, 
the comments need some work (e.g. the step 5. in the top comment in 
async.c needs updating). And then there are a couple of FIXME and XXX 
comments that need to be addressed.


The comment on PGPROC.procLatch in storage/proc.h says just this:


Latch   procLatch;  /* generic latch for process */


This needs a lot more explaining. It's now used by signal handlers to 
interrupt a read or write to the socket; that should be mentioned. What 
else is it used for? (for waking up a backend in synchronous 
replication, at least) What are the rules on when to arm it and when to 
reset it?


Would it be more clear to use a separate, backend-private, latch, for 
the signals? I guess that won't work, though, because sometimes we need 
need to wait for a wakeup from a different process or from a signal at 
the same time (SyncRepWaitForLSN() in particular). Not without adding a 
variant of WaitLatch that can wait on two latches simultaneously, anyway.


The assumption in secure_raw_read that MyProc exists is pretty 
surprising. I understand why it's that way, and there's a comment in 
PostgresMain explaining why the socket cannot be put into non-blocking 
mode earlier, but it's still a bit whacky. Not sure what to do about that.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-30 Thread Andres Freund
On 2014-10-30 18:54:57 +0530, Amit Kapila wrote:
 On Thu, Oct 30, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2014-10-21 12:40:56 +0530, Amit Kapila wrote:
   I have ran it for half an hour, but it doesn't came out even after
   ~2 hours.  It doesn't get reproduced every time, currently I am
   able to reproduce it and the m/c is in same state, if you want any
   info, let me know (unfortunately binaries are in release mode, so
   might not get enough information).
 
  Hm. What commit did you apply the series ontop? I managed to reproduce a
  hang, but it was just something that heikki had already fixed...
 
 
 commit 494affbd900d1c90de17414a575af1a085c3e37a
 Author: Noah Misch n...@leadboat.com
 Date:   Sun Oct 12 23:33:37 2014 -0400
 
 And, I think you are saying that heikki's commit e0d97d has fixed
 this issue, in that case I will check once by including that fix?

Well, the hang I was able to reproduce was originally hanging because of
that. I saw lot of content locks waiting as well, but the origin seems
to have a backend waiting for a xloginsert.

The way I could trigger it quite fast was by first running a read/write
pgbench and then switch to a readonly one.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-30 Thread Amit Kapila
On Thu, Oct 30, 2014 at 6:58 PM, Andres Freund and...@2ndquadrant.com
wrote:
 On 2014-10-30 18:54:57 +0530, Amit Kapila wrote:
  On Thu, Oct 30, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com
  wrote:
   Hm. What commit did you apply the series ontop? I managed to
reproduce a
   hang, but it was just something that heikki had already fixed...
  
 
  commit 494affbd900d1c90de17414a575af1a085c3e37a
  Author: Noah Misch n...@leadboat.com
  Date:   Sun Oct 12 23:33:37 2014 -0400
 
  And, I think you are saying that heikki's commit e0d97d has fixed
  this issue, in that case I will check once by including that fix?

 Well, the hang I was able to reproduce was originally hanging because of
 that. I saw lot of content locks waiting as well, but the origin seems
 to have a backend waiting for a xloginsert.

 The way I could trigger it quite fast was by first running a read/write
 pgbench and then switch to a readonly one.


So what exactly you mean by 'switch to'?
Is it that both read-write and readonly pgbench were running together
or after read-write got finished and then by running read-only pgbench,
you are able to reproduce it?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Escaping from blocked send() reprised.

2014-10-30 Thread Andres Freund
On 2014-10-30 15:27:13 +0200, Heikki Linnakangas wrote:
 The comment on PGPROC.procLatch in storage/proc.h says just this:
 
  Latch   procLatch;  /* generic latch for process */
 
 This needs a lot more explaining. It's now used by signal handlers to
 interrupt a read or write to the socket; that should be mentioned. What else
 is it used for? (for waking up a backend in synchronous replication, at
 least) What are the rules on when to arm it and when to reset it?

Hm. I agree it use expaned commentary, but I'm unsure if that much
detail is really warranted. Any such documentation seems to be almost
guaranteed to be out of date. As evidenced that there's none to date...

 Would it be more clear to use a separate, backend-private, latch, for the
 signals? I guess that won't work, though, because sometimes we need need to
 wait for a wakeup from a different process or from a signal at the same time
 (SyncRepWaitForLSN() in particular). Not without adding a variant of
 WaitLatch that can wait on two latches simultaneously, anyway.

I wondered the same, but I don't really see what it'd buy us during
normal running. It seems like it'd just make code more complex without
leading to relevantly fewer wakeups.

 The assumption in secure_raw_read that MyProc exists is pretty surprising. I
 understand why it's that way, and there's a comment in PostgresMain
 explaining why the socket cannot be put into non-blocking mode earlier, but
 it's still a bit whacky. Not sure what to do about that.

It makes me quite unhappy too. I looked what it'd take to make proclatch
available earlier, but it wasn't pretty. I wondered whether we could use
a 'early proc latch' in MyProcLatch that used until we're fully attached
to shared memory. Then, when attaching to shared memory we'd set the old
latch once, and reset MyProcLatch to the shared memory one.

But that's pretty ugly.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Wait free LW_SHARED acquisition - v0.9

2014-10-30 Thread Andres Freund
On 2014-10-30 19:05:06 +0530, Amit Kapila wrote:
 On Thu, Oct 30, 2014 at 6:58 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  On 2014-10-30 18:54:57 +0530, Amit Kapila wrote:
   On Thu, Oct 30, 2014 at 5:52 PM, Andres Freund and...@2ndquadrant.com
   wrote:
Hm. What commit did you apply the series ontop? I managed to
 reproduce a
hang, but it was just something that heikki had already fixed...
   
  
   commit 494affbd900d1c90de17414a575af1a085c3e37a
   Author: Noah Misch n...@leadboat.com
   Date:   Sun Oct 12 23:33:37 2014 -0400
  
   And, I think you are saying that heikki's commit e0d97d has fixed
   this issue, in that case I will check once by including that fix?
 
  Well, the hang I was able to reproduce was originally hanging because of
  that. I saw lot of content locks waiting as well, but the origin seems
  to have a backend waiting for a xloginsert.
 
  The way I could trigger it quite fast was by first running a read/write
  pgbench and then switch to a readonly one.
 
 
 So what exactly you mean by 'switch to'?
 Is it that both read-write and readonly pgbench were running together
 or after read-write got finished and then by running read-only pgbench,
 you are able to reproduce it?

I don't think that matters all that much. In this case I first had a
read-write one (accidentally, by leaving of -S), and then aborted and
ran a readonly pgbench. That turned out to trigger it relatively fast.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE IF NOT EXISTS INDEX

2014-10-30 Thread Fujii Masao
On Tue, Oct 7, 2014 at 2:42 AM, Fabrízio de Royes Mello
fabriziome...@gmail.com wrote:
 On Mon, Oct 6, 2014 at 11:13 AM, Marti Raudsepp ma...@juffo.org wrote:

 On Mon, Oct 6, 2014 at 4:27 PM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  create_index_if_not_exists_v7.patch

 Looks good to me. Marking ready for committer.


 Thanks.

The patch looks good to me except the following minor comments.

+  termliteralIF NOT EXISTS/literal/term

It's better to place this after the paragraph of CONCURRENTLY
for the consistency with the syntax.

+Do not throw an error if the index already exists.

I think that this should be

Do not throw an error if a relation with the same name already exists.

+Index name is required when IF NOT EXISTS is specified.

IF NOT EXISTS should be enclosed with literal tag.

@@ -60,7 +60,8 @@ extern Oid index_create(Relation heapRelation,
  bool allow_system_table_mods,
  bool skip_build,
  bool concurrent,
- bool is_internal);
+ bool is_internal,
+ bool if_not_exists);

You forgot to add the comment of if_not_exists argument into the top of
index_create function.

+boolif_not_exists;/* just do nothing if index already exists */

You forgot to add the trailing ? at the above comment. There are similar
comments in parsenodes.h, and they have such ?.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Change in HEAP_NEWPAGE logging makes diagnosis harder

2014-10-30 Thread Andres Freund
Hi,

I've just once more looked at the WAL stream ans was briefly confused
about all the XLOG_FPI records. Since 54685338e3
log_newpage/log_newpage_buffer and XLogSaveBufferForHint() use the same
WAL record. I personally find that a bad idea because they're used in
quite different situations.

Can we use different IDs again?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2014-10-30 Thread Amit Kapila
On Fri, Sep 12, 2014 at 6:12 PM, Amit Kapila amit.kapil...@gmail.com
wrote:

 On Fri, Sep 12, 2014 at 5:07 PM, Dilip kumar dilip.ku...@huawei.com
wrote:
 
  On 12 September 2014 14:34, Amit Kapila Wrote

  Please find updated patch to include those documentation changes.
 
 
 
  Looks fine, Moved to Ready for committer.

 Thanks a lot for the review.


This patch is in Ready for committer stage for more than 1.5 months.
I believe this is an important functionality such that without this tar
format of pg_basebackup is not usable on Windows.  I feel this
will add a value to pg_basebackup utility and moreover the need
and design has been agreed upon the list before development.

Can any Committer please have a look at this patch?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 Ick; I concur with your judgment on those aspects of the IPC::Cmd design.
 Thanks for investigating.  So, surviving options include:

 1. Require IPC::Run.
 2. Write our own run() that reports the raw exit code.
 3. Distill the raw exit code from the IPC::Cmd::run() error string.
 4. Pass IPC::Run::run_forked() a subroutine that execs an argument list.

 Any others worth noting?

From a user-inconvenience standpoint, I think requiring IPC::Run is not
horrid, so long as we add an --enable-something configure option to drive
whether the TAP tests are run or skipped.  However, it has to work with
whatever version of IPC::Run the platform's Perl installation comes up
with, and in that regard I'm a bit worried.  I'm sitting here watching a
trial run on prairiedog (OSX 10.4.11, perl 5.8.6), and there seems to be
something extremely wrong from a performance standpoint.  For example:

t/001_pg_controldata.pl .. 
1..13
ok 1 - pg_controldata --help exit code 0
ok 2 - pg_controldata --help goes to stdout
ok 3 - pg_controldata --help nothing to stderr
ok 4 - pg_controldata --version exit code 0
ok 5 - pg_controldata --version goes to stdout
ok 6 - pg_controldata --version nothing to stderr
ok 7 - pg_controldata with invalid option nonzero exit code
ok 8 - pg_controldata with invalid option prints error message
ok 9 - pg_controldata without arguments fails
ok 10 - pg_controldata with nonexistent directory fails
ok 11 - pg_controldata 
/Users/tgl/pgsql/src/bin/pg_controldata/tmp_testOtBj/data exit code 0
ok 12 - pg_controldata 
/Users/tgl/pgsql/src/bin/pg_controldata/tmp_testOtBj/data no stderr
ok 13 - pg_controldata produces output: matches
ok
All tests successful.
Files=1, Tests=13, 32 wallclock secs ( 0.23 usr  0.05 sys + 21.97 cusr  7.82 
csys = 30.07 CPU)
Result: PASS

Yup, you read that right, it took 32 seconds to run those dozen utterly
trivial tests.  As far as I could tell by eyeball, pretty much all of the
time went into test 11, which is odd since it seems not significantly
different from the others.  I think there's something wacky about IPC::Run
on this platform.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Andres Freund
On 2014-10-30 01:57:15 -0400, Noah Misch wrote:
 On Wed, Oct 29, 2014 at 08:14:07PM -0400, Peter Eisentraut wrote:
  On 10/28/14 9:09 PM, Peter Eisentraut wrote:
   I have looked into IPC::Cmd, but the documentation keeps telling me that
   to do anything interesting I have to have IPC::Run anyway.  I'll give it
   a try, though.
  
  I tried this, but I'm not optimistic about it.  While parts of IPC::Cmd
  are actually a bit nicer than IPC::Run, other parts are weird.  For
  example, with most packages and functions in Perl that run a command,
  you can pass the command as a string or as a list (or array reference).
   The latter is preferred because it avoids issues with quoting, word
  splitting, spaces, etc.  In IPC::Run, I can use the run function in
  the latter way, but I cannot use the run_forked function like that,
  and I need that one to get the exit code of a command.  It's possible to
  work around that, but I'm getting the feeling that this is not very well
  designed.
 
 Ick; I concur with your judgment on those aspects of the IPC::Cmd design.
 Thanks for investigating.  So, surviving options include:
 
 1. Require IPC::Run.
 2. Write our own run() that reports the raw exit code.
 3. Distill the raw exit code from the IPC::Cmd::run() error string.
 4. Pass IPC::Run::run_forked() a subroutine that execs an argument list.
 
 Any others worth noting?

5. Include a copy of IPC::Run in the repository till it's common enough.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-10-30 Thread Andres Freund
On 2014-10-29 10:24:20 +0200, Heikki Linnakangas wrote:
 On 10/06/2014 02:29 PM, Andres Freund wrote:
 On 2014-10-06 14:19:39 +0300, Heikki Linnakangas wrote:
 Barring objections, I'll commit this, and then continue benchmarking the
 second patch with the WAL format and API changes.
 
 I'd like to have a look at it beforehand.
 
 Ping? Here's an rebased patch. I'd like to proceed with this.

Doing so.

 I've not yet really looked,
 but on a quick readthrough XLogInsertRecData() staying in xlog.c doesn't
 make me happy...
 
 Ok.. Can you elaborate?

To me the split between xloginsert.c doing some of the record assembly,
and xlog.c doing the lower level part of the assembly is just wrong.

And it leads to things like the already complicated logic to retry after
detecting missing fpws is now split across two files seems to confirm
that. What happens right now is that XLogInsert() (with some helper
routines) assembles the record. Then hands that off to
XLogInsertRecData(). Which sometimes returns InvalidXLogRecPtr, and
returns back to XLogInsert(), which reverses *some* of its work and then
retries. Brr.

Similary the 'page_writes_omitted' logic doesn't make me particularly
happy. Previously we retried when there actually was a page affected by
the different RedoRecPtr. Now we do it as soon as our copy of RedoRecPtr
is out of date? Won't that quite often spuriously trigger retries? Am I
missing something?
Arguably this doesn't happen often enough to matter, but it's still
something that we should explicitly discuss.

The implementation of the split seems to change the meaning of
TRACE_POSTGRESQL_XLOG_INSERT - it's now counting retries. I don't
particularly care about those tracepoints, but I see no simplification
due to changing its meaning.

Aside: In C11 function local statics become a bit more expensive - they
essentially require atomics and/or spinlocks on the first few calls.

Next thing: The patch doesn't actually compile. Misses an #include
storage/proc.h for MyPgXact.

I'm not a big fan of the naming for the new split. We have
* XLogInsert() which is the external interface
* XLogRecordAssemble() which builds the rdata chain that will be
  inserted
* XLogInsertRecData() which actually inserts the data into the xl buffers.

To me that's pretty inconsistent.

I'm also somewhat confused about the new split of the headers. I'm not
adverse to splitting them, but it's getting a bit hard to understand:
* xlog.h - generic mishmash
* xlogdefs.h - Three typedefs and some #defines
* xlog_fn.h - SQL functions
* xlog_internal.h - Pretty random collection of stuff
* xlogutils.h - Utilities for replaying WAL records
* xlogreader.h - generic XLog reading facility
* xloginsert.h - Functions for generating WAL records
* xlogrecord.h - Definitions for the WAL record format.

Isn't that a bit too much? Pretty much the only files that have a
recognizable separation to me are xlog_fn.h and xlogreader.h.


You've removed the
- *
- * NB: this routine feels free to scribble on the XLogRecData structs,
- * though not on the data they reference.  This is OK since the
XLogRecData
- * structs are always just temporaries in the calling code.
comment, but we still do, no?


While not this patches blame, I see currently we finish the critical
section in XLogInsert/XLogInsertRecData pretty early:
END_CRIT_SECTION();

/*
 * Update shared LogwrtRqst.Write, if we crossed page boundary.
 */
if (StartPos / XLOG_BLCKSZ != EndPos / XLOG_BLCKSZ)
{
SpinLockAcquire(XLogCtl-info_lck);
/* advance global request to include new block(s) */
if (XLogCtl-LogwrtRqst.Write  EndPos)
XLogCtl-LogwrtRqst.Write = EndPos;
/* update local result copy while I have the chance */
LogwrtResult = XLogCtl-LogwrtResult;
SpinLockRelease(XLogCtl-info_lck);
}
I don't think this is particularly critical, but it still looks wrong to me.


Hm. I think that's what I see for now.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Tom Lane
I wrote:
 Yup, you read that right, it took 32 seconds to run those dozen utterly
 trivial tests.  As far as I could tell by eyeball, pretty much all of the
 time went into test 11, which is odd since it seems not significantly
 different from the others.  I think there's something wacky about IPC::Run
 on this platform.

Hm ... on my RHEL6 machine, it takes about 2.5 seconds to run the
pg_controldata TAP tests, and again it looks like practically all of that
is going into test 11.  Given the speed differential between the two
machines, the time taken by prairiedog is roughly in line with that.
So the real question seems to be why is IPC::Run's performance so
inconsistent?  Is there something I'm not understanding that would cause
test 11 to require much more time than the others?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] infinite loop in _bt_getstackbuf

2014-10-30 Thread Robert Haas
A colleague at EnterpriseDB today ran into a situation on PostgreSQL
9.3.5 where the server went into an infinite loop while attempting a
VACUUM FREEZE; it couldn't escape _bt_getstackbuf(), and it couldn't
be killed with ^C.   I think we should add a check for interrupts into
that loop somewhere; and possibly make some attempt to notice if we've
been iterating for longer than, say, the lifetime of the universe
until now.

The fundamental structure of that function is an infinite loop.  We
break out of that loop when BTEntrySame(item, stack-bts_btentry) or
P_RIGHTMOST(opaque) and I'm sure that it's correct to think that, in
theory, one of those things will eventually happen.  But the index
could be corrupted, most obviously by having a page where
opaque-btpo_next points pack to the current block number.  If that
happens, you need an immediate shutdown (or some clever gdb hackery)
to terminate the VACUUM.  That's unfortunate and unnecessary.

It also looks likes something we can fix, at a minimum by adding a
CHECK_FOR_INTERRUPTS() at the top of that loop, or in some function
that it calls, like _bt_getbuf(), so that if it goes into an infinite
loop, it can at least be killed.  We could also onsider adding a check
at the bottom of the loop, just before setting blkno =
opaque-btpo_next, that those values are unequal.  If they are,
elog().  Clearly it's possible to have a cycle of length 1, and such
a check wouldn't catch that, but it might still be worth checking for
the trivial case.  Or, we could try to put an upper bound on the
number of iterations that are reasonable and error out if we exceed
that value.  That might be tricky, though; it's not obvious to me that
there's any comfortably small upper bound.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-10-30 Thread Heikki Linnakangas

On 10/30/2014 06:02 PM, Andres Freund wrote:

On 2014-10-29 10:24:20 +0200, Heikki Linnakangas wrote:

On 10/06/2014 02:29 PM, Andres Freund wrote:

I've not yet really looked,
but on a quick readthrough XLogInsertRecData() staying in xlog.c doesn't
make me happy...


Ok.. Can you elaborate?


To me the split between xloginsert.c doing some of the record assembly,
and xlog.c doing the lower level part of the assembly is just wrong.


Really? To me, that feels like a natural split. xloginsert.c is 
responsible for constructing the final WAL record, with the backup 
blocks and all. It doesn't access any shared memory (related to WAL; it 
does look at Buffers, to determine what to back up). The function in 
xlog.c inserts the assembled record to the WAL, as is. It handles the 
locking and WAL buffer management related to that.


What would you suggest? I don't think putting everything in one 
XLogInsert function, like we have today, is better. Note that the second 
patch makes xloginsert.c a lot more complicated.



And it leads to things like the already complicated logic to retry after
detecting missing fpws is now split across two files seems to confirm
that. What happens right now is that XLogInsert() (with some helper
routines) assembles the record. Then hands that off to
XLogInsertRecData(). Which sometimes returns InvalidXLogRecPtr, and
returns back to XLogInsert(), which reverses *some* of its work and then
retries. Brr.


Modifying the chain that was already constructed is not pretty, but it's 
not this patch's fault. I don't think splitting the functions makes that 
any worse. (BTW, the follow-up patch to change the WAL format fixes 
that; the per-buffer data is kept separately from the main data chain).



Similary the 'page_writes_omitted' logic doesn't make me particularly
happy. Previously we retried when there actually was a page affected by
the different RedoRecPtr. Now we do it as soon as our copy of RedoRecPtr
is out of date? Won't that quite often spuriously trigger retries? Am I
missing something?
Arguably this doesn't happen often enough to matter, but it's still
something that we should explicitly discuss.


The situation where it leads to a spurious retry is when the constructed 
WAL record skipped the FPW of a buffer, and RedoRecPtr was updated, but 
the buffer stilll doesn't need to be FPW according to the updated 
RedoRecPtr. That only happens if the same buffer was updated (by another 
backend) after the new checkpoint began. I believe that's extremely rare.


We could make that more fine-grained, though. Instead of passing a 
boolean 'fpw_sensitive' flag to XLogInsertRecData, pass the lowest LSN 
among the pages whose FPW was skipped. If that's less than the new 
RedoRecPtr, then retry is needed. That would eliminate the spurious retries.



The implementation of the split seems to change the meaning of
TRACE_POSTGRESQL_XLOG_INSERT - it's now counting retries. I don't
particularly care about those tracepoints, but I see no simplification
due to changing its meaning.


I wasn't sure what to do about that. I don't use tracepoints, and I 
don't know what others use them for.



I'm not a big fan of the naming for the new split. We have
* XLogInsert() which is the external interface
* XLogRecordAssemble() which builds the rdata chain that will be
   inserted
* XLogInsertRecData() which actually inserts the data into the xl buffers.

To me that's pretty inconsistent.


Got a better suggestion? I wanted to keep the name XLogInsert() 
unchanged, to avoid code churn, and because it's still a good name for 
that. XLogRecordAssemble is pretty descriptive for what it does, 
although XLogRecord implies that it construct an XLogRecord struct. It 
does fill in that too, but the main purpose is to build the XLogRecData 
chain. Perhaps XLogAssembleRecord() would be better.


I'm not very happy with XLogInsertRecData myself. XLogInsertRecord?


I'm also somewhat confused about the new split of the headers. I'm not
adverse to splitting them, but it's getting a bit hard to understand:
* xlog.h - generic mishmash
* xlogdefs.h - Three typedefs and some #defines
* xlog_fn.h - SQL functions
* xlog_internal.h - Pretty random collection of stuff
* xlogutils.h - Utilities for replaying WAL records
* xlogreader.h - generic XLog reading facility
* xloginsert.h - Functions for generating WAL records
* xlogrecord.h - Definitions for the WAL record format.

Isn't that a bit too much? Pretty much the only files that have a
recognizable separation to me are xlog_fn.h and xlogreader.h.


This is a matter of taste, I guess. I find xlog.h, xlogdefs.h and 
xlog_internal.h fairly random, while the rest are have a clear function. 
xlogutils.h is needed by redo functions, and xloginsert.h is needed for 
generating WAL.


Note that the second patch adds more stuff to xloginsert.h and xlogrecord.h.


You've removed the
- *
- * NB: this routine feels free to scribble on the XLogRecData structs,
- * though not on the data 

Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-10-30 Thread Robert Haas
+   acronymBRIN/acronym indexes can satisfy queries via the bitmap
+   scanning facility, and will return all tuples in all pages within

The bitmap scanning facility?  Does this mean a bitmap index scan?
Or something novel to BRIN?  I think this could be clearer.

+   This enables them to work as very fast sequential scan helpers to avoid
+   scanning blocks that are known not to contain matching tuples.

Hmm, but they don't actually do anything about sequential scans per
se, right?  I'd say something like: Because a BRIN index is very
small, scanning the index adds little overhead compared to a
sequential scan, but may avoid scanning large parts of the table that
are known not to contain matching tuples.

+   depend on the operator class selected for the data type.

The operator class is selected for the index, not the data type.

+   The size of the block range is determined at index creation time with
+   the literalpages_per_range/ storage parameter.
+   The smaller the number, the larger the index becomes (because of the need to
+   store more index entries), but at the same time the summary data stored can
+   be more precise and more data blocks can be skipped during an index scan.

I would insert a sentence something like this: The number of index
entries will be equal to the size of the relation in pages divided by
the selected value for pages_per_range.  Therefore, the smaller the
number  At least, I would insert that if it's actually true.  My
point is that I think the effect of pages_per_range could be made more
clear.

+   The core productnamePostgreSQL/productname distribution includes
+   includes the acronymBRIN/acronym operator classes shown in
+   xref linkend=gin-builtin-opclasses-table.

Shouldn't that say brin, not gin?

+   requiring the access method implementer only to implement the semantics

The naming of the reverse range map seems a little weird.  It seems
like most operations go through it, so it feels more like the forward
direction.  Maybe I'm misunderstanding.  (I doubt it's worth renaming
it at this point either way, but I thought I'd mention it.)

+  errmsg(unlogged BRIN indexes are not supported)));

Why not?  Shouldn't be particularly hard, I wouldn't think.

I'm pretty sure you need to create a pageinspect--1.3.sql, not just
update the 1.2 file.  Because that's in 9.4, and this won't be.

I'm pretty excited about this feature.  I think it's going to be very
good for PostgreSQL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter user/role CURRENT_USER

2014-10-30 Thread Adam Brightwell
Kyotaro,

Zero-length identifiers are rejected in scanner so RoleId cannot
 be a valid pointer to a zero-length string. (NULL is used as
 PUBLIC in auth_ident)

 | postgres=# create role ;
 | ERROR:  zero-length delimited identifier at or near 
 | postgres=# create role U\00;
 | ERROR:  invalid Unicode escape value at or near \00


Err... what?  I'm not sure what you are getting at here?  Why would we ever
have/want a zero-length identifier for a RoleId?  Seems to me that anything
requiring a RoleId should be explicit.

As a dirty and quick hack we can use some integer values prfixed
 by single zero byte to represent special roles such as
 CURRENT_USER.

 | RoleId_or_curruser: RoleId{ $$ = $1; }
 |   | CURRENT_USER  { $$ = \x00\x01;};
 
 | Oid ResolveRoleId(const char *name, bool missing_ok)
 | {
 |   Oid roleid;
 |
 |   if (name[0] == 0  name[1] == 1)
 |   roleid = GetUserId();

 This is ugly but needs no additional struct member or special
 logics. (Macros could make them look better.)


Yeah, that's pretty ugly.  I think Alvaro's recommendation of having the
production return a node with a name or flag is the better approach.

|   /* This hack lets us avoid reserving PUBLIC as a keyword*/
 |   if (strcmp($1, public) == 0)

 Please let me know the reason to avoid registering new keyword
 making the word unusable as an literal identifier, if any?


FWIW, I found the following in the archives:

http://www.postgresql.org/message-id/15516.1038718...@sss.pgh.pa.us

Now this is from 2002 and it appears it wasn't necessary to change at the
time, but I haven't yet found anything else related (it's a big archive).
Though, as I understand it, PUBLIC is now non-reserved as of SQL:2011 which
might make a compelling argument to leave it as is?

-Adam

-- 
Adam Brightwell - adam.brightw...@crunchydatasolutions.com
Database Engineer - www.crunchydatasolutions.com


Re: [HACKERS] infinite loop in _bt_getstackbuf

2014-10-30 Thread Alvaro Herrera
Robert Haas wrote:
 A colleague at EnterpriseDB today ran into a situation on PostgreSQL
 9.3.5 where the server went into an infinite loop while attempting a
 VACUUM FREEZE; it couldn't escape _bt_getstackbuf(), and it couldn't
 be killed with ^C.   I think we should add a check for interrupts into
 that loop somewhere;

Our design principle in this area is that all loops should have
CHECK_FOR_INTERRUPTS() calls somewhere, so that even if data is horribly
corrupted you can get out of it.  (Trivial loops where the exit
condition cannot possibly fail don't need to apply --- surely we don't
need to cope with hardware that makes i+1 go back to i-1 or whatever.)
Therefore I don't think you need to argue very hard in order to add more
interrupt checks if you see a loop that's missing them.

For example, Andres was saying to me just this morning that
GetMultiXactIdMembers is lacking one such check, and I was considering
pushing a patch to add it without any discussion.

 and possibly make some attempt to notice if we've
 been iterating for longer than, say, the lifetime of the universe
 until now.

This I'm not so sure about.  Adding extra logic in all nontrivial loops
to detect whether they have been running for too long is likely to
cause too much overhead.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Faster relevance ranking didn't make it into PostgreSQL 9.4

2014-10-30 Thread Arthur Silva
This is slight off topic but please bear with me.

I came across this post:
http://pauleveritt.wordpress.com/2014/10/29/faster-relevance-ranking-didnt-make-it-into-postgresql-9-4/
I was curious about it so I checked several commit fest pages and searched
the mailing lists but I wasn't able to locate the relevant discussion.

Can someone point me to to it?

--
Arthur Silva


Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT

2014-10-30 Thread Robert Haas
On Wed, Oct 29, 2014 at 2:36 PM, Tomas Vondra t...@fuzzy.cz wrote:
 I would tend not to worry too much about this case. I'm skeptical
 that there are a lot of people using large template databases. But
 if there are, or if some particular one of those people hits this
 problem, then they can raise checkpoint_segments to avoid it. The
 reverse problem, which you are encountering, cannot be fixed by
 adjusting settings.

 That however solves only the checkpoint, not the double amount of I/O
 due to writing both the files and WAL, no? But maybe that's OK.

I mean, it's not unimaginable that it's going to hurt somebody, but
the current situation is pretty bad too.  You don't have to be the
world's foremost PostgreSQL performance expert to know that extra
checkpoints are really bad for performance.  Write volume is of course
also a problem, but I bet there are a lot more people using small
template databases (where the write volume isn't really an issue,
because as Heikki points out the checkpoint wastes half a segment
anyway, but the checkpoint may very well be a issue) than large ones
(where either could be an issue).

 Also, all this is concern only with 'wal_level != minimal', but ISTM 'on
 wal_level=minimal it's fast' is a rather poor argument.

I think the argument here is more that there's no such thing as a free
lunch.  If someone wants to come up with a way to make this work
without WAL-logging every block -AND- without triggering a checkpoint,
great.  If that works well, perhaps we can apply it to other cases
like ALTER TABLE .. SET TABLESPACE which are currently quite painful
and which seem to me to have more or less the same problem.  But I
don't really know why we should expect that such a solution exists at
all or is easy to engineer correctly.  All we're doing here is
applying the same solution that's been found to be robust in other
situations to some old, crufty code that isn't.

 (This reminds me, yet again, that it would be really nice to something
 smarter than checkpoint_segments.  If there is little WAL activity
 between one checkpoint and the next, we should reduce the number of
 segments we're keeping around to free up disk space and ensure that
 we're recycling a file new enough that it's likely to still be in
 cache.  Recycling files long-since evicted from cache is poor.  But
 then we should also let the number of WAL files ratchet back up if the
 system again becomes busy.  Isn't this more or less what Heikki's
 soft-WAL-limit patch did?  Why did we reject that, again?)

 What about simply reusing the files in a different way? Instead of
 looping through the files in a round robin manner, couldn't we just use
 the last recently used file, instead of going all the way back to the
 first one? This won't free the disk space, but IMHO that's not a problem
 because noone is going to use that space anyway (as it would be a risk
 once all the segments will be used again).

Hmm, interesting idea.  I have no idea whether that would work out to
a win or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT

2014-10-30 Thread Andres Freund
On 2014-10-30 14:51:54 -0400, Robert Haas wrote:
 On Wed, Oct 29, 2014 at 2:36 PM, Tomas Vondra t...@fuzzy.cz wrote:
  I would tend not to worry too much about this case. I'm skeptical
  that there are a lot of people using large template databases. But
  if there are, or if some particular one of those people hits this
  problem, then they can raise checkpoint_segments to avoid it. The
  reverse problem, which you are encountering, cannot be fixed by
  adjusting settings.
 
  That however solves only the checkpoint, not the double amount of I/O
  due to writing both the files and WAL, no? But maybe that's OK.
 
 I mean, it's not unimaginable that it's going to hurt somebody, but
 the current situation is pretty bad too.  You don't have to be the
 world's foremost PostgreSQL performance expert to know that extra
 checkpoints are really bad for performance.  Write volume is of course
 also a problem, but I bet there are a lot more people using small
 template databases (where the write volume isn't really an issue,
 because as Heikki points out the checkpoint wastes half a segment
 anyway, but the checkpoint may very well be a issue) than large ones
 (where either could be an issue).

Agreed. The current behaviour is a pretty ugly that just failed to fail
recently. I actually think we should *always* use the new code and not
add a separate wal_level=minimal branch. Maintaining this twice just
isn't worth the effort. minimal is used *far* less these days.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Faster relevance ranking didn't make it into PostgreSQL 9.4

2014-10-30 Thread Alvaro Herrera
Arthur Silva wrote:
 This is slight off topic but please bear with me.
 
 I came across this post:
 http://pauleveritt.wordpress.com/2014/10/29/faster-relevance-ranking-didnt-make-it-into-postgresql-9-4/
 I was curious about it so I checked several commit fest pages and searched
 the mailing lists but I wasn't able to locate the relevant discussion.
 
 Can someone point me to to it?

Maybe this one?
http://www.postgresql.org/message-id/CAPpHfdscOX5an71nHd8WSUH6GNOCf=V7wgDaTXdDd9=gon-...@mail.gmail.com

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter user/role CURRENT_USER

2014-10-30 Thread Stephen Frost
* Adam Brightwell (adam.brightw...@crunchydatasolutions.com) wrote:
  | RoleId_or_curruser: RoleId{ $$ = $1; }
  |   | CURRENT_USER  { $$ = \x00\x01;};
[...]
  This is ugly but needs no additional struct member or special
  logics. (Macros could make them look better.)
 
 Yeah, that's pretty ugly.  I think Alvaro's recommendation of having the
 production return a node with a name or flag is the better approach.

That's more than just 'ugly', in my view.  I don't think there's any
reason to avoid making this into a node with a field that can be set to
indicate it's something special if we're going to support this.

The other idea which comes to mind is- could we try to actually resolve
what the 'right' answer is here, instead of setting a special value and
then having to detect and fix it later?  Perhaps have a Oid+Rolename
structure and then fill in the Oid w/ GetUserId(), if we're called with
CURRENT_USER, and otherwise just populate the Rolename field and have
code later which fills in the Oid if it's InvalidOid.

  Please let me know the reason to avoid registering new keyword
  making the word unusable as an literal identifier, if any?

We really don't want to introduce new keywords without very good reason,
and adding to the list of can't be used even if quoted is all but
completely forbidden.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT

2014-10-30 Thread Heikki Linnakangas

On 10/30/2014 08:51 PM, Robert Haas wrote:

On Wed, Oct 29, 2014 at 2:36 PM, Tomas Vondra t...@fuzzy.cz wrote:

I would tend not to worry too much about this case. I'm skeptical
that there are a lot of people using large template databases. But
if there are, or if some particular one of those people hits this
problem, then they can raise checkpoint_segments to avoid it. The
reverse problem, which you are encountering, cannot be fixed by
adjusting settings.


That however solves only the checkpoint, not the double amount of I/O
due to writing both the files and WAL, no? But maybe that's OK.


I mean, it's not unimaginable that it's going to hurt somebody, but
the current situation is pretty bad too.  You don't have to be the
world's foremost PostgreSQL performance expert to know that extra
checkpoints are really bad for performance.  Write volume is of course
also a problem, but I bet there are a lot more people using small
template databases (where the write volume isn't really an issue,
because as Heikki points out the checkpoint wastes half a segment
anyway, but the checkpoint may very well be a issue) than large ones
(where either could be an issue).


Nitpick: I didn't say that a a checkpoint wastes half a segment. An xlog 
switch does, but a checkpoint doesn't automatically cause an xlog switch.


But I agree with the sentiment in general.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT

2014-10-30 Thread Heikki Linnakangas

On 10/30/2014 08:56 PM, Andres Freund wrote:

I actually think we should *always* use the new code and not
add a separate wal_level=minimal branch. Maintaining this twice just
isn't worth the effort. minimal is used *far* less these days.


I wouldn't go that far. Doing the wal_level=minimal optimization should 
be pretty straightforward. Note that it would be implemented more like 
CREATE INDEX et al with wal_level=minimal, not the way CREATE DATABASE 
currently works. It would not involve any extra checkpoints.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-10-30 Thread Andres Freund
Hi,

On 2014-09-15 15:41:22 +0300, Heikki Linnakangas wrote:
 The second patch contains the interesting changes.

Heikki's pushed the newest version of this to the git tree.

Some things I noticed while reading the patch:
* potential mismerge:
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -408,7 +408,7 @@ main(int argc, char **argv)
}
}
 
-   while ((c = getopt_long(argc, argv, D:d:h:p:U:s:S:nF:wWv,
+   while ((c = getopt_long(argc, argv, D:d:h:p:U:s:nF:wWv,
long_options, option_index)) != -1)
{
switch (c)

* Can't you just have REGBUF_WILL_INIT include REGBUF_NO_IMAGE? Then you
  don't need to test both.

* Is it really a good idea to separate XLogRegisterBufData() from
  XLogRegisterBuffer() the way you've done it ? If we ever actually get
  a record with a large numbers of blocks touched this issentially is
  O(touched_buffers*data_entries).

* At least in Assert mode XLogRegisterBuffer() should ensure we're not
  registering the same buffer twice. It's a pretty easy to make mistake
  to do it twice for some kind of actions (think UPDATE), and the
  consequences will be far from obvious afaics.

* There's lots of functions like XLogRecHasBlockRef() that dig through
  the wal record. A common pattern is something like:
if (XLogRecHasBlockRef(record, 1))
XLogRecGetBlockTag(record, 1, NULL, NULL, oldblk);
else
oldblk = newblk;

  I think doing that repeatedly is quite a bad idea. We should parse the
  record once and then use it in a sensible format. Not do it in pieces,
  over and over again. It's not like we ignore backup blocks - so doing
  this lazily doesn't make sense to me.
  Especially as ValidXLogRecord() *already* has parsed the whole damn
  thing once.

* Maybe XLogRegisterData() and XLogRegisterBufData() should accept void*
  instead of char*? Right now there's lots of pointless casts in callers
  needed that don't add any safety. The one additional line that's then
  needed in these functions seems well worth it.

* There's a couple record types (e.g. XLOG_SMGR_TRUNCATE) that only
  refer to the relation, but not to the block number. These still log
  their rnode manually. Shouldn't we somehow deal with those in a
  similar way explicit block references are dealt with?

* You've added an assert in RestoreBackupBlockContents() that ensures
  the page isn't new. That wasn't there before, instead there was a
  special case for it. I can't immediately think how that previously
  could happen, but I do wonder why it was there.

* The following comment in +RestoreBackupBlock probably is two lines to
  early
+   /* Found it, apply the update */
+   if (!(bkpb-fork_flags  BKPBLOCK_HAS_IMAGE))
+   return InvalidBuffer;
  This new InvalidBuffer case probably could use a comment in either
  XLogReadBufferForRedoExtended or here.

* Most of the commentary above RestoreBackupBlock() probably doesn't
  belong there anymore.

* Maybe XLogRecordBlockData.fork_flags should be renamed? Maybe just
  into flags_fork? Right now it makes at least me think that it's fork
  specific flags, which really isn't the actual meaning.

* I wonder if it wouldn't be worthwile, for the benefit of the FPI
  compression patch, to keep the bkp block data after *all* the
  headers. That'd make it easier to just compress the data.

* +InitXLogRecordConstruct() seems like a remainder of the
  xlogconstruct.c naming. I'd just go for InitXLogInsert()

* Hm. At least WriteMZeroPageXlogRec (and probably the same for all the
  other slru stuff) doesn't include a reference to the page. Isn't that
  bad? Shouldn't we make XLogRegisterBlock() usable for that case?
  Otherwise I fail to see how pg_rewind like tools can sanely deal with this?

* Why did you duplicate the identical cases in btree_desc()? I guess due
  to rebasing ontop the xlogdump stats series?

* I haven't actually looked at the xlogdump output so I might be
  completely wrong, but won't the output of e.g. updates be harder to
  read now because it's not clear which of the references old/new
  buffers are? Not that it matters that much.

* s/recdataend/recdata_end/? The former is somewhat hard to parse for
  me.

* I think heap_xlog_update is buggy for wal_level=logical because it
  computes the length of the tuple using
  tuplen = recdataend - recdata;
  But the old primary key/old tuple value might be stored there as
  well. Afaics that code has to continue to use xl_heap_header_len.

* It looks to me like insert wal logging could just use REGBUF_KEEP_DATA
  to get rid of:
+   /*
+* The new tuple is normally stored as buffer 0's data. But if
+* XLOG_HEAP_CONTAINS_NEW_TUPLE flag is set, it's part of the main
+* data, after the xl_heap_insert struct.
+*/
+   if (xlrec-flags  XLOG_HEAP_CONTAINS_NEW_TUPLE)
+   {
+   data = XLogRecGetData(record) + SizeOfHeapInsert;
+   datalen = record-xl_len - SizeOfHeapInsert;
+   }
+   else
+   data 

Re: [HACKERS] infinite loop in _bt_getstackbuf

2014-10-30 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Robert Haas wrote:
 A colleague at EnterpriseDB today ran into a situation on PostgreSQL
 9.3.5 where the server went into an infinite loop while attempting a
 VACUUM FREEZE; it couldn't escape _bt_getstackbuf(), and it couldn't
 be killed with ^C.   I think we should add a check for interrupts into
 that loop somewhere;

 Our design principle in this area is that all loops should have
 CHECK_FOR_INTERRUPTS() calls somewhere, so that even if data is horribly
 corrupted you can get out of it.

FWIW, I concur with Alvaro that adding a CHECK_FOR_INTERRUPTS() needn't
require much discussion.  Given the lack of prior complaints about this
loop, I'm not sure I see the need to work harder than that; corruption
of this sort must be quite rare.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter user/role CURRENT_USER

2014-10-30 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
 The other idea which comes to mind is- could we try to actually resolve
 what the 'right' answer is here, instead of setting a special value and
 then having to detect and fix it later?

No, absolutely not.  Read the NOTES at the head of gram.y.  Or if you
need it spelled out: when we run the bison parser, we may not be inside a
transaction at all, and even if we are, we aren't necessarily going to
be seeing the same current user that will apply when the parsetree is
ultimately executed.  (Consider a query inside a plpgsql function, which
might be called under multiple userids over the course of a session.)

I think Alvaro's suggestion is a perfectly appropriate solution.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] alter user/role CURRENT_USER

2014-10-30 Thread Tom Lane
Adam Brightwell adam.brightw...@crunchydatasolutions.com writes:
 FWIW, I found the following in the archives:

 http://www.postgresql.org/message-id/15516.1038718...@sss.pgh.pa.us

 Now this is from 2002 and it appears it wasn't necessary to change at the
 time, but I haven't yet found anything else related (it's a big archive).
 Though, as I understand it, PUBLIC is now non-reserved as of SQL:2011 which
 might make a compelling argument to leave it as is?

The current spec does list PUBLIC as a non-reserved keyword, but it also
says (5.4 Names and identifiers syntax rules)

 20) No authorization identifier shall specify PUBLIC.

which, oddly enough, seems to license us to handle PUBLIC the way
we are doing.  OTOH, it lists CURRENT_USER as a reserved word, suggesting
that they don't think the same type of hack should be used for that.

I'd be inclined to leave the grammar as such alone (ie CURRENT_USER is
a keyword, PUBLIC isn't).  Changing that has more downside than upside,
and we do have justification in the spec for treating the two cases
differently.  However, I agree that we should fix the subsequent
processing so that current_user is not confused with CURRENT_USER.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Tom Lane
I wrote:
 Yup, you read that right, it took 32 seconds to run those dozen utterly
 trivial tests.  As far as I could tell by eyeball, pretty much all of the
 time went into test 11, which is odd since it seems not significantly
 different from the others.  I think there's something wacky about IPC::Run
 on this platform.

Oh, never mind: after digging into the test source I eventually realized
that there's an initdb happening between tests 10 and 11, and that's
what's eating the time.  It might be a thought to print something to
indicate that a time-consuming step is happening; but the lack of any
feedback here isn't exactly IPC::Run's fault.

Anyway, I can confirm Peter's statement that the current tests work even
on quite old platforms, as long as you install IPC::Run.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] SET TABLESPACE ... NOWAIT

2014-10-30 Thread Tomas Vondra
Hi,

while preparing an overview of features new in 9.4, I noticed that while
we provide NOWAIT for the ALTER ... ALL IN TABLESPACE commands, we
don't support that for the 'single object' case. Is that on purpose? I
assume it makes, as with a single object you can't get stuck half-way
through, but I don't see it mentioned in any of the discussions (both
for the original patch in January and the summer reworks).

Also, the current phrasing If the NOWAIT option is specified then the
command will fail if it is unable to acquire all of the locks required
immediately. seems a bit ambiguous to me. Maybe it's just me, but I
wasn't sure if that means locks for all objects immediately, before
actually starting moving them to the new tablespace, and I had to check
to code that I understand it correctly. But maybe that's just me and
it's unambiguous to everyone else. Well, it would be really strange if
it behaved differently ...

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2014-10-30 Thread Jim Nasby

On 10/30/14, 3:19 AM, Michael Paquier wrote:

On Wed, Oct 29, 2014 at 7:59 AM, Jim Nasby jim.na...@bluetreble.com 
mailto:jim.na...@bluetreble.com wrote:
  Patch applies against current HEAD and builds, but I'm getting 37 failed
  tests (mostly parallel, but also misc and WITH; results attached). Is that
  expected?
This is caused by the recent commit 7b1c2a0 (that I actually participated in 
reviewing :p) because of a missing inclusion of ruleutils.h in index.c.


Sorry, forgot to mention patch now passes make check cleanly.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] REINDEX CONCURRENTLY 2.0

2014-10-30 Thread Jim Nasby

On 10/30/14, 3:19 AM, Michael Paquier wrote:

Thanks for your input, Jim!

On Wed, Oct 29, 2014 at 7:59 AM, Jim Nasby jim.na...@bluetreble.com 
mailto:jim.na...@bluetreble.com wrote:
  Patch applies against current HEAD and builds, but I'm getting 37 failed
  tests (mostly parallel, but also misc and WITH; results attached). Is that
  expected?
This is caused by the recent commit 7b1c2a0 (that I actually participated in 
reviewing :p) because of a missing inclusion of ruleutils.h in index.c.



  The mark the concurrent index bit is rather confusing; it sounds like it's
  referring to the new index instead of the old. Now that I've read the code I
  understand what's going on here between the concurrent index *entry* and the
  filenode swap, but I don't think the docs make this sufficiently clear to
  users.
 
  How about something like this:
 
  The following steps occur in a concurrent index build, each in a separate
  transaction. Note that if there are multiple indexes to be rebuilt then each
  step loops through all the indexes we're rebuilding, using a separate
  transaction for each one.
 
  1. [blah]
Definitely a good idea! I took your text and made it more precise, listing the 
actions done for each step, the pg_index flags switched, using orderedlist to 
make the list of steps described in a separate paragraph more exhaustive for the 
user. At the same time I reworked the docs removing a part that was somewhat 
duplicated about dealing with the constraints having invalid index entries and how to 
drop them.


Awesome! Just a few items here:

+   Then a second pass is performed to add tuples that were added while
+   the first pass build was running. One the validation of the index

s/One the/Once the/


  + * index_concurrent_create
  + *
  + * Create an index based on the given one that will be used for concurrent
  + * operations. The index is inserted into catalogs and needs to be built
  later
  + * on. This is called during concurrent index processing. The heap relation
  + * on which is based the index needs to be closed by the caller.
 
  Last bit presumably should be on which the index is based.
What about Create a concurrent index based on the definition of the one provided by 
caller?


That's good too, but my comment was on the last sentence, not the first.


  +   /* Build the list of column names, necessary for index_create */
  Instead of all this work wouldn't it be easier to create a version of
  index_create/ConstructTupleDescriptor that will use the IndexInfo for the
  old index? ISTM index_concurrent_create() is doing a heck of a lot of work
  to marshal data into one form just to have it get marshaled yet again. Worst
  case, if we do have to play this game, there should be a stand-alone
  function to get the columns/expressions for an existing index; you're
  duplicating a lot of code from pg_get_indexdef_worker().
Yes, this definitely sucks and the approach creating a function to get all the 
column names is not productive as well. Then let's define an additional 
argument in index_create to pass a potential TupleDesc instead of this whole 
wart. I noticed as well that we need to actually reset attcacheoff to be able 
to use a fresh version of the tuple descriptor of the old index. I added a 
small API for this purpose in tupdesc.h called ResetTupleDescCache. Would it 
make sense instead to extend CreateTupleDescCopyConstr or CreateTupleDescCopy 
with a boolean flag?


Perhaps there'd be other places that would want to reset the stats, so I lean 
slightly that direction.

The comment at the beginning of index_create needs to be modified to mention 
tupdesc and especially that setting tupdesc over-rides indexColNames.


  index_concurrent_swap(): Perhaps it'd be better to create
  index_concurrent_swap_setup() and index_concurrent_swap_cleanup() and
  refactor the duplicated code out... the actual function would then become:
This sentence is not finished :) IMO, index_concurrent_swap looks good as is, 
taking as arguments the index and its concurrent entry, and swapping their 
relfilenode after taking AccessExclusiveLock that will be hold until the end of 
this transaction.


Fair enough.


  ReindexRelationConcurrently()
 
  + * Process REINDEX CONCURRENTLY for given relation Oid. The relation can be
  + * either an index or a table. If a table is specified, each step of
  REINDEX
  + * CONCURRENTLY is done in parallel with all the table's indexes as well as
  + * its dependent toast indexes.
  This comment is a bit misleading; we're not actually doing anything in
  parallel, right? AFAICT index_concurrent_build is going to block while each
  index is built the first time.
Yes, parallel may be misleading. What is meant here is that each step of the 
process is done one by one on all the valid indexes a table may have.


New comment looks good.


  +* relkind is an index, this index itself will be rebuilt. The locks
  taken
  +* parent relations 

Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Andrew Dunstan


On 10/30/2014 05:23 PM, Tom Lane wrote:

I wrote:

Yup, you read that right, it took 32 seconds to run those dozen utterly
trivial tests.  As far as I could tell by eyeball, pretty much all of the
time went into test 11, which is odd since it seems not significantly
different from the others.  I think there's something wacky about IPC::Run
on this platform.

Oh, never mind: after digging into the test source I eventually realized
that there's an initdb happening between tests 10 and 11, and that's
what's eating the time.  It might be a thought to print something to
indicate that a time-consuming step is happening; but the lack of any
feedback here isn't exactly IPC::Run's fault.

Anyway, I can confirm Peter's statement that the current tests work even
on quite old platforms, as long as you install IPC::Run.





So, I'm a bit confused. Is the --enable-tap-tests config setting still 
on the table?


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 On 10/30/2014 05:23 PM, Tom Lane wrote:
 Anyway, I can confirm Peter's statement that the current tests work even
 on quite old platforms, as long as you install IPC::Run.

 So, I'm a bit confused. Is the --enable-tap-tests config setting still 
 on the table?

I think it should be.  You should not have to have either prove or
IPC::Run (or, IIRC, even Perl) in order to do make check-world.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-10-30 Thread Jim Nasby

On 10/30/14, 8:05 AM, Michael Paquier wrote:

This switches from using a single if() with multiple conditions 'd
together to a bunch of if() continue's. I don't know if those will perform
the same, and AFAIK this is pretty performance critical.

Well, we could still use the old notation with a single if(). That's
not much complicated to change.


I actually prefer the multiple if's; it reads a LOT cleaner. I don't know what 
the compiler will do with it though.

If we stick with this version I'd argue it makes more sense to just stick the 
sync_node = and priority = statements into the if block and ditch the continue. 
/nitpick
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-30 Thread Robert Haas
On Wed, Oct 29, 2014 at 4:58 PM, Andres Freund and...@2ndquadrant.com wrote:
 That's true.  I don't know what to do about it.  I'm somewhat inclined
 to think that, if this remains in contrib, it's OK to ignore those
 issues until such time as people complain about them, because anybody
 who dislikes the things that can be done with this extension doesn't
 have to install it.  Also, the people complaining might have useful
 ideas about what a good fix would look like, which I currently don't.
 There's some push to move this into core, which I think is overkill,
 but if we do it then we'd better have a good solution to this problem.

 At the very least it need to be clearly documented. Another solution
 would be to simply not give out PUBLIC rights, and restrict it to the
 owner/superuesers lest somebody makes explicit grants. I favor
 combining those two.

I don't think it's appropriate to put superuser() checks in the code,
if that's what you are proposing.  Forcing this to be super-user only
is hitting a mouse with a battleship.  If you're saying we should put
REVOKE commands into the install script as we do for some other
modules, like dblink, that makes sense to me.

 We could try to make connection limits apply to pg_background, and we
 could also check CONNECT permission when starting a background worker.
 Both of those things feel slightly odd because there's no actual
 server connection.  There *might* be a connection to the user backend
 that started it, but it's sort of a virtual connection through
 shared memory, and the background process continues running unimpeded
 if it goes away, so there might be no actual connection at all.

 I think that'd not be bad.

Looks like those checks happen in InitializeSessionUserId(), which is
called from InitPostgres(), which is called from
BackgroundWorkerInitializeConnection().  That makes me think we're
already applying these checks.

rhaas=# select * from
pg_background_result(pg_background_launch('vacuum pg_enum')) as (x
text);
   x

 VACUUM
(1 row)

rhaas=# alter role rhaas nologin;
ALTER ROLE
rhaas=# select * from
pg_background_result(pg_background_launch('vacuum pg_enum')) as (x
text);
ERROR:  role rhaas is not permitted to log in
CONTEXT:  background worker, pid 64311
rhaas=# alter role rhaas login;
ALTER ROLE
rhaas=# select * from
pg_background_result(pg_background_launch('vacuum pg_enum')) as (x
text);
   x

 VACUUM
(1 row)

 Hm. I'm unconvinced. It looks almost trivial to fail back to the text
 based protocol.

It's hard to argue with I'm unconvinced.  What specifically about
that argument do you think isn't valid?

While I am sure the problem can be worked around, it isn't trivial.
Right now, execute_sql_string() just requests binary format
unconditionally.  To do what you're talking about, we'd need to
iterate through all of the types and figure out which ones have
typsend/typreceive functions.  If there's a convenience function that
will do that for us, I don't see it, probably because I believe there
are exact zero situations where we do that kind of inference today.
Then, the user backend has to save the format codes from the
RowDescription message and decide whether to use text or binary.  That
just seems like a silly waste of code and cycles.

I think this actually matters, too, because the question is what we're
going to do with full-blown parallelism.  Best would be to actually
shuttle entire raw tuples between backends; second best, binary
format; third best, text format or mixture of text and binary.  I'm
not sure what it's reasonable to try to get away with there, but I
certainly think minimizing IPC costs is going to be an important goal.
I didn't try to get around with shipping raw tuples here because we
don't lock types while they're in use, and I'm worried that Bad Things
Could Happen.  But I'm sure somebody's going to care about the
overhead of converting back and forth at some point.

 I don't see how that follows. The error context logic is there to make
 it clearer where an error originated from. It'll be really confusing if
 there's ERRORs jumping out of a block of code without emitting context
 that has set a error context set.

I don't think I was proposing that, but I think I may have gotten a
little off-track here.  See what you think of the attached, which
seems to work.

 It does mean that if a security definer function
 starts a worker, and returns without detaching it or cleaning it up,
 the unprivileged user could then read the data back from that worker.
 That's more insidious than it may at first appear, because the
 security definer function could have been intending to read back the
 data before returning, and then a transaction abort happens.  We could
 add a guard requiring that the data be read by the same effective user
 ID that started the worker, although that might also foreclose useful
 things people would otherwise be able to do with this.

 I think such a restriction would be a good idea 

Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Jim Nasby

On 10/30/14, 4:55 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

On 10/30/2014 05:23 PM, Tom Lane wrote:

Anyway, I can confirm Peter's statement that the current tests work even
on quite old platforms, as long as you install IPC::Run.



So, I'm a bit confused. Is the --enable-tap-tests config setting still
on the table?


I think it should be.  You should not have to have either prove or
IPC::Run (or, IIRC, even Perl) in order to do make check-world.


Could configure detect if we have IPC::Run? ISTM it'd be nice to make this 
automatic instead of requiring it to be specifically enabled.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SET TABLESPACE ... NOWAIT

2014-10-30 Thread David G Johnston
Tomas Vondra wrote
 Also, the current phrasing If the NOWAIT option is specified then the
 command will fail if it is unable to acquire all of the locks required
 immediately. seems a bit ambiguous to me. Maybe it's just me, but I
 wasn't sure if that means locks for all objects immediately, before
 actually starting moving them to the new tablespace, and I had to check
 to code that I understand it correctly. But maybe that's just me and
 it's unambiguous to everyone else. Well, it would be really strange if
 it behaved differently ...

If you weren't sure that it meant locks for all objects [first]... what
did you think it could have meant instead?

NOWAIT semantics seem fairly straight-forward in that at no point do I want
to wait to get a lock so unless I can get all my locks right now give up and
let me try again when I am ready.  Once you have all the locks feel free to
do your thing cause the locks you hold will prevent you from having to wait
for someone else during your processing (aside from resource contention
anyways).

Note from the ALTER TABLE documentation section:

All tables in the current database in a tablespace can be moved by using
the ALL IN TABLESPACE form, which will lock all tables to be moved first and
then move each one.

This immediately precedes the comment about NOWAIT so there is not need to
reason things out...

Is there inconsistency between the different objects types with regard to
the wording in the documentation?  I may nitpick the wording itself but at
least for TABLE there is no ambiguity that I can see.

David J.






--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/SET-TABLESPACE-NOWAIT-tp5825098p5825107.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Tom Lane
Jim Nasby jim.na...@bluetreble.com writes:
 On 10/30/14, 4:55 PM, Tom Lane wrote:
 I think it should be.  You should not have to have either prove or
 IPC::Run (or, IIRC, even Perl) in order to do make check-world.

 Could configure detect if we have IPC::Run? ISTM it'd be nice to make this 
 automatic instead of requiring it to be specifically enabled.

The general philosophy we have about features enabled by configure is
exactly opposite to that: we do not for example look for Perl and then
build or not build plperl dependent on that.  If you want plperl, you
tell us so, and then we either build it or fail because we can't.

You could argue that test coverage is different from features of the
completed package, but I don't really buy that.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT

2014-10-30 Thread Jim Nasby

On 10/30/14, 2:13 PM, Heikki Linnakangas wrote:

On 10/30/2014 08:56 PM, Andres Freund wrote:

I actually think we should *always* use the new code and not
add a separate wal_level=minimal branch. Maintaining this twice just
isn't worth the effort. minimal is used *far* less these days.


I wouldn't go that far. Doing the wal_level=minimal optimization should be 
pretty straightforward. Note that it would be implemented more like CREATE 
INDEX et al with wal_level=minimal, not the way CREATE DATABASE currently 
works. It would not involve any extra checkpoints.


+1

At my previous job, we used createdb -T copy_from_production new_dev_database, 
because that was far faster than re-loading the raw SQL dump all the time. It'd 
be a shame to have that need to write the copied data 2x. IIRC that database 
was around 20MB.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Jim Nasby

On 10/30/14, 5:32 PM, Tom Lane wrote:

Jim Nasby jim.na...@bluetreble.com writes:

On 10/30/14, 4:55 PM, Tom Lane wrote:

I think it should be.  You should not have to have either prove or
IPC::Run (or, IIRC, even Perl) in order to do make check-world.



Could configure detect if we have IPC::Run? ISTM it'd be nice to make this 
automatic instead of requiring it to be specifically enabled.


The general philosophy we have about features enabled by configure is
exactly opposite to that: we do not for example look for Perl and then
build or not build plperl dependent on that.  If you want plperl, you
tell us so, and then we either build it or fail because we can't.

You could argue that test coverage is different from features of the
completed package, but I don't really buy that.


If our policy is that tests are there primarily for developers then I agree 
with you.

If not, then would we be OK with make check being a no-op unless you'd 
configured with --enable-make-check?

Making this something you have to enable will seriously limit the number of 
people running the TAP tests, simply because few will know to enable them.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_background (and more parallelism infrastructure patches)

2014-10-30 Thread Andres Freund
Hi,

On 2014-10-30 18:03:27 -0400, Robert Haas wrote:
 On Wed, Oct 29, 2014 at 4:58 PM, Andres Freund and...@2ndquadrant.com wrote:
  That's true.  I don't know what to do about it.  I'm somewhat inclined
  to think that, if this remains in contrib, it's OK to ignore those
  issues until such time as people complain about them, because anybody
  who dislikes the things that can be done with this extension doesn't
  have to install it.  Also, the people complaining might have useful
  ideas about what a good fix would look like, which I currently don't.
  There's some push to move this into core, which I think is overkill,
  but if we do it then we'd better have a good solution to this problem.
 
  At the very least it need to be clearly documented. Another solution
  would be to simply not give out PUBLIC rights, and restrict it to the
  owner/superuesers lest somebody makes explicit grants. I favor
  combining those two.
 
 I don't think it's appropriate to put superuser() checks in the code,
 if that's what you are proposing.

That's not at all what I'm thinking of... The superuser reference was
only that explicit EXECUTE rights aren't required for superusers.

 Forcing this to be super-user only is hitting a mouse with a
 battleship.  If you're saying we should put REVOKE commands into the
 install script as we do for some other modules, like dblink, that
 makes sense to me.

But instead this.

  We could try to make connection limits apply to pg_background, and we
  could also check CONNECT permission when starting a background worker.
  Both of those things feel slightly odd because there's no actual
  server connection.  There *might* be a connection to the user backend
  that started it, but it's sort of a virtual connection through
  shared memory, and the background process continues running unimpeded
  if it goes away, so there might be no actual connection at all.
 
  I think that'd not be bad.
 
 Looks like those checks happen in InitializeSessionUserId(), which is
 called from InitPostgres(), which is called from
 BackgroundWorkerInitializeConnection().  That makes me think we're
 already applying these checks.

Oh, neat.

  Hm. I'm unconvinced. It looks almost trivial to fail back to the text
  based protocol.
 
 It's hard to argue with I'm unconvinced.  What specifically about
 that argument do you think isn't valid?

We don't normally just say this is unsupported for things that are
perfectly valid. And types without binary send/recv functions imo are
perfectly valid. You've made that kind of argument yourself more than
once, no?
It's really not a academic thing. Popular extensions like postgis don't
provide send/recv for all its types. Same with a couple of our own
contrib types (intarray, chkpass, seg, cube).

I guess we need input from others on this point.

 While I am sure the problem can be worked around, it isn't trivial.
 Right now, execute_sql_string() just requests binary format
 unconditionally.  To do what you're talking about, we'd need to
 iterate through all of the types and figure out which ones have
 typsend/typreceive functions.  If there's a convenience function that
 will do that for us, I don't see it, probably because I believe there
 are exact zero situations where we do that kind of inference today.
 Then, the user backend has to save the format codes from the
 RowDescription message and decide whether to use text or binary.  That
 just seems like a silly waste of code and cycles.

The current client/server protocol definitely can do it. You can specify
per column whether you want binary/textual data. I'm pretty sure that at
least the jdbc driver does that.

It's also what I've decided to do for BDR's output plugin. Some builtin
types will be transferred 'as is' if the architecture/version
matches. Otherwise, if available and the version matches, send/recv will
be used (exluding some corner case, but ..). Only if neither is the case
if falls back to the textual protocol.

 I think this actually matters, too, because the question is what we're
 going to do with full-blown parallelism.

I think for parallelism it'll most definitely not be acceptable to not
support types without send/recv. So building some convenience
infrastructure here imo is going to pay off. It's not like it's that
hard to detect - just check OidIsValid(pg_type-typsend). The pg_type
tuple has to be looked up anyway to find the send/recv functions...

 Best would be to actually
 shuttle entire raw tuples between backends; second best, binary
 format; third best, text format or mixture of text and binary.  I'm
 not sure what it's reasonable to try to get away with there, but I
 certainly think minimizing IPC costs is going to be an important goal.
 I didn't try to get around with shipping raw tuples here because we
 don't lock types while they're in use, and I'm worried that Bad Things
 Could Happen.

Yea, because of type changes and such I went with using the 'as-is'
format only for builtin types 

Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Tom Lane
Jim Nasby jim.na...@bluetreble.com writes:
 If our policy is that tests are there primarily for developers then I agree 
 with you.

 If not, then would we be OK with make check being a no-op unless you'd 
 configured with --enable-make-check?

 Making this something you have to enable will seriously limit the number of 
 people running the TAP tests, simply because few will know to enable them.

Well, TBH I have no problem with that at the moment, because as Robert has
pointed out the current TAP tests are of close to zero value.  The odds
that they'll find anything in the hands of Joe Random User are far lower
than the odds that they'll break Joe's build.

At some point down the road that value judgement might (hopefully will)
reverse, and then we could deal with it by making --enable-tap-tests the
default.  But even then there would be a place for --disable-tap-tests.
The current situation, where the only way to disable the TAP tests is to
not do make check-world, is utterly unacceptable given their low present
usefulness and lack of proven portability.

I opined before that we should rip those tests out of the 9.4 branch
altogether.  I'm willing to leave them there if we have an --enable
switch controlling them, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: CREATE DATABASE vs. (partial) CHECKPOINT

2014-10-30 Thread Andres Freund
On 2014-10-30 18:06:02 -0500, Jim Nasby wrote:
 On 10/30/14, 2:13 PM, Heikki Linnakangas wrote:
 On 10/30/2014 08:56 PM, Andres Freund wrote:
 I actually think we should *always* use the new code and not
 add a separate wal_level=minimal branch. Maintaining this twice just
 isn't worth the effort. minimal is used *far* less these days.
 
 I wouldn't go that far. Doing the wal_level=minimal optimization should be 
 pretty straightforward. Note that it would be implemented more like CREATE 
 INDEX et al with wal_level=minimal, not the way CREATE DATABASE currently 
 works. It would not involve any extra checkpoints.

It's probably not that hard. I agree. Imo it's up to the person doing
this conversion. We imo shouldn't require that person to develop both
versions, but if they're interested in doing it: fine with me.

 At my previous job, we used createdb -T copy_from_production 
 new_dev_database, because that was far faster than re-loading the raw SQL 
 dump all the time. It'd be a shame to have that need to write the copied data 
 2x. IIRC that database was around 20MB.

At that size not doing two immediate checkpoints will still be an order
of magnitude or so bigger win.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Andres Freund
On 2014-10-30 19:30:04 -0400, Tom Lane wrote:
 Jim Nasby jim.na...@bluetreble.com writes:
  If our policy is that tests are there primarily for developers then I agree 
  with you.
 
  If not, then would we be OK with make check being a no-op unless you'd 
  configured with --enable-make-check?
 
  Making this something you have to enable will seriously limit the number of 
  people running the TAP tests, simply because few will know to enable them.
 
 Well, TBH I have no problem with that at the moment, because as Robert has
 pointed out the current TAP tests are of close to zero value.  The odds
 that they'll find anything in the hands of Joe Random User are far lower
 than the odds that they'll break Joe's build.

I already have a couple more ready once this has stabilized...

I personally don't agree that they have no value at this point. At least
the pg_basebackup tests test paths that are not executed *at all*
otherwise and which are not trivial. To my knowledge it's the only thing
in our tests that exercises walsender and wal_level != minimal.

 At some point down the road that value judgement might (hopefully will)
 reverse, and then we could deal with it by making --enable-tap-tests the
 default.  But even then there would be a place for
 --disable-tap-tests.

Which would be what exactly?

 The current situation, where the only way to disable the TAP tests is to
 not do make check-world, is utterly unacceptable given their low present
 usefulness and lack of proven portability.

Agreed.

 I opined before that we should rip those tests out of the 9.4 branch
 altogether.  I'm willing to leave them there if we have an --enable
 switch controlling them, though.

Hm. I'm not convinced that that's the best way, but I'm not going fight
hard.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-10-30 19:30:04 -0400, Tom Lane wrote:
 At some point down the road that value judgement might (hopefully will)
 reverse, and then we could deal with it by making --enable-tap-tests the
 default.  But even then there would be a place for
 --disable-tap-tests.

 Which would be what exactly?

Well, for example, you don't have and don't want to install IPC::Run.
Or you've tripped over some failure in the TAP tests (which might or
might not be an actual bug) but would like to complete your build anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SET TABLESPACE ... NOWAIT

2014-10-30 Thread Tomas Vondra
On 30.10.2014 23:19, David G Johnston wrote:
 Tomas Vondra wrote
 Also, the current phrasing If the NOWAIT option is specified then
 the command will fail if it is unable to acquire all of the locks
 required immediately. seems a bit ambiguous to me. Maybe it's just
 me, but I wasn't sure if that means locks for all objects
 immediately, before actually starting moving them to the new
 tablespace, and I had to check to code that I understand it
 correctly. But maybe that's just me and it's unambiguous to
 everyone else. Well, it would be really strange if it behaved
 differently ...
 
 If you weren't sure that it meant locks for all objects [first]...
 what did you think it could have meant instead?
 
 NOWAIT semantics seem fairly straight-forward in that at no point do
 I want to wait to get a lock so unless I can get all my locks right 
 now give up and let me try again when I am ready. Once you have all
 the locks feel free to do your thing cause the locks you hold will
 prevent you from having to wait for someone else during your
 processing (aside from resource contention anyways).
 
 Note from the ALTER TABLE documentation section:
 
 All tables in the current database in a tablespace can be moved by
 using the ALL IN TABLESPACE form, which will lock all tables to be
 moved first and then move each one.
 
 This immediately precedes the comment about NOWAIT so there is not 
 need to reason things out...

EINSUFFICIENTCAFFEINE I guess ... I was already in slighly confused mode
because of something else, and apparently I skipped the sentence about
locking all the tables first.

So nevermind, let's forget this ever happened, OK?

Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Andres Freund
On 2014-10-30 19:53:54 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-10-30 19:30:04 -0400, Tom Lane wrote:
  At some point down the road that value judgement might (hopefully will)
  reverse, and then we could deal with it by making --enable-tap-tests the
  default.  But even then there would be a place for
  --disable-tap-tests.
 
  Which would be what exactly?
 
 Well, for example, you don't have and don't want to install IPC::Run.

Well, that's what the hypothetical configure test is for. I see little
reason in this specific case to do anything more complicated than check
for prove and IPC::Run in configure and use them if necessary.

 Or you've tripped over some failure in the TAP tests (which might or
 might not be an actual bug) but would like to complete your build anyway.

Well. That's just the same with plain regression tests. But I guess
adding --disable is essentially free, so whatever.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to handle missing prove

2014-10-30 Thread Peter Eisentraut
On 10/28/14 10:01 PM, Peter Eisentraut wrote:
 On 10/28/14 9:16 PM, Tom Lane wrote:
 ISTM that the project policy for external components like this has been
 don't rely on them unless user says to use them, in which case fail if
 they aren't present.  So perhaps what we ought to have is a configure
 switch along the lines of --enable-tap-tests.  If you don't specify it,
 prove_check expands to nothing.  If you do specify it, we fail if we
 lack any of the expected support, both prove and whatever the agreed-on
 set of Perl modules is.
 
 That's also a good idea.

Here is a patch.


From 27b303938e4472c1a28f216978c847aed5750ced Mon Sep 17 00:00:00 2001
From: Peter Eisentraut pete...@gmx.net
Date: Thu, 30 Oct 2014 20:08:45 -0400
Subject: [PATCH] Add configure --enable-tap-tests option

---
 configure  | 41 -
 configure.in   | 17 -
 doc/src/sgml/installation.sgml | 10 ++
 src/Makefile.global.in |  8 
 4 files changed, 74 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 1248b06..1b29be6 100755
--- a/configure
+++ b/configure
@@ -729,6 +729,7 @@ CPPFLAGS
 LDFLAGS
 CFLAGS
 CC
+enable_tap_tests
 enable_dtrace
 DTRACEFLAGS
 DTRACE
@@ -808,6 +809,7 @@ enable_debug
 enable_profiling
 enable_coverage
 enable_dtrace
+enable_tap_tests
 with_blocksize
 with_segsize
 with_wal_blocksize
@@ -1477,6 +1479,7 @@ Optional Features:
   --enable-profiling  build with profiling enabled
   --enable-coverage   build with coverage testing instrumentation
   --enable-dtrace build with DTrace support
+  --enable-tap-tests  enable TAP tests (requires Perl)
   --enable-depend turn on automatic dependency tracking
   --enable-cassertenable assertion checks (for debugging)
   --disable-thread-safety disable thread-safety in client libraries
@@ -3466,6 +3469,34 @@ fi
 
 
 #
+# TAP tests
+#
+
+
+# Check whether --enable-tap-tests was given.
+if test ${enable_tap_tests+set} = set; then :
+  enableval=$enable_tap_tests;
+  case $enableval in
+yes)
+  :
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? no argument expected for --enable-tap-tests option $LINENO 5
+  ;;
+  esac
+
+else
+  enable_tap_tests=no
+
+fi
+
+
+
+
+#
 # Block size
 #
 { $as_echo $as_me:${as_lineno-$LINENO}: checking for block size 5
@@ -14785,7 +14816,8 @@ done
 #
 # Check for test tools
 #
-for ac_prog in prove
+if test $enable_tap_tests = yes; then
+  for ac_prog in prove
 do
   # Extract the first word of $ac_prog, so it can be a program name with args.
 set dummy $ac_prog; ac_word=$2
@@ -14827,6 +14859,13 @@ fi
   test -n $PROVE  break
 done
 
+  if test -z $PROVE; then
+as_fn_error $? prove not found $LINENO 5
+  fi
+  if test -z $PERL; then
+as_fn_error $? Perl not found $LINENO 5
+  fi
+fi
 
 # Thread testing
 
diff --git a/configure.in b/configure.in
index 0a3725f..fe6f735 100644
--- a/configure.in
+++ b/configure.in
@@ -229,6 +229,13 @@ AC_SUBST(DTRACEFLAGS)])
 AC_SUBST(enable_dtrace)
 
 #
+# TAP tests
+#
+PGAC_ARG_BOOL(enable, tap-tests, no,
+  [enable TAP tests (requires Perl)])
+AC_SUBST(enable_tap_tests)
+
+#
 # Block size
 #
 AC_MSG_CHECKING([for block size])
@@ -1876,7 +1883,15 @@ AC_CHECK_PROGS(OSX, [osx sgml2xml sx])
 #
 # Check for test tools
 #
-AC_CHECK_PROGS(PROVE, prove)
+if test $enable_tap_tests = yes; then
+  AC_CHECK_PROGS(PROVE, prove)
+  if test -z $PROVE; then
+AC_MSG_ERROR([prove not found])
+  fi
+  if test -z $PERL; then
+AC_MSG_ERROR([Perl not found])
+  fi
+fi
 
 # Thread testing
 
diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 68931d2..adde5b3 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -1271,6 +1271,16 @@ titleConfiguration/title
/listitem
   /varlistentry
 
+  varlistentry
+   termoption--enable-tap-tests/option/term
+   listitem
+para
+ Enable tests using the Perl TAP tools.  This requires a Perl
+ installation and the Perl module literalIPC::Run/literal.
+ See xref linkend=regress-tap for more information.
+/para
+   /listitem
+  /varlistentry
  /variablelist
 /para
 
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index b04d005..63ff50b 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -174,6 +174,7 @@ enable_nls	= @enable_nls@
 enable_debug	= @enable_debug@
 enable_dtrace	= @enable_dtrace@
 enable_coverage	= @enable_coverage@
+enable_tap_tests	= @enable_tap_tests@
 enable_thread_safety	= @enable_thread_safety@
 
 python_enable_shared	= @python_enable_shared@
@@ -310,6 +311,8 @@ define ld_library_path_var
 $(if $(filter $(PORTNAME),darwin),DYLD_LIBRARY_PATH,$(if $(filter $(PORTNAME),aix),LIBPATH,LD_LIBRARY_PATH))
 endef
 
+ifeq ($(enable_tap_tests),yes)
+
 define prove_installcheck
 cd $(srcdir)  TESTDIR='$(CURDIR)' 

Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-10-30 19:53:54 -0400, Tom Lane wrote:
 Well, for example, you don't have and don't want to install IPC::Run.

 Well, that's what the hypothetical configure test is for. I see little
 reason in this specific case to do anything more complicated than check
 for prove and IPC::Run in configure and use them if necessary.

As I said upthread, that approach seems to me to be contrary to the
project policy about how configure should behave.  If you have selected
(or, someday, defaulted to) --enable-tap-tests, configure should *fail*
if you don't have the tools to run the tests.  Not silently disable tests
that we have decided are valuable.  How exactly would that be different
from silently omitting readline support if we don't find that library?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Noah Misch
On Thu, Oct 30, 2014 at 04:54:25PM +0100, Andres Freund wrote:
 On 2014-10-30 01:57:15 -0400, Noah Misch wrote:
  On Wed, Oct 29, 2014 at 08:14:07PM -0400, Peter Eisentraut wrote:
   On 10/28/14 9:09 PM, Peter Eisentraut wrote:
I have looked into IPC::Cmd, but the documentation keeps telling me that
to do anything interesting I have to have IPC::Run anyway.  I'll give it
a try, though.
   
   I tried this, but I'm not optimistic about it.  While parts of IPC::Cmd
   are actually a bit nicer than IPC::Run, other parts are weird.  For
   example, with most packages and functions in Perl that run a command,
   you can pass the command as a string or as a list (or array reference).
The latter is preferred because it avoids issues with quoting, word
   splitting, spaces, etc.  In IPC::Run, I can use the run function in
   the latter way, but I cannot use the run_forked function like that,
   and I need that one to get the exit code of a command.  It's possible to
   work around that, but I'm getting the feeling that this is not very well
   designed.
  
  Ick; I concur with your judgment on those aspects of the IPC::Cmd design.
  Thanks for investigating.  So, surviving options include:
  
  1. Require IPC::Run.
  2. Write our own run() that reports the raw exit code.
  3. Distill the raw exit code from the IPC::Cmd::run() error string.
  4. Pass IPC::Run::run_forked() a subroutine that execs an argument list.
  
  Any others worth noting?
 
 5. Include a copy of IPC::Run in the repository till it's common enough.

True.  I eliminated that one because the license of IPC::Run is, like Perl
itself, GPL+Artistic.  Right now, we presume that the entire PostgreSQL
tarball is distributable under the PostgreSQL License.  The benefit of
bundling IPC::Run is not strong enough to justify muddying that.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Andres Freund
On 2014-10-30 20:13:53 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-10-30 19:53:54 -0400, Tom Lane wrote:
  Well, for example, you don't have and don't want to install IPC::Run.
 
  Well, that's what the hypothetical configure test is for. I see little
  reason in this specific case to do anything more complicated than check
  for prove and IPC::Run in configure and use them if necessary.
 
 As I said upthread, that approach seems to me to be contrary to the
 project policy about how configure should behave.

I don't think that holds much water. There's a fair amount of things
that configure detects automatically. I don't think the comparison to
plperl or such is meaningful - that's a runtime/install time
difference. These tests are not.

We e.g. detect compiler support for certain features that result in
possible speedups and/or better warnings. we detect wether bison is
available...

 If you have selected
 (or, someday, defaulted to) --enable-tap-tests, configure should *fail*
 if you don't have the tools to run the tests.  Not silently disable tests
 that we have decided are valuable.  How exactly would that be different
 from silently omitting readline support if we don't find that library?

Because it doesn't result in a user visible regression?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review of Refactoring code for sync node detection

2014-10-30 Thread Michael Paquier
On Fri, Oct 31, 2014 at 6:59 AM, Jim Nasby jim.na...@bluetreble.com wrote:

 If we stick with this version I'd argue it makes more sense to just stick
 the sync_node = and priority = statements into the if block and ditch the
 continue. /nitpick

Let's go with the cleaner version then, I'd prefer code that can be read
easily for this code path. Switching back is not much complicated either.
-- 
Michael
From 00d85f046d187056a85c6d6dc82e9a79674b132d Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Thu, 30 Oct 2014 22:01:10 +0900
Subject: [PATCH] Refactor code to detect synchronous node in WAL sender array

This patch is made to remove code duplication in walsender.c and syncrep.c
in order to detect what is the node with the lowest strictly-positive
priority, facilitating maintenance of this code.
---
 src/backend/replication/syncrep.c   | 101 ++--
 src/backend/replication/walsender.c |  36 +++--
 src/include/replication/syncrep.h   |   1 +
 3 files changed, 82 insertions(+), 56 deletions(-)

diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index aa54bfb..848c783 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -5,7 +5,7 @@
  * Synchronous replication is new as of PostgreSQL 9.1.
  *
  * If requested, transaction commits wait until their commit LSN is
- * acknowledged by the sync standby.
+ * acknowledged by the synchronous standby.
  *
  * This module contains the code for waiting and release of backends.
  * All code in this module executes on the primary. The core streaming
@@ -357,6 +357,72 @@ SyncRepInitConfig(void)
 	}
 }
 
+
+/*
+ * Obtain position of synchronous standby in the array referencing all
+ * the WAL senders, or -1 if no synchronous node can be found. The caller
+ * of this function should take a lock on SyncRepLock. If there are
+ * multiple nodes with the same lowest priority value, the first node
+ * found is selected.
+ * sync_priority is a preallocated array of size max_wal_senders that
+ * can be used to retrieve the priority of each WAL sender. Its
+ * inclusion in this function has the advantage to limit the scan
+ * of the WAL sender array to one pass, limiting the amount of cycles
+ * SyncRepLock is taken.
+ */
+int
+SyncRepGetSynchronousNode(int *node_priority)
+{
+	int		sync_node = -1;
+	int		priority = 0;
+	int		i;
+
+	/* Scan WAL senders and find synchronous node if any */
+	for (i = 0; i  max_wal_senders; i++)
+	{
+		/* Use volatile pointer to prevent code rearrangement */
+		volatile WalSnd *walsnd = WalSndCtl-walsnds[i];
+
+		/* First get the priority of this WAL sender */
+		if (node_priority)
+			node_priority[i] = XLogRecPtrIsInvalid(walsnd-flush) ?
+0 : walsnd-sync_standby_priority;
+
+		/* Proceed to next if not active */
+		if (walsnd-pid == 0)
+			continue;
+
+		/* Proceed to next if not streaming */
+		if (walsnd-state != WALSNDSTATE_STREAMING)
+			continue;
+
+		/* Proceed to next one if asynchronous */
+		if (walsnd-sync_standby_priority == 0)
+			continue;
+
+		/* Proceed to next one if priority conditions not satisfied */
+		if (priority != 0 
+			priority = walsnd-sync_standby_priority)
+			continue;
+
+		/* Proceed to next one if flush position is invalid */
+		if (XLogRecPtrIsInvalid(walsnd-flush))
+			continue;
+
+		/*
+		 * We have a potential synchronous candidate, choose it as the
+		 * synchronous node for the time being before going through the
+		 * other nodes listed in the WAL sender array.
+		 */
+		sync_node = i;
+
+		/* Update priority to current value of WAL sender */
+		priority = walsnd-sync_standby_priority;
+	}
+
+	return sync_node;
+}
+
 /*
  * Update the LSNs on each queue based upon our latest state. This
  * implements a simple policy of first-valid-standby-releases-waiter.
@@ -369,10 +435,9 @@ SyncRepReleaseWaiters(void)
 {
 	volatile WalSndCtlData *walsndctl = WalSndCtl;
 	volatile WalSnd *syncWalSnd = NULL;
+	int			sync_node;
 	int			numwrite = 0;
 	int			numflush = 0;
-	int			priority = 0;
-	int			i;
 
 	/*
 	 * If this WALSender is serving a standby that is not on the list of
@@ -388,32 +453,14 @@ SyncRepReleaseWaiters(void)
 	/*
 	 * We're a potential sync standby. Release waiters if we are the highest
 	 * priority standby. If there are multiple standbys with same priorities
-	 * then we use the first mentioned standby. If you change this, also
-	 * change pg_stat_get_wal_senders().
+	 * then we use the first mentioned standbys.
 	 */
 	LWLockAcquire(SyncRepLock, LW_EXCLUSIVE);
+	sync_node = SyncRepGetSynchronousNode(NULL);
 
-	for (i = 0; i  max_wal_senders; i++)
-	{
-		/* use volatile pointer to prevent code rearrangement */
-		volatile WalSnd *walsnd = walsndctl-walsnds[i];
-
-		if (walsnd-pid != 0 
-			walsnd-state == WALSNDSTATE_STREAMING 
-			walsnd-sync_standby_priority  0 
-			(priority == 0 ||
-			 priority  walsnd-sync_standby_priority) 
-			!XLogRecPtrIsInvalid(walsnd-flush))
-		{

Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-10-30 20:13:53 -0400, Tom Lane wrote:
 As I said upthread, that approach seems to me to be contrary to the
 project policy about how configure should behave.

 I don't think that holds much water. There's a fair amount of things
 that configure detects automatically. I don't think the comparison to
 plperl or such is meaningful - that's a runtime/install time
 difference. These tests are not.

Meh.  Right now, it's easy to dismiss these tests as unimportant,
figuring that they play little part in whether the completed build
is reliable.  But that may not always be true.  If they do become
a significant part of our test arsenal, silently omitting them will
not be cool for configure to do.

We think that it's okay to autoconfigure things like which semaphore
technology to use in part because we believe we can test the results.
I think if someone asks for make check-world, it should be pretty
deterministic what that means.

Historical note: I was not originally very much on board with the strict
enable-what-you-want policy for configure behavior, but I got religion
after working at Red Hat for awhile.  Nondeterministic package build
behaviors *suck*.  Here's one example:
https://bugzilla.redhat.com/show_bug.cgi?id=427063

 If you have selected
 (or, someday, defaulted to) --enable-tap-tests, configure should *fail*
 if you don't have the tools to run the tests.  Not silently disable tests
 that we have decided are valuable.  How exactly would that be different
 from silently omitting readline support if we don't find that library?

 Because it doesn't result in a user visible regression?

Uncaught bugs become user-visible regressions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] how to handle missing prove

2014-10-30 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On 10/28/14 10:01 PM, Peter Eisentraut wrote:
 On 10/28/14 9:16 PM, Tom Lane wrote:
 ISTM that the project policy for external components like this has been
 don't rely on them unless user says to use them, in which case fail if
 they aren't present.  So perhaps what we ought to have is a configure
 switch along the lines of --enable-tap-tests.  If you don't specify it,
 prove_check expands to nothing.  If you do specify it, we fail if we
 lack any of the expected support, both prove and whatever the agreed-on
 set of Perl modules is.

 That's also a good idea.

 Here is a patch.

Looks generally reasonable, but I thought you were planning to choose a
different option name?

One minor nitpick: perhaps the --help description of the option should
read

+  --enable-tap-tests  enable TAP tests (requires Perl and IPC::Run)

because in practice it'll be much more likely that people will be missing
IPC::Run than that they'll be missing Perl altogether.

Also, shouldn't we have it fail rather than just skipping tests if
IPC::Run is missing?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Andres Freund
On 2014-10-30 21:24:04 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-10-30 21:03:43 -0400, Tom Lane wrote:
  Meh.  Right now, it's easy to dismiss these tests as unimportant,
  figuring that they play little part in whether the completed build
  is reliable.  But that may not always be true.  If they do become
  a significant part of our test arsenal, silently omitting them will
  not be cool for configure to do.
 
  Well, I'm all for erroring out if somebody passed --enable-foo-tests and
  the prerequisites aren't there. What I *am* against is requiring an
  explicit flag to enable them because then they'll just not be run in
  enough environments. And that's what's much more likely to cause
  unnoticed bugs.
 
 Once they're at the point where they're actually likely to catch stuff
 of interest, I'll be all for enabling them by default.

Great. We already are at that point due to the pg_basebackup
tests.
If we slightly extend it to also start up the newly made base backups we
will have the first minimal automated test of recovery...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CREATE IF NOT EXISTS INDEX

2014-10-30 Thread Fabrízio de Royes Mello
On Thu, Oct 30, 2014 at 12:11 PM, Fujii Masao masao.fu...@gmail.com wrote:

 On Tue, Oct 7, 2014 at 2:42 AM, Fabrízio de Royes Mello
 fabriziome...@gmail.com wrote:
  On Mon, Oct 6, 2014 at 11:13 AM, Marti Raudsepp ma...@juffo.org wrote:
 
  On Mon, Oct 6, 2014 at 4:27 PM, Fabrízio de Royes Mello
  fabriziome...@gmail.com wrote:
   create_index_if_not_exists_v7.patch
 
  Looks good to me. Marking ready for committer.
 
 
  Thanks.

 The patch looks good to me except the following minor comments.

 +  termliteralIF NOT EXISTS/literal/term

 It's better to place this after the paragraph of CONCURRENTLY
 for the consistency with the syntax.


Fixed.


 +Do not throw an error if the index already exists.

 I think that this should be

 Do not throw an error if a relation with the same name already exists.


Fixed.


 +Index name is required when IF NOT EXISTS is specified.

 IF NOT EXISTS should be enclosed with literal tag.


Fixed.


 @@ -60,7 +60,8 @@ extern Oid index_create(Relation heapRelation,
   bool allow_system_table_mods,
   bool skip_build,
   bool concurrent,
 - bool is_internal);
 + bool is_internal,
 + bool if_not_exists);

 You forgot to add the comment of if_not_exists argument into the top of
 index_create function.


Fixed.


 +boolif_not_exists;/* just do nothing if index already
exists */

 You forgot to add the trailing ? at the above comment. There are similar
 comments in parsenodes.h, and they have such ?.


Fixed.


Thanks for your review!


Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index 43df32f..0414c26 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/ref/create_index.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/replaceable ] ON replaceable class=parametertable_name/replaceable [ USING replaceable class=parametermethod/replaceable ]
+CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [ IF NOT EXISTS ] replaceable class=parametername/replaceable ] ON replaceable class=parametertable_name/replaceable [ USING replaceable class=parametermethod/replaceable ]
 ( { replaceable class=parametercolumn_name/replaceable | ( replaceable class=parameterexpression/replaceable ) } [ COLLATE replaceable class=parametercollation/replaceable ] [ replaceable class=parameteropclass/replaceable ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
 [ WITH ( replaceable class=PARAMETERstorage_parameter/replaceable = replaceable class=PARAMETERvalue/replaceable [, ... ] ) ]
 [ TABLESPACE replaceable class=parametertablespace_name/replaceable ]
@@ -97,7 +97,6 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/
  refsect1
   titleParameters/title
 
-variablelist
  varlistentry
   termliteralUNIQUE/literal/term
   listitem
@@ -126,6 +125,19 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ replaceable class=parametername/
   /listitem
  /varlistentry
 
+variablelist
+ varlistentry
+  termliteralIF NOT EXISTS/literal/term
+  listitem
+   para
+Do not throw an error if a relation with the same name already exists.
+A notice is issued in this case. Note that there is no guarantee that
+the existing index is anything like the one that would have been created.
+Index name is required when literalIF NOT EXISTS/literal is specified.
+   /para
+  /listitem
+ /varlistentry
+
  varlistentry
   termreplaceable class=parametername/replaceable/term
   listitem
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 01ed880..c886f74 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -674,6 +674,8 @@ UpdateIndexRelation(Oid indexoid,
  *		will be marked invalid and the caller must take additional steps
  *		to fix it up.
  * is_internal: if true, post creation hook for new index
+ * if_not_exists: if true, issue a notice instead an error if the index with
+ *  the same name already exists
  *
  * Returns the OID of the created index.
  */
@@ -697,7 +699,8 @@ index_create(Relation heapRelation,
 			 bool allow_system_table_mods,
 			 bool skip_build,
 			 bool concurrent,
-			 bool is_internal)
+			 bool is_internal,
+			 bool if_not_exists)
 {
 	Oid			heapRelationId = RelationGetRelid(heapRelation);
 	Relation	pg_class;
@@ -773,10 +776,22 @@ index_create(Relation heapRelation,
 		elog(ERROR, shared relations must be placed in pg_global tablespace);
 
 	if (get_relname_relid(indexRelationName, 

Re: [HACKERS] Add CREATE support to event triggers

2014-10-30 Thread Michael Paquier
On Thu, Oct 30, 2014 at 2:40 AM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Oct 28, 2014 at 6:00 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  Uhm.  Obviously we didn't have jsonb when I started this and we do have
  them now, so I could perhaps see about updating the patch to do things
  this way; but I'm not totally sold on that idea, as my ObjTree stuff is
  a lot easier to manage and the jsonb API is pretty ugly.
 
  I looked at this as well, and I think trying to do so would not result
  in readable code.

 That doesn't speak very well of jsonb.  :-(


Just did the same and I played a bit with the APIs. And I am getting the
impression that the jsonb API is currently focused on the fact of deparsing
and parsing Jsonb strings to/from containers but there is no real interface
that allows to easily manipulate the containers where the values are
located. So, what I think is missing is really a friendly interface to
manipulate JsonbContainers directly, and I think that we are not far from
it with something like this set, roughly:
- Initialization of an empty container
- Set of APIs to directly push a value to a container (boolean, array,
null, string, numeric or other jsonb object)
- Initialization of JsonbValue objects
With this basic set of APIs patch 4 could for example use JsonbToCString to
then convert the JSONB bucket back to a string it sends to client. Note as
well that there is already findJsonbValueFromContainer present to get back
a value in a container.

In short, my point is: instead of re-creating the wheel like what this
series of patch is trying to do with ObjTree, I think that it would be more
fruitful to have a more solid in-place JSONB infrastructure that allows to
directly manipulate JSONB objects. This feature as well as future
extensions could benefit from that.
Feel free to comment.
Regards,
-- 
Michael


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Andrew Dunstan


On 10/30/2014 09:37 PM, Andres Freund wrote:

On 2014-10-30 21:24:04 -0400, Tom Lane wrote:

Andres Freund and...@2ndquadrant.com writes:

On 2014-10-30 21:03:43 -0400, Tom Lane wrote:

Meh.  Right now, it's easy to dismiss these tests as unimportant,
figuring that they play little part in whether the completed build
is reliable.  But that may not always be true.  If they do become
a significant part of our test arsenal, silently omitting them will
not be cool for configure to do.

Well, I'm all for erroring out if somebody passed --enable-foo-tests and
the prerequisites aren't there. What I *am* against is requiring an
explicit flag to enable them because then they'll just not be run in
enough environments. And that's what's much more likely to cause
unnoticed bugs.

Once they're at the point where they're actually likely to catch stuff
of interest, I'll be all for enabling them by default.

Great. We already are at that point due to the pg_basebackup
tests.
If we slightly extend it to also start up the newly made base backups we
will have the first minimal automated test of recovery...





There are other issues. I am not going to enable this in the buildfarm 
until the check test can work from a single install. It's insane for the 
bin tests to take an order of magnitude longer than the main regression 
suite.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] snapshot too large error when initializing logical replication (9.4)

2014-10-30 Thread Steve Singer

On 10/28/2014 01:27 PM, Andres Freund wrote:

Hi,

On 2014-10-25 18:09:36 -0400, Steve Singer wrote:

I sometimes get the error snapshot too large from my logical replication
walsender process when in response to a CREATE_REPLICATION_SLOT.

Yes. That's possible if 'too much' was going on until a consistent point
was reached.  I think we can just use a much larger size for the array
if necessary.

I've attached patch for this. Could you try whether that helps? I don't
have a testcase handy that reproduces the problem.


This patch seems to fix things.
I've done numerous runs of the test with I was doing before with your 
patch applied and don't seem to be having this issue anymore.




This is in SnapBuildExportSnapshot in snapbuild.c

newxcnt is 212 at that point

I have max_connections = 200

procArray-maxProcs=212

Should we be testing
newxcnt  GetMaxSnapshotXidCount()

instead of
newxcnt = GetMaxSnapshotXidCount()

It actually looks correct to me new - newxcnt is used as an offset into
an array of size GetMaxSnapshotXidCount().

Greetings,

Andres Freund







--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Tom Lane
Andrew Dunstan and...@dunslane.net writes:
 There are other issues. I am not going to enable this in the buildfarm 
 until the check test can work from a single install. It's insane for the 
 bin tests to take an order of magnitude longer than the main regression 
 suite.

I think the installs as such aren't the only reason for the sucky
performance.  We need to also reduce the number of initdb cycles incurred
by the TAP tests.  It's useless for example that the pg_controldata test
creates its very own $PGDATA rather than sharing one with other tests.

This line of thought implies that the tests will become less independent
of each other, which will probably result in them being a bit harder to
maintain.  Still, we are paying an awful lot of cycles for not much, as
things stand at the moment.

A couple other random ideas for shaving cycles:

* Use initdb -N (no fsync) where we do need to initdb.

* We probably don't need a full install tree for these types of tests;
it's tempting for instance to omit installing the include/ tree.  That
wouldn't save a large number of megabytes but it is a sizable number of
files, so it might cut the install/rm -rf time noticeably.

* In the same line, suppressing install of the timezone database file
tree would possibly save a useful number of cycles.  We do need to have
that data for functionality, but buildfarm owners could be encouraged 
to use --with-system-tzdata to shave install cycles.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CINE in CREATE TABLE AS ... and CREATE MATERIALIZED VIEW ...

2014-10-30 Thread Fabrízio de Royes Mello
On Mon, Oct 27, 2014 at 4:15 AM, Rushabh Lathia rushabh.lat...@gmail.com
wrote:

 Hi All,

 - Patch got applied cleanly.
 - Regression make check run fine.
 - Patch covered the documentation changes

 Here are few comments:

 1) What the need of following change:

 diff --git a/src/backend/storage/lmgr/lwlock.c
b/src/backend/storage/lmgr/lwlock.c
 index bcec173..9fe6855 100644
 --- a/src/backend/storage/lmgr/lwlock.c
 +++ b/src/backend/storage/lmgr/lwlock.c
 @@ -1005,12 +1005,6 @@ LWLockWaitForVar(LWLock *lock, uint64 *valptr,
uint64 oldval, uint64 *newval)
  lock-tail = proc;
  lock-head = proc;

 -/*
 - * Set releaseOK, to make sure we get woken up as soon as the
lock is
 - * released.
 - */
 -lock-releaseOK = true;
 -
  /* Can release the mutex now */
  SpinLockRelease(lock-mutex);


 It doesn't look like related to this patch.


Sorry... my mistake when diff to master (more updated than my branch).

Fixed.


 2)

 diff --git a/src/bin/psql/help.c b/src/bin/psql/help.c
 index ae5fe88..4d11952 100644
 --- a/src/bin/psql/help.c
 +++ b/src/bin/psql/help.c
 @@ -247,7 +247,7 @@ slashUsage(unsigned short int pager)
  fprintf(output, _(  \\f [STRING]show or set field
separator for unaligned query output\n));
  fprintf(output, _(  \\H toggle HTML output mode
(currently %s)\n),
  ON(pset.popt.topt.format == PRINT_HTML));
 -fprintf(output, _(  \\pset [NAME [VALUE]]   set table output
option\n
 +fprintf(output, _(  \\pset [NAME [VALUE]] set table output
option\n
 (NAME :=
{format|border|expanded|fieldsep|fieldsep_zero|footer|null|\n

numericlocale|recordsep|recordsep_zero|tuples_only|title|tableattr|pager|\n

unicode_border_linestyle|unicode_column_linestyle|unicode_header_linestyle})\n));


 Why above changes reqired ?


Same previous mistake.

Fixed.


 3) This patch adding IF NOT EXIST_S for CREATE TABLE AS and CREATE
MATERIALIZED
 TABLE, but testcase coverage for CREATE MATERIALIZED TABLE is missing.

 Apart from this changes looks good to me.


Fixed.

Thanks for your review.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello
diff --git a/doc/src/sgml/ref/create_materialized_view.sgml b/doc/src/sgml/ref/create_materialized_view.sgml
index 2c73852..8a0fb4d 100644
--- a/doc/src/sgml/ref/create_materialized_view.sgml
+++ b/doc/src/sgml/ref/create_materialized_view.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-CREATE MATERIALIZED VIEW replaceabletable_name/replaceable
+CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] replaceabletable_name/replaceable
 [ (replaceablecolumn_name/replaceable [, ...] ) ]
 [ WITH ( replaceable class=PARAMETERstorage_parameter/replaceable [= replaceable class=PARAMETERvalue/replaceable] [, ... ] ) ]
 [ TABLESPACE replaceable class=PARAMETERtablespace_name/replaceable ]
@@ -53,6 +53,18 @@ CREATE MATERIALIZED VIEW replaceabletable_name/replaceable
  refsect1
   titleParameters/title
 
+   varlistentry
+termliteralIF NOT EXISTS//term
+listitem
+ para
+  Do not throw an error if a materialized view with the same name already
+  exists. A notice is issued in this case.  Note that there is no guarantee
+  that the existing materialized view is anything like the one that would
+  have been created.
+ /para
+/listitem
+   /varlistentry
+
   variablelist
varlistentry
 termreplaceabletable_name/replaceable/term
diff --git a/doc/src/sgml/ref/create_table_as.sgml b/doc/src/sgml/ref/create_table_as.sgml
index 60300ff..8e4ada7 100644
--- a/doc/src/sgml/ref/create_table_as.sgml
+++ b/doc/src/sgml/ref/create_table_as.sgml
@@ -21,7 +21,7 @@ PostgreSQL documentation
 
  refsynopsisdiv
 synopsis
-CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE replaceabletable_name/replaceable
+CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] replaceabletable_name/replaceable
 [ (replaceablecolumn_name/replaceable [, ...] ) ]
 [ WITH ( replaceable class=PARAMETERstorage_parameter/replaceable [= replaceable class=PARAMETERvalue/replaceable] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
 [ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
@@ -91,6 +91,17 @@ CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE replaceable
/varlistentry
 
varlistentry
+termliteralIF NOT EXISTS//term
+listitem
+ para
+  Do not throw an error if a relation with the same name already exists.
+  A notice is issued in this case. Refer to xref linkend=sql-createtable
+  for details.
+ /para
+

Re: [HACKERS] pg_basebackup fails with long tablespace paths

2014-10-30 Thread Peter Eisentraut
On 10/29/14 10:48 AM, Robert Haas wrote:
 On Tue, Oct 28, 2014 at 8:29 PM, Peter Eisentraut pete...@gmx.net wrote:
 On 10/20/14 2:59 PM, Tom Lane wrote:
 My Salesforce colleague Thomas Fanghaenel observed that the TAP tests
 for pg_basebackup fail when run in a sufficiently deeply-nested directory
 tree.

 As for the test, we can do something like the attached to mark the test
 as TODO.
 
 What does this actually do?  It doesn't appear that it's just
 disabling the test.

It still runs the tests, but doesn't count the results in whether the
suite passes.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On 10/7/14 1:57 PM, Tom Lane wrote:
 Peter had a patch to eliminate the overhead of multiple subinstalls;
 not sure where that stands, but presumably it would address your issue.

 It will need some cleverness to sort out the parallel make issues that
 were brought up in the review thread.

I took a quick look.  I concur with Fabien that the dependency on
MAKELEVEL seems pretty horrid: in particular, will that not break the
ability to initiate make check from somewhere below the top directory?

I wonder whether it could be solved by having code in the toplevel
Makefile that actually makes the test install tree, and not as a .PHONY
target but along the lines of

tmp-install-stamp:
rm -rf tmp_install   # in case we failed during a previous attempt
$(MKDIR_P) tmp_install/log
$(MAKE) ... etc etc ...
touch tmp-install-stamp

and then also at top level, put tmp-install-stamp as a prerequisite
for check-world, and then in every subdirectory that has a check
target, add something like

$(abs_top_builddir)/tmp-install-stamp:
$(MAKE) -C $(abs_top_builddir) tmp-install-stamp

check: $(abs_top_builddir)/tmp-install-stamp


The way this solves the parallel make problem is that no matter
where you invoke make check, the first thing it would have to
do is create the tmp_install directory if it's not done already,
before it can launch any parallel operations.  Or at least I hope
it'd work like that; I've not actually tried it.

Possibly some of these rules could be kept in Makefile.global
so as to avoid having to touch so many individual Makefiles.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] infinite loop in _bt_getstackbuf

2014-10-30 Thread Noah Misch
On Thu, Oct 30, 2014 at 03:52:01PM -0400, Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Robert Haas wrote:
  A colleague at EnterpriseDB today ran into a situation on PostgreSQL
  9.3.5 where the server went into an infinite loop while attempting a
  VACUUM FREEZE; it couldn't escape _bt_getstackbuf(), and it couldn't
  be killed with ^C.   I think we should add a check for interrupts into
  that loop somewhere;
 
  Our design principle in this area is that all loops should have
  CHECK_FOR_INTERRUPTS() calls somewhere, so that even if data is horribly
  corrupted you can get out of it.
 
 FWIW, I concur with Alvaro that adding a CHECK_FOR_INTERRUPTS() needn't
 require much discussion.

+1

 Given the lack of prior complaints about this
 loop, I'm not sure I see the need to work harder than that; corruption
 of this sort must be quite rare.

Looks like _bt_getstackbuf() is always called with some buffer lock held, so
CHECK_FOR_INTERRUPTS() alone would not help:

http://www.postgresql.org/message-id/flat/16519.1401395...@sss.pgh.pa.us


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] infinite loop in _bt_getstackbuf

2014-10-30 Thread Tom Lane
Noah Misch n...@leadboat.com writes:
 Looks like _bt_getstackbuf() is always called with some buffer lock held, so
 CHECK_FOR_INTERRUPTS() alone would not help:
 http://www.postgresql.org/message-id/flat/16519.1401395...@sss.pgh.pa.us

Oooh, good point.  I never followed up on that idea, but we would have to
in order to fix the case Robert is on about.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Tom Lane
I wrote:
 I took a quick look.  I concur with Fabien that the dependency on
 MAKELEVEL seems pretty horrid: in particular, will that not break the
 ability to initiate make check from somewhere below the top directory?

Another use-case that seems to be broken both by Peter's patch and my
proposed variant is

 configure ...
 make ...
 cd src/bin/something
 hack hack hack
 make
 make check
 ooops ... hack some more
 make
 make check

Neither proposed approach would result in reinstalling the updated binary
you just built into the common temp install tree, so the second make
check step would just retest the old binary.

The simplest solution that comes to mind is to define a temp-install
target that is the same as make install except it installs the stuff
built in the current directory into the temp install tree instead of the
configured installation target tree.  You'd have to remember to do make
temp-install before a make check; but it would work, and it's not so
different from make install then make installcheck, which is what you
usually do for this use-case if you've got the tree configured to install
into a throwaway installation location.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-10-30 Thread Amit Kapila
On Thu, Oct 30, 2014 at 2:28 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 30 October 2014 04:24, Amit Kapila amit.kapil...@gmail.com wrote:

  Locking the toast table of any main tables we access seems easily
  done. Though perhaps we should make weak locking of the toast table
  presumed. Do we have cases where the toast table can be accessed when
  the main table is not also strong locked first?
 
  I think it is possible to have a strong lock on toast table before
  main table.
  Cluster pg_toast.toast_table_name using toast_index;
  Vacuum Full pg_toast.toast_table_name;
  Reindex Index pg_toast.toast_index_name;
  ..
 
  Now if take the lock on toast table in main task, it will block some of
  the operations before actually they need to be blocked.

 Very strange.

 Those commands are not common operations? I doubt many people even
 know that exists.

 All of those commands would block SELECTs on the main table, so there
 is no significant benefit in that behaviour.

 In fact it would be more sensible to lock the toast table earlier.


It might make some sense to lock the toast table earlier for this
particular case, but I don't think in general it will be feasible to lock
all the tables (including catalog tables) which might get used in the
overall operation before starting parallel operation.  I believe it will
make doing parallel stuff difficult if we try to go via this route, as
we need to always do this for any operation we want to make parallel.
It will always have the risk that we might miss something and another
thing is it might not be feasible in all kind of cases.

If I understand correctly, you are suggesting that to get the first version
of parallel operation out earlier, we should try to avoid all the stuff
(like group locking, ...), have some restrictions on which kind of cases
Create Index can work, do some hackery in Create Index path to avoid
any kind of locking problems and do the parallel operation for External
Sort (which might avoid need for shared memory allocator) and then
call it a day and then do the things which we need for other parallel
operations as and when they are required.  I think this might be a good
approach in itself if somebody wants to target something to release
early, however in medium to short term when we want to provide
non-trivial parallel operations we have to eventually solve those problems
and delaying will only make the things worse.

In short, I agree that there is a merit in your idea w.r.t getting the first
version of parallel operation out early, however if we see from medium
to long term, I feel solving group locking problem (or some other general
infrastructure what Robert mentioned upthread) can yield better results,
unless you feel that is not at all required for parallel operations.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] TAP test breakage on MacOS X

2014-10-30 Thread Noah Misch
On Thu, Oct 30, 2014 at 10:49:33PM -0400, Tom Lane wrote:
 Andrew Dunstan and...@dunslane.net writes:
  There are other issues. I am not going to enable this in the buildfarm 
  until the check test can work from a single install. It's insane for the 
  bin tests to take an order of magnitude longer than the main regression 
  suite.
 
 I think the installs as such aren't the only reason for the sucky
 performance.  We need to also reduce the number of initdb cycles incurred
 by the TAP tests.  It's useless for example that the pg_controldata test
 creates its very own $PGDATA rather than sharing one with other tests.
 
 This line of thought implies that the tests will become less independent
 of each other, which will probably result in them being a bit harder to
 maintain.  Still, we are paying an awful lot of cycles for not much, as
 things stand at the moment.

One could memoize initdb within the suite.  Call it once per distinct command
line, caching the resulting data directory.  Copy the cached data directory
for each test desiring one.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-10-30 Thread Michael Paquier
On Fri, Oct 31, 2014 at 2:52 AM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 10/30/2014 06:02 PM, Andres Freund wrote:

 On 2014-10-29 10:24:20 +0200, Heikki Linnakangas wrote:

 On 10/06/2014 02:29 PM, Andres Freund wrote:

 I've not yet really looked,
 but on a quick readthrough XLogInsertRecData() staying in xlog.c doesn't
 make me happy...


 Ok.. Can you elaborate?


 To me the split between xloginsert.c doing some of the record assembly,
 and xlog.c doing the lower level part of the assembly is just wrong.


 Really? To me, that feels like a natural split. xloginsert.c is
 responsible for constructing the final WAL record, with the backup blocks
 and all. It doesn't access any shared memory (related to WAL; it does look
 at Buffers, to determine what to back up). The function in xlog.c inserts
 the assembled record to the WAL, as is. It handles the locking and WAL
 buffer management related to that.

FWIW, I tend to the same opinion here.

What would you suggest? I don't think putting everything in one XLogInsert
 function, like we have today, is better. Note that the second patch makes
 xloginsert.c a lot more complicated.

I recall some time ago seeing complaints that xlog.c is too complicated and
should be refactored. Any effort in this area is a good thing IMO, and this
split made sense when I went through the code.


  I'm not a big fan of the naming for the new split. We have
 * XLogInsert() which is the external interface
 * XLogRecordAssemble() which builds the rdata chain that will be
inserted
 * XLogInsertRecData() which actually inserts the data into the xl buffers.

 To me that's pretty inconsistent.


 Got a better suggestion? I wanted to keep the name XLogInsert() unchanged,
 to avoid code churn, and because it's still a good name for that.
 XLogRecordAssemble is pretty descriptive for what it does, although
 XLogRecord implies that it construct an XLogRecord struct. It does fill
 in that too, but the main purpose is to build the XLogRecData chain.
 Perhaps XLogAssembleRecord() would be better.

 I'm not very happy with XLogInsertRecData myself. XLogInsertRecord?

+1 for XLogInsertRecord.
-- 
Michael


Re: [HACKERS] tracking commit timestamps

2014-10-30 Thread Michael Paquier
On Tue, Oct 28, 2014 at 9:25 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On 13 October 2014 10:05, Petr Jelinek p...@2ndquadrant.com wrote:

  I worked bit on this patch to make it closer to committable state.

  Here is updated version that works with current HEAD for the October
  committfest.

 I've reviewed this and it looks good to me. Clean, follows existing
 code neatly, documented and tested.

 Please could you document a few things

 * ExtendCommitTS() works only because commit_ts_enabled can only be
 set at server start.
 We need that documented so somebody doesn't make it more easily
 enabled and break something.
 (Could we make it enabled at next checkpoint or similar?)

 * The SLRU tracks timestamps of both xids and subxids. We need to
 document that it does this because Subtrans SLRU is not persistent. If
 we made Subtrans persistent we might need to store only the top level
 xid's commitTS, but that's very useful for typical use cases and
 wouldn't save much time at commit.


Hm. What is the performance impact of this feature using the latest version
of this patch? I imagine that the penalty of the additional operations this
feature introduces is not zero, particularly for loads with lots of short
transactions.
-- 
Michael