Re: [HACKERS] strncpy is not a safe version of strcpy

2014-08-15 Thread Noah Misch
On Thu, Aug 14, 2014 at 02:50:02AM -0400, Tom Lane wrote:
 Noah Misch n...@leadboat.com writes:
  On Wed, Aug 13, 2014 at 10:21:50AM -0400, Tom Lane wrote:
  I believe that we deal with this by the expedient of checking the lengths
  of tablespace paths in advance, when the tablespace is created.
 
  The files under scrutiny here are not located in a tablespace.  Even if they
  were, isn't the length of $PGDATA/pg_tblspc the important factor?
 
 The length of $PGDATA is of no relevance whatsoever; we chdir into that
 directory at startup, and subsequently all paths are implicitly relative
 to there.  If there is any backend code that's prepending $PGDATA to
 something else, it's wrong to start with.

Ah; quite right.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers

2014-08-15 Thread Michael Paquier
On Thu, Aug 14, 2014 at 8:34 PM, Fujii Masao masao.fu...@gmail.com wrote:
 +At any one time there will be at a number of active
 synchronous standbys
 +defined by varnamesynchronous_standby_num/; transactions waiting

 It's better to use xref linkend=guc-synchronous-standby-num, instead.
Fixed.

 +for commit will be allowed to proceed after those standby servers
 +confirms receipt of their data. The synchronous standbys will be

 Typo: confirms - confirm

Fixed.

 +   para
 +Specifies the number of standbys that support
 +firsttermsynchronous replication/, as described in
 +xref linkend=synchronous-replication, and listed as the first
 +elements of xref linkend=guc-synchronous-standby-names.
 +   /para
 +   para
 +Default value is 1.
 +   /para

 synchronous_standby_num is defined with PGC_SIGHUP. So the following
 should be added into the document.

 This parameter can only be set in the postgresql.conf file or on
 the server command line.
Fixed.

 The name of the parameter synchronous_standby_num sounds to me that
 the transaction must wait for its WAL to be replicated to s_s_num standbys.
 But that's not true in your patch. If s_s_names is empty, replication works
 asynchronously whether the value of s_s_num is. I'm afraid that it's 
 confusing.
 The description of s_s_num is not sufficient. I'm afraid that users can easily
 misunderstand that they can use quorum commit feature by using s_s_names
 and s_s_num. That is, the transaction waits for its WAL to be replicated to
 any s_s_num standbys listed in s_s_names.

I reworked the docs to mention all that. Yes things are a bit
different than any quorum commit facility (how to parametrize that
simply without a parameter mapping one to one the items of
s_s_names?), as this facility relies on the order of the items of
s_s_names and the fact that stansbys are connected at a given time.

 When s_s_num is set to larger value than max_wal_senders, we should warn that?
Actually I have done a bit more than that by forbidding setting
s_s_num to a value higher than max_wal_senders. Thoughts?

Now that we discuss the interactions with other parameters. Another
thing that I am wondering about now is: what should we do if we
specify s_s_num to a number higher than the elements in s_s_names?
Currently, the patch gives the priority to s_s_num, in short if we set
s_s_num to 100, server will wait for 100 servers to confirm commit
even if there are less than 100 elements in s_s_names. I chose this
way because it looks saner particularly if s_s_names = '*'. Thoughts
once again?

 +for (i = 0; i  num_sync; i++)
 +{
 +volatile WalSnd *walsndloc = WalSndCtl-walsnds[sync_standbys[i]];
 +
 +if (min_write_pos  walsndloc-write)
 +min_write_pos = walsndloc-write;
 +if (min_flush_pos  walsndloc-flush)
 +min_flush_pos = walsndloc-flush;
 +}

 I don't think that it's safe to see those shared values without spinlock.
Looking at walsender.c you are right. I have updated the code to use
the mutex lock of the walsender whose values are being read from.

Regards,
-- 
Michael

On Thu, Aug 14, 2014 at 4:34 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Wed, Aug 13, 2014 at 4:10 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 On Wed, Aug 13, 2014 at 2:10 PM, Fujii Masao masao.fu...@gmail.com wrote:
 I sent the SIGSTOP signal to the walreceiver process in one of sync 
 standbys,
 and then ran write transactions again. In this case, they must not be 
 completed
 because their WAL cannot be replicated to the standby that its walreceiver
 was stopped. But they were successfully completed.

 At the end of SyncRepReleaseWaiters, SYNC_REP_WAIT_WRITE and
 SYNC_REP_WAIT_FLUSH in walsndctl were able to update with only one wal
 sender in sync, making backends wake up even if other standbys did not
 catch up. But we need to scan all the synchronous wal senders and find
 the minimum write and flush positions and update walsndctl with those
 values. Well that's a code path I forgot to cover.

 Attached is an updated patch fixing the problem you reported.

 +At any one time there will be at a number of active
 synchronous standbys
 +defined by varnamesynchronous_standby_num/; transactions waiting

 It's better to use xref linkend=guc-synchronous-standby-num, instead.

 +for commit will be allowed to proceed after those standby servers
 +confirms receipt of their data. The synchronous standbys will be

 Typo: confirms - confirm

 +   para
 +Specifies the number of standbys that support
 +firsttermsynchronous replication/, as described in
 +xref linkend=synchronous-replication, and listed as the first
 +elements of xref linkend=guc-synchronous-standby-names.
 +   /para
 +   para
 +Default value is 1.
 +   /para

 synchronous_standby_num is defined with 

Re: [HACKERS] Minmax indexes

2014-08-15 Thread Heikki Linnakangas

On 08/15/2014 02:02 AM, Alvaro Herrera wrote:

Alvaro Herrera wrote:

Heikki Linnakangas wrote:



I'm sure this still needs some cleanup, but here's the patch, based
on your v14. Now that I know what this approach looks like, I still
like it much better. The insert and update code is somewhat more
complicated, because you have to be careful to lock the old page,
new page, and revmap page in the right order. But it's not too bad,
and it gets rid of all the complexity in vacuum.


It seems there is some issue here, because pageinspect tells me the
index is not growing properly for some reason.  minmax_revmap_data gives
me this array of TIDs after a bunch of insert/vacuum/delete/ etc:


I fixed this issue, and did a lot more rework and bugfixing.  Here's
v15, based on v14-heikki2.


Thanks!


I think remaining issues are mostly minimal (pageinspect should output
block number alongside each tuple, now that we have it, for example.)


There's this one issue I left in my patch version that I think we should 
do something about:



+   /*
+* No luck. Assume that the revmap was updated concurrently.
+*
+* XXX: it would be nice to add some kind of a sanity check 
here to
+* avoid looping infinitely, if the revmap points to wrong 
tuple for
+* some reason.
+*/


This happens when we follow the revmap to a tuple, but find that the 
tuple points to a different block than what the revmap claimed. 
Currently, we just assume that it's because the tuple was updated 
concurrently, but while hacking, I frequently had a broken index where 
the revmap pointed to bogus tuples or the tuples had a missing/wrong 
block number on them, and ran into infinite loop here. It's clearly a 
case of a corrupt index and shouldn't happen, but I would imagine that 
it's a fairly typical way this would fail in production too because of 
hardware issues or bugs. So I think we need to work a bit harder to stop 
the looping and throw an error instead.


Perhaps something as simple as keeping a loop counter and giving up 
after 1000 attempts would be good enough. The window between releasing 
the lock on the revmap, and acquiring the lock on the page containing 
the MMTuple is very narrow, so the chances of losing that race to a 
concurrent update more than 1-2 times in a row is vanishingly small.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] option -T in pg_basebackup doesn't work on windows

2014-08-15 Thread MauMau
Thank you.  The code looks correct.  I confirmed that the pg_basebackup 
could relocate the tablespace directory on Windows.


I marked this patch as ready for committer.

Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Removing dependency to wsock32.lib when compiling code on WIndows

2014-08-15 Thread Michael Paquier
On Mon, Aug 11, 2014 at 10:48 PM, Noah Misch n...@leadboat.com wrote:
 Consistency with the MSVC build is desirable, either HAVE_GETADDRINFO for both
 or !HAVE_GETADDRINFO for both.

Hm. Looking here getattrinfo has been added in ws2_32 and was not
present in wsock32:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms738520%28v=vs.85%29.aspx

And this change in configure.in is the root cause of the regression:
-AC_SEARCH_LIBS(socket, [socket wsock32])
+AC_SEARCH_LIBS(socket, [socket ws2_32])

Btw, how do you determine if MSVC is using HAVE_GETADDRINFO? Is it
decided by the inclusion of getaddrinfo.c in @pgportfiles of
Mkvdbuild.pm?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-08-15 Thread Heikki Linnakangas

On 08/15/2014 10:26 AM, Heikki Linnakangas wrote:

On 08/15/2014 02:02 AM, Alvaro Herrera wrote:

Alvaro Herrera wrote:

Heikki Linnakangas wrote:



I'm sure this still needs some cleanup, but here's the patch, based
on your v14. Now that I know what this approach looks like, I still
like it much better. The insert and update code is somewhat more
complicated, because you have to be careful to lock the old page,
new page, and revmap page in the right order. But it's not too bad,
and it gets rid of all the complexity in vacuum.


It seems there is some issue here, because pageinspect tells me the
index is not growing properly for some reason.  minmax_revmap_data gives
me this array of TIDs after a bunch of insert/vacuum/delete/ etc:


I fixed this issue, and did a lot more rework and bugfixing.  Here's
v15, based on v14-heikki2.


Thanks!


I think remaining issues are mostly minimal (pageinspect should output
block number alongside each tuple, now that we have it, for example.)


There's this one issue I left in my patch version that I think we should
do something about:


+   /*
+* No luck. Assume that the revmap was updated concurrently.
+*
+* XXX: it would be nice to add some kind of a sanity check 
here to
+* avoid looping infinitely, if the revmap points to wrong 
tuple for
+* some reason.
+*/


This happens when we follow the revmap to a tuple, but find that the
tuple points to a different block than what the revmap claimed.
Currently, we just assume that it's because the tuple was updated
concurrently, but while hacking, I frequently had a broken index where
the revmap pointed to bogus tuples or the tuples had a missing/wrong
block number on them, and ran into infinite loop here. It's clearly a
case of a corrupt index and shouldn't happen, but I would imagine that
it's a fairly typical way this would fail in production too because of
hardware issues or bugs. So I think we need to work a bit harder to stop
the looping and throw an error instead.

Perhaps something as simple as keeping a loop counter and giving up
after 1000 attempts would be good enough. The window between releasing
the lock on the revmap, and acquiring the lock on the page containing
the MMTuple is very narrow, so the chances of losing that race to a
concurrent update more than 1-2 times in a row is vanishingly small.


Reading the patch more closely, I see that you added a check that when 
we loop, we throw an error if the new item pointer in the revmap is the 
same as before. In theory, it's possible that two concurrent updates 
happen: one that moves the tuple we're looking for elsewhere, and 
another that moves it back again. The probability of that is also 
vanishingly small, so maybe that's OK. Or we could check the LSN; if the 
revmap has been updated, its LSN must've changed.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and replication slots

2014-08-15 Thread furuyao
 Actually I came up with the same need as Magnus, but a bit later, so...
 Attached is a patch to support physical slot creation and drop in
 pg_receivexlog with the addition of new options --create and --drop. It
 would be nice to have that in 9.4, but I will not the one deciding that
 at the end :) Code has been refactored with what is already available
 in pg_recvlogical for the slot creation and drop.

I think that create/drop options of a slot is unnecessary. 
It is not different from performing pg_create_physical_replication_slot() by 
psql. 

At consistency with pg_recvlogical, do you think about --start?

Initial review.

The patch was not able to be applied to the source file at this morning time. 

commit-id:5ff5bfb5f0d83a538766903275b230499fa9ebe1

[postgres postgresql-5ff5bfb]$ patch -p1  
20140812_physical_slot_receivexlog.patch
patching file doc/src/sgml/ref/pg_receivexlog.sgml
patching file src/bin/pg_basebackup/pg_receivexlog.c
Hunk #2 FAILED at 80.
1 out of 7 hunks FAILED -- saving rejects to file 
src/bin/pg_basebackup/pg_receivexlog.c.rej
patching file src/bin/pg_basebackup/pg_recvlogical.c
patching file src/bin/pg_basebackup/streamutil.c
patching file src/bin/pg_basebackup/streamutil.h

The patch was applied to the source file before August 12. 
warning comes out then make.

commit-id:6aa61580e08d58909b2a8845a4087b7699335ee0

[postgres postgresql-6aa6158]$ make  /dev/null
streamutil.c: In function ‘CreateReplicationSlot’:
streamutil.c:244: warning: suggest parentheses around ‘’ within ‘||’

Regards,

--
Furuya Osamu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_shmem_allocations view

2014-08-15 Thread Marti Raudsepp
Hi

On Thu, May 8, 2014 at 4:28 PM, Andres Freund and...@2ndquadrant.com wrote:
 Ok. A new version of the patches implementing that are
 attached. Including a couple of small fixups and docs. The latter aren't
 extensive, but that doesn't seem to be warranted anyway.

Is it really actually useful to expose the segment off(set) to users?
Seems to me like unnecessary internal details leaking out.

Regards,
Marti


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and replication slots

2014-08-15 Thread Michael Paquier
Thanks for your review.

On Fri, Aug 15, 2014 at 12:56 AM,  furu...@pm.nttdata.co.jp wrote:
 At consistency with pg_recvlogical, do you think about --start?
I did not add that for the sake of backward-compatibility as in
pg_recvlogical an action is mandatory. It is not the case now of
pg_receivexlog.

 [postgres postgresql-6aa6158]$ make  /dev/null
 streamutil.c: In function 'CreateReplicationSlot':
 streamutil.c:244: warning: suggest parentheses around '' within '||'
I see. Here is a rebased patch.
Regards,
-- 
Michael
From 18e754569c1fd97f4346be935341b27e228775a4 Mon Sep 17 00:00:00 2001
From: Michael Paquier mich...@otacoo.com
Date: Fri, 15 Aug 2014 17:15:31 +0900
Subject: [PATCH] Add support for physical slot creation/deletion in
 pg_receivexlog

Physical slot creation can be done with --create and drop with --drop.
In both cases --slot is needed. Code for replication slot creation and
drop is refactored with what was available in pg_recvlogical.
---
 doc/src/sgml/ref/pg_receivexlog.sgml   | 29 ++
 src/bin/pg_basebackup/pg_receivexlog.c | 73 +
 src/bin/pg_basebackup/pg_recvlogical.c | 66 +++
 src/bin/pg_basebackup/streamutil.c | 97 ++
 src/bin/pg_basebackup/streamutil.h |  7 +++
 5 files changed, 203 insertions(+), 69 deletions(-)

diff --git a/doc/src/sgml/ref/pg_receivexlog.sgml b/doc/src/sgml/ref/pg_receivexlog.sgml
index 5916b8f..7e46005 100644
--- a/doc/src/sgml/ref/pg_receivexlog.sgml
+++ b/doc/src/sgml/ref/pg_receivexlog.sgml
@@ -72,6 +72,35 @@ PostgreSQL documentation
   titleOptions/title
 
para
+applicationpg_receivexlog/application can run in one of two following
+modes, which control physical replication slot:
+
+variablelist
+
+ varlistentry
+  termoption--create/option/term
+  listitem
+   para
+Create a new physical replication slot with the name specified in
+option--slot/option, then exit.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
+  termoption--drop/option/term
+  listitem
+   para
+Drop the replication slot with the name specified
+in option--slot/option, then exit.
+   /para
+  /listitem
+ /varlistentry
+/variablelist
+
+   /para
+
+   para
 The following command-line options control the location and format of the
 output.
 
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index a8b9ad3..98d034a 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -38,6 +38,8 @@ static int	noloop = 0;
 static int	standby_message_timeout = 10 * 1000;		/* 10 sec = default */
 static int	fsync_interval = 0; /* 0 = default */
 static volatile bool time_to_abort = false;
+static bool do_create_slot = false;
+static bool do_drop_slot = false;
 
 
 static void usage(void);
@@ -78,6 +80,9 @@ usage(void)
 	printf(_(  -w, --no-password  never prompt for password\n));
 	printf(_(  -W, --password force password prompt (should happen automatically)\n));
 	printf(_(  -S, --slot=SLOTNAMEreplication slot to use\n));
+	printf(_(\nOptional actions:\n));
+	printf(_(  --create   create a new replication slot (for the slot's name see --slot)\n));
+	printf(_(  --drop drop the replication slot (for the slot's name see --slot)\n));
 	printf(_(\nReport bugs to pgsql-b...@postgresql.org.\n));
 }
 
@@ -261,14 +266,6 @@ StreamLog(void)
 	uint32		hi,
 lo;
 
-	/*
-	 * Connect in replication mode to the server
-	 */
-	conn = GetConnection();
-	if (!conn)
-		/* Error message already written in GetConnection() */
-		return;
-
 	if (!CheckServerVersionForStreaming(conn))
 	{
 		/*
@@ -370,6 +367,9 @@ main(int argc, char **argv)
 		{status-interval, required_argument, NULL, 's'},
 		{slot, required_argument, NULL, 'S'},
 		{verbose, no_argument, NULL, 'v'},
+/* action */
+		{create, no_argument, NULL, 1},
+		{drop, no_argument, NULL, 2},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -453,6 +453,13 @@ main(int argc, char **argv)
 			case 'v':
 verbose++;
 break;
+/* action */
+			case 1:
+do_create_slot = true;
+break;
+			case 2:
+do_drop_slot = true;
+break;
 			default:
 
 /*
@@ -477,10 +484,26 @@ main(int argc, char **argv)
 		exit(1);
 	}
 
+	if (replication_slot == NULL  (do_drop_slot || do_create_slot))
+	{
+		fprintf(stderr, _(%s: replication slot needed with action --create or --drop\n), progname);
+		fprintf(stderr, _(Try \%s --help\ for more information.\n),
+progname);
+		exit(1);
+	}
+
+	if (do_drop_slot  do_create_slot)
+	{
+		fprintf(stderr, _(%s: cannot use --create together with --drop\n), progname);
+		fprintf(stderr, _(Try \%s --help\ for more information.\n),
+progname);
+		exit(1);
+	}
+
 	/*
 	 * Required arguments
 	 */
-	if (basedir == NULL)
+	if (basedir == NULL  !do_create_slot  !do_drop_slot)
 	{
 		fprintf(stderr, _(%s: no target 

Re: [HACKERS] pg_shmem_allocations view

2014-08-15 Thread Andres Freund
On 2014-08-15 11:12:11 +0300, Marti Raudsepp wrote:
 Hi
 
 On Thu, May 8, 2014 at 4:28 PM, Andres Freund and...@2ndquadrant.com wrote:
  Ok. A new version of the patches implementing that are
  attached. Including a couple of small fixups and docs. The latter aren't
  extensive, but that doesn't seem to be warranted anyway.
 
 Is it really actually useful to expose the segment off(set) to users?
 Seems to me like unnecessary internal details leaking out.

Yes. This is clearly developer oriented and I'd more than once wished I
could see where some stray pointer is pointing to... That's not really
possible without something like this.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_shmem_allocations view

2014-08-15 Thread Andres Freund
On 2014-08-14 22:16:31 -0700, Michael Paquier wrote:
 And here are some comments about patch 2:
 - Patch applies with some hunks.
 - Some typos are present
 s#memory segments..#memory segments. (double dots)
 s#NULL#literalNULL/ (in the docs as this refers to a value)

Will fix.

 - Your thoughts about providing separate patches for each view? What
 this patch does is straight-forward, but pg_shmem_allocations does not
 actually depend on the first patch adding size and name to the dsm
 fields. So IMO it makes sense to separate each feature properly.

I don't know, seems a bit like busywork to me. Postgres doesn't really
very granular commits...

 - off should be renamed to offset for pg_get_shmem_allocations.

ok.

 - Is it really worth showing unused shared memory? I'd rather rip out
 the last portion of pg_get_shmem_allocations.

It's actually really helpful. There's a couple situations where you
possibly can run out of that spare memory and into troubles. Which
currently aren't diagnosable. Similarly we currently can't diagnose
whether we're superflously allocate too much 'reserve' shared memory.

 - For refcnt in pg_get_dynamic_shmem_allocations, could you add a
 comment mentioning that refcnt = 1 means that the item is moribund and
 0 is unused, and that reference count for active dsm segments only
 begins from 2? I would imagine that this is enough, instead of using
 some define's defining the ID from which a dsm item is considered as
 active.

Ok.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-15 Thread Andres Freund
On 2014-08-14 12:21:38 -0400, Robert Haas wrote:
 On Sat, Aug 9, 2014 at 12:54 PM, Andres Freund and...@2ndquadrant.com wrote:
  I don't think we really need to embed it at that level. And it doesn't
  have to be always on - only clients that ask for it need to get the
  answer. Something like COMMIT WITH (report_commit_lsn ON); or similar
  might do the trick?
 
 And what does that actually do?  Send back a result-set, or a new
 protocol message?

What I was thinking of was to return COMMIT X/X instead of
COMMIT. Since that's only sent when COMMIT WITH (report_commit_lsn ON)
was set it won't break clients/libraries that don't need it.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgcrypto: PGP armor headers

2014-08-15 Thread Marko Tiikkaja

Hi,

On 8/8/14 3:18 PM, I wrote:

Currently there's no way to generate or extract armor headers from the
PGP armored format in pgcrypto.  I've written a patch to add the
support.


Latest version of the patch here, having fixed some small coding issues.


.marko
*** a/contrib/pgcrypto/Makefile
--- b/contrib/pgcrypto/Makefile
***
*** 26,32  MODULE_big   = pgcrypto
  OBJS  = $(SRCS:.c=.o) $(WIN32RES)
  
  EXTENSION = pgcrypto
! DATA = pgcrypto--1.1.sql pgcrypto--1.0--1.1.sql pgcrypto--unpackaged--1.0.sql
  PGFILEDESC = pgcrypto - cryptographic functions
  
  REGRESS = init md5 sha1 hmac-md5 hmac-sha1 blowfish rijndael \
--- 26,32 
  OBJS  = $(SRCS:.c=.o) $(WIN32RES)
  
  EXTENSION = pgcrypto
! DATA = pgcrypto--1.2.sql pgcrypto--1.1--1.2.sql pgcrypto--1.0--1.1.sql 
pgcrypto--unpackaged--1.0.sql
  PGFILEDESC = pgcrypto - cryptographic functions
  
  REGRESS = init md5 sha1 hmac-md5 hmac-sha1 blowfish rijndael \
*** a/contrib/pgcrypto/expected/pgp-armor.out
--- b/contrib/pgcrypto/expected/pgp-armor.out
***
*** 102,104  em9va2E=
--- 102,362 
  -END PGP MESSAGE-
  ');
  ERROR:  Corrupt ascii-armor
+ -- corrupt
+ select pgp_armor_header('
+ -BEGIN PGP MESSAGE-
+ foo:
+ 
+ em9va2E=
+ =
+ -END PGP MESSAGE-
+ ', 'foo');
+ ERROR:  Corrupt ascii-armor
+ -- empty
+ select pgp_armor_header('
+ -BEGIN PGP MESSAGE-
+ foo: 
+ 
+ em9va2E=
+ =
+ -END PGP MESSAGE-
+ ', 'foo');
+  pgp_armor_header 
+ --
+  
+ (1 row)
+ 
+ -- simple
+ select pgp_armor_header('
+ -BEGIN PGP MESSAGE-
+ foo: bar
+ 
+ em9va2E=
+ =
+ -END PGP MESSAGE-
+ ', 'foo');
+  pgp_armor_header 
+ --
+  bar
+ (1 row)
+ 
+ -- uninteresting keys, part 1
+ select pgp_armor_header('
+ -BEGIN PGP MESSAGE-
+ foo: found
+ bar: ignored
+ 
+ em9va2E=
+ =
+ -END PGP MESSAGE-
+ ', 'foo');
+  pgp_armor_header 
+ --
+  found
+ (1 row)
+ 
+ -- uninteresting keys, part 2
+ select pgp_armor_header('
+ -BEGIN PGP MESSAGE-
+ bar: ignored
+ foo: found
+ 
+ em9va2E=
+ =
+ -END PGP MESSAGE-
+ ', 'foo');
+  pgp_armor_header 
+ --
+  found
+ (1 row)
+ 
+ -- uninteresting keys, part 3
+ select pgp_armor_header('
+ -BEGIN PGP MESSAGE-
+ bar: ignored
+ foo: found
+ bar: ignored
+ 
+ em9va2E=
+ =
+ -END PGP MESSAGE-
+ ', 'foo');
+  pgp_armor_header 
+ --
+  found
+ (1 row)
+ 
+ -- insane keys, part 1
+ select pgp_armor_header('
+ -BEGIN PGP MESSAGE-
+ insane:key : 
+ 
+ em9va2E=
+ =
+ -END PGP MESSAGE-
+ ', 'insane:key ');
+  pgp_armor_header 
+ --
+  
+ (1 row)
+ 
+ -- insane keys, part 2
+ select pgp_armor_header('
+ -BEGIN PGP MESSAGE-
+ insane:key : text value here
+ 
+ em9va2E=
+ =
+ -END PGP MESSAGE-
+ ', 'insane:key ');
+  pgp_armor_header 
+ --
+  text value here
+ (1 row)
+ 
+ -- long value
+ select pgp_armor_header('
+ -BEGIN PGP MESSAGE-
+ long: this value is more than 76 characters long, but it should still parse 
correctly as that''s permitted by RFC 4880
+ 
+ em9va2E=
+ =
+ -END PGP MESSAGE-
+ ', 'long');
+ pgp_armor_header  
   
+ 
-
+  this value is more than 76 characters long, but it should still parse 
correctly as that's permitted by RFC 4880
+ (1 row)
+ 
+ -- long value, split up
+ select pgp_armor_header('
+ -BEGIN PGP MESSAGE-
+ long: this value is more than 76 characters long, but it should still 
+ long: parse correctly as that''s permitted by RFC 4880
+ 
+ em9va2E=
+ =
+ -END PGP MESSAGE-
+ ', 'long');
+ pgp_armor_header  
   
+ 
-
+  this value is more than 76 characters long, but it should still parse 
correctly as that's permitted by RFC 4880
+ (1 row)
+ 
+ -- long value, split up, part 2
+ select pgp_armor_header('
+ -BEGIN PGP MESSAGE-
+ long: this value is more than 
+ long: 76 characters long, but it should still 
+ long: parse correctly as that''s permitted by RFC 4880
+ 
+ em9va2E=
+ =
+ -END PGP MESSAGE-
+ ', 'long');
+ pgp_armor_header  
   
+ 
-
+  this value is more than 76 characters long, but it should still parse 
correctly as that's permitted by RFC 4880
+ (1 row)
+ 
+ -- long value, split up, part 3
+ select pgp_armor_header('
+ -BEGIN PGP 

Re: [HACKERS] Unwanted LOG during recovery of DROP TABLESPACE REDO

2014-08-15 Thread Marko Tiikkaja

On 7/16/14 4:33 PM, Tom Lane wrote:

Rajeev rastogi rajeev.rast...@huawei.com writes:

I found and fixed a bug that causes recovery (crash recovery , PITR) to throw 
unwanted LOG message if the tablespace symlink is not found during the 
processing of DROP TABLESPACE redo.
 LOG:  could not remove symbolic link 
pg_tblspc/16384: No such file or directory


I don't think that's a bug: it's the designed behavior.  Why should we
complicate the code to not print a log message in a situation where
it's unclear if the case is expected or not?


I agree with Tom here; this doesn't seem like an improvement.



.marko


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pgbench --tuple-size option

2014-08-15 Thread Fabien COELHO


After publishing some test results with pgbench on SSD with varying page 
size, Josh Berkus pointed out that pgbench uses small 100-bytes tuples, 
and that results may be different with other tuple sizes.


This patch adds an option to change the default tuple size, so that this 
can be tested easily.


--
Fabien.diff --git a/contrib/pgbench/pgbench.c b/contrib/pgbench/pgbench.c
index 2f7d80e..709022c 100644
--- a/contrib/pgbench/pgbench.c
+++ b/contrib/pgbench/pgbench.c
@@ -119,6 +119,10 @@ int			scale = 1;
  */
 int			fillfactor = 100;
 
+/* approximate number of bytes for rows
+ */
+int			tupsize = 100;
+
 /*
  * create foreign key constraints on the tables?
  */
@@ -359,6 +363,7 @@ usage(void)
 	   create indexes in the specified tablespace\n
 	   --tablespace=TABLESPACE  create tables in the specified tablespace\n
 		 --unlogged-tablescreate tables as unlogged tables\n
+		 --tuple-size=NUM target tuple size (default: 100)\n
 		   \nBenchmarking options:\n
 		 -c, --client=NUM number of concurrent database clients (default: 1)\n
 		 -C, --connectestablish new connection for each transaction\n
@@ -1704,32 +1709,37 @@ init(bool is_no_vacuum)
 		const char *table;		/* table name */
 		const char *smcols;		/* column decls if accountIDs are 32 bits */
 		const char *bigcols;	/* column decls if accountIDs are 64 bits */
-		int			declare_fillfactor;
+		int			declare_fillfactor; /* whether to add a fillfactor */
+		int			tupsize_correction; /* tupsize correction */
 	};
 	static const struct ddlinfo DDLs[] = {
 		{
 			pgbench_history,
-			tid int,bid int,aidint,delta int,mtime timestamp,filler char(22),
-			tid int,bid int,aid bigint,delta int,mtime timestamp,filler char(22),
-			0
+			tid int,bid int,aidint,delta int,mtime timestamp,filler char,
+			tid int,bid int,aid bigint,delta int,mtime timestamp,filler char,
+			0,
+			78
 		},
 		{
 			pgbench_tellers,
-			tid int not null,bid int,tbalance int,filler char(84),
-			tid int not null,bid int,tbalance int,filler char(84),
-			1
+			tid int not null,bid int,tbalance int,filler char,
+			tid int not null,bid int,tbalance int,filler char,
+			1,
+			16
 		},
 		{
 			pgbench_accounts,
-			aidint not null,bid int,abalance int,filler char(84),
-			aid bigint not null,bid int,abalance int,filler char(84),
-			1
+			aidint not null,bid int,abalance int,filler char,
+			aid bigint not null,bid int,abalance int,filler char,
+			1,
+			16
 		},
 		{
 			pgbench_branches,
-			bid int not null,bbalance int,filler char(88),
-			bid int not null,bbalance int,filler char(88),
-			1
+			bid int not null,bbalance int,filler char,
+			bid int not null,bbalance int,filler char,
+			1,
+			12
 		}
 	};
 	static const char *const DDLINDEXes[] = {
@@ -1767,6 +1777,9 @@ init(bool is_no_vacuum)
 		char		buffer[256];
 		const struct ddlinfo *ddl = DDLs[i];
 		const char *cols;
+		int 		ts = tupsize - ddl-tupsize_correction;
+
+		if (ts  1) ts = 1;
 
 		/* Remove old table, if it exists. */
 		snprintf(buffer, sizeof(buffer), drop table if exists %s, ddl-table);
@@ -1790,9 +1803,9 @@ init(bool is_no_vacuum)
 
 		cols = (scale = SCALE_32BIT_THRESHOLD) ? ddl-bigcols : ddl-smcols;
 
-		snprintf(buffer, sizeof(buffer), create%s table %s(%s)%s,
+		snprintf(buffer, sizeof(buffer), create%s table %s(%s(%d))%s,
  unlogged_tables ?  unlogged : ,
- ddl-table, cols, opts);
+ ddl-table, cols, ts, opts);
 
 		executeStatement(con, buffer);
 	}
@@ -2504,6 +2517,7 @@ main(int argc, char **argv)
 		{unlogged-tables, no_argument, unlogged_tables, 1},
 		{sampling-rate, required_argument, NULL, 4},
 		{aggregate-interval, required_argument, NULL, 5},
+		{tuple-size, required_argument, NULL, 6},
 		{rate, required_argument, NULL, 'R'},
 		{NULL, 0, NULL, 0}
 	};
@@ -2811,6 +2825,15 @@ main(int argc, char **argv)
 }
 #endif
 break;
+			case 6:
+initialization_option_set = true;
+tupsize = atoi(optarg);
+if (tupsize = 0)
+{
+	fprintf(stderr, invalid tuple size: %d\n, tupsize);
+	exit(1);
+}
+break;
 			default:
 fprintf(stderr, _(Try \%s --help\ for more information.\n), progname);
 exit(1);
diff --git a/doc/src/sgml/pgbench.sgml b/doc/src/sgml/pgbench.sgml
index 23bfa9e..e6210e7 100644
--- a/doc/src/sgml/pgbench.sgml
+++ b/doc/src/sgml/pgbench.sgml
@@ -515,6 +515,15 @@ pgbench optional replaceableoptions/ /optional replaceabledbname/
  /varlistentry
 
  varlistentry
+  termoption--tuple-size=/optionreplaceablesize//term
+  listitem
+   para
+Set target size of tuples. Default is 100 bytes.
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption-v/option/term
   termoption--vacuum-all/option/term
   listitem

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench --tuple-size option

2014-08-15 Thread Andres Freund
On 2014-08-15 11:46:52 +0200, Fabien COELHO wrote:
 
 After publishing some test results with pgbench on SSD with varying page
 size, Josh Berkus pointed out that pgbench uses small 100-bytes tuples, and
 that results may be different with other tuple sizes.
 
 This patch adds an option to change the default tuple size, so that this can
 be tested easily.

I don't think it's beneficial to put this into pgbench. There really
isn't a relevant benefit over using a custom script here.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench --tuple-size option

2014-08-15 Thread Fabien COELHO


Hello Andres,


This patch adds an option to change the default tuple size, so that this can
be tested easily.


I don't think it's beneficial to put this into pgbench. There really
isn't a relevant benefit over using a custom script here.


The scripts to run are the standard ones. The difference is in the 
*initialization* phase (-i), namely the filler attribute size. There is no 
custom script for initialization in pgbench, so ISTM that this argument 
does not apply here.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench --tuple-size option

2014-08-15 Thread Andres Freund
On 2014-08-15 11:58:41 +0200, Fabien COELHO wrote:
 
 Hello Andres,
 
 This patch adds an option to change the default tuple size, so that this can
 be tested easily.
 
 I don't think it's beneficial to put this into pgbench. There really
 isn't a relevant benefit over using a custom script here.
 
 The scripts to run are the standard ones. The difference is in the
 *initialization* phase (-i), namely the filler attribute size. There is no
 custom script for initialization in pgbench, so ISTM that this argument does
 not apply here.

The custom initialization is to run a manual ALTER after the
initialization.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench --tuple-size option

2014-08-15 Thread Fabien COELHO



I don't think it's beneficial to put this into pgbench. There really
isn't a relevant benefit over using a custom script here.


The scripts to run are the standard ones. The difference is in the
*initialization* phase (-i), namely the filler attribute size. There is no
custom script for initialization in pgbench, so ISTM that this argument does
not apply here.


The custom initialization is to run a manual ALTER after the
initialization.


Sure, it can be done this way.

I'm not sure about the implication of ALTER on the table storage, thus I 
prefer all benchmarks to run exactly the same straightforward way in all 
cases so as to avoid unwanted effects on what I'm trying to measure, which 
is already noisy and unstable enough.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench --tuple-size option

2014-08-15 Thread Andres Freund
On 2014-08-15 12:17:31 +0200, Fabien COELHO wrote:
 
 I don't think it's beneficial to put this into pgbench. There really
 isn't a relevant benefit over using a custom script here.
 
 The scripts to run are the standard ones. The difference is in the
 *initialization* phase (-i), namely the filler attribute size. There is no
 custom script for initialization in pgbench, so ISTM that this argument does
 not apply here.
 
 The custom initialization is to run a manual ALTER after the
 initialization.
 
 Sure, it can be done this way.
 
 I'm not sure about the implication of ALTER on the table storage,

Should be fine in this case. But if that's what you're concerned about -
understandably - it seems to make more sense to split -i into two. One
to create the tables, and another to fill them. That'd allow to do
manual stuff inbetween.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Function to know last log write timestamp

2014-08-15 Thread Fujii Masao
On Fri, Aug 15, 2014 at 3:40 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-08-14 14:37:22 -0400, Robert Haas wrote:
 On Thu, Aug 14, 2014 at 2:21 PM, Andres Freund and...@2ndquadrant.com 
 wrote:
  On 2014-08-14 14:19:13 -0400, Robert Haas wrote:
  That's about the idea. However, what you've got there is actually
  unsafe, because shmem-counter++ is not an atomic operation.  It reads
  the counter (possibly even as two separate 4-byte loads if the counter
  is an 8-byte value), increments it inside the CPU, and then writes the
  resulting value back to memory.  If two backends do this concurrently,
  one of the updates might be lost.
 
  All these are only written by one backend, so it should be safe. Note
  that that coding pattern, just without memory barriers, is all over
  pgstat.c

 Ah, OK.  If there's a separate slot for each backend, I agree that it's safe.

 We should probably add barriers to pgstat.c, too.

 Yea, definitely. I think this is rather borked on weaker
 architectures. It's just that the consequences of an out of date/torn
 value are rather low, so it's unlikely to be noticed.

 Imo we should encapsulate the changecount modifications/checks somehow
 instead of repeating the barriers, Asserts, comments et al everywhere.

So what about applying the attached patch first, which adds the macros
to load and store the changecount with the memory barries, and changes
pgstat.c use them. Maybe this patch needs to be back-patch to at least 9.4?

After applying the patch, I will rebase the pg_last_xact_insert_timestamp
patch and post it again.

Regards,

-- 
Fujii Masao
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***
*** 2563,2569  pgstat_bestart(void)
  	beentry = MyBEEntry;
  	do
  	{
! 		beentry-st_changecount++;
  	} while ((beentry-st_changecount  1) == 0);
  
  	beentry-st_procpid = MyProcPid;
--- 2563,2569 
  	beentry = MyBEEntry;
  	do
  	{
! 		pgstat_increment_changecount_before(beentry);
  	} while ((beentry-st_changecount  1) == 0);
  
  	beentry-st_procpid = MyProcPid;
***
*** 2588,2595  pgstat_bestart(void)
  	beentry-st_appname[NAMEDATALEN - 1] = '\0';
  	beentry-st_activity[pgstat_track_activity_query_size - 1] = '\0';
  
! 	beentry-st_changecount++;
! 	Assert((beentry-st_changecount  1) == 0);
  
  	/* Update app name to current GUC setting */
  	if (application_name)
--- 2588,2594 
  	beentry-st_appname[NAMEDATALEN - 1] = '\0';
  	beentry-st_activity[pgstat_track_activity_query_size - 1] = '\0';
  
! 	pgstat_increment_changecount_after(beentry);
  
  	/* Update app name to current GUC setting */
  	if (application_name)
***
*** 2624,2635  pgstat_beshutdown_hook(int code, Datum arg)
  	 * before and after.  We use a volatile pointer here to ensure the
  	 * compiler doesn't try to get cute.
  	 */
! 	beentry-st_changecount++;
  
  	beentry-st_procpid = 0;	/* mark invalid */
  
! 	beentry-st_changecount++;
! 	Assert((beentry-st_changecount  1) == 0);
  }
  
  
--- 2623,2633 
  	 * before and after.  We use a volatile pointer here to ensure the
  	 * compiler doesn't try to get cute.
  	 */
! 	pgstat_increment_changecount_before(beentry);
  
  	beentry-st_procpid = 0;	/* mark invalid */
  
! 	pgstat_increment_changecount_after(beentry);
  }
  
  
***
*** 2666,2672  pgstat_report_activity(BackendState state, const char *cmd_str)
  			 * non-disabled state.  As our final update, change the state and
  			 * clear fields we will not be updating anymore.
  			 */
! 			beentry-st_changecount++;
  			beentry-st_state = STATE_DISABLED;
  			beentry-st_state_start_timestamp = 0;
  			beentry-st_activity[0] = '\0';
--- 2664,2670 
  			 * non-disabled state.  As our final update, change the state and
  			 * clear fields we will not be updating anymore.
  			 */
! 			pgstat_increment_changecount_before(beentry);
  			beentry-st_state = STATE_DISABLED;
  			beentry-st_state_start_timestamp = 0;
  			beentry-st_activity[0] = '\0';
***
*** 2674,2681  pgstat_report_activity(BackendState state, const char *cmd_str)
  			/* st_xact_start_timestamp and st_waiting are also disabled */
  			beentry-st_xact_start_timestamp = 0;
  			beentry-st_waiting = false;
! 			beentry-st_changecount++;
! 			Assert((beentry-st_changecount  1) == 0);
  		}
  		return;
  	}
--- 2672,2678 
  			/* st_xact_start_timestamp and st_waiting are also disabled */
  			beentry-st_xact_start_timestamp = 0;
  			beentry-st_waiting = false;
! 			pgstat_increment_changecount_after(beentry);
  		}
  		return;
  	}
***
*** 2695,2701  pgstat_report_activity(BackendState state, const char *cmd_str)
  	/*
  	 * Now update the status entry
  	 */
! 	beentry-st_changecount++;
  
  	beentry-st_state = state;
  	beentry-st_state_start_timestamp = current_timestamp;
--- 2692,2698 
  	/*
  	 * Now update the status entry
  	 */
! 	pgstat_increment_changecount_before(beentry);
  
  	

Re: [HACKERS] pgbench --tuple-size option

2014-08-15 Thread Fabien COELHO



I'm not sure about the implication of ALTER on the table storage,


Should be fine in this case. But if that's what you're concerned about -
understandably -


Indeed, my (long) experience with benchmarks is that it is a much more 
complicated that it looks if you want to really understand what you are 
getting, and to get anything meaningful.


it seems to make more sense to split -i into two. One to create the 
tables, and another to fill them. That'd allow to do manual stuff 
inbetween.


Hmmm. This would mean much more changes than the pretty trivial patch I 
submitted: more options (2 parts init + compatibility with the previous 
case), splitting the init function, having a dependency and new error 
cases to check (you must have the table to fill them), some options apply 
to first part while other apply to second part, which would lead in any 
case to a signicantly more complicated documentation... a lot of trouble 
for my use case to answer Josh pertinent comments, and to be able to test 
the tuple size factor easily. Moreover, I would reject it myself as too 
much trouble for a small benefit.


Feel free to reject the patch if you do not want it. I think that its 
cost/benefit is reasonable (one small option, small code changes, some 
benefit for people who want to measure performance in various cases).


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench --tuple-size option

2014-08-15 Thread Andres Freund
On 2014-08-15 13:33:20 +0200, Fabien COELHO wrote:
 it seems to make more sense to split -i into two. One to create the
 tables, and another to fill them. That'd allow to do manual stuff
 inbetween.
 
 Hmmm. This would mean much more changes than the pretty trivial patch I
 submitted

FWIW, I find that patch really ugly. Adding the filler's with in a
printf, after the actual DDL declaration. Without so much as a
comment. Brr.

: more options (2 parts init + compatibility with the previous
 case), splitting the init function, having a dependency and new error
 cases to check (you must have the table to fill them), some options apply to
 first part while other apply to second part, which would lead in any case to
 a signicantly more complicated documentation... a lot of trouble for my use
 case to answer Josh pertinent comments, and to be able to test the tuple
 size factor easily. Moreover, I would reject it myself as too much trouble
 for a small benefit.

Well, it's something more generic, because it allows you do do more...

 Feel free to reject the patch if you do not want it. I think that its
 cost/benefit is reasonable (one small option, small code changes, some
 benefit for people who want to measure performance in various cases).

I personally think this isn't worth the price. But I'm just one guy.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Support for N synchronous standby servers

2014-08-15 Thread Fujii Masao
On Fri, Aug 15, 2014 at 4:05 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Thu, Aug 14, 2014 at 8:34 PM, Fujii Masao masao.fu...@gmail.com wrote:
 +At any one time there will be at a number of active
 synchronous standbys
 +defined by varnamesynchronous_standby_num/; transactions waiting

 It's better to use xref linkend=guc-synchronous-standby-num, instead.
 Fixed.

 +for commit will be allowed to proceed after those standby servers
 +confirms receipt of their data. The synchronous standbys will be

 Typo: confirms - confirm

 Fixed.

 +   para
 +Specifies the number of standbys that support
 +firsttermsynchronous replication/, as described in
 +xref linkend=synchronous-replication, and listed as the first
 +elements of xref linkend=guc-synchronous-standby-names.
 +   /para
 +   para
 +Default value is 1.
 +   /para

 synchronous_standby_num is defined with PGC_SIGHUP. So the following
 should be added into the document.

 This parameter can only be set in the postgresql.conf file or on
 the server command line.
 Fixed.

 The name of the parameter synchronous_standby_num sounds to me that
 the transaction must wait for its WAL to be replicated to s_s_num standbys.
 But that's not true in your patch. If s_s_names is empty, replication works
 asynchronously whether the value of s_s_num is. I'm afraid that it's 
 confusing.
 The description of s_s_num is not sufficient. I'm afraid that users can 
 easily
 misunderstand that they can use quorum commit feature by using s_s_names
 and s_s_num. That is, the transaction waits for its WAL to be replicated to
 any s_s_num standbys listed in s_s_names.

 I reworked the docs to mention all that. Yes things are a bit
 different than any quorum commit facility (how to parametrize that
 simply without a parameter mapping one to one the items of
 s_s_names?), as this facility relies on the order of the items of
 s_s_names and the fact that stansbys are connected at a given time.

 When s_s_num is set to larger value than max_wal_senders, we should warn 
 that?
 Actually I have done a bit more than that by forbidding setting
 s_s_num to a value higher than max_wal_senders. Thoughts?

You added check_synchronous_standby_num() as the GUC check function for
synchronous_standby_num, and checked that there. But that seems to be wrong.
You can easily see the following error messages even if synchronous_standby_num
is smaller than max_wal_senders. The point is that synchronous_standby_num
should be located before max_wal_senders in postgresql.conf.

LOG:  invalid value for parameter synchronous_standby_num: 0
DETAIL:  synchronous_standby_num cannot be higher than max_wal_senders.

 Now that we discuss the interactions with other parameters. Another
 thing that I am wondering about now is: what should we do if we
 specify s_s_num to a number higher than the elements in s_s_names?
 Currently, the patch gives the priority to s_s_num, in short if we set
 s_s_num to 100, server will wait for 100 servers to confirm commit
 even if there are less than 100 elements in s_s_names. I chose this
 way because it looks saner particularly if s_s_names = '*'. Thoughts
 once again?

I'm fine with this. As you gave an example, the number of entries in s_s_names
can be smaller than the number of actual active sync standbys. For example,
when s_s_names is set to 'hoge', more than one standbys with the name 'hoge'
can connect to the server with sync mode.



I still think that it's strange that replication can be async even when
s_s_num is larger than zero. That is, I think that the transaction must
wait for s_s_num sync standbys whether s_s_names is empty or not.
OTOH, if s_s_num is zero, replication must be async whether s_s_names
is empty or not. At least for me, it's intuitive to use s_s_num primarily
to control the sync mode. Of course, other hackers may have different
thoughts, so we need to keep our ear open for them.

In the above design, one problem is that the number of parameters
that those who want to set up only one sync replication need to change is
incremented by one. That is, they need to change s_s_num additionally.
If we are really concerned about this, we can treat a value of -1 in
s_s_num as the special value, which allows us to control sync replication
only by s_s_names as we do now. That is, if s_s_names is empty,
replication would be async. Otherwise, only one standby with
high-priority in s_s_names becomes sync one. Probably the default of
s_s_num should be -1. Thought?

The source code comments at the top of syncrep.c need to be udpated.
It's worth checking whether there are other comments to be updated.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench --tuple-size option

2014-08-15 Thread Fujii Masao
On Fri, Aug 15, 2014 at 8:36 PM, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-08-15 13:33:20 +0200, Fabien COELHO wrote:
 it seems to make more sense to split -i into two. One to create the
 tables, and another to fill them. That'd allow to do manual stuff
 inbetween.

 Hmmm. This would mean much more changes than the pretty trivial patch I
 submitted

 FWIW, I find that patch really ugly. Adding the filler's with in a
 printf, after the actual DDL declaration. Without so much as a
 comment. Brr.

: more options (2 parts init + compatibility with the previous
 case), splitting the init function, having a dependency and new error
 cases to check (you must have the table to fill them), some options apply to
 first part while other apply to second part, which would lead in any case to
 a signicantly more complicated documentation... a lot of trouble for my use
 case to answer Josh pertinent comments, and to be able to test the tuple
 size factor easily. Moreover, I would reject it myself as too much trouble
 for a small benefit.

 Well, it's something more generic, because it allows you do do more...

 Feel free to reject the patch if you do not want it. I think that its
 cost/benefit is reasonable (one small option, small code changes, some
 benefit for people who want to measure performance in various cases).

 I personally think this isn't worth the price. But I'm just one guy.

I also don't like this feature. The benefit of this option seems too small.
If we apply this, we might want to support other options, for example,
option to change the data type of each column, option to create new
index using minmax, option to change the fillfactor of each table, ...etc.
There are countless such options, but I'm afraid that it's really hard to
support so many options.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak

2014-08-15 Thread Robert Haas
On Thu, Aug 14, 2014 at 6:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 We really should not be making changes of this type less than a month
 from our ostensible release date.  That is not enough time for us to
 notice if the changes turn out to be not as good as we think they are.

 If it weren't for the fact that we'll be wedded forevermore to a bad
 choice of syntax, I might agree with you.  But at this point, the
 alternatives we have are to fix it now, or fix it never.  I don't
 like #2.

Or postpone the release for another month or two.  There's still a few
other unresolved issues, too, like the problems with psql and expanded
mode; and the JSONB toast problems.  The latter is relatively new, but
we don't have a proposed patch for it yet unless there's on in an
email I haven't read yet, and the former has been lingering for many
months without getting appreciably closer to a resolution.

I like releasing in September as much as anyone, but that contemplates
people taking care to get known issues fixed before the second half of
August.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Another logical decoding assertion failure

2014-08-15 Thread Antonin Houska
http://www.postgresql.org/message-id/blu436-smtp12682d628f61ab9736099c3dc...@phx.gbl
recalls me that I also saw an assertion failure recently. Although I
wanted to isolate and report my issue when my vacation is over, this
report made me curious whether I saw the same. Eventually it seems to be
a different symptom.

Following are the steps that trigger the failure in my environment
(9.5devel), although it's not always that straightforward - sometimes I
need to create and populate one more table. (I use the 'test_decoding'
contrib module.)


postgres=# SELECT pg_create_logical_replication_slot('my_slot',
'test_decoding');
 pg_create_logical_replication_slot

 (my_slot,0/16F3B30)
(1 row)

postgres=# CREATE TABLE a AS SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA;
SELECT 0
postgres=# INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);
INSERT 0 2
postgres=# INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);
The connection to the server was lost. Attempting reset: Failed.
!

The stack trace (should I upload / send the whole core file anywhere?):

(gdb) bt
#0  0x7f12d0640849 in raise () from /lib64/libc.so.6
#1  0x7f12d0641cd8 in abort () from /lib64/libc.so.6
#2  0x008c2d51 in ExceptionalCondition (conditionName=0xa926a0
!(((bool)((relation)-rd_refcnt == 0))),
errorType=0xa92210 FailedAssertion, fileName=0xa92104
relcache.c, lineNumber=1892) at assert.c:54
#3  0x008b2776 in RelationDestroyRelation
(relation=0x7f12c766d240, remember_tupdesc=0 '\000') at relcache.c:1892
#4  0x008b2c28 in RelationClearRelation
(relation=0x7f12c766d240, rebuild=1 '\001') at relcache.c:2106
#5  0x008b2fa3 in RelationFlushRelation
(relation=0x7f12c766d240) at relcache.c:2204
#6  0x008b30b5 in RelationCacheInvalidateEntry
(relationId=16384) at relcache.c:2256
#7  0x008abc65 in LocalExecuteInvalidationMessage
(msg=0x28c1260) at inval.c:567
#8  0x0073e1f5 in ReorderBufferExecuteInvalidations
(rb=0x28b0318, txn=0x28b0430) at reorderbuffer.c:1798
#9  0x0073ddbf in ReorderBufferForget (rb=0x28b0318, xid=1279,
lsn=24154944) at reorderbuffer.c:1645
#10 0x007383f8 in DecodeCommit (ctx=0x2894658,
buf=0x7fff3e658c30, xid=1279, dboid=12736, commit_time=461418357793855,
nsubxacts=0, sub_xids=0x28af650, ninval_msgs=69, msgs=0x28af650) at
decode.c:539
#11 0x00737c91 in DecodeXactOp (ctx=0x2894658,
buf=0x7fff3e658c30) at decode.c:207
#12 0x007379ce in LogicalDecodingProcessRecord (ctx=0x2894658,
record=0x28af610) at decode.c:103
#13 0x0073b0d6 in pg_logical_slot_get_changes_guts
(fcinfo=0x7fff3e658f50, confirm=1 '\001', binary=0 '\000') at
logicalfuncs.c:432
#14 0x0073b221 in pg_logical_slot_get_changes
(fcinfo=0x7fff3e658f50) at logicalfuncs.c:478
#15 0x0063e1a3 in ExecMakeTableFunctionResult
(funcexpr=0x288be78, econtext=0x288b758, argContext=0x28b5580,
expectedDesc=0x288ccd8, randomAccess=0 '\000') at execQual.c:2157
#16 0x0065e302 in FunctionNext (node=0x288ba18) at
nodeFunctionscan.c:95
#17 0x0064537e in ExecScanFetch (node=0x288ba18,
accessMtd=0x65e24b FunctionNext, recheckMtd=0x65e630 FunctionRecheck)
at execScan.c:82
#18 0x006453ed in ExecScan (node=0x288ba18, accessMtd=0x65e24b
FunctionNext, recheckMtd=0x65e630 FunctionRecheck)
at execScan.c:132
#19 0x0065e665 in ExecFunctionScan (node=0x288ba18) at
nodeFunctionscan.c:269
#20 0x0063a649 in ExecProcNode (node=0x288ba18) at
execProcnode.c:426
#21 0x0065ca53 in ExecModifyTable (node=0x288b8c0) at
nodeModifyTable.c:926
#22 0x0063a577 in ExecProcNode (node=0x288b8c0) at
execProcnode.c:377
#23 0x00638465 in ExecutePlan (estate=0x288b518,
planstate=0x288b8c0, operation=CMD_INSERT, sendTuples=0 '\000',
numberTuples=0,
direction=ForwardScanDirection, dest=0x2815ac0) at execMain.c:1475
#24 0x0063667a in standard_ExecutorRun (queryDesc=0x288f948,
direction=ForwardScanDirection, count=0) at execMain.c:308
#25 0x00636512 in ExecutorRun (queryDesc=0x288f948,
direction=ForwardScanDirection, count=0) at execMain.c:256
#26 0x007998ec in ProcessQuery (plan=0x28159c8,
sourceText=0x2854d18 INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);, params=0x0,
dest=0x2815ac0,
completionTag=0x7fff3e659920 ) at pquery.c:185
#27 0x0079b16e in PortalRunMulti (portal=0x2890638, isTopLevel=1
'\001', dest=0x2815ac0, altdest=0x2815ac0,
completionTag=0x7fff3e659920 ) at pquery.c:1279
#28 0x0079a7b1 in PortalRun (portal=0x2890638,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x2815ac0,
altdest=0x2815ac0,
completionTag=0x7fff3e659920 ) at pquery.c:816
#29 0x0079487b in exec_simple_query (
query_string=0x2854d18 INSERT INTO a SELECT * FROM
pg_logical_slot_get_changes('my_slot', NULL, NULL);) at 

Re: [HACKERS] Minmax indexes

2014-08-15 Thread Fujii Masao
On Fri, Aug 15, 2014 at 8:02 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Alvaro Herrera wrote:
 Heikki Linnakangas wrote:

  I'm sure this still needs some cleanup, but here's the patch, based
  on your v14. Now that I know what this approach looks like, I still
  like it much better. The insert and update code is somewhat more
  complicated, because you have to be careful to lock the old page,
  new page, and revmap page in the right order. But it's not too bad,
  and it gets rid of all the complexity in vacuum.

 It seems there is some issue here, because pageinspect tells me the
 index is not growing properly for some reason.  minmax_revmap_data gives
 me this array of TIDs after a bunch of insert/vacuum/delete/ etc:

 I fixed this issue, and did a lot more rework and bugfixing.  Here's
 v15, based on v14-heikki2.

I've not read the patch yet. But while testing the feature, I found that

* Brin index cannot be created on CHAR(n) column.
   Maybe other data types have the same problem.

* FILLFACTOR cannot be set in brin index.

Are these intentional?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another logical decoding assertion failure

2014-08-15 Thread Andres Freund
On 2014-08-15 14:53:45 +0200, Antonin Houska wrote:
 http://www.postgresql.org/message-id/blu436-smtp12682d628f61ab9736099c3dc...@phx.gbl
 recalls me that I also saw an assertion failure recently. Although I
 wanted to isolate and report my issue when my vacation is over, this
 report made me curious whether I saw the same. Eventually it seems to be
 a different symptom.

That's something separate, yes.

 Following are the steps that trigger the failure in my environment
 (9.5devel), although it's not always that straightforward - sometimes I
 need to create and populate one more table. (I use the 'test_decoding'
 contrib module.)
 
 
 postgres=# SELECT pg_create_logical_replication_slot('my_slot',
 'test_decoding');
  pg_create_logical_replication_slot
 
  (my_slot,0/16F3B30)
 (1 row)
 
 postgres=# CREATE TABLE a AS SELECT * FROM
 pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA;
 SELECT 0
 postgres=# INSERT INTO a SELECT * FROM
 pg_logical_slot_get_changes('my_slot', NULL, NULL);
 INSERT 0 2
 postgres=# INSERT INTO a SELECT * FROM
 pg_logical_slot_get_changes('my_slot', NULL, NULL);
 The connection to the server was lost. Attempting reset: Failed.
 !

Is this just to reproduce the issue, or are you really storing the
results of logical decoding in a table again?

Why? That'll just result in circularity, no?  It's not something I
really thought about allowing when writing this. Possibly we can make it
work, but I don't really see the point. We obviously shouldn't just
crash, but that's not my point.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench --tuple-size option

2014-08-15 Thread Fabien COELHO


Hmmm. This would mean much more changes than the pretty trivial patch I 
submitted


FWIW, I find that patch really ugly. Adding the filler's with in a
printf, after the actual DDL declaration. Without so much as a
comment. Brr.


Indeed. I'm not too proud of that very point either:-) You are right that 
it deserves at the minimum a clear comment. To put the varying size in the 
DDL string means vsprintf and splitting the query building some more, 
which I do not find desirable.



[...]
Well, it's something more generic, because it allows you do do more...


Apart from I do not need it (at least right now), and that it is more 
work, my opinion is that it would be rejected. Not a strong insentive to 
spend time in that direction.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL

2014-08-15 Thread Robert Haas
On Thu, Aug 14, 2014 at 12:08 PM, Baker, Keith [OCDUS Non-JJ]
kbak...@its.jnj.com wrote:
 I tried a combination of PIPE lock and file lock (fcntl) as Tom had suggested.
 Attached experimental patch has this logic...

 Postmaster :
 - get exclusive fcntl lock (to guard against race condition in PIPE-based 
 lock)
 - check PIPE for any existing readers
 - open PIPE for read

 All other backends:
 - get shared fcnlt lock
 - open PIPE for read

Hmm.  This seems like it might almost work.  But I don't see why the
other backends need to care about fcntl() at all.  How about this
locking protocol:

Postmaster:
1. Acquire an exclusive lock on some file in the data directory, maybe
the control file, using fcntl().
2. Open the named pipe for read.
3. Open the named pipe for write.
4. Close the named pipe for read.
5. Install a signal handler for SIGPIPE which sets a global variable.
6. Try to write to the pipe.
7. Check that the variable is set; if not, FATAL.
8. Revert SIGPIPE handler.
9. Close the named pipe for write.
10. Open the named pipe for read.
11. Release the fcntl() lock acquired in step 1.

Regular backends don't need to do anything special, except that they
need to make sure that the file descriptor opened in step 8 gets
inherited by the right set of processes.  That means that the
close-on-exec flag should be turned on in the postmaster; except in
EXEC_BACKEND builds, where it should be turned off but then turned on
again by child processes before they do anything that might fork.

It's impossible for two postmasters to start up at the same time
because the fcntl() lock acquired at step 1 will block any
newly-arriving postmaster until step 11 is completel.  The
first-to-close semantics of fcntl() aren't a problem for this purpose
because we only execute a very limited amount of code over which we
have full control while holding the lock.  By the time the postmaster
that gets the lock first completes step 10, any later-arriving
postmaster is guaranteed to fall out at step 7 while that postmaster
or any children who inherit the pipe descriptor remain alive.  No
process holds any resource that will survive its exit, so cleanup is
fully automatic.

This seems solid to me, but watch somebody find a problem with it...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Fix search_path default value separator.

2014-08-15 Thread Fujii Masao
On Tue, Jul 15, 2014 at 12:20 AM, Christoph Martin
christoph.r.mar...@gmail.com wrote:
 True. Both variants are legal, and most people won't ever notice. I stumbled
 across this while writing a test case for a transaction helper that
 sets/restores search_path before committing. The test was to simply compare
 the string values of `SHOW search_path;` before `BEGIN TRANSACTION;` and
 after `COMMIT;`.

 It's a non-issue, really, but since there's a patch and I cannot come up
 with a more common use case that would depend on the use of just-comma
 separators in the default value, I'd say it's more of a question of why
 not instead of why, isn't it?


 On 14 July 2014 16:58, Robert Haas robertmh...@gmail.com wrote:

 On Fri, Jul 11, 2014 at 6:09 AM, Christoph Martin
 christoph.r.mar...@gmail.com wrote:
  I noticed a minor inconsistency with the search_path separator used in
  the
  default configuration.
 
  The schemas of any search_path set using `SET search_path TO...` are
  separated by ,  (comma, space), while the default value is only
  separated
  by , (comma).
 
  The attached patch against master changes the separator of the default
  value
  to be consistent with the usual comma-space separators, and updates the
  documentation of `SHOW search_path;` accordingly.
 
  This massive three-byte change passes all 144 tests of make check.

 Heh.  I'm not particularly averse to changing this, but I guess I
 don't see any particular benefit of changing it either.  Either comma
 or comma-space is a legal separator, so why worry about it?

This change might cause me to update the existing documents (which
I need to maintain in my company) including the output example of
default search_path. If the change is for the improvement, I'd be
happy to do that, but it seems not.

Also there might be some PostgreSQL extensions which their regression test
shows the default search_path. This patch would make their developers
spend the time to update the test. I'm sure that they are fine with that if
it's for an improvement. But not.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 logical decoding assertion

2014-08-15 Thread Andres Freund
On 2014-08-14 16:03:08 -0400, Steve Singer wrote:
 I hit the following on 9.4 testing logical decoding.
 
 
 TRAP: FailedAssertion(!(prev_first_lsn  cur_txn-first_lsn), File:
 reorderbuffer.c, Line: 618)
 LOG:  server process (PID 3801) was terminated by signal 6: Aborted

I saw that recently while hacking around, but I thought it was because
of stuff I'd added. But apparently not.

Hm. I think I see how that might happen. It might be possible (and
harmless) if two subxacts of the same toplevel xact have the same
first_lsn. But if it's not just = vs  it'd be worse.

 Unfortunately I don't have a core file and I haven't been able to reproduce
 this.

Any information about the workload? Any chance you still have the data
directory around?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-15 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-08-14 12:21:38 -0400, Robert Haas wrote:
 And what does that actually do?  Send back a result-set, or a new
 protocol message?

 What I was thinking of was to return COMMIT X/X instead of
 COMMIT. Since that's only sent when COMMIT WITH (report_commit_lsn ON)
 was set it won't break clients/libraries that don't need it.

Au contraire: it will break any piece of code that is expecting a COMMIT
command tag to look like exactly COMMIT and not COMMIT something.
If you think there isn't any such, a look into libpq should disabuse you
of that notion.  (Admittedly, libpq's instance is only in the protocol-V2
code paths, but I'm sure that similar code exists elsewhere client-side.)

The risk still remains, therefore, that one layer of the client-side
software stack might try to enable this feature even though another
layer is not prepared for it.  Changing the command tag might actually
be *more* dangerous than a new protocol message, rather than less so,
because command tags are usually exposed at multiple layers of the stack
--- libpq for instance happily returns them up to its caller.  So it
will be somewhere between difficult and impossible to be sure that one
has fixed everything that needs fixing.

And, again, I think that controlling this via something as widely
changeable as a GUC is sheer folly, potentially even reaching the point
of being a security bug.  (Applications that fail to recognize when
their transactions have committed would be broken in very nasty ways.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak

2014-08-15 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Aug 14, 2014 at 6:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If it weren't for the fact that we'll be wedded forevermore to a bad
 choice of syntax, I might agree with you.  But at this point, the
 alternatives we have are to fix it now, or fix it never.  I don't
 like #2.

 Or postpone the release for another month or two.  There's still a few
 other unresolved issues, too, like the problems with psql and expanded
 mode; and the JSONB toast problems.  The latter is relatively new, but
 we don't have a proposed patch for it yet unless there's on in an
 email I haven't read yet, and the former has been lingering for many
 months without getting appreciably closer to a resolution.

Yeah, if we do anything about JSONB we are certainly going to need another
beta release, which means final couldn't be before end of Sept. at the
very earliest.  Still, it's always been project policy that we ship when
it's ready, not when the calendar hits some particular point.

As for the expanded-mode changes, I thought there was consensus to
revert that from 9.4.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Fix search_path default value separator.

2014-08-15 Thread Bruce Momjian
On Fri, Aug 15, 2014 at 10:40:59PM +0900, Fujii Masao wrote:
  Heh.  I'm not particularly averse to changing this, but I guess I
  don't see any particular benefit of changing it either.  Either comma
  or comma-space is a legal separator, so why worry about it?
 
 This change might cause me to update the existing documents (which
 I need to maintain in my company) including the output example of
 default search_path. If the change is for the improvement, I'd be
 happy to do that, but it seems not.
 
 Also there might be some PostgreSQL extensions which their regression test
 shows the default search_path. This patch would make their developers
 spend the time to update the test. I'm sure that they are fine with that if
 it's for an improvement. But not.

Well, rename GUC often too for clearity, so I don't see adjusting
white-space as something to avoid either.  It is always about short-term
adjustments vs. long-term clarity.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Reporting the commit LSN at commit time

2014-08-15 Thread Andres Freund
On 2014-08-15 09:54:01 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-08-14 12:21:38 -0400, Robert Haas wrote:
  And what does that actually do?  Send back a result-set, or a new
  protocol message?
 
  What I was thinking of was to return COMMIT X/X instead of
  COMMIT. Since that's only sent when COMMIT WITH (report_commit_lsn ON)
  was set it won't break clients/libraries that don't need it.
 
 Au contraire: it will break any piece of code that is expecting a COMMIT
 command tag to look like exactly COMMIT and not COMMIT something.

Well, if your code doesn't support it. Don't use it.

 The risk still remains, therefore, that one layer of the client-side
 software stack might try to enable this feature even though another
 layer is not prepared for it.

Well, then the user will have to fix that. It's not like the feature
will magically start to be used by itself.

One alternative would be to expose a pg_get_last_commit_lsn(); function
that'd return the the last commit's lsn stored in a static variable if
set. That'll increase the window in which the connection can break, but
that window already exists.

 And, again, I think that controlling this via something as widely
 changeable as a GUC is sheer folly, potentially even reaching the point
 of being a security bug.  (Applications that fail to recognize when
 their transactions have committed would be broken in very nasty ways.)

*I*'ve never suggested making this depend on a guc. I think that'd a
major PITA.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 logical decoding assertion

2014-08-15 Thread Steve Singer

On 08/15/2014 09:42 AM, Andres Freund wrote:

On 2014-08-14 16:03:08 -0400, Steve Singer wrote:

I hit the following on 9.4 testing logical decoding.


TRAP: FailedAssertion(!(prev_first_lsn  cur_txn-first_lsn), File:
reorderbuffer.c, Line: 618)
LOG:  server process (PID 3801) was terminated by signal 6: Aborted

I saw that recently while hacking around, but I thought it was because
of stuff I'd added. But apparently not.

Hm. I think I see how that might happen. It might be possible (and
harmless) if two subxacts of the same toplevel xact have the same
first_lsn. But if it's not just = vs  it'd be worse.


Unfortunately I don't have a core file and I haven't been able to reproduce
this.

Any information about the workload? Any chance you still have the data
directory around?


I was running the slony regression tests  but  I ran the same tests 
script after a number of times after and the problem didn't reproduce 
itself.


The last thing the tests did before the crash was part of the slony 
failover process.


I am doing my testing running with all 5 nodes/databases under the same 
postmaster (giving something like 20 replication slots open)


A few milliseconds before the one of the connections had just done a
START_REPLICATION SLOT slon_4_2 LOGICAL 0/32721A58

and then that connection reported the socket being closed,

but because so much was going on concurrently I can't say for sure if 
that connection experienced the assert or was closed because another 
backend asserted.



I haven't done an initdb since, so I have the data directory but I've 
dropped and recreated all of my slots many times since so the wal files 
are long gone.




Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-15 Thread Fujii Masao
On Thu, Aug 14, 2014 at 1:52 PM, MauMau maumau...@gmail.com wrote:
 I fixed some minor mistakes.

What's the main purpose of this tool? If it's for WAL archiving, the tool name
pg_copy sounds too generic. We already have pg_archivecleanup, so maybe
pg_archivecopy or something is better for the consistency?

pg_copy in the patch copies the file to the destination in a
straightforward way,
i.e., directly copies the file to the dest file with actual name. This can cause
the problem which some people reported. The problem is that, when the server
crashes while WAL file is being archived by cp command, its partially-filled
WAL file remains at the archival area. This half-baked archive file can cause
various troubles. To address this, WAL file needs to be copied to the temporary
file at first, then renamed to the actual name. I think that pg_copy should
copy the WAL file in that way.

Currently pg_copy always syncs the archive file, and there is no way to disable
that. But I'm sure that not everyone want to sync the archive file. So I think
that it's better to add the option specifying whether to sync the file
or not, into
pg_copy.

Some users might want to specify whether to call posix_fadvise or not because
they might need to re-read the archvied files just after the archiving.
For example, network copy of the archived files from the archive area to
remote site for disaster recovery.

Do you recommend to use pg_copy for restore_command? If yes, it also should
be documented. And in the WAL restore case, the restored WAL files are re-read
soon by recovery, so posix_fadvise is not good in that case.

Direct I/O and posix_fadvise are used only for destination file. But why not
source file? That might be useful especially for restore_command case.

At last, the big question is, is there really no OS command which provides
the same functionality as pg_copy does? If there is, I'd like to avoid duplicate
work basically.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump bug in 9.4beta2 and HEAD

2014-08-15 Thread Pavel Stehule
Hi


2014-08-14 9:03 GMT+02:00 Heikki Linnakangas hlinnakan...@vmware.com:

 On 08/14/2014 06:53 AM, Joachim Wieland wrote:

 I'm seeing an assertion failure with pg_dump -c --if-exists which is
 not ready to handle BLOBs it seems:

 pg_dump: pg_backup_archiver.c:472: RestoreArchive: Assertion `mark !=
 ((void *)0)' failed.

 To reproduce:

 $ createdb test
 $ pg_dump -c --if-exists test  (works, dumps empty database)
 $ psql test -c select lo_create(1);
 $ pg_dump -c --if-exists test  (fails, with the above mentioned assertion)


 The code tries to inject an IF EXISTS into the already-construct DROP
 command, but it doesn't work for large objects, because the deletion
 command looks like SELECT pg_catalog.lo_unlink(xxx). There is no DROP
 there.

 I believe we could use SELECT pg_catalog.lo_unlink(loid) FROM
 pg_catalog.pg_largeobject_metadata WHERE loid = xxx.
 pg_largeobject_metadata table didn't exist before version 9.0, but we don't
 guarantee pg_dump's output to be compatible in that direction anyway, so I
 think that's fine.

 The quick fix would be to add an exception for blobs, close to where
 Assert is. There are a few exceptions there already. A cleaner solution
 would be to add a new argument to ArchiveEntry and make the callers
 responsible for providing an DROP IF EXISTS query, but that's not too
 appetizing because for most callers it would be boring boilerplate code.
 Perhaps add an argument, but if it's NULL, ArchiveEntry would form the
 if-exists query automatically from the DROP query.


I am sending two patches

first is fast fix

second fix is implementation of Heikki' idea.




 - Heikki



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

commit 68ba9d3b682c6e56b08d623ebc58e351ce1a6c55
Author: Pavel Stehule pavel.steh...@gooddata.com
Date:   Fri Aug 15 13:41:24 2014 +0200

heikki-lo-if-exists-fix

diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c
index 3aebac8..0049be7 100644
--- a/src/bin/pg_dump/pg_backup_archiver.c
+++ b/src/bin/pg_dump/pg_backup_archiver.c
@@ -429,60 +429,100 @@ RestoreArchive(Archive *AHX)
 	}
 	else
 	{
-		char		buffer[40];
-		char	   *mark;
-		char	   *dropStmt = pg_strdup(te-dropStmt);
-		char	   *dropStmtPtr = dropStmt;
-		PQExpBuffer ftStmt = createPQExpBuffer();
-
 		/*
-		 * Need to inject IF EXISTS clause after ALTER TABLE
-		 * part in ALTER TABLE .. DROP statement
+		 * LO doesn't use DDL stmts, instead it uses lo_unlink
+		 * functions. But it should be injected too.
 		 */
-		if (strncmp(dropStmt, ALTER TABLE, 11) == 0)
+		if (strncmp(te-desc, BLOB, 4) == 0)
 		{
-			appendPQExpBuffer(ftStmt,
-			  ALTER TABLE IF EXISTS);
-			dropStmt = dropStmt + 11;
+			Oid	   loid = InvalidOid;
+			char	   *mark;
+
+			/* find and take foid */
+			if (strncmp(te-dropStmt, SELECT pg_catalog.lo_unlink(', 29) == 0)
+			{
+unsigned long cvt;
+char	   *endptr;
+			
+mark = te-dropStmt + 29;
+
+if (*mark != '\0')
+{
+	errno = 0;
+	cvt = strtoul(mark, endptr, 10);
+
+	/* Ensure we have a valid Oid */
+	if (errno == 0  mark != endptr  *endptr == '\'')
+		loid = (Oid) cvt;
+}
+			}
+
+			if (loid == InvalidOid)
+exit_horribly(modulename, cannot to identify oid of large object\n);
+
+			/*
+			 * Now we have valid foid and we can generate
+			 * fault tolerant (not existency) SQL statement.
+			 */
+			DropBlobIfExists(AH, loid);
 		}
-
-		/*
-		 * ALTER TABLE..ALTER COLUMN..DROP DEFAULT does not
-		 * support the IF EXISTS clause, and therefore we
-		 * simply emit the original command for such objects.
-		 * For other objects, we need to extract the first
-		 * part of the DROP which includes the object type.
-		 * Most of the time this matches te-desc, so search
-		 * for that; however for the different kinds of
-		 * CONSTRAINTs, we know to search for hardcoded DROP
-		 * CONSTRAINT instead.
-		 */
-		if (strcmp(te-desc, DEFAULT) == 0)
-			appendPQExpBuffer(ftStmt, %s, dropStmt);
 		else
 		{
-			if (strcmp(te-desc, CONSTRAINT) == 0 ||
-strcmp(te-desc, CHECK CONSTRAINT) == 0 ||
-strcmp(te-desc, FK CONSTRAINT) == 0)
-strcpy(buffer, DROP CONSTRAINT);
+			char		buffer[40];
+			char	   *mark;
+			char	   *dropStmt = pg_strdup(te-dropStmt);
+			char	   *dropStmtPtr = dropStmt;
+			PQExpBuffer ftStmt = createPQExpBuffer();
+
+			/*
+			 * Need to inject IF EXISTS clause after ALTER TABLE
+			 * part in ALTER TABLE .. DROP statement
+			 */
+			if (strncmp(dropStmt, ALTER TABLE, 11) == 0)
+			{
+appendPQExpBuffer(ftStmt,
+  ALTER TABLE IF 

Re: [HACKERS] [patch] pg_copy - a command for reliable WAL archiving

2014-08-15 Thread Kevin Grittner
Fujii Masao masao.fu...@gmail.com wrote:

 Direct I/O and posix_fadvise are used only for destination file.
 But why not source file? That might be useful especially for
 restore_command case.

That would prevent people from piping the file through a
compression utility.  We should support piped I/O for input (and if
we want to use it on the recovery side, for the output, too), at
least as an option.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump bug in 9.4beta2 and HEAD

2014-08-15 Thread David Fetter
On Thu, Aug 14, 2014 at 10:03:57AM +0300, Heikki Linnakangas wrote:
 On 08/14/2014 06:53 AM, Joachim Wieland wrote:
 I'm seeing an assertion failure with pg_dump -c --if-exists which is
 not ready to handle BLOBs it seems:
 
 pg_dump: pg_backup_archiver.c:472: RestoreArchive: Assertion `mark !=
 ((void *)0)' failed.
 
 To reproduce:
 
 $ createdb test
 $ pg_dump -c --if-exists test  (works, dumps empty database)
 $ psql test -c select lo_create(1);
 $ pg_dump -c --if-exists test  (fails, with the above mentioned assertion)
 
 The code tries to inject an IF EXISTS into the already-construct DROP
 command, but it doesn't work for large objects, because the deletion command
 looks like SELECT pg_catalog.lo_unlink(xxx). There is no DROP there.

The lo_* functions are probably too entrenched to be deprecated, but
maybe we could come up with DML (or DDL, although that seems like a
bridge too far) equivalents and use those.  Not for 9.4, obviously.

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL

2014-08-15 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 How about this locking protocol:

 Postmaster:
 1. Acquire an exclusive lock on some file in the data directory, maybe
 the control file, using fcntl().
 2. Open the named pipe for read.
 3. Open the named pipe for write.
 4. Close the named pipe for read.
 5. Install a signal handler for SIGPIPE which sets a global variable.
 6. Try to write to the pipe.
 7. Check that the variable is set; if not, FATAL.
 8. Revert SIGPIPE handler.
 9. Close the named pipe for write.
 10. Open the named pipe for read.
 11. Release the fcntl() lock acquired in step 1.

Hm, this seems like it would work.  A couple other thoughts:

* I think 5..8 are overly complex: we can just set SIGPIPE to SIG_IGN
(which is its usual setting in the postmaster already) and check for
EPIPE from the write().

* There might be some benefit to swapping steps 9 and 10; at the
very least, this would eliminate the need to use O_NONBLOCK while
re-opening for read.

* We talked about combining this technique with a plain file lock
so that we would have belt-and-suspenders protection, in particular
something that would have a chance of working across NFS clients.
This would suggest leaving the fcntl lock in place, ie, don't do
step 11, and also that the file-to-be-locked *not* have any other
purpose (which would only increase the risk of losing the lock
through careless open/close).

 Regular backends don't need to do anything special, except that they
 need to make sure that the file descriptor opened in step 8 gets
 inherited by the right set of processes.  That means that the
 close-on-exec flag should be turned on in the postmaster; except in
 EXEC_BACKEND builds, where it should be turned off but then turned on
 again by child processes before they do anything that might fork.

Meh.  Do we really want to allow a new postmaster to start if there
are any processes remaining that were launched by backends?  I'd
be inclined to just suppress close-on-exec, period.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Another logical decoding assertion failure

2014-08-15 Thread Antonin Houska
On 08/15/2014 03:16 PM, Andres Freund wrote:
 On 2014-08-15 14:53:45 +0200, Antonin Houska wrote:
 postgres=# SELECT pg_create_logical_replication_slot('my_slot',
 'test_decoding');
  pg_create_logical_replication_slot
 
  (my_slot,0/16F3B30)
 (1 row)

 postgres=# CREATE TABLE a AS SELECT * FROM
 pg_logical_slot_get_changes('my_slot', NULL, NULL) WITH NO DATA;
 SELECT 0
 postgres=# INSERT INTO a SELECT * FROM
 pg_logical_slot_get_changes('my_slot', NULL, NULL);
 INSERT 0 2
 postgres=# INSERT INTO a SELECT * FROM
 pg_logical_slot_get_changes('my_slot', NULL, NULL);
 The connection to the server was lost. Attempting reset: Failed.
 !
 
 Is this just to reproduce the issue, or are you really storing the
 results of logical decoding in a table again?
 
 Why? That'll just result in circularity, no?  It's not something I
 really thought about allowing when writing this. Possibly we can make it
 work, but I don't really see the point. We obviously shouldn't just
 crash, but that's not my point.

It was basically an experiment. I tried to capture changes into a table.
I don't know whether it's legal (i.e. whether the current call of
pg_logical_slot_get_changes() should include changes that the current
transaction did). Unloged / temporary table may be necessary for case
like this.

The reason I report it is that (I think), if such an use case is not
legal, it should be detected somehow and handled using ereport / elog.

// Tony



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement

2014-08-15 Thread Jeff Janes
On Tue, Aug 12, 2014 at 10:52 AM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:

 On 08/06/2014 08:37 PM, Jeff Janes wrote:

 But now it looks like 0002 needs a rebase


 I've committed the refactoring patch, and here's a rebased and improved
 version of the Windows SChannel implementation over that.


On MinGW, I get the following error when compiling with options
--host=x86_64-w64-mingw32 --without-zlib:

be-secure.c: In function 'secure_open_server':
be-secure.c:106:2: error: 'Port' has no member named 'peer_cn'
be-secure.c:106:2: error: 'Port' has no member named 'peer_cn'
make[3]: *** [be-secure.o] Error 1
make[2]: *** [libpq-recursive] Error 2
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2

Should the ereport DEBUG2 be inside the #ifdef USE_SSL?

Thanks,

Jeff


Re: [HACKERS] Minmax indexes

2014-08-15 Thread Heikki Linnakangas

On 08/15/2014 02:02 AM, Alvaro Herrera wrote:

Alvaro Herrera wrote:

Heikki Linnakangas wrote:



I'm sure this still needs some cleanup, but here's the patch, based
on your v14. Now that I know what this approach looks like, I still
like it much better. The insert and update code is somewhat more
complicated, because you have to be careful to lock the old page,
new page, and revmap page in the right order. But it's not too bad,
and it gets rid of all the complexity in vacuum.


It seems there is some issue here, because pageinspect tells me the
index is not growing properly for some reason.  minmax_revmap_data gives
me this array of TIDs after a bunch of insert/vacuum/delete/ etc:


I fixed this issue, and did a lot more rework and bugfixing.  Here's
v15, based on v14-heikki2.


So, the other design change I've been advocating is to store the revmap 
in the first N blocks, instead of having the two-level structure with 
array pages and revmap pages.


Attached is a patch for that, to be applied after v15. When the revmap 
needs to be expanded, all the tuples on it are moved elsewhere 
one-by-one. That adds some latency to the unfortunate guy who needs to 
do that, but as the patch stands, the revmap is only ever extended by 
VACUUM or CREATE INDEX, so I think that's fine. Like with my previous 
patch, the point is to demonstrate how much simpler the code becomes 
this way; I'm sure there are bugs and cleanup still necessary.


PS. Spotted one oversight in patch v15: callers of mm_doupdate must 
check the return value, and retry the operation if it returns false.


- Heikki

commit ce4df0e9dbd43f7e3d4fdf3f7920301f81f17d63
Author: Heikki Linnakangas heikki.linnakan...@iki.fi
Date:   Fri Aug 15 18:32:19 2014 +0300

Get rid of array pages. Instead, move all tuples out of the way

diff --git a/contrib/pageinspect/mmfuncs.c b/contrib/pageinspect/mmfuncs.c
index 6cd559a..51cc9e2 100644
--- a/contrib/pageinspect/mmfuncs.c
+++ b/contrib/pageinspect/mmfuncs.c
@@ -74,9 +74,6 @@ minmax_page_type(PG_FUNCTION_ARGS)
 		case MINMAX_PAGETYPE_META:
 			type = meta;
 			break;
-		case MINMAX_PAGETYPE_REVMAP_ARRAY:
-			type = revmap array;
-			break;
 		case MINMAX_PAGETYPE_REVMAP:
 			type = revmap;
 			break;
@@ -343,11 +340,9 @@ minmax_metapage_info(PG_FUNCTION_ARGS)
 	Page		page;
 	MinmaxMetaPageData *meta;
 	TupleDesc	tupdesc;
-	Datum		values[3];
-	bool		nulls[3];
-	ArrayBuildState *astate = NULL;
+	Datum		values[4];
+	bool		nulls[4];
 	HeapTuple	htup;
-	int			i;
 
 	page = verify_minmax_page(raw_page, MINMAX_PAGETYPE_META, metapage);
 
@@ -361,22 +356,8 @@ minmax_metapage_info(PG_FUNCTION_ARGS)
 	MemSet(nulls, 0, sizeof(nulls));
 	values[0] = CStringGetTextDatum(psprintf(0x%08X, meta-minmaxMagic));
 	values[1] = Int32GetDatum(meta-minmaxVersion);
-
-	/* Extract (possibly empty) list of revmap array page numbers. */
-	for (i = 0; i  MAX_REVMAP_ARRAYPAGES; i++)
-	{
-		BlockNumber	blkno;
-
-		blkno = meta-revmapArrayPages[i];
-		if (blkno == InvalidBlockNumber)
-			break;	/* XXX or continue? */
-		astate = accumArrayResult(astate, Int64GetDatum((int64) blkno),
-  false, INT8OID, CurrentMemoryContext);
-	}
-	if (astate == NULL)
-		nulls[2] = true;
-	else
-		values[2] = makeArrayResult(astate, CurrentMemoryContext);
+	values[2] = Int32GetDatum(meta-pagesPerRange);
+	values[3] = Int64GetDatum(meta-lastRevmapPage);
 
 	htup = heap_form_tuple(tupdesc, values, nulls);
 
@@ -384,34 +365,6 @@ minmax_metapage_info(PG_FUNCTION_ARGS)
 }
 
 /*
- * Return the BlockNumber array stored in a revmap array page
- */
-Datum
-minmax_revmap_array_data(PG_FUNCTION_ARGS)
-{
-	bytea	   *raw_page = PG_GETARG_BYTEA_P(0);
-	Page		page;
-	ArrayBuildState *astate = NULL;
-	RevmapArrayContents *contents;
-	Datum		blkarr;
-	int			i;
-
-	page = verify_minmax_page(raw_page, MINMAX_PAGETYPE_REVMAP_ARRAY,
-			  revmap array);
-
-	contents = (RevmapArrayContents *) PageGetContents(page);
-
-	for (i = 0; i  contents-rma_nblocks; i++)
-		astate = accumArrayResult(astate,
-  Int64GetDatum((int64) contents-rma_blocks[i]),
-  false, INT8OID, CurrentMemoryContext);
-	Assert(astate != NULL);
-
-	blkarr = makeArrayResult(astate, CurrentMemoryContext);
-	PG_RETURN_DATUM(blkarr);
-}
-
-/*
  * Return the TID array stored in a minmax revmap page
  */
 Datum
@@ -437,7 +390,7 @@ minmax_revmap_data(PG_FUNCTION_ARGS)
 	/* Extract values from the revmap page */
 	contents = (RevmapContents *) PageGetContents(page);
 	MemSet(nulls, 0, sizeof(nulls));
-	values[0] = Int64GetDatum((uint64) contents-rmr_logblk);
+	values[0] = Int64GetDatum((uint64) 0);
 
 	/* Extract (possibly empty) list of TIDs in this page. */
 	for (i = 0; i  REGULAR_REVMAP_PAGE_MAXITEMS; i++)
diff --git a/contrib/pageinspect/pageinspect--1.2.sql b/contrib/pageinspect/pageinspect--1.2.sql
index 56c9ba8..cba90ca 100644
--- a/contrib/pageinspect/pageinspect--1.2.sql
+++ b/contrib/pageinspect/pageinspect--1.2.sql
@@ -110,7 +110,7 @@ LANGUAGE C STRICT;
 -- 

Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement

2014-08-15 Thread Heikki Linnakangas

On 08/15/2014 08:16 PM, Jeff Janes wrote:

On Tue, Aug 12, 2014 at 10:52 AM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:


On 08/06/2014 08:37 PM, Jeff Janes wrote:


But now it looks like 0002 needs a rebase


I've committed the refactoring patch, and here's a rebased and improved
version of the Windows SChannel implementation over that.


On MinGW, I get the following error when compiling with options
--host=x86_64-w64-mingw32 --without-zlib:

be-secure.c: In function 'secure_open_server':
be-secure.c:106:2: error: 'Port' has no member named 'peer_cn'
be-secure.c:106:2: error: 'Port' has no member named 'peer_cn'
make[3]: *** [be-secure.o] Error 1
make[2]: *** [libpq-recursive] Error 2
make[1]: *** [all-backend-recurse] Error 2
make: *** [all-src-recurse] Error 2

Should the ereport DEBUG2 be inside the #ifdef USE_SSL?


Yeah.

I've been thinking though, perhaps we should always have the ssl_in_use, 
peer_cn and peer_cert_valid members in the Port struct. If not compiled 
with USE_SSL, they would just always be false/NULL. Then we wouldn't 
need #ifdefs around all the places that check hose fields either.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-15 Thread Robert Haas
On Thu, Aug 14, 2014 at 2:21 PM, Jeff Davis pg...@j-davis.com wrote:
 On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote:
 Oh?  So if we have aggregates like array_agg whose memory footprint
 increases over time, the patch completely fails to avoid bloat?

 Yes, in its current form.

 I might think a patch with such a limitation was useful, if it weren't
 for the fact that aggregates of that nature are a large part of the
 cases where the planner misestimates the table size in the first place.
 Any complication that we add to nodeAgg should be directed towards
 dealing with cases that the planner is likely to get wrong.

 In my experience, the planner has a lot of difficulty estimating the
 cardinality unless it's coming from a base table without any operators
 above it (other than maybe a simple predicate). This is probably a lot
 more common than array_agg problems, simply because array_agg is
 relatively rare compared with GROUP BY in general.

I think that's right, and I rather like your (Jeff's) approach.  It's
definitely true that we could do better if we have a mechanism for
serializing and deserializing group states, but (1) I think an awful
lot of cases would get an awful lot better even just with the approach
proposed here and (2) I doubt we would make the
serialization/deserialization interfaces mandatory, so even if we had
that we'd probably want a fallback strategy anyway.

Furthermore, I don't really see that we're backing ourselves into a
corner here.  If prohibiting creation of additional groups isn't
sufficient to control memory usage, but we have
serialization/deserialization functions, we can just pick an arbitrary
subset of the groups that we're storing in memory and spool their
transition states off to disk, thus reducing memory even further.  I
understand Tomas' point to be that this is quite different from what
we do for hash joins, but I think it's a different problem.  In the
case of a hash join, there are two streams of input tuples, and we've
got to batch them in compatible ways.  If we were to, say, exclude an
arbitrary subset of tuples from the hash table instead of basing it on
the hash code, we'd have to test *every* outer tuple against the hash
table for *every* batch.  That would incur a huge amount of additional
cost vs. being able to discard outer tuples once the batch to which
they pertain has been processed.

But the situation here isn't comparable, because there's only one
input stream.  I'm pretty sure we'll want to keep track of which
transition states we've spilled due to lack of memory as opposed to
those which were never present in the table at all, so that we can
segregate the unprocessed tuples that pertain to spilled transition
states from the ones that pertain to a group we haven't begun yet.
And it might be that if we know (or learn as we go along) that we're
going to vastly blow out work_mem it makes sense to use batching to
segregate the tuples that we decide not to process onto N tapes binned
by hash code, so that we have a better chance that future batches will
be the right size to fit in memory.  But I'm not convinced that
there's a compelling reason why the *first* batch has to be chosen by
hash code; we're actually best off picking any arbitrary set of groups
that does the best job reducing the amount of data remaining to be
processed, at least if the transition states are fixed size and maybe
even if they aren't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak

2014-08-15 Thread Robert Haas
On Fri, Aug 15, 2014 at 9:57 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Aug 14, 2014 at 6:12 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If it weren't for the fact that we'll be wedded forevermore to a bad
 choice of syntax, I might agree with you.  But at this point, the
 alternatives we have are to fix it now, or fix it never.  I don't
 like #2.

 Or postpone the release for another month or two.  There's still a few
 other unresolved issues, too, like the problems with psql and expanded
 mode; and the JSONB toast problems.  The latter is relatively new, but
 we don't have a proposed patch for it yet unless there's on in an
 email I haven't read yet, and the former has been lingering for many
 months without getting appreciably closer to a resolution.

 Yeah, if we do anything about JSONB we are certainly going to need another
 beta release, which means final couldn't be before end of Sept. at the
 very earliest.  Still, it's always been project policy that we ship when
 it's ready, not when the calendar hits some particular point.

Absolutely.

 As for the expanded-mode changes, I thought there was consensus to
 revert that from 9.4.

Me too.  In fact, I think that's been the consensus for many months,
but unless I'm mistaken it ain't done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minmax indexes

2014-08-15 Thread Alvaro Herrera
Fujii Masao wrote:

 I've not read the patch yet. But while testing the feature, I found that
 
 * Brin index cannot be created on CHAR(n) column.
Maybe other data types have the same problem.

Yeah, it's just a matter of adding an opclass for it -- pretty simple
stuff really, because you don't need to write any code, just add a bunch
of catalog entries and an OPCINFO line in mmsortable.c.

Right now there are opclasses for the following types:

int4
numeric
text
date
timestamp with time zone
timestamp
time with time zone
time
char

We can eventually extend to cover all types that have btree opclasses,
but we can do that in a separate commit.  I'm also considering removing
the opclass for time with time zone, as it's a pretty useless type.  I
mostly added the ones that are there as a way to test that it behaved
reasonably in the various cases (pass by val vs. not, variable width vs.
fixed, different alignment requirements)

Of course, the real interesting part is adding a completely different
opclass, such as one that stores bounding boxes.

 * FILLFACTOR cannot be set in brin index.

I hadn't added this one because I didn't think there was much point
previously, but I think it might now be useful to allow same-page
updates.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal to add a QNX 6.5 port to PostgreSQL

2014-08-15 Thread Robert Haas
On Fri, Aug 15, 2014 at 12:02 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 * I think 5..8 are overly complex: we can just set SIGPIPE to SIG_IGN
 (which is its usual setting in the postmaster already) and check for
 EPIPE from the write().

wfm.

 * There might be some benefit to swapping steps 9 and 10; at the
 very least, this would eliminate the need to use O_NONBLOCK while
 re-opening for read.

Also wfm.

 * We talked about combining this technique with a plain file lock
 so that we would have belt-and-suspenders protection, in particular
 something that would have a chance of working across NFS clients.
 This would suggest leaving the fcntl lock in place, ie, don't do
 step 11, and also that the file-to-be-locked *not* have any other
 purpose (which would only increase the risk of losing the lock
 through careless open/close).

I'd be afraid that a secondary mechanism that mostly-but-not-really
works could do more harm by allowing us to miss bugs in the primary,
pipe-based locking mechanism than the good it would accomplish.

 Regular backends don't need to do anything special, except that they
 need to make sure that the file descriptor opened in step 8 gets
 inherited by the right set of processes.  That means that the
 close-on-exec flag should be turned on in the postmaster; except in
 EXEC_BACKEND builds, where it should be turned off but then turned on
 again by child processes before they do anything that might fork.

 Meh.  Do we really want to allow a new postmaster to start if there
 are any processes remaining that were launched by backends?  I'd
 be inclined to just suppress close-on-exec, period.

Seems like a pretty weird and artificial restriction.  Anything that
has done exec() will not be connected to shared memory, so it really
doesn't matter whether it's still alive or not.  People can and do
write extensions that launch processes from PostgreSQL backends via
fork()+exec(), and we've taken pains in the past not to break such
cases.  I don't see a reason to impose now (for no
data-integrity-related reason) the rule that any such processes must
not be daemons.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How about a proper TEMPORARY TABLESPACE?

2014-08-15 Thread Fujii Masao
On Sun, Jun 29, 2014 at 3:19 AM, Matheus de Oliveira
matioli.math...@gmail.com wrote:
 Hi Hackers,

 I have worked on that patch a little more. So now I have functional patch
 (although still WIP) attached. The feature works as following:

 - Added a boolean parameter only_temp_files to pg_tablespace.spcoptions;
 - This parameter can be set to true only during CREATE TABLESPACE, not on
 ALTER TABLESPACE (I have thought of ways of implementing the latter, and I'd
 like to discuss it more latter);
 - On the creation of relations, it is checked if it is a
 temporary-tablespace, and an error occurs when it is and the relation is not
 temporary (temp table or index on a temp table);
 - When a temporary file (either relation file or sort/agg file) is created
 inside a temporary-tablespace, the entire directories structure is created
 on-demand (e.g. if pg_tblspc/oid/TABLESPACE_VERSION_DIRECTORY is
 missing, it is created on demand) it is done on
 OpenTemporaryFileInTablespace, at fd.c (I wonder if shouldn't we do that for
 any tablespace) and on TablespaceCreateDbspace, at tablespace.c.

 I still haven't change documentation, as I think I need some insights about
 the changes. I have some more thoughts about the syntax and I still think
 that TEMP LOCATION syntax is better suited for this patch. First because
 of the nature of the changes I made, it seems more suitable to a column on
 pg_tablespace rather than an option. Second because no ALTER is available
 (so far) and I think it is odd to have an option that can't be changed.
 Third, I think TEMP keyword is more clear and users can be more used to
 it.

 Thoughts?

 I'm going to add the CF app entry next. Could I get some review now or after
 discussion about how things are going (remember I'm a newbie on this, so I'm
 a little lost)?

I got the following warning messages at the compilation.

dbcommands.c:420: warning: implicit declaration of function
'is_tablespace_temp_only'
indexcmds.c:437: warning: implicit declaration of function
'is_tablespace_temp_only'
tablecmds.c:527: warning: implicit declaration of function
'is_tablespace_temp_only'

When I created temporary tablespace and executed pg_dumpall,
it generated the SQL ALTER TABLESPACE hoge SET (only_temp_files=on);.
This is clearly a bug because that SQL is not allowed to be executed.

 ERROR:  this tablespace only allows temporary files

I think that this ERROR message can be improved, for example, for the
database creation case, what about something like the following?

ERROR:  database cannot be created on temporary tablespace
HINT:  temporary tablespace is allowed to store only temporary files.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-15 Thread Josh Berkus
On 08/14/2014 07:24 PM, Tom Lane wrote:

 We can certainly reduce that.  The question was whether it would be
 worth the effort to try.  At this point, with three different test
 data sets having shown clear space savings, I think it is worth
 the effort.  I'll poke into it tomorrow or over the weekend, unless
 somebody beats me to it.

Note that I specifically created that data set to be a worst case: many
top-level keys, no nesting, and small values.  However, I don't think
it's an unrealistic worst case.

Interestingly, even on the unpatched, 1GB table case, the *index* on the
JSONB is only 60MB.  Which shows just how terrific the improvement in
GIN index size/performance is.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-15 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 On 08/14/2014 07:24 PM, Tom Lane wrote:
 We can certainly reduce that.  The question was whether it would be
 worth the effort to try.  At this point, with three different test
 data sets having shown clear space savings, I think it is worth
 the effort.  I'll poke into it tomorrow or over the weekend, unless
 somebody beats me to it.

 Note that I specifically created that data set to be a worst case: many
 top-level keys, no nesting, and small values.  However, I don't think
 it's an unrealistic worst case.

 Interestingly, even on the unpatched, 1GB table case, the *index* on the
 JSONB is only 60MB.  Which shows just how terrific the improvement in
 GIN index size/performance is.

I've been poking at this, and I think the main explanation for your result
is that with more JSONB documents being subject to compression, we're
spending more time in pglz_decompress.  There's no free lunch in that
department: if you want compressed storage it's gonna cost ya to
decompress.  The only way I can get decompression and TOAST access to not
dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE
PLAIN.  However, when I do that, I do see my test patch running about 25%
slower overall than HEAD on an explain analyze select jfield - 'key'
from table type of query with 200-key documents with narrow fields (see
attached perl script that generates the test data).

It seems difficult to improve much on that for this test case.  I put some
logic into findJsonbValueFromContainer to calculate the offset sums just
once not once per binary-search iteration, but that only improved matters
5% at best.  I still think it'd be worth modifying the JsonbIterator code
to avoid repetitive offset calculations, but that's not too relevant to
this test case.

Having said all that, I think this test is something of a contrived worst
case.  More realistic cases are likely to have many fewer keys (so that
speed of the binary search loop is less of an issue) or else to have total
document sizes large enough that inline PLAIN storage isn't an option,
meaning that detoast+decompression costs will dominate.

regards, tom lane

#! /usr/bin/perl

for (my $i = 0; $i  10; $i++) {
print {;
for (my $k = 1; $k = 200; $k++) {
	print ,  if $k  1;
	printf \k%d\: %d, $k, int(rand(10));
}
print }\n;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Supporting Windows SChannel as OpenSSL replacement

2014-08-15 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 08/15/2014 08:16 PM, Jeff Janes wrote:
 Should the ereport DEBUG2 be inside the #ifdef USE_SSL?

 Yeah.

 I've been thinking though, perhaps we should always have the ssl_in_use, 
 peer_cn and peer_cert_valid members in the Port struct. If not compiled 
 with USE_SSL, they would just always be false/NULL. Then we wouldn't 
 need #ifdefs around all the places that check hose fields either.

+1.  This would also make it less risky for add-on code to touch the Port
struct.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-15 Thread Arthur Silva
I'm still getting up to speed on postgres development but I'd like to leave
an opinion.

We should add some sort of versionning to the jsonb format. This can be
explored in the future in many ways.

As for the current problem, we should explore the directory at the end
option. It should improve compression and keep good access performance.

A 4 byte header is sufficient to store the directory offset and some
versionning bits.
 Em 15/08/2014 17:39, Tom Lane t...@sss.pgh.pa.us escreveu:

 Josh Berkus j...@agliodbs.com writes:
  On 08/14/2014 07:24 PM, Tom Lane wrote:
  We can certainly reduce that.  The question was whether it would be
  worth the effort to try.  At this point, with three different test
  data sets having shown clear space savings, I think it is worth
  the effort.  I'll poke into it tomorrow or over the weekend, unless
  somebody beats me to it.

  Note that I specifically created that data set to be a worst case: many
  top-level keys, no nesting, and small values.  However, I don't think
  it's an unrealistic worst case.

  Interestingly, even on the unpatched, 1GB table case, the *index* on the
  JSONB is only 60MB.  Which shows just how terrific the improvement in
  GIN index size/performance is.

 I've been poking at this, and I think the main explanation for your result
 is that with more JSONB documents being subject to compression, we're
 spending more time in pglz_decompress.  There's no free lunch in that
 department: if you want compressed storage it's gonna cost ya to
 decompress.  The only way I can get decompression and TOAST access to not
 dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE
 PLAIN.  However, when I do that, I do see my test patch running about 25%
 slower overall than HEAD on an explain analyze select jfield - 'key'
 from table type of query with 200-key documents with narrow fields (see
 attached perl script that generates the test data).

 It seems difficult to improve much on that for this test case.  I put some
 logic into findJsonbValueFromContainer to calculate the offset sums just
 once not once per binary-search iteration, but that only improved matters
 5% at best.  I still think it'd be worth modifying the JsonbIterator code
 to avoid repetitive offset calculations, but that's not too relevant to
 this test case.

 Having said all that, I think this test is something of a contrived worst
 case.  More realistic cases are likely to have many fewer keys (so that
 speed of the binary search loop is less of an issue) or else to have total
 document sizes large enough that inline PLAIN storage isn't an option,
 meaning that detoast+decompression costs will dominate.

 regards, tom lane



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




[HACKERS] GIST create index very very slow

2014-08-15 Thread worthy7
CREATE INDEX USING GIST(timerange);

On 1.3 million rows this took only 30 seconds.
on 70 million its already taken over a day. I swear it didn't take this long
on version 9.3


Is there some kind of known bug with GIST? CPU is at 4% or less and ram is
at 150mbs
IO usage is at 100% but most of it is writes? (like 3.5mbps!) which looks
good but actually the size of the disk is only increasing by like 8 BYTES
per second.

This is really odd and I don't want to wait an indefinite amount of time.





--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/GIST-create-index-very-very-slow-tp5815011.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New Minmax index for geometry data type?

2014-08-15 Thread Stefan Keller
Hi Alvaro

2014-08-15 20:16 GMT+02:00 you answered:
 (...)
 Yeah, it's just a matter of adding an opclass for it -- pretty simple
(...)
 Right now there are opclasses for the following types:
(...)
 Of course, the real interesting part is adding a completely different
 opclass, such as one that stores bounding boxes.

That was exactly what I was going to ask you regarding support of
minmax (block range) index for GEOMETRY types (as defined in PostGIS):

1. What would be the advantage of such a minmax index over GiST besides size?
2. Are the plans to implement this?
3. If no, how large would you estimate the efforts to implement this
in days for an experienced programmer like you?

Yours, Stefan


2014-08-15 20:16 GMT+02:00 Alvaro Herrera alvhe...@2ndquadrant.com:
 Fujii Masao wrote:

 I've not read the patch yet. But while testing the feature, I found that

 * Brin index cannot be created on CHAR(n) column.
Maybe other data types have the same problem.

 Yeah, it's just a matter of adding an opclass for it -- pretty simple
 stuff really, because you don't need to write any code, just add a bunch
 of catalog entries and an OPCINFO line in mmsortable.c.

 Right now there are opclasses for the following types:

 int4
 numeric
 text
 date
 timestamp with time zone
 timestamp
 time with time zone
 time
 char

 We can eventually extend to cover all types that have btree opclasses,
 but we can do that in a separate commit.  I'm also considering removing
 the opclass for time with time zone, as it's a pretty useless type.  I
 mostly added the ones that are there as a way to test that it behaved
 reasonably in the various cases (pass by val vs. not, variable width vs.
 fixed, different alignment requirements)

 Of course, the real interesting part is adding a completely different
 opclass, such as one that stores bounding boxes.

 * FILLFACTOR cannot be set in brin index.

 I hadn't added this one because I didn't think there was much point
 previously, but I think it might now be useful to allow same-page
 updates.

 --
 Álvaro Herrerahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [Bad Attachment] Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-15 Thread Josh Berkus
On 08/15/2014 01:38 PM, Tom Lane wrote:
 I've been poking at this, and I think the main explanation for your result
 is that with more JSONB documents being subject to compression, we're
 spending more time in pglz_decompress.  There's no free lunch in that
 department: if you want compressed storage it's gonna cost ya to
 decompress.  The only way I can get decompression and TOAST access to not
 dominate the profile on cases of this size is to ALTER COLUMN SET STORAGE
 PLAIN.  However, when I do that, I do see my test patch running about 25%
 slower overall than HEAD on an explain analyze select jfield - 'key'
 from table type of query with 200-key documents with narrow fields (see
 attached perl script that generates the test data).

Ok, that probably falls under the heading of acceptable tradeoffs then.

 Having said all that, I think this test is something of a contrived worst
 case.  More realistic cases are likely to have many fewer keys (so that
 speed of the binary search loop is less of an issue) or else to have total
 document sizes large enough that inline PLAIN storage isn't an option,
 meaning that detoast+decompression costs will dominate.

This was intended to be a worst case.  However, I don't think that it's
the last time we'll see the case of having 100 to 200 keys each with
short values.  That case was actually from some XML data which I'd
already converted into a regular table (hence every row having 183
keys), but if JSONB had been available when I started the project, I
might have chosen to store it as JSONB instead.  It occurs to me that
the matching data from a personals website would very much fit the
pattern of having between 50 and 200 keys, each of which has a short value.

So we don't need to *optimize* for that case, but it also shouldn't be
disastrously slow or 300% of the size of comparable TEXT.  Mind you, I
don't find +80% to be disastrously slow (especially not with a space
savings of 60%), so maybe that's good enough.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strncpy is not a safe version of strcpy

2014-08-15 Thread David Rowley
On Thu, Aug 14, 2014 at 4:13 PM, Noah Misch n...@leadboat.com wrote:

 I share your (Kevin's) discomfort with our use of strlcpy().  I wouldn't
 mind
 someone replacing most strlcpy()/snprintf() calls with calls to wrappers
 that
 ereport(ERROR) on truncation.  Though as reliability problems go, this one
 has
 been minor.


Or maybe it would be better to just remove the restriction and just palloc
something of the correct size?
Although, that sounds like a much larger patch. I'd vote that the strlcpy
should be used in the meantime.

Regards

David Rowley


Re: [HACKERS] jsonb format is pessimal for toast compression

2014-08-15 Thread Tom Lane
Arthur Silva arthur...@gmail.com writes:
 We should add some sort of versionning to the jsonb format. This can be
 explored in the future in many ways.

If we end up making an incompatible change to the jsonb format, I would
support taking the opportunity to stick a version ID in there.  But
I don't want to force a dump/reload cycle *only* to do that.

 As for the current problem, we should explore the directory at the end
 option. It should improve compression and keep good access performance.

Meh.  Pushing the directory to the end is just a band-aid, and since it
would still force a dump/reload, it's not a very enticing band-aid.
The only thing it'd really fix is the first_success_by issue, which
we could fix *without* a dump/reload by using different compression
parameters for jsonb.  Moving the directory to the end, by itself,
does nothing to fix the problem that the directory contents aren't
compressible --- and we now have pretty clear evidence that that is a
significant issue.  (See for instance Josh's results that increasing
first_success_by did very little for the size of his dataset.)

I think the realistic alternatives at this point are either to
switch to all-lengths as in my test patch, or to use the hybrid approach
of Heikki's test patch.  IMO the major attraction of Heikki's patch
is that it'd be upward compatible with existing beta installations,
ie no initdb required (but thus, no opportunity to squeeze in a version
identifier either).  It's not showing up terribly well in the performance
tests I've been doing --- it's about halfway between HEAD and my patch on
that extract-a-key-from-a-PLAIN-stored-column test.  But, just as with my
patch, there are things that could be done to micro-optimize it by
touching a bit more code.

I did some quick stats comparing compressed sizes for the delicio.us
data, printing quartiles as per Josh's lead:

all-lengths {440,569,609,655,1257}
Heikki's patch  {456,582,624,671,1274}
HEAD{493,636,684,744,1485}

(As before, this is pg_column_size of the jsonb within a table whose rows
are wide enough to force tuptoaster.c to try to compress the jsonb;
otherwise many of these values wouldn't get compressed.)  These documents
don't have enough keys to trigger the first_success_by issue, so that
HEAD doesn't look too awful, but still there's about an 11% gain from
switching from offsets to lengths.  Heikki's method captures much of
that but not all.

Personally I'd prefer to go to the all-lengths approach, but a large
part of that comes from a subjective assessment that the hybrid approach
is too messy.  Others might well disagree.

In case anyone else wants to do measurements on some more data sets,
attached is a copy of Heikki's patch updated to apply against git tip.

regards, tom lane

diff --git a/src/backend/utils/adt/jsonb_util.c b/src/backend/utils/adt/jsonb_util.c
index 04f35bf..47b2998 100644
*** a/src/backend/utils/adt/jsonb_util.c
--- b/src/backend/utils/adt/jsonb_util.c
*** convertJsonbArray(StringInfo buffer, JEn
*** 1378,1385 
  	 errmsg(total size of jsonb array elements exceeds the maximum of %u bytes,
  			JENTRY_POSMASK)));
  
! 		if (i  0)
  			meta = (meta  ~JENTRY_POSMASK) | totallen;
  		copyToBuffer(buffer, metaoffset, (char *) meta, sizeof(JEntry));
  		metaoffset += sizeof(JEntry);
  	}
--- 1378,1387 
  	 errmsg(total size of jsonb array elements exceeds the maximum of %u bytes,
  			JENTRY_POSMASK)));
  
! 		if (i % JBE_STORE_LEN_STRIDE == 0)
  			meta = (meta  ~JENTRY_POSMASK) | totallen;
+ 		else
+ 			meta |= JENTRY_HAS_LEN;
  		copyToBuffer(buffer, metaoffset, (char *) meta, sizeof(JEntry));
  		metaoffset += sizeof(JEntry);
  	}
*** convertJsonbObject(StringInfo buffer, JE
*** 1430,1440 
  	 errmsg(total size of jsonb array elements exceeds the maximum of %u bytes,
  			JENTRY_POSMASK)));
  
! 		if (i  0)
  			meta = (meta  ~JENTRY_POSMASK) | totallen;
  		copyToBuffer(buffer, metaoffset, (char *) meta, sizeof(JEntry));
  		metaoffset += sizeof(JEntry);
  
  		convertJsonbValue(buffer, meta, pair-value, level);
  		len = meta  JENTRY_POSMASK;
  		totallen += len;
--- 1432,1445 
  	 errmsg(total size of jsonb array elements exceeds the maximum of %u bytes,
  			JENTRY_POSMASK)));
  
! 		if (i % JBE_STORE_LEN_STRIDE == 0)
  			meta = (meta  ~JENTRY_POSMASK) | totallen;
+ 		else
+ 			meta |= JENTRY_HAS_LEN;
  		copyToBuffer(buffer, metaoffset, (char *) meta, sizeof(JEntry));
  		metaoffset += sizeof(JEntry);
  
+ 		/* put value */
  		convertJsonbValue(buffer, meta, pair-value, level);
  		len = meta  JENTRY_POSMASK;
  		totallen += len;
*** convertJsonbObject(StringInfo buffer, JE
*** 1445,1451 
  	 errmsg(total size of jsonb array elements exceeds the maximum of %u bytes,
  			JENTRY_POSMASK)));
  
! 		meta = (meta  ~JENTRY_POSMASK) | totallen;
  		copyToBuffer(buffer, metaoffset, (char *) meta, 

Re: [HACKERS] Removing dependency to wsock32.lib when compiling code on WIndows

2014-08-15 Thread Noah Misch
On Fri, Aug 15, 2014 at 12:49:36AM -0700, Michael Paquier wrote:
 Btw, how do you determine if MSVC is using HAVE_GETADDRINFO? Is it
 decided by the inclusion of getaddrinfo.c in @pgportfiles of
 Mkvdbuild.pm?

src/include/pg_config.h.win32 dictates it, and we must keep @pgportfiles
synchronized with the former's verdict.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] strncpy is not a safe version of strcpy

2014-08-15 Thread Noah Misch
On Sat, Aug 16, 2014 at 10:38:39AM +1200, David Rowley wrote:
 On Thu, Aug 14, 2014 at 4:13 PM, Noah Misch n...@leadboat.com wrote:
 
  I share your (Kevin's) discomfort with our use of strlcpy().  I wouldn't
  mind
  someone replacing most strlcpy()/snprintf() calls with calls to wrappers
  that
  ereport(ERROR) on truncation.  Though as reliability problems go, this one
  has
  been minor.
 
 
 Or maybe it would be better to just remove the restriction and just palloc
 something of the correct size?
 Although, that sounds like a much larger patch. I'd vote that the strlcpy
 should be used in the meantime.

I agree that, in principle, dynamic allocation might be better still.  I also
agree that it would impose more code churn, for an awfully-narrow benefit.

Barring objections, I will commit your latest patch with some comments about
why truncation is harmless for those two particular calls.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Sample LDIF for pg_service.conf no longer works

2014-08-15 Thread Noah Misch
When using pg_service.conf with LDAP, we document[1] the following sample LDIF
for populating the LDAP server:

version:1
dn:cn=mydatabase,dc=mycompany,dc=com
changetype:add
objectclass:top
objectclass:groupOfUniqueNames
cn:mydatabase
uniqueMember:host=dbserver.mycompany.com
uniqueMember:port=5439
uniqueMember:dbname=mydb
uniqueMember:user=mydb_user
uniqueMember:sslmode=require

That presumably worked at one point, but OpenLDAP 2.4.23 and OpenLDAP 2.4.39
both reject it cryptically:

ldap_add: Invalid syntax (21)
additional info: uniqueMember: value #0 invalid per syntax

uniqueMember is specified to bear a distinguished name.  While OpenLDAP does
not verify that uniqueMember values correspond to known DNs, it does verify
that the value syntactically could be a DN.  To give examples, o=foobar is
always accepted, but xyz=foobar is always rejected: xyz is not an LDAP DN
attribute type.  Amid the LDAP core schema, device is the best-fitting
objectClass having the generality required.  Let's convert to that, as
attached.  I have verified that this works end-to-end.

Thanks,
nm

[1] http://www.postgresql.org/docs/devel/static/libpq-ldap.html

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com
commit 957b58f (HEAD)
Author: Noah Misch n...@leadboat.com
AuthorDate: Mon Aug 4 00:03:21 2014 -0400
Commit: Noah Misch n...@leadboat.com
CommitDate: Mon Aug 4 00:03:21 2014 -0400

Make pg_service.conf sample LDIF more portable.

The aboriginal sample placed connection parameters in
groupOfUniqueNames/uniqueMember.  OpenLDAP, at least as early as version
2.4.23, rejects uniqueMember entries that do not conform to the syntax
for a distinguished name.  Use device/description, which is free-form.
Back-patch to 9.4 for web site visibility.

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index db634a8..ef45fbf 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -7043,17 +7043,17 @@ version:1
 dn:cn=mydatabase,dc=mycompany,dc=com
 changetype:add
 objectclass:top
-objectclass:groupOfUniqueNames
+objectclass:device
 cn:mydatabase
-uniqueMember:host=dbserver.mycompany.com
-uniqueMember:port=5439
-uniqueMember:dbname=mydb
-uniqueMember:user=mydb_user
-uniqueMember:sslmode=require
+description:host=dbserver.mycompany.com
+description:port=5439
+description:dbname=mydb
+description:user=mydb_user
+description:sslmode=require
 /programlisting
might be queried with the following LDAP URL:
 programlisting
-ldap://ldap.mycompany.com/dc=mycompany,dc=com?uniqueMember?one?(cn=mydatabase)
+ldap://ldap.mycompany.com/dc=mycompany,dc=com?description?one?(cn=mydatabase)
 /programlisting
   /para
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers