Re: [HACKERS] pg_receivexlog and replication slots

2014-08-17 Thread Fujii Masao
On Mon, Aug 18, 2014 at 3:48 PM, Fujii Masao  wrote:
> On Mon, Aug 18, 2014 at 2:38 PM, Michael Paquier
>  wrote:
>> On Fri, Aug 15, 2014 at 5:17 PM, Michael Paquier
>>  wrote:
>>> Thanks for your review.
>>>
>>> On Fri, Aug 15, 2014 at 12:56 AM,   wrote:
 At consistency with pg_recvlogical, do you think about --start?
>>> I did not add that for the sake of backward-compatibility as in
>>> pg_recvlogical an action is mandatory. It is not the case now of
>>> pg_receivexlog.
>>>
 [postgres postgresql-6aa6158]$ make > /dev/null
 streamutil.c: In function 'CreateReplicationSlot':
 streamutil.c:244: warning: suggest parentheses around '&&' within '||'
>>> I see. Here is a rebased patch.
>>
>> Looking more at the code, IDENTIFY_SYSTEM management can be unified
>> into a single function. So I have written a newer version of the patch
>> grouping IDENTIFY_SYSTEM call into a single function for both
>> utilities, fixing at the same time a couple of other issues:
>> - correct use of TimelineID in code
>> - IDENTIFY_SYSTEM checks were incorrect (even in HEAD). The following
>> check was done but in 9.4 this command returns 4 fields:
>> (PQntuples(res) != 1 || PQnfields(res) < 3)
>> That's not directly related to this patch, but making some corrections
>> is not going to hurt..
>
> Good catch! I found that libpqwalreceiver.c, etc have the same problem.
> It's better to fix this separately. Patch attached.

BTW, the name of the forth column in IDENTIFY_SYSETM is "dbname". Maybe I don't
like this name because we usually use "datname" as the name of column which
identify the database name. For example, pg_database, pg_stat_activity, etc use
"datname".

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and replication slots

2014-08-17 Thread Fujii Masao
On Mon, Aug 18, 2014 at 2:38 PM, Michael Paquier
 wrote:
> On Fri, Aug 15, 2014 at 5:17 PM, Michael Paquier
>  wrote:
>> Thanks for your review.
>>
>> On Fri, Aug 15, 2014 at 12:56 AM,   wrote:
>>> At consistency with pg_recvlogical, do you think about --start?
>> I did not add that for the sake of backward-compatibility as in
>> pg_recvlogical an action is mandatory. It is not the case now of
>> pg_receivexlog.
>>
>>> [postgres postgresql-6aa6158]$ make > /dev/null
>>> streamutil.c: In function 'CreateReplicationSlot':
>>> streamutil.c:244: warning: suggest parentheses around '&&' within '||'
>> I see. Here is a rebased patch.
>
> Looking more at the code, IDENTIFY_SYSTEM management can be unified
> into a single function. So I have written a newer version of the patch
> grouping IDENTIFY_SYSTEM call into a single function for both
> utilities, fixing at the same time a couple of other issues:
> - correct use of TimelineID in code
> - IDENTIFY_SYSTEM checks were incorrect (even in HEAD). The following
> check was done but in 9.4 this command returns 4 fields:
> (PQntuples(res) != 1 || PQnfields(res) < 3)
> That's not directly related to this patch, but making some corrections
> is not going to hurt..

Good catch! I found that libpqwalreceiver.c, etc have the same problem.
It's better to fix this separately. Patch attached.

Regards,

-- 
Fujii Masao
*** a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
--- b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
***
*** 131,137  libpqrcv_identify_system(TimeLineID *primary_tli)
  		"the primary server: %s",
  		PQerrorMessage(streamConn;
  	}
! 	if (PQnfields(res) < 3 || PQntuples(res) != 1)
  	{
  		int			ntuples = PQntuples(res);
  		int			nfields = PQnfields(res);
--- 131,137 
  		"the primary server: %s",
  		PQerrorMessage(streamConn;
  	}
! 	if (PQnfields(res) < 4 || PQntuples(res) != 1)
  	{
  		int			ntuples = PQntuples(res);
  		int			nfields = PQnfields(res);
***
*** 140,146  libpqrcv_identify_system(TimeLineID *primary_tli)
  		ereport(ERROR,
  (errmsg("invalid response from primary server"),
   errdetail("Could not identify system: got %d rows and %d fields, expected %d rows and %d or more fields.",
! 		   ntuples, nfields, 3, 1)));
  	}
  	primary_sysid = PQgetvalue(res, 0, 0);
  	*primary_tli = pg_atoi(PQgetvalue(res, 0, 1), 4, 0);
--- 140,146 
  		ereport(ERROR,
  (errmsg("invalid response from primary server"),
   errdetail("Could not identify system: got %d rows and %d fields, expected %d rows and %d or more fields.",
! 		   ntuples, nfields, 4, 1)));
  	}
  	primary_sysid = PQgetvalue(res, 0, 0);
  	*primary_tli = pg_atoi(PQgetvalue(res, 0, 1), 4, 0);
*** a/src/bin/pg_basebackup/pg_basebackup.c
--- b/src/bin/pg_basebackup/pg_basebackup.c
***
*** 1644,1654  BaseBackup(void)
  progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
  		disconnect_and_exit(1);
  	}
! 	if (PQntuples(res) != 1 || PQnfields(res) < 3)
  	{
  		fprintf(stderr,
  _("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d or more fields\n"),
! progname, PQntuples(res), PQnfields(res), 1, 3);
  		disconnect_and_exit(1);
  	}
  	sysidentifier = pg_strdup(PQgetvalue(res, 0, 0));
--- 1644,1654 
  progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
  		disconnect_and_exit(1);
  	}
! 	if (PQntuples(res) != 1 || PQnfields(res) < 4)
  	{
  		fprintf(stderr,
  _("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d or more fields\n"),
! progname, PQntuples(res), PQnfields(res), 1, 4);
  		disconnect_and_exit(1);
  	}
  	sysidentifier = pg_strdup(PQgetvalue(res, 0, 0));
*** a/src/bin/pg_basebackup/pg_receivexlog.c
--- b/src/bin/pg_basebackup/pg_receivexlog.c
***
*** 290,300  StreamLog(void)
  progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
  		disconnect_and_exit(1);
  	}
! 	if (PQntuples(res) != 1 || PQnfields(res) < 3)
  	{
  		fprintf(stderr,
  _("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d or more fields\n"),
! progname, PQntuples(res), PQnfields(res), 1, 3);
  		disconnect_and_exit(1);
  	}
  	servertli = atoi(PQgetvalue(res, 0, 1));
--- 290,300 
  progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
  		disconnect_and_exit(1);
  	}
! 	if (PQntuples(res) != 1 || PQnfields(res) < 4)
  	{
  		fprintf(stderr,
  _("%s: could not identify system: got %d rows and %d fields, expected %d rows and %d or more fields\n"),
! progname, PQntuples(res), PQnfields(res), 1, 4);
  		disconnect_and_exit(1);
  	}
  	servertli = atoi(PQgetvalue(res, 0, 1));
*** a/src/bin/pg_basebackup/receivelog.c
--- b/src/bin/pg_basebackup/receivelog.c
***
*** 499,509  ReceiveXlogStream(PGconn *conn, XLogRecPtr startpos, uint32 timeline,
  			PQclear(res);
  			return false;
  		}
! 		if (PQntuples(res) != 1 

Re: [HACKERS] Index-only scans for GIST

2014-08-17 Thread Anastasia Lubennikova
Updated patch
* Compiler, merge and regression fails checked
* Regression tests was impoved
* GiST and amcanreturn docs updated
-- 
Best regards,
Lubennikova Anastasia


indexonlyscan_gist2.patch
Description: Binary data


indexonlyscan_gist_docs.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal for 9.5: monitoring lock time for slow queries

2014-08-17 Thread Pavel Stehule
2014-08-18 7:42 GMT+02:00 Alvaro Herrera :

> Pavel Stehule wrote:
> > 2014-08-13 15:22 GMT+02:00 MauMau :
>
> > > I didn't mean performance statistics data to be stored in database
> tables.
> > > I just meant:
> > >
> > > * pg_stat_system_events is a view to show data on memory, which returns
> > > one row for each event across the system.  This is similar to
> > > V$SYSTEM_EVENT in Oracle.
> > >
> > > * pg_stat_session_events is a view to show data on memory, which
> returns
> > > one row for each event on one session.  This is similar to
> V$SESSION_EVENT
> > > in Oracle.
> > >
> > > * The above views represent the current accumulated data like other
> > > pg_stat_xxx views.
> > >
> > > * EXPLAIN ANALYZE and auto_explain shows all events for one query.  The
> > > lock waits you are trying to record in the server log is one of the
> events.
> >
> > I am little bit sceptic about only memory based structure. Is it this
> > concept acceptable for commiters?
>
> Is this supposed to be session-local data, or is it visible from remote
> sessions too?  How durable is it supposed to be?  Keep in mind that in
> case of a crash, all pgstats data is erased.
>

surely it should be visible from all sessions and least 48 hours.

I have no problem with cleaning pgstats after crash -  it is cost related
to minimal overhead. And on server related hw there are (should be)  a
minimal number of crash.

Regards

Pavel


>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] delta relations in AFTER triggers

2014-08-17 Thread Amit Khandekar
On 15 August 2014 04:04, Kevin Grittner  wrote:
> Amit Khandekar  wrote:
>
 The execution level itself was almost trivial; it's getting the
 tuplestore reference through the parse analysis and planning
 phases that is painful for me.
>>> I am not sure why you think we would need to refer the
>>> tuplestore in the parse analysis and planner phases. It seems
>>> that we would need them only in execution phase. Or may be I
>>> didn't understand your point.
>> Ah I think I understand now. That might be because you are
>> thinking of having an infrastructure common to triggers and
>> materialized views, right ?
>
> Well, it's more immediate than that.  The identifiers in the
> trigger function are not resolved to particular objects until there
> is a request to fire the trigger.
Ah ok, you are talking about changes specific to the PL language
handlers. Yes, I agree that in the plpgsql parser (and in any PL
handler), we need to parse such table references in the SQL construct,
and transform it into something else.
> At that time the parse analysis
> needs to find the name defined somewhere.  It's not defined in the
> catalogs like a table or view, and it's not defined in the query
> itself like a CTE or VALUES clause.  The names specified in trigger
> creation must be recognized as needing to resolve to the new
> TuplestoreScan, and it needs to match those to the tuplestores
> themselves.
One approach that comes to my mind is by transforming such transition
table references into a RangeFunction reference while in plpgsql
parser/lexer. This RangeFunction would point to a set-returning
catalog function that would return rows from the delta tuplestore. So
the tuplestore itself would remain a blackbox. Once we have such a
function, any language handler can re-use the same interface.

> Row counts, costing, etc. needs to be provided so the
> optimizer can pick a good plan in what might be a complex query
> with many options.
I am also not sure about the costing, but I guess it may be possible
to supply some costs to the FunctionScan plan node.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \watch versus \timing

2014-08-17 Thread Michael Paquier
On Thu, Aug 14, 2014 at 11:10 PM, Fujii Masao  wrote:
> Attached patch changes \watch so that it displays how long the query takes
> if \timing is enabled.
>
> I didn't refactor PSQLexec and SendQuery into one routine because
> the contents of those functions are not so same. I'm not sure how much
> it's worth doing that refactoring. Anyway this feature is quite useful
> even without that refactoring, I think.

The patch applies correctly and it does correctly what it is made for:
=# \timing
Timing is on.
=# select 1;
 ?column?
--
1
(1 row)
Time: 0.407 ms
=# \watch 1
Watch every 1sMon Aug 18 15:17:41 2014
 ?column?
--
1
(1 row)
Time: 0.397 ms
Watch every 1sMon Aug 18 15:17:42 2014
 ?column?
--
1
(1 row)
Time: 0.615 ms

Refactoring it would be worth it thinking long-term... And printing
the timing in PSQLexec code path is already done in SendQuery, so
that's doing two times the same thing IMHO.

Now, looking at the patch, introducing the new function
PSQLexecInternal with an additional parameter to control the timing is
correct choosing the non-refactoring way of doing. But I don't think
that printing the time outside PSQLexecInternal is consistent with
SendQuery. Why not simply control the timing with a boolean flag and
print the timing directly in PSQLexecInternal?
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal for 9.5: monitoring lock time for slow queries

2014-08-17 Thread Alvaro Herrera
Pavel Stehule wrote:
> 2014-08-13 15:22 GMT+02:00 MauMau :

> > I didn't mean performance statistics data to be stored in database tables.
> > I just meant:
> >
> > * pg_stat_system_events is a view to show data on memory, which returns
> > one row for each event across the system.  This is similar to
> > V$SYSTEM_EVENT in Oracle.
> >
> > * pg_stat_session_events is a view to show data on memory, which returns
> > one row for each event on one session.  This is similar to V$SESSION_EVENT
> > in Oracle.
> >
> > * The above views represent the current accumulated data like other
> > pg_stat_xxx views.
> >
> > * EXPLAIN ANALYZE and auto_explain shows all events for one query.  The
> > lock waits you are trying to record in the server log is one of the events.
> 
> I am little bit sceptic about only memory based structure. Is it this
> concept acceptable for commiters?

Is this supposed to be session-local data, or is it visible from remote
sessions too?  How durable is it supposed to be?  Keep in mind that in
case of a crash, all pgstats data is erased.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_receivexlog and replication slots

2014-08-17 Thread Michael Paquier
On Fri, Aug 15, 2014 at 5:17 PM, Michael Paquier
 wrote:
> Thanks for your review.
>
> On Fri, Aug 15, 2014 at 12:56 AM,   wrote:
>> At consistency with pg_recvlogical, do you think about --start?
> I did not add that for the sake of backward-compatibility as in
> pg_recvlogical an action is mandatory. It is not the case now of
> pg_receivexlog.
>
>> [postgres postgresql-6aa6158]$ make > /dev/null
>> streamutil.c: In function 'CreateReplicationSlot':
>> streamutil.c:244: warning: suggest parentheses around '&&' within '||'
> I see. Here is a rebased patch.

Looking more at the code, IDENTIFY_SYSTEM management can be unified
into a single function. So I have written a newer version of the patch
grouping IDENTIFY_SYSTEM call into a single function for both
utilities, fixing at the same time a couple of other issues:
- correct use of TimelineID in code
- IDENTIFY_SYSTEM checks were incorrect (even in HEAD). The following
check was done but in 9.4 this command returns 4 fields:
(PQntuples(res) != 1 || PQnfields(res) < 3)
That's not directly related to this patch, but making some corrections
is not going to hurt..

Regards,
-- 
Michael
From 160b00e595dab739aa4da7d18a81aff38d0a837f Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Mon, 18 Aug 2014 14:32:44 +0900
Subject: [PATCH] Add support for physical slot creation/deletion in
 pg_receivexlog

Physical slot creation can be done with --create and drop with --drop.
In both cases --slot is needed. Code for replication slot creation and
drop is refactored with what was available in pg_recvlogical, the same
applied with IDENTIFY_SYSTEM to simplify the whole set.
---
 doc/src/sgml/ref/pg_receivexlog.sgml   |  29 +++
 src/bin/pg_basebackup/pg_receivexlog.c | 108 -
 src/bin/pg_basebackup/pg_recvlogical.c | 119 +---
 src/bin/pg_basebackup/streamutil.c | 140 +
 src/bin/pg_basebackup/streamutil.h |   9 +++
 5 files changed, 265 insertions(+), 140 deletions(-)

diff --git a/doc/src/sgml/ref/pg_receivexlog.sgml b/doc/src/sgml/ref/pg_receivexlog.sgml
index 5916b8f..7e46005 100644
--- a/doc/src/sgml/ref/pg_receivexlog.sgml
+++ b/doc/src/sgml/ref/pg_receivexlog.sgml
@@ -72,6 +72,35 @@ PostgreSQL documentation
   Options
 

+pg_receivexlog can run in one of two following
+modes, which control physical replication slot:
+
+
+
+ 
+  --create
+  
+   
+Create a new physical replication slot with the name specified in
+--slot, then exit.
+   
+  
+ 
+
+ 
+  --drop
+  
+   
+Drop the replication slot with the name specified
+in --slot, then exit.
+   
+  
+ 
+
+
+   
+
+   
 The following command-line options control the location and format of the
 output.
 
diff --git a/src/bin/pg_basebackup/pg_receivexlog.c b/src/bin/pg_basebackup/pg_receivexlog.c
index a8b9ad3..a56cd9e 100644
--- a/src/bin/pg_basebackup/pg_receivexlog.c
+++ b/src/bin/pg_basebackup/pg_receivexlog.c
@@ -38,6 +38,8 @@ static int	noloop = 0;
 static int	standby_message_timeout = 10 * 1000;		/* 10 sec = default */
 static int	fsync_interval = 0; /* 0 = default */
 static volatile bool time_to_abort = false;
+static bool do_create_slot = false;
+static bool do_drop_slot = false;
 
 
 static void usage(void);
@@ -78,6 +80,9 @@ usage(void)
 	printf(_("  -w, --no-password  never prompt for password\n"));
 	printf(_("  -W, --password force password prompt (should happen automatically)\n"));
 	printf(_("  -S, --slot=SLOTNAMEreplication slot to use\n"));
+	printf(_("\nOptional actions:\n"));
+	printf(_("  --create   create a new replication slot (for the slot's name see --slot)\n"));
+	printf(_("  --drop drop the replication slot (for the slot's name see --slot)\n"));
 	printf(_("\nReport bugs to .\n"));
 }
 
@@ -253,21 +258,10 @@ FindStreamingStart(uint32 *tli)
 static void
 StreamLog(void)
 {
-	PGresult   *res;
 	XLogRecPtr	startpos;
-	uint32		starttli;
+	TimeLineID	starttli;
 	XLogRecPtr	serverpos;
-	uint32		servertli;
-	uint32		hi,
-lo;
-
-	/*
-	 * Connect in replication mode to the server
-	 */
-	conn = GetConnection();
-	if (!conn)
-		/* Error message already written in GetConnection() */
-		return;
+	TimeLineID	servertli;
 
 	if (!CheckServerVersionForStreaming(conn))
 	{
@@ -280,33 +274,11 @@ StreamLog(void)
 	}
 
 	/*
-	 * Run IDENTIFY_SYSTEM so we can get the timeline and current xlog
-	 * position.
+	 * Identify server, obtaining start LSN position and current timeline ID
+	 * at the same time.
 	 */
-	res = PQexec(conn, "IDENTIFY_SYSTEM");
-	if (PQresultStatus(res) != PGRES_TUPLES_OK)
-	{
-		fprintf(stderr, _("%s: could not send replication command \"%s\": %s"),
-progname, "IDENTIFY_SYSTEM", PQerrorMessage(conn));
-		disconnect_and_exit(1);
-	}
-	if (PQntuples(res) != 1 || PQnfields(res) < 3)
-	{
-		fprintf(stderr,
-_("%s: could not identify system:

Re: [HACKERS] [PATCH] Incremental backup: add backup profile to base backup

2014-08-17 Thread Alvaro Herrera
Marco Nenciarini wrote:

> To calculate the md5 checksum I've used the md5 code present in pgcrypto
> contrib as the code in src/include/libpq/md5.h is not suitable for large
> files. Since a core feature cannot depend on a piece of contrib, I've
> moved the files
> 
> contrib/pgcrypto/md5.c
> contrib/pgcrypto/md5.h
> 
> to
> 
> src/backend/utils/hash/md5.c
> src/include/utils/md5.h
> 
> changing the pgcrypto extension to use them.

We already have the FNV checksum implementation in the backend -- can't
we use that one for this and avoid messing with MD5?

(I don't think we're looking for a cryptographic hash here.  Am I wrong?)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-08-17 Thread Fujii Masao
On Sat, Aug 16, 2014 at 6:51 PM, Rahila Syed  wrote:
>>So, you're compressing backup blocks one by one. I wonder if that's the
>>right idea and if we shouldn't instead compress all of them in one run to
>>increase the compression ratio
>
> Please find attached patch for compression of all blocks of a record
> together .
>
> Following are the measurement results:
>
>
> Benchmark:
>
> Scale : 16
> Command  :java JR  /home/postgres/jdbcrunner-1.2/scripts/tpcc.js  -sleepTime
> 550,250,250,200,200
>
> Warmup time  : 1 sec
> Measurement time : 900 sec
> Number of tx types   : 5
> Number of agents : 16
> Connection pool size : 16
> Statement cache size : 40
> Auto commit  : false
>
>
> Checkpoint segments:1024
> Checkpoint timeout:5 mins
>
>
>
>
> Compression   Multiple
> Blocks in one runSingle Block in one run
>
>   Bytes saved
> 0   0
>
>
>
> OFF   WAL generated
> 1265150984(~1265MB)   1264771760(~1265MB)
>
>
>
>  % Compression
> NA  NA
>
>
>
>
>   Bytes saved
> 215215079 (~215MB)   285675622 (~286MB)
>
>
>
> LZ4   WAL generated
> 125118783(~1251MB)1329031918(~1329MB)
>
>
>
>  % Compression  17.2
> % 21.49 %
>
>
>
>
>  Bytes saved
> 203705959 (~204MB)  271009408 (~271MB)
>
>
>
> Snappy   WAL generated
> 1254505415(~1254MB)  1329628352(~1330MB)
>
>
>
>  % Compression16.23
> %   20.38%
>
>
>
>
>  Bytes saved
> 155910177(~156MB)   182804997(~182MB)
>
>
>
> pglz WAL generated
> 1259773129(~1260MB) 1286670317(~1287MB)
>
>
>
>  % Compression12.37%
> 14.21%
>
>
>
>
>
> As per measurement results of this benchmark, compression of multiple blocks
> didn't improve compression ratio over compression  of single block.

According to the measurement result, the amount of WAL generated in
"Multiple Blocks in one run" than that in "Single Block in one run".
So ISTM that compression of multiple blocks at one run can improve
the compression ratio. Am I missing something?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] option -T in pg_basebackup doesn't work on windows

2014-08-17 Thread Michael Paquier
On Mon, Aug 18, 2014 at 11:51 AM, Amit Kapila  wrote:
> On Mon, Aug 18, 2014 at 7:08 AM, Michael Paquier 
> wrote:
>>
>> On Mon, Aug 18, 2014 at 9:37 AM, Peter Eisentraut  wrote:
>> > It's not ready for committer if the current patch does not apply.
>> FWIW, the latest version sent by Amit here applies correctly:
>>
>> http://www.postgresql.org/message-id/caa4ek1+cc9rb1s9q4+nsokfas1yufvfgfxuhxy_6wlbq1re...@mail.gmail.com
>> I haven't tested it myself though.
>>
>> However, independently on this patch and as pointed by MauMau, the
>> code that has been committed in fb05f3c is incorrect in the way it
>> defines the tablespace path, this:
>> psprintf("%s/pg_tblspc/%d", basedir, oid);
>> should be this:
>> psprintf("%s/pg_tblspc/%u", basedir, oid);
>> I am separating this fix (that should be backpatched to REL9_4_STABLE
>> as well), in the patch attached if this helps.
>
> I think the patch provided by me needs to be back patched to
> 9.4 as this is a new option introduced in 9.4 which is not working
> on windows.
Yep. Definitely. This will fix both issues at the same time.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not ISO 8601 format for date values rendered into JSON?

2014-08-17 Thread Andrew Dunstan


On 08/17/2014 10:50 PM, Andrew Dunstan wrote:


On 08/17/2014 09:53 PM, Tom Lane wrote:


OK.  I think I can fix it, if you don't have time.





[offlist]

Thanks. FYI I am still recovering from treatment for prostate cancer I 
had not long after pgcon ... it's taken more out of me that I 
expected, so time is limited.






Darn it. well, I guess everyone knows now.  *sigh*

cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] option -T in pg_basebackup doesn't work on windows

2014-08-17 Thread Amit Kapila
On Mon, Aug 18, 2014 at 7:08 AM, Michael Paquier 
wrote:
>
> On Mon, Aug 18, 2014 at 9:37 AM, Peter Eisentraut  wrote:
> > It's not ready for committer if the current patch does not apply.
> FWIW, the latest version sent by Amit here applies correctly:
>
http://www.postgresql.org/message-id/caa4ek1+cc9rb1s9q4+nsokfas1yufvfgfxuhxy_6wlbq1re...@mail.gmail.com
> I haven't tested it myself though.
>
> However, independently on this patch and as pointed by MauMau, the
> code that has been committed in fb05f3c is incorrect in the way it
> defines the tablespace path, this:
> psprintf("%s/pg_tblspc/%d", basedir, oid);
> should be this:
> psprintf("%s/pg_tblspc/%u", basedir, oid);
> I am separating this fix (that should be backpatched to REL9_4_STABLE
> as well), in the patch attached if this helps.

I think the patch provided by me needs to be back patched to
9.4 as this is a new option introduced in 9.4 which is not working
on windows.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Why not ISO 8601 format for date values rendered into JSON?

2014-08-17 Thread Andrew Dunstan


On 08/17/2014 09:53 PM, Tom Lane wrote:


OK.  I think I can fix it, if you don't have time.





[offlist]

Thanks. FYI I am still recovering from treatment for prostate cancer I 
had not long after pgcon ... it's taken more out of me that I expected, 
so time is limited.


cheers

andrew




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-08-17 Thread Fujii Masao
On Sat, Aug 16, 2014 at 4:23 PM, Sawada Masahiko  wrote:
> On Fri, Aug 8, 2014 at 11:45 PM, Fujii Masao  wrote:
>> On Fri, Aug 8, 2014 at 11:43 PM, Fujii Masao  wrote:
>>> On Mon, Apr 14, 2014 at 11:40 PM, Tom Lane  wrote:
 Fujii Masao  writes:
> On Tue, Apr 1, 2014 at 1:41 AM, Robert Haas  wrote:
>> Should we try to install some hack around fastupdate for 9.4?  I fear
>> the divergence between reasonable values of work_mem and reasonable
>> sizes for that list is only going to continue to get bigger.  I'm sure
>> there's somebody out there who has work_mem = 16GB, and stuff like
>> 263865a48973767ce8ed7b7788059a38a24a9f37 is only going to increase the
>> appeal of large values.

> Controlling the threshold of the size of pending list only by GUC doesn't
> seem reasonable. Users may want to increase the threshold only for the
> GIN index which can be updated heavily, and decrease it otherwise. So
> I think that it's better to add new storage parameter for GIN index to 
> control
> the threshold, or both storage parameter and GUC.

 Yeah, -1 for a GUC.  A GIN-index-specific storage parameter seems more
 appropriate.
>>>
>>> The attached patch introduces the GIN index storage parameter
>>> "PENDING_LIST_CLEANUP_SIZE" which specifies the maximum size of
>>> GIN pending list. If it's not set, work_mem is used as that maximum size,
>>> instead. So this patch doesn't break the existing application which
>>> currently uses work_mem as the threshold of cleanup operation of
>>> the pending list. It has only not to set PENDING_LIST_CLEANUP_SIZE.
>>>
>>> This is an index storage parameter, and allows us to specify each
>>> threshold per GIN index. So, for example, we can increase the threshold
>>> only for the GIN index which can be updated heavily, and decrease it 
>>> otherwise.
>>>
>>> This patch uses another patch that I proposed (*1) as an infrastructure.
>>> Please apply that infrastructure patch first if you apply this patch.
>>>
>>> (*1)
>>> http://www.postgresql.org/message-id/CAHGQGwEanQ_e8WLHL25=bm_8z5zkyzw0k0yir+kdmv2hgne...@mail.gmail.com
>>>
>>> Regards,
>>>
>>> --
>>> Fujii Masao
>>
>> Sorry, I forgot to attached the patch This time, attached.
>>
>
> I think that this patch should be rebased.
> It contains garbage code.

Thanks for reviewing the patch! ISTM that I failed to make the patch from
my git repository... Attached is the rebased version.

Regards,

-- 
Fujii Masao
diff --git a/doc/src/sgml/gin.sgml b/doc/src/sgml/gin.sgml
index 80a578d..24c8dc1 100644
--- a/doc/src/sgml/gin.sgml
+++ b/doc/src/sgml/gin.sgml
@@ -728,8 +728,9 @@
from the indexed item). As of PostgreSQL 8.4,
GIN is capable of postponing much of this work by inserting
new tuples into a temporary, unsorted list of pending entries.
-   When the table is vacuumed, or if the pending list becomes too large
-   (larger than ), the entries are moved to the
+   When the table is vacuumed, or if the pending list becomes larger than
+   PENDING_LIST_CLEANUP_SIZE (or
+if not set), the entries are moved to the
main GIN data structure using the same bulk insert
techniques used during initial index creation.  This greatly improves
GIN index update speed, even counting the additional
@@ -812,18 +813,27 @@
   
 
   
-   
+   PENDING_LIST_CLEANUP_SIZE and
+   

 
  During a series of insertions into an existing GIN
  index that has FASTUPDATE enabled, the system will clean up
  the pending-entry list whenever the list grows larger than
- work_mem.  To avoid fluctuations in observed response time,
- it's desirable to have pending-list cleanup occur in the background
- (i.e., via autovacuum).  Foreground cleanup operations can be avoided by
- increasing work_mem or making autovacuum more aggressive.
- However, enlarging work_mem means that if a foreground
- cleanup does occur, it will take even longer.
+ PENDING_LIST_CLEANUP_SIZE (if not set, work_mem
+ is used as that threshold, instead). To avoid fluctuations in observed
+ response time, it's desirable to have pending-list cleanup occur in the
+ background (i.e., via autovacuum).  Foreground cleanup operations
+ can be avoided by increasing PENDING_LIST_CLEANUP_SIZE
+ (or work_mem) or making autovacuum more aggressive.
+ However, enlarging the threshold of the cleanup operation means that
+ if a foreground cleanup does occur, it will take even longer.
+
+
+ PENDING_LIST_CLEANUP_SIZE is an index storage parameter,
+ and allows each GIN index to have its own cleanup threshold.
+ For example, it's possible to increase the threshold only for the GIN
+ index which can be updated heavily, and decrease it otherwise.
 

   
diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml
index e469b17..1589812 100644
--- a/doc/src/sgml/ref/create_index.sgml
+++ b/doc/src/sgml/re

Re: [HACKERS] replication commands and index terms

2014-08-17 Thread Fujii Masao
On Fri, Aug 15, 2014 at 11:25 AM, Fujii Masao  wrote:
> On Fri, Aug 15, 2014 at 4:59 AM, Robert Haas  wrote:
>> On Thu, Aug 14, 2014 at 12:59 PM, Fujii Masao  wrote:
>>> Since I sometimes try to search the replication command in the index,
>>> I'd feel inclined to expose all those commands as index terms...
>>
>> +1.
>
> Attached patch exposes replication commands as index terms.
> Barring any objection, I will commit this.

Done.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not ISO 8601 format for date values rendered into JSON?

2014-08-17 Thread Tom Lane
Andrew Dunstan  writes:
> On 08/17/2014 08:42 PM, Tom Lane wrote:
>> I was just going over the release notes, and noticed the bit about
>> timestamp and timestamptz now being rendered in a fixed ISO-8601-compliant
>> format rather than whatever random DateStyle is in use.  That's fine,
>> but I wonder why the same approach wasn't applied to type date?

> But yes, I agree it should be fixed. Whatever we output should be 
> suitable as input for the string-argument constructor of class Date.

OK.  I think I can fix it, if you don't have time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not ISO 8601 format for date values rendered into JSON?

2014-08-17 Thread Andrew Dunstan


On 08/17/2014 08:42 PM, Tom Lane wrote:

I was just going over the release notes, and noticed the bit about
timestamp and timestamptz now being rendered in a fixed ISO-8601-compliant
format rather than whatever random DateStyle is in use.  That's fine,
but I wonder why the same approach wasn't applied to type date?

regression=# set datestyle to postgres;
SET

regression=# select row_to_json(row(now()));
 row_to_json
---
  {"f1":"2014-08-17T20:34:54.424237-04:00"}
(1 row)

regression=# select row_to_json(row(current_date));
  row_to_json
-
  {"f1":"08-17-2014"}
(1 row)

Doesn't seem real consistent ...





Good point. Probably because I didn't get a complaint about it, which in 
turn is probably because JavaScript's builtin Date class is in fact 
(from our POV) more or less a timestamp(tz) type.


See 



But yes, I agree it should be fixed. Whatever we output should be 
suitable as input for the string-argument constructor of class Date.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] option -T in pg_basebackup doesn't work on windows

2014-08-17 Thread Michael Paquier
On Mon, Aug 18, 2014 at 9:37 AM, Peter Eisentraut  wrote:
> It's not ready for committer if the current patch does not apply.
FWIW, the latest version sent by Amit here applies correctly:
http://www.postgresql.org/message-id/caa4ek1+cc9rb1s9q4+nsokfas1yufvfgfxuhxy_6wlbq1re...@mail.gmail.com
I haven't tested it myself though.

However, independently on this patch and as pointed by MauMau, the
code that has been committed in fb05f3c is incorrect in the way it
defines the tablespace path, this:
psprintf("%s/pg_tblspc/%d", basedir, oid);
should be this:
psprintf("%s/pg_tblspc/%u", basedir, oid);
I am separating this fix (that should be backpatched to REL9_4_STABLE
as well), in the patch attached if this helps.
Regards,
-- 
Michael
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 3d26e22..ef7cd94 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1119,7 +1119,7 @@ update_tablespace_symlink(Oid oid, const char *old_dir)
 
 	if (strcmp(old_dir, new_dir) != 0)
 	{
-		char	   *linkloc = psprintf("%s/pg_tblspc/%d", basedir, oid);
+		char	   *linkloc = psprintf("%s/pg_tblspc/%u", basedir, oid);
 
 		if (unlink(linkloc) != 0 && errno != ENOENT)
 		{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Why not ISO 8601 format for date values rendered into JSON?

2014-08-17 Thread Tom Lane
I was just going over the release notes, and noticed the bit about
timestamp and timestamptz now being rendered in a fixed ISO-8601-compliant
format rather than whatever random DateStyle is in use.  That's fine,
but I wonder why the same approach wasn't applied to type date?

regression=# set datestyle to postgres;
SET

regression=# select row_to_json(row(now()));
row_to_json
---
 {"f1":"2014-08-17T20:34:54.424237-04:00"}
(1 row)

regression=# select row_to_json(row(current_date));
 row_to_json 
-
 {"f1":"08-17-2014"}
(1 row)

Doesn't seem real consistent ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] option -T in pg_basebackup doesn't work on windows

2014-08-17 Thread Peter Eisentraut
On 8/16/14 8:46 AM, Amit Kapila wrote:
> On Fri, Aug 15, 2014 at 1:03 PM, MauMau  > wrote:
>>
>> Thank you.  The code looks correct.  I confirmed that the
> pg_basebackup could relocate the tablespace directory on Windows.
>>
>> I marked this patch as ready for committer.
> 
> Thanks for the review.

It's not ready for committer if the current patch does not apply.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak

2014-08-17 Thread Peter Eisentraut
On 8/17/14 5:19 PM, Stephen Frost wrote:
> Alvaro, all,
> 
> * Stephen Frost (sfr...@snowman.net) wrote:
>> As mentioned, I'll add this to the ALTER TABLE documentation and remove
>> it from the TABLESPACE docs.  That's not done yet but I should have time
>> in the next few days to get that done also and will then commit it all
>> to master and back-patch to 9.4, barring objections.
> 
> Here's a first pass with the documentation changes included.  Feedback
> welcome, and I'll review it again and plan to commit it on Tuesday.

One thing that is not clear from this (before or after) is whether "all
{tables|indexes|etc} in the tablespace" actually means "in this
tablespace and in the current database".  Presumably it does.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] Incremental backup: add backup profile to base backup

2014-08-17 Thread Marco Nenciarini
Hi Hackers,

This is the first piece of file level incremental backup support, as
described on wiki page https://wiki.postgresql.org/wiki/Incremental_backup

It is not yet complete, but I wish to share it on the list to receive
comments and suggestions.

The point of the patch is adding an option to pg_basebackup and
replication protocol BASE_BACKUP command to generate a backup_profile file.

When taking a full backup with pg_basebackup, the user can request
Postgres to generate a backup_profile file through the --profile option
(-B short option, which I've arbitrarily picked up because both -P and
-p was already taken)

At the moment the backup profile consists of a file with one line per
file detailing modification time, md5, size, tablespace and path
relative to tablespace root (PGDATA or the tablespace)

To calculate the md5 checksum I've used the md5 code present in pgcrypto
contrib as the code in src/include/libpq/md5.h is not suitable for large
files. Since a core feature cannot depend on a piece of contrib, I've
moved the files

contrib/pgcrypto/md5.c
contrib/pgcrypto/md5.h

to

src/backend/utils/hash/md5.c
src/include/utils/md5.h

changing the pgcrypto extension to use them.

There are still some TODOs:

* User documentation

* Remove the pg_basebackup code duplication I've introduced with the
ReceiveAndUnpackTarFileToDir function, which is almost the same of
ReceiveAndUnpackTarFile but does not expect to handle a tablespace. It
instead simply extract a tar stream in a destination directory. The
latter could probably be rewritten using the former as component, but it
needs some adjustment to the "progress reporting" part, which is not
present at the moment in ReceiveAndUnpackTarFileToDir.

* Add header section to backup_profile file which at the moment contains
only the body part. I'm thinking to change the original design and put
the whole backup label as header, which is IMHO clearer and well known.
I would use something like:

START WAL LOCATION: 0/E28 (file 0001000E)
CHECKPOINT LOCATION: 0/E60
BACKUP METHOD: streamed
BACKUP FROM: master
START TIME: 2014-08-14 18:54:01 CEST
LABEL: pg_basebackup base backup
END LABEL

I've attached the current patch based on master.

Any comment will be appreciated.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
diff --git a/contrib/pgcrypto/Makefile b/contrib/pgcrypto/Makefile
index b6c9484..daf2ba3 100644
*** a/contrib/pgcrypto/Makefile
--- b/contrib/pgcrypto/Makefile
***
*** 1,6 
  # contrib/pgcrypto/Makefile
  
! INT_SRCS = md5.c sha1.c sha2.c internal.c internal-sha2.c blf.c rijndael.c \
fortuna.c random.c pgp-mpi-internal.c imath.c
  INT_TESTS = sha2
  
--- 1,6 
  # contrib/pgcrypto/Makefile
  
! INT_SRCS = sha1.c sha2.c internal.c internal-sha2.c blf.c rijndael.c \
fortuna.c random.c pgp-mpi-internal.c imath.c
  INT_TESTS = sha2
  
diff --git a/contrib/pgcrypto/internal.c b/contrib/pgcrypto/internal.c
index cb8ba26..0d2db23 100644
*** a/contrib/pgcrypto/internal.c
--- b/contrib/pgcrypto/internal.c
***
*** 30,40 
   */
  
  #include "postgres.h"
  
  #include 
  
  #include "px.h"
- #include "md5.h"
  #include "sha1.h"
  #include "blf.h"
  #include "rijndael.h"
--- 30,40 
   */
  
  #include "postgres.h"
+ #include "utils/md5.h"
  
  #include 
  
  #include "px.h"
  #include "sha1.h"
  #include "blf.h"
  #include "rijndael.h"
diff --git a/contrib/pgcrypto/md5.c b/contrib/pgcrypto/md5.c
index cac4e40..e69de29 .
*** a/contrib/pgcrypto/md5.c
--- b/contrib/pgcrypto/md5.c
***
*** 1,397 
- /*   $KAME: md5.c,v 1.3 2000/02/22 14:01:17 itojun Exp $ */
- 
- /*
-  * Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project.
-  * All rights reserved.
-  *
-  * Redistribution and use in source and binary forms, with or without
-  * modification, are permitted provided that the following conditions
-  * are met:
-  * 1. Redistributions of source code must retain the above copyright
-  *  notice, this list of conditions and the following disclaimer.
-  * 2. Redistributions in binary form must reproduce the above copyright
-  *  notice, this list of conditions and the following disclaimer in the
-  *  documentation and/or other materials provided with the distribution.
-  * 3. Neither the name of the project nor the names of its contributors
-  *  may be used to endorse or promote products derived from this software
-  *  without specific prior written permission.
-  *
-  * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
-  * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
-  * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
-  * ARE DISCLAIMED.  IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
-  * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,

Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak

2014-08-17 Thread Greg Stark
On 15 Aug 2014 19:06, "Robert Haas"  wrote:
>
> > As for the expanded-mode changes, I thought there was consensus to
> > revert that from 9.4.
>
> Me too.  In fact, I think that's been the consensus for many months,
> but unless I'm mistaken it ain't done.

Yeah, this is entirely my fault. I was traveling, busy, not feeling well,
in various permutations but I should have gotten to it earlier.

I'll take care of it tomorrow morning.


Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak

2014-08-17 Thread Stephen Frost
Alvaro, all,

* Stephen Frost (sfr...@snowman.net) wrote:
> As mentioned, I'll add this to the ALTER TABLE documentation and remove
> it from the TABLESPACE docs.  That's not done yet but I should have time
> in the next few days to get that done also and will then commit it all
> to master and back-patch to 9.4, barring objections.

Here's a first pass with the documentation changes included.  Feedback
welcome, and I'll review it again and plan to commit it on Tuesday.

Thanks,

Stephen
diff --git a/doc/src/sgml/ref/alter_index.sgml b/doc/src/sgml/ref/alter_index.sgml
index 94a7af0..85159a0 100644
--- a/doc/src/sgml/ref/alter_index.sgml
+++ b/doc/src/sgml/ref/alter_index.sgml
@@ -25,6 +25,8 @@ ALTER INDEX [ IF EXISTS ] name RENA
 ALTER INDEX [ IF EXISTS ] name SET TABLESPACE tablespace_name
 ALTER INDEX [ IF EXISTS ] name SET ( storage_parameter = value [, ... ] )
 ALTER INDEX [ IF EXISTS ] name RESET ( storage_parameter [, ... ] )
+ALTER INDEX ALL IN TABLESPACE name [ OWNED BY role_name [, ... ] ]
+SET TABLESPACE new_tablespace [ NOWAIT ]
 
  
 
@@ -63,6 +65,17 @@ ALTER INDEX [ IF EXISTS ] name RESE
  
   This form changes the index's tablespace to the specified tablespace and
   moves the data file(s) associated with the index to the new tablespace.
+  To change the tablespace of an index, you must own the index and have
+  CREATE privilege on the new tablespace.
+  All indexes in a tablespace can be moved by using the
+  ALL IN TABLESPACE form, which will lock all indexes
+  to be moved and then move each one.  This form also supports
+  OWNED BY, which will only move indexes owned by the
+  roles specified.  If the NOWAIT option is specified
+  then the command will fail if it is unable to acquire all of the locks
+  required immediately.  Note that system catalogs will not be moved by
+  this command, use ALTER DATABASE or explicit
+  ALTER INDEX invocations instead if desired.
   See also
   .
  
diff --git a/doc/src/sgml/ref/alter_materialized_view.sgml b/doc/src/sgml/ref/alter_materialized_view.sgml
index 1932eeb..b0759fc 100644
--- a/doc/src/sgml/ref/alter_materialized_view.sgml
+++ b/doc/src/sgml/ref/alter_materialized_view.sgml
@@ -29,6 +29,8 @@ ALTER MATERIALIZED VIEW [ IF EXISTS ] namenew_name
 ALTER MATERIALIZED VIEW [ IF EXISTS ] name
 SET SCHEMA new_schema
+ALTER MATERIALIZED VIEW ALL IN TABLESPACE name [ OWNED BY role_name [, ... ] ]
+SET TABLESPACE new_tablespace [ NOWAIT ]
 
 where action is one of:
 
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 69a1e14..83d230c 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -31,6 +31,8 @@ ALTER TABLE [ IF EXISTS ] name
 RENAME TO new_name
 ALTER TABLE [ IF EXISTS ] name
 SET SCHEMA new_schema
+ALTER TABLE ALL IN TABLESPACE name [ OWNED BY role_name [, ... ] ]
+SET TABLESPACE new_tablespace [ NOWAIT ]
 
 where action is one of:
 
@@ -597,6 +599,17 @@ ALTER TABLE [ IF EXISTS ] name
   moves the data file(s) associated with the table to the new tablespace.
   Indexes on the table, if any, are not moved; but they can be moved
   separately with additional SET TABLESPACE commands.
+  All tables in a tablespace can be moved by using the
+  ALL IN TABLESPACE form, which will lock all tables
+  to be moved first and then move each one.  This form also supports
+  OWNED BY, which will only move tables owned by the
+  roles specified.  If the NOWAIT option is specified
+  then the command will fail if it is unable to acquire all of the locks
+  required immediately.  Note that system catalogs are not moved by this
+  command, use ALTER DATABASE or explicit
+  ALTER TABLE invocations instead if desired.  The
+  information_schema relations are not considered part
+  of the system catalogs and will be moved.
   See also
   .
  
@@ -649,7 +662,8 @@ ALTER TABLE [ IF EXISTS ] name
   
 
   
-   All the actions except RENAME and SET SCHEMA
+   All the actions except RENAME,
+   SET TABLESPACE and SET SCHEMA
can be combined into
a list of multiple alterations to apply in parallel.  For example, it
is possible to add several columns and/or alter the type of several
@@ -659,8 +673,8 @@ ALTER TABLE [ IF EXISTS ] name
 
   
You must own the table to use ALTER TABLE.
-   To change the schema of a table, you must also have
-   CREATE privilege on the new schema.
+   To change the schema or tablespace of a table, you must also have
+   CREATE privilege on the new schema or tablespace.
To add the table as a new child of a parent table, you must own the
parent table as well.
To alter the owner, you must also be a direct or indirect member of the new
diff --git a/doc/src/sgml/ref/alter_tablespace.sgml b/doc/src/sgml/ref/alter_tablespace.sgml
index bd1afb4..7c4aabc 100644
--- a/doc/src/

Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-08-17 Thread Fabrízio de Royes Mello
On Mon, Jul 28, 2014 at 2:24 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
>
>
> On Mon, Jul 28, 2014 at 1:41 PM, Christoph Berg  wrote:
> >
> > Re: Fabrízio de Royes Mello 2014-07-28

> > > There are something that should I do on this patch yet?
> >
> > I haven't got around to have a look at the newest incarnation yet, but
> > I plan to do that soonish. (Of course that shouldn't stop others from
> > doing that as well if they wish.)
> >
>
> Thanks!
>

Updated version.

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog sobre TI: http://fabriziomello.blogspot.com
>> Perfil Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 69a1e14..2d131df 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -59,16 +59,17 @@ ALTER TABLE [ IF EXISTS ] name
 ENABLE ALWAYS RULE rewrite_rule_name
 CLUSTER ON index_name
 SET WITHOUT CLUSTER
+SET {LOGGED | UNLOGGED}
 SET WITH OIDS
 SET WITHOUT OIDS
 SET ( storage_parameter = value [, ... ] )
+SET TABLESPACE new_tablespace
 RESET ( storage_parameter [, ... ] )
 INHERIT parent_table
 NO INHERIT parent_table
 OF type_name
 NOT OF
 OWNER TO new_owner
-SET TABLESPACE new_tablespace
 REPLICA IDENTITY {DEFAULT | USING INDEX index_name | FULL | NOTHING}
 
 and table_constraint_using_index is:
@@ -447,6 +448,20 @@ ALTER TABLE [ IF EXISTS ] name

 

+SET {LOGGED | UNLOGGED}
+
+ 
+  This form changes the table persistence type from unlogged to permanent or
+  from unlogged to permanent (see ).
+ 
+ 
+  Changing the table persistence type acquires an ACCESS EXCLUSIVE lock
+  and rewrites the table contents and associated indexes into new disk files.
+ 
+
+   
+
+   
 SET WITH OIDS
 
  
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index b1c411a..7f497c7 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -574,7 +574,8 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	heap_close(OldHeap, NoLock);
 
 	/* Create the transient table that will receive the re-ordered data */
-	OIDNewHeap = make_new_heap(tableOid, tableSpace, false,
+	OIDNewHeap = make_new_heap(tableOid, tableSpace,
+			   OldHeap->rd_rel->relpersistence,
 			   AccessExclusiveLock);
 
 	/* Copy the heap data into the new table in the desired order */
@@ -601,7 +602,7 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
  * data, then call finish_heap_swap to complete the operation.
  */
 Oid
-make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
+make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, char relpersistence,
 			  LOCKMODE lockmode)
 {
 	TupleDesc	OldHeapDesc;
@@ -613,7 +614,6 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
 	Datum		reloptions;
 	bool		isNull;
 	Oid			namespaceid;
-	char		relpersistence;
 
 	OldHeap = heap_open(OIDOldHeap, lockmode);
 	OldHeapDesc = RelationGetDescr(OldHeap);
@@ -636,16 +636,10 @@ make_new_heap(Oid OIDOldHeap, Oid NewTableSpace, bool forcetemp,
 	if (isNull)
 		reloptions = (Datum) 0;
 
-	if (forcetemp)
-	{
+	if (relpersistence == RELPERSISTENCE_TEMP)
 		namespaceid = LookupCreationNamespace("pg_temp");
-		relpersistence = RELPERSISTENCE_TEMP;
-	}
 	else
-	{
 		namespaceid = RelationGetNamespace(OldHeap);
-		relpersistence = OldHeap->rd_rel->relpersistence;
-	}
 
 	/*
 	 * Create the new heap, using a temporary name in the same namespace as
@@ -1146,6 +1140,7 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 	Oid			relfilenode1,
 relfilenode2;
 	Oid			swaptemp;
+	char		swaprelpersistence;
 	CatalogIndexState indstate;
 
 	/* We need writable copies of both pg_class tuples. */
@@ -1177,6 +1172,10 @@ swap_relation_files(Oid r1, Oid r2, bool target_is_pg_class,
 		relform1->reltablespace = relform2->reltablespace;
 		relform2->reltablespace = swaptemp;
 
+		swaprelpersistence = relform1->relpersistence;
+		relform1->relpersistence = relform2->relpersistence;
+		relform2->relpersistence = swaprelpersistence;
+
 		/* Also swap toast links, if we're swapping by links */
 		if (!swap_toast_by_content)
 		{
diff --git a/src/backend/commands/matview.c b/src/backend/commands/matview.c
index 5130d51..a49e66f 100644
--- a/src/backend/commands/matview.c
+++ b/src/backend/commands/matview.c
@@ -147,6 +147,7 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 	DestReceiver *dest;
 	bool		concurrent;
 	LOCKMODE	lockmode;
+	char		relpersistence;
 
 	/* Determine strength of lock needed. */
 	concurrent = stmt->concurrent;
@@ -233,9 +234,15 @@ ExecRefreshMatView(RefreshMatViewStmt *stmt, const char *queryString,
 
 	/* Concurrent refresh builds new data in temp tablespace, an

Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-17 Thread Tomas Vondra
On 15.8.2014 19:53, Robert Haas wrote:
> On Thu, Aug 14, 2014 at 2:21 PM, Jeff Davis  wrote:
>> On Thu, 2014-08-14 at 12:53 -0400, Tom Lane wrote:
>>> Oh?  So if we have aggregates like array_agg whose memory footprint
>>> increases over time, the patch completely fails to avoid bloat?
>>
>> Yes, in its current form.
>>
>>> I might think a patch with such a limitation was useful, if it weren't
>>> for the fact that aggregates of that nature are a large part of the
>>> cases where the planner misestimates the table size in the first place.
>>> Any complication that we add to nodeAgg should be directed towards
>>> dealing with cases that the planner is likely to get wrong.
>>
>> In my experience, the planner has a lot of difficulty estimating the
>> cardinality unless it's coming from a base table without any operators
>> above it (other than maybe a simple predicate). This is probably a lot
>> more common than array_agg problems, simply because array_agg is
>> relatively rare compared with GROUP BY in general.
> 
> I think that's right, and I rather like your (Jeff's) approach.  It's
> definitely true that we could do better if we have a mechanism for
> serializing and deserializing group states, but (1) I think an awful
> lot of cases would get an awful lot better even just with the approach
> proposed here and (2) I doubt we would make the
> serialization/deserialization interfaces mandatory, so even if we had
> that we'd probably want a fallback strategy anyway.

I certainly agree that we need Jeff's approach even if we can do better
in some cases (when we are able to serialize/deserialize the states).

And yes, (mis)estimating the cardinalities is a big issue, and certainly
a source of many problems.


> Furthermore, I don't really see that we're backing ourselves into a
> corner here.  If prohibiting creation of additional groups isn't
> sufficient to control memory usage, but we have
> serialization/deserialization functions, we can just pick an arbitrary
> subset of the groups that we're storing in memory and spool their
> transition states off to disk, thus reducing memory even further.  I
> understand Tomas' point to be that this is quite different from what
> we do for hash joins, but I think it's a different problem.  In the
> case of a hash join, there are two streams of input tuples, and we've
> got to batch them in compatible ways.  If we were to, say, exclude an
> arbitrary subset of tuples from the hash table instead of basing it on
> the hash code, we'd have to test *every* outer tuple against the hash
> table for *every* batch.  That would incur a huge amount of additional
> cost vs. being able to discard outer tuples once the batch to which
> they pertain has been processed.

Being able to batch inner and outer relations in a matching way is
certainly one of the reasons why hashjoin uses that particular scheme.
There are other reasons, though - for example being able to answer 'Does
this group belong to this batch?' quickly, and automatically update
number of batches.

I'm not saying the lookup is extremely costly, but I'd be very surprised
if it was as cheap as modulo on a 32-bit integer. Not saying it's the
dominant cost here, but memory bandwidth is quickly becoming one of the
main bottlenecks.


> But the situation here isn't comparable, because there's only one
> input stream.  I'm pretty sure we'll want to keep track of which
> transition states we've spilled due to lack of memory as opposed to
> those which were never present in the table at all, so that we can
> segregate the unprocessed tuples that pertain to spilled transition
> states from the ones that pertain to a group we haven't begun yet.

Why would that be necessary or useful? I don't see a reason for tracking
that / segregating the tuples.

> And it might be that if we know (or learn as we go along) that we're
> going to vastly blow out work_mem it makes sense to use batching to
> segregate the tuples that we decide not to process onto N tapes binned
> by hash code, so that we have a better chance that future batches will
> be the right size to fit in memory.  But I'm not convinced that
> there's a compelling reason why the *first* batch has to be chosen by
> hash code; we're actually best off picking any arbitrary set of groups
> that does the best job reducing the amount of data remaining to be
> processed, at least if the transition states are fixed size and maybe
> even if they aren't.

If you don't choose the fist batch by hash code, it's over, IMHO. You
can't just redo that later easily, because the HashWork items are
already treated separately.

regards
Tomas


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5: Memory-bounded HashAgg

2014-08-17 Thread Tomas Vondra
On 10.8.2014 23:26, Jeff Davis wrote:
> This patch is requires the Memory Accounting patch, or something similar
> to track memory usage.
> 
> The attached patch enables hashagg to spill to disk, which means that
> hashagg will contain itself to work_mem even if the planner makes a
> bad misestimate of the cardinality.
> 
> This is a well-known concept; there's even a Berkeley homework
> assignment floating around to implement it -- in postgres 7.2, no
> less. I didn't take the exact same approach as the homework assignment
> suggests, but it's not much different, either. My apologies if some
> classes are still using this as a homework assignment, but postgres
> needs to eventually have an answer to this problem.
> 
> Included is a GUC, "enable_hashagg_disk" (default on), which allows
> the planner to choose hashagg even if it doesn't expect the hashtable
> to fit in memory. If it's off, and the planner misestimates the
> cardinality, hashagg will still use the disk to contain itself to
> work_mem.
> 
> One situation that might surprise the user is if work_mem is set too
> low, and the user is *relying* on a misestimate to pick hashagg. With
> this patch, it would end up going to disk, which might be
> significantly slower. The solution for the user is to increase
> work_mem.
> 
> Rough Design:
> 
> Change the hash aggregate algorithm to accept a generic "work item",
> which consists of an input file as well as some other bookkeeping
> information.
> 
> Initially prime the algorithm by adding a single work item where the
> file is NULL, indicating that it should read from the outer plan.
> 
> If the memory is exhausted during execution of a work item, then
> continue to allow existing groups to be aggregated, but do not allow new
> groups to be created in the hash table. Tuples representing new groups
> are saved in an output partition file referenced in the work item that
> is currently being executed.
> 
> When the work item is done, emit any groups in the hash table, clear the
> hash table, and turn each output partition file into a new work item.
> 
> Each time through at least some groups are able to stay in the hash
> table, so eventually none will need to be saved in output partitions, no
> new work items will be created, and the algorithm will terminate. This
> is true even if the number of output partitions is always one.
> 
> Open items:
>* costing
>* EXPLAIN details for disk usage
>* choose number of partitions intelligently
>* performance testing
> 
> Initial tests indicate that it can be competitive with sort+groupagg
> when the disk is involved, but more testing is required.
> 
> Feedback welcome.

I've been working on this for a few hours - getting familiar with the
code, testing queries etc. Two comments.

1) Apparently there's something broken, because with this:

   create table table_b (fk_id int, val_a int, val_b int);
   insert into table_b
  select i, mod(i,1000), mod(i,1000)
from generate_series(1,1000) s(i);
   analyze table_b;

   I get this:

   set work_mem = '8MB';
   explain analyze select fk_id, count(*)
   from table_b where val_a < 50 and val_b < 50 group by 1;
   > The connection to the server was lost. Attempting reset: Failed.

   Stacktrace attached, but apparently there's a segfault in
   advance_transition_function when accessing pergroupstate.

   This happened for all queries that I tried, once they needed to do
   the batching.

2) Using the same hash value both for dynahash and batching seems
   really fishy to me. I'm not familiar with dynahash, but I'd bet
   the way it's done now will lead to bad distribution in the hash
   table (some buckets will be always empty in some batches, etc.).

   This is why hashjoin tries so hard to use non-overlapping parts
   of the hash for batchno/bucketno.

   The hashjoin implements it's onw hash table, which makes it clear
   how the bucket is derived from the hash value. I'm not sure how
   dynahash does that, but I'm pretty sure we can'd just reuse the hash
   value like this.

   I see two options - compute our own hash value, or somehow derive
   a new one (e.g. by doing "hashvalue XOR random_seed"). I'm not sure
   the latter would work, though.

regards
Tomas
Program received signal SIGSEGV, Segmentation fault.
0x0064e6fc in advance_transition_function (aggstate=0x11405c0, peraggstate=0x1143540, pergroupstate=0x8) at nodeAgg.c:465
465 if (pergroupstate->noTransValue)
(gdb) bt
#0  0x0064e6fc in advance_transition_function (aggstate=0x11405c0, peraggstate=0x1143540, pergroupstate=0x8) at nodeAgg.c:465
#1  0x0064eb0e in advance_aggregates (aggstate=0x11405c0, pergroup=0x8) at nodeAgg.c:621
#2  0x006502a7 in agg_fill_hash_table (aggstate=0x11405c0) at nodeAgg.c:1584
#3  0x0064fc3f in ExecAgg (node=0x11405c0) at nodeAgg.c:1289
#4  0x0063c754 in ExecProcNode (node=0x11405c0) at execProcnode.c:476
#5  0x0063a483

Re: [HACKERS] Index-only scans for GIST

2014-08-17 Thread Anastasia Lubennikova
2014-08-07 0:30 GMT+04:00 Heikki Linnakangas :

* I'm getting two regression failures with this (opr_sanity and join).
>

opr_sanity failure is corrected.
But there is remain question with join.
I check the latest version of my github repo and there's no fail in join
regression test
All 145 tests passed.
To tell the truth, I don't understand which changes could led to this
failure.
Could you show me regression diffs?
I want to understand what's wrong with the patch.

* The regression test queries that use LIMIT are not guaranteed to always
> return the same rows, hence they're not very good regression test cases.
> I'd suggest using more restricting WHERE clauses, so that each query only
> returns a handful of rows.
>

Thank you for comment, I rewrote wrong queries. But could you explain why
LIMIT queries may return different results? Is it happens because of
different query optimization?

* I think it's leaking memory, in GIST scan context. I tested this with a
> variant of the regression tests:
>
> insert into gist_tbl select box(point(0.05*i, 0.05*i), point(0.05*i,
> 0.05*i)),
>  point(0.05*i, 0.05*i) FROM generate_series(0,
> 1000) as i;
> CREATE INDEX gist_tbl_point_index ON gist_tbl USING gist (p);
>
> set enable_seqscan=off;
> set enable_bitmapscan=off;
>
> explain analyze  select p from gist_tbl where p <@ box(point(0,0),
> point(999,999)) and length(p::text) < 10;
>
> while the final query runs, 'top' shows constantly increasing memory usage.


I don't think it's memory leak. After some increasing, memory using remain
the same. It works similar without using indexonlyscan.



-- 
Best regards,
Lubennikova Anastasia


Re: [HACKERS] ALTER TABLESPACE MOVE command tag tweak

2014-08-17 Thread Stephen Frost
Alvaro, all,

* Alvaro Herrera (alvhe...@2ndquadrant.com) wrote:
> Can you be more specific on the exact grammar you're considering?  The
> proposal above,
> ALTER TABLE ON ALL TABLES IN TABLESPACE xyz
> doesn't seem very good to me.  I would think it'd be more like
> ALTER ALL TABLES IN TABLESPACE xyz
> but then if you return ALTER TABLE as a command tag that might be a bit
> strange.  Maybe
> ALTER TABLE ALL IN TABLESPACE xyz
> which AFAICS should work since ALL is already a reserved keyword.

I've implemented the 'ALTER TABLE ALL IN TABLESPACE ...' appproach, as
discussed.  Patch attached.

> Also, how would we document this?  Would we have it in the same page as
> all the ALTER TABLE variants, or would we create a separate page for
> ALTER TABLE ALL?  Keeping in mind that in the future we might want to
> allow things such as ALTER TABLE ALL IN SCHEMA xyz it might be better to
> have the selection logic documented neatly in its own little page
> instead of together with the ALTER TABLE mess which is already rather
> large.

As mentioned, I'll add this to the ALTER TABLE documentation and remove
it from the TABLESPACE docs.  That's not done yet but I should have time
in the next few days to get that done also and will then commit it all
to master and back-patch to 9.4, barring objections.

Thanks,

Stephen
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index c86b999..feceed7 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -50,6 +50,7 @@
 #include "commands/tablespace.h"
 #include "commands/trigger.h"
 #include "commands/typecmds.h"
+#include "commands/user.h"
 #include "executor/executor.h"
 #include "foreign/foreign.h"
 #include "miscadmin.h"
@@ -9204,6 +9205,176 @@ ATExecSetTableSpace(Oid tableOid, Oid newTableSpace, LOCKMODE lockmode)
 }
 
 /*
+ * Alter Table ALL ... SET TABLESPACE
+ *
+ * Allows a user to move all objects of some type in a given tablespace in the
+ * current database to another tablespace.  Objects can be chosen based on the
+ * owner of the object also, to allow users to move only their objects.
+ * The user must have CREATE rights on the new tablespace, as usual.   The main
+ * permissions handling is done by the lower-level table move function.
+ *
+ * All to-be-moved objects are locked first. If NOWAIT is specified and the
+ * lock can't be acquired then we ereport(ERROR).
+ */
+Oid
+AlterTableMoveAll(AlterTableMoveAllStmt *stmt)
+{
+	List	   *relations = NIL;
+	ListCell   *l;
+	ScanKeyData key[1];
+	Relation	rel;
+	HeapScanDesc scan;
+	HeapTuple	tuple;
+	Oid			orig_tablespaceoid;
+	Oid			new_tablespaceoid;
+	List	   *role_oids = roleNamesToIds(stmt->roles);
+
+	/* Ensure we were not asked to move something we can't */
+	if (stmt->objtype != OBJECT_TABLE && stmt->objtype != OBJECT_INDEX &&
+		stmt->objtype != OBJECT_MATVIEW)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("only tables, indexes, and materialized views exist in tablespaces")));
+
+	/* Get the orig and new tablespace OIDs */
+	orig_tablespaceoid = get_tablespace_oid(stmt->orig_tablespacename, false);
+	new_tablespaceoid = get_tablespace_oid(stmt->new_tablespacename, false);
+
+	/* Can't move shared relations in to or out of pg_global */
+	/* This is also checked by ATExecSetTableSpace, but nice to stop earlier */
+	if (orig_tablespaceoid == GLOBALTABLESPACE_OID ||
+		new_tablespaceoid == GLOBALTABLESPACE_OID)
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot move relations in to or out of pg_global tablespace")));
+
+	/*
+	 * Must have CREATE rights on the new tablespace, unless it is the
+	 * database default tablespace (which all users implicitly have CREATE
+	 * rights on).
+	 */
+	if (OidIsValid(new_tablespaceoid) && new_tablespaceoid != MyDatabaseTableSpace)
+	{
+		AclResult	aclresult;
+
+		aclresult = pg_tablespace_aclcheck(new_tablespaceoid, GetUserId(),
+		   ACL_CREATE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, ACL_KIND_TABLESPACE,
+		   get_tablespace_name(new_tablespaceoid));
+	}
+
+	/*
+	 * Now that the checks are done, check if we should set either to
+	 * InvalidOid because it is our database's default tablespace.
+	 */
+	if (orig_tablespaceoid == MyDatabaseTableSpace)
+		orig_tablespaceoid = InvalidOid;
+
+	if (new_tablespaceoid == MyDatabaseTableSpace)
+		new_tablespaceoid = InvalidOid;
+
+	/* no-op */
+	if (orig_tablespaceoid == new_tablespaceoid)
+		return new_tablespaceoid;
+
+	/*
+	 * Walk the list of objects in the tablespace and move them. This will
+	 * only find objects in our database, of course.
+	 */
+	ScanKeyInit(&key[0],
+Anum_pg_class_reltablespace,
+BTEqualStrategyNumber, F_OIDEQ,
+ObjectIdGetDatum(orig_tablespaceoid));
+
+	rel = heap_open(RelationRelationId, AccessShareLock);
+	scan = heap_beginscan_catalog(rel, 1, key);
+	while ((tuple = heap_getnext(scan, ForwardScanDirectio