Re: [HACKERS] Performance problem in PLPgSQL

2013-07-24 Thread Amit Kapila
On Wednesday, July 24, 2013 4:23 AM Tom Lane wrote:
 I wrote:
  Marc Cousin cousinm...@gmail.com writes:
  The example below is of course extremely simplified, and obviously
 not
  what we are really doing in the database, but it exhibits the
 slowdown
  between 9.1.9 and 9.2.4.
 
  Hm.  Some experimentation shows that the plan cache is failing to
 switch
  to a generic plan, but I'm not sure why the cast would have anything
 to
  do with that ...
 
 Hah, I see why:
 
 (gdb) s
 1009if (plansource-generic_cost  avg_custom_cost * 1.1)
 (gdb) p plansource-generic_cost
 $18 = 0.012501
 (gdb) p avg_custom_cost
 $19 = 0.01
 (gdb) p avg_custom_cost * 1.1
 $20 = 0.011001
 
 That is, the estimated cost of the custom plan is just the evaluation
 time for a simple constant, while the estimated cost of the generic
 plan
 includes a charge for evaluation of int4_numeric().  That makes the
 latter more than ten percent more expensive, and since this logic isn't
 considering anything else at all (particularly not the cost of
 planning), it thinks that makes the custom plan worth picking.
 
 We've been around on this before, but not yet thought of a reasonable
 way to estimate planning cost, let alone compare it to estimated
 execution costs.  Up to now I hadn't thought that was a particularly
 urgent issue, but this example shows it's worth worrying about.

 One thing that was suggested in the previous discussion is to base the
 planning cost estimate on the length of the rangetable.  We could
 do something trivial like add 10 * (length(plan-rangetable) + 1)
 in this comparison.
 
 Another thing that would fix this particular issue, though perhaps not
 related ones, is to start charging something nonzero for ModifyTable
 nodes, say along the lines of one seq_page_cost per inserted/modified
 row.  That would knock the total estimated cost for an INSERT up enough
 that the ten percent threshold wouldn't be exceeded.

Shouldn't it consider new value of boundparam to decide whether a new custom
plan is needed,
as that can be one of the main reason why it would need different plan.

Current behavior is either it will choose generic plan or build a new custom
plan with new parameters based on 
Choose_custom_plan().

Shouldn't the behavior of this be as below:
a. choose generic plan
b. choose one of existing custom plan
c. create new custom plan 

The choice can be made based on the new value of bound parameter.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Backup throttling

2013-07-24 Thread Antonin Houska

Hello,
the purpose of this patch is to limit impact of pg_backup on running 
server. Feedback is appreciated.


// Antonin Houska (Tony)
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index eb0c1d6..3b7ecfd 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -189,6 +189,22 @@ PostgreSQL documentation
  /varlistentry
 
  varlistentry
+  termoption-r/option/term
+  termoption--max-rate/option/term
+  listitem
+   para
+	The maximum amount of data transferred from server per second.
+	The purpose is to limit impact of
+	applicationpg_basebackup/application on a running master server.
+   /para
+   para
+	Suffixes literalk/literal (kilobytes) and literalm/literal
+	(megabytes) are accepted. For example: literal10m/literal
+   /para
+  /listitem
+ /varlistentry
+
+ varlistentry
   termoption-R/option/term
   termoption--write-recovery-conf/option/term
   listitem
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index a1e12a8..7fb2d78 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -45,11 +45,23 @@ bool		streamwal = false;
 bool		fastcheckpoint = false;
 bool		writerecoveryconf = false;
 int			standby_message_timeout = 10 * 1000;		/* 10 sec = default */
+double		max_rate = 0;		/* Maximum bytes per second. 0 implies
+ * unlimited rate. */
+
+#define MAX_RATE_LOWER	0x8000		/* 32 kB */
+#define MAX_RATE_UPPER	0x4000		/* 1 GB */
+/*
+ * We shouldn't measure time whenever a small piece of data (e.g. TAR file
+ * header) has arrived. That would introduce high CPU overhead.
+ */
+#define RATE_MIN_SAMPLE 32768
 
 /* Progress counters */
 static uint64 totalsize;
 static uint64 totaldone;
+static uint64 last_measured;
 static int	tablespacecount;
+static int64 last_measured_time;
 
 /* Pipe to communicate with background wal receiver process */
 #ifndef WIN32
@@ -72,9 +84,14 @@ static volatile LONG has_xlogendptr = 0;
 static PQExpBuffer recoveryconfcontents = NULL;
 
 /* Function headers */
+
+
+
 static void usage(void);
 static void verify_dir_is_empty_or_create(char *dirname);
 static void progress_report(int tablespacenum, const char *filename);
+static double parse_max_rate(char *src);
+static void enforce_max_rate();
 
 static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum);
 static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum);
@@ -110,6 +127,7 @@ usage(void)
 	printf(_(\nOptions controlling the output:\n));
 	printf(_(  -D, --pgdata=DIRECTORY receive base backup into directory\n));
 	printf(_(  -F, --format=p|t   output format (plain (default), tar)\n));
+	printf(_(  -r, --max-rate maximum transfer rate between server and client\n));
 	printf(_(  -R, --write-recovery-conf\n
 			  write recovery.conf after backup\n));
 	printf(_(  -x, --xlog include required WAL files in backup (fetch mode)\n));
@@ -473,6 +491,113 @@ progress_report(int tablespacenum, const char *filename)
 	fprintf(stderr, \r);
 }
 
+static double
+parse_max_rate(char *src)
+{
+	int			factor;
+	char	   *after_num;
+	double		result;
+
+	result = strtod(src, after_num);
+	if (src == after_num)
+	{
+		fprintf(stderr, _(%s: invalid transfer rate \%s\\n), progname, src);
+		exit(1);
+	}
+
+	/*
+	 * Evaluate (optional) suffix.
+	 *
+	 * after_num should now be right behind the numeric value.
+	 */
+	factor = 1;
+	switch (tolower(*after_num))
+	{
+			/*
+			 * Only the following suffixes are allowed. It's not too useful to
+			 * restrict the rate to gigabytes: such a rate will probably bring
+			 * significant impact on the master anyway, so the throttling
+			 * won't help much.
+			 */
+		case 'm':
+			factor = 10;
+		case 'k':
+			factor = 10;
+			after_num++;
+			break;
+
+		default:
+
+			/*
+			 * If there's no suffix, byte is the unit. Possible invalid letter
+			 * will make conversion to integer fail.
+			 */
+			break;
+	}
+
+	/* The rest can only consist of white space. */
+	while (*after_num != '\0')
+	{
+		if (!isspace(*after_num))
+		{
+			fprintf(stderr, _(%s: invalid transfer rate \%s\\n), progname, src);
+			exit(1);
+		}
+		after_num++;
+	}
+
+	result *= factor;
+	if (result  MAX_RATE_LOWER || result  MAX_RATE_UPPER)
+	{
+		fprintf(stderr, _(%s: transfer rate out of range \%s\\n), progname, src);
+		exit(1);
+	}
+	return result;
+}
+
+/*
+ * If the progress is more than what max_rate allows, sleep.
+ *
+ * Do not call if max_rate == 0.
+ */
+static void
+enforce_max_rate()
+{
+	int64		min_elapsed,
+now,
+elapsed,
+to_sleep;
+
+	int64		last_chunk;
+
+	last_chunk = totaldone - last_measured;
+
+	/* The measurements shouldn't be more frequent then necessary. */
+	if (last_chunk  RATE_MIN_SAMPLE)
+		return;
+
+
+	now = localGetCurrentTimestamp();
+	elapsed = now - last_measured_time;
+
+	/*
+	 * 

Re: [HACKERS] Performance problem in PLPgSQL

2013-07-24 Thread Amit Kapila
On Wednesday, July 24, 2013 11:38 AM Amit Kapila wrote:
 On Wednesday, July 24, 2013 4:23 AM Tom Lane wrote:
  I wrote:
   Marc Cousin cousinm...@gmail.com writes:
   The example below is of course extremely simplified, and obviously
  not
   what we are really doing in the database, but it exhibits the
  slowdown
   between 9.1.9 and 9.2.4.
 
   Hm.  Some experimentation shows that the plan cache is failing to
  switch
   to a generic plan, but I'm not sure why the cast would have
 anything
  to
   do with that ...
 
  Hah, I see why:
 
  (gdb) s
  1009if (plansource-generic_cost  avg_custom_cost * 1.1)
  (gdb) p plansource-generic_cost
  $18 = 0.012501
  (gdb) p avg_custom_cost
  $19 = 0.01
  (gdb) p avg_custom_cost * 1.1
  $20 = 0.011001
 
  That is, the estimated cost of the custom plan is just the evaluation
  time for a simple constant, while the estimated cost of the generic
  plan
  includes a charge for evaluation of int4_numeric().  That makes the
  latter more than ten percent more expensive, and since this logic
 isn't
  considering anything else at all (particularly not the cost of
  planning), it thinks that makes the custom plan worth picking.
 
  We've been around on this before, but not yet thought of a reasonable
  way to estimate planning cost, let alone compare it to estimated
  execution costs.  Up to now I hadn't thought that was a particularly
  urgent issue, but this example shows it's worth worrying about.
 
  One thing that was suggested in the previous discussion is to base
 the
  planning cost estimate on the length of the rangetable.  We could
  do something trivial like add 10 * (length(plan-rangetable) + 1)
  in this comparison.
 
  Another thing that would fix this particular issue, though perhaps
 not
  related ones, is to start charging something nonzero for ModifyTable
  nodes, say along the lines of one seq_page_cost per inserted/modified
  row.  That would knock the total estimated cost for an INSERT up
 enough
  that the ten percent threshold wouldn't be exceeded.
 
 Shouldn't it consider new value of boundparam to decide whether a new
 custom
 plan is needed,
 as that can be one of the main reason why it would need different plan.
 
 Current behavior is either it will choose generic plan or build a new
 custom
 plan with new parameters based on
 Choose_custom_plan().
 
 Shouldn't the behavior of this be as below:
 a. choose generic plan
 b. choose one of existing custom plan
 c. create new custom plan
 
 The choice can be made based on the new value of bound parameter.

For the case of Insert where no scan is involved (as in this case), do we
anytime need different plan?
Can we always use generic plan for such cases?

With Regards,
Amit Kapila.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [bug fix] PITR corrupts the database cluster

2013-07-24 Thread MauMau

Hello,

I've encountered a bug of PITR that corrupts the database.  I'm willing to 
submit the patch to fix it, but I'm wondering what approach is appropriate. 
Could you give me your opinions?


[Problem]
I cannot connect to the database after performing the following steps:

1. CREATE DATABASE mydb;
2. Take a base backup with pg_basebackup.
3. DROP DATABASE mydb;
4. Shutdown the database server with pg_ctl stop.
5. Recover the database cluster to the point where the base backup 
completed, i.e., before dropping mydb.  The contents of recovery.conf is:

restore_command = 'cp /arc/dir/%f %p'
recovery_target_timeline = 'latest'
recovery_target_time = 'STOP TIME recorded in the backup history file which 
was created during base backup'


I expected to be able to connect to mydb because I recovered to the point 
before dropping mydb.  However, I cannot connect to mydb because the 
directory for mydb does not exist.  The entry for mydb exists in 
pg_database.



[Cause]
DROP DATABASE emits the below WAL records:

1. System catalog changes including deletion of a tuple for mydb in 
pg_database

2. Deletion of directories for the database
3. Transaction commit

During recovery, postgres replays 1 and 2.  It ends the recovery when it 
notices that the time recorded in commit record (3 above) is later than the 
recovery target time.  The commit record is not replayed, thus the system 
catalog changes are virtually undone.


The problem is that 2 is replayed.  This deletes the directory for the 
database although the transaction is not committed.



[How to fix]
There are two approaches.  Which do you think is the way to go?

Approach 1
During recovery, when the WAL record for directory deletion is found, just 
record that fact for later replay (in a hash table keyed by xid).  When the 
corresponding transaction commit record is found, replay the directory 
deletion record.


Approach 2
Like the DROP TABLE/INDEX case, piggyback the directory deletion record on 
the transaction commit record, and eliminate the directory deletion record 
altogether.


I think we need to take approach 1 even when we also does 2, because 1 is 
necessary when the backup and archive WAL are already taken with the current 
PostgreSQL anyway.



Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] PITR corrupts the database cluster

2013-07-24 Thread Andres Freund
On 2013-07-24 19:30:09 +0900, MauMau wrote:
 I've encountered a bug of PITR that corrupts the database.  I'm willing to
 submit the patch to fix it, but I'm wondering what approach is appropriate.
 Could you give me your opinions?
 
 [Problem]
 I cannot connect to the database after performing the following steps:
 
 1. CREATE DATABASE mydb;
 2. Take a base backup with pg_basebackup.
 3. DROP DATABASE mydb;
 4. Shutdown the database server with pg_ctl stop.
 5. Recover the database cluster to the point where the base backup
 completed, i.e., before dropping mydb.  The contents of recovery.conf is:
 restore_command = 'cp /arc/dir/%f %p'
 recovery_target_timeline = 'latest'
 recovery_target_time = 'STOP TIME recorded in the backup history file which
 was created during base backup'

For a second I wanted to say that it's a user error because you should
just set recovery_target_lsn based on the END WAL location in the backup
history file. Unfortunately recovery_target_lsn doesn't exist. It
should.

 I expected to be able to connect to mydb because I recovered to the point
 before dropping mydb.  However, I cannot connect to mydb because the
 directory for mydb does not exist.  The entry for mydb exists in
 pg_database.

 [Cause]
 DROP DATABASE emits the below WAL records:
 
 1. System catalog changes including deletion of a tuple for mydb in
 pg_database
 2. Deletion of directories for the database
 3. Transaction commit

 Approach 1
 During recovery, when the WAL record for directory deletion is found, just
 record that fact for later replay (in a hash table keyed by xid).  When the
 corresponding transaction commit record is found, replay the directory
 deletion record.

I think that's too much of a special case implementation.

 Approach 2
 Like the DROP TABLE/INDEX case, piggyback the directory deletion record on
 the transaction commit record, and eliminate the directory deletion record
 altogether.

I don't think burdening commit records with that makes sense. It's just
not a common enough case.

What we imo could do would be to drop the tablespaces in a *separate*
transaction *after* the transaction that removed the pg_tablespace
entry. Then an incomplete actions logic similar to btree and gin could
be used to remove the database directory if we crashed between the two
transactions.

SO:
TXN1 does:
* remove catalog entries
* drop buffers
* XLogInsert(XLOG_DBASE_DROP_BEGIN)

TXN2:
* remove_dbtablespaces
* XLogInsert(XLOG_DBASE_DROP_FINISH)

The RM_DBASE_ID resource manager would then grow a rm_cleanup callback
(which would perform TXN2 if we failed inbetween) and a
rm_safe_restartpoint which would prevent restartpoints from occuring on
standby between both.

The same should probably done for CREATE DATABASE because that currently
can result in partially copied databases lying around.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] PITR corrupts the database cluster

2013-07-24 Thread Andres Freund
On 2013-07-24 12:59:43 +0200, Andres Freund wrote:
  Approach 2
  Like the DROP TABLE/INDEX case, piggyback the directory deletion record on
  the transaction commit record, and eliminate the directory deletion record
  altogether.
 
 I don't think burdening commit records with that makes sense. It's just
 not a common enough case.
 
 What we imo could do would be to drop the tablespaces in a *separate*
 transaction *after* the transaction that removed the pg_tablespace
 entry. Then an incomplete actions logic similar to btree and gin could
 be used to remove the database directory if we crashed between the two
 transactions.
 
 SO:
 TXN1 does:
 * remove catalog entries
 * drop buffers
 * XLogInsert(XLOG_DBASE_DROP_BEGIN)
 
 TXN2:
 * remove_dbtablespaces
 * XLogInsert(XLOG_DBASE_DROP_FINISH)
 
 The RM_DBASE_ID resource manager would then grow a rm_cleanup callback
 (which would perform TXN2 if we failed inbetween) and a
 rm_safe_restartpoint which would prevent restartpoints from occuring on
 standby between both.
 
 The same should probably done for CREATE DATABASE because that currently
 can result in partially copied databases lying around.

And CREATE/DROP TABLESPACE.

Not really related, but CREATE DATABASE's implementation makes me itch
everytime I read parts of it...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backup throttling

2013-07-24 Thread Fujii Masao
On Wed, Jul 24, 2013 at 4:20 PM, Antonin Houska
antonin.hou...@gmail.com wrote:
 Hello,
 the purpose of this patch is to limit impact of pg_backup on running server.
 Feedback is appreciated.

Interesting. Please add this patch to the next commitfeat.
https://commitfest.postgresql.org/action/commitfest_view?id=19

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] DATE type output does not follow datestyle parameter

2013-07-24 Thread MauMau

Hello,

The description of datestyle parameter does not seem to match the actual 
behavior.  Is this a bug to be fixed?  Which do you think should be 
corrected, the program or the manual?



The manual says:

DateStyle (string)
Sets the display format for date and time values, as well as the rules for 
interpreting ambiguous date input values. For historical reasons, this 
variable contains two independent components: the output format 
specification (ISO, Postgres, SQL, or German) and the input/output 
specification for year/month/day ordering (DMY, MDY, or YMD). ...



And says:

http://www.postgresql.org/docs/current/static/datatype-datetime.html

8.5.2. Date/Time Output
The output of the date and time types is of course only the date or time 
part in accordance with the given examples.



After doing SET datestyle = 'Postgres, MDY' on the psql prompt, I did the 
following things on the same psql session:



1. SELECT current_timestamp;

  now
--
Wed Jul 24 10:51:00.217 2013 GMT
(1 行)

This is exactly as I expected.


2. SELECT current_date;
I expected the output Wed Jul 24 2013 or Jul 24 2013, but I got:

   date

07-24-2013
(1 行)

This does not follow the above statement in 8.5.2.  This output is created 
by EncodeDateOnly() in src/backend/utils/adt/datetime.c.



Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] PITR corrupts the database cluster

2013-07-24 Thread Heikki Linnakangas
Andres Freund and...@2ndquadrant.com wrote:
On 2013-07-24 12:59:43 +0200, Andres Freund wrote:
  Approach 2
  Like the DROP TABLE/INDEX case, piggyback the directory deletion
record on
  the transaction commit record, and eliminate the directory deletion
record
  altogether.
 
 I don't think burdening commit records with that makes sense. It's
just
 not a common enough case.
 
 What we imo could do would be to drop the tablespaces in a *separate*
 transaction *after* the transaction that removed the pg_tablespace
 entry. Then an incomplete actions logic similar to btree and gin
could
 be used to remove the database directory if we crashed between the
two
 transactions.
 
 SO:
 TXN1 does:
 * remove catalog entries
 * drop buffers
 * XLogInsert(XLOG_DBASE_DROP_BEGIN)
 
 TXN2:
 * remove_dbtablespaces
 * XLogInsert(XLOG_DBASE_DROP_FINISH)
 
 The RM_DBASE_ID resource manager would then grow a rm_cleanup
callback
 (which would perform TXN2 if we failed inbetween) and a
 rm_safe_restartpoint which would prevent restartpoints from occuring
on
 standby between both.
 
 The same should probably done for CREATE DATABASE because that
currently
 can result in partially copied databases lying around.

And CREATE/DROP TABLESPACE.

Not really related, but CREATE DATABASE's implementation makes me itch
everytime I read parts of it...

I've been hoping that we could get rid of the rm_cleanup mechanism entirely. I 
eliminated it for gist a while back, and I've been thinking of doing the same 
for gin and btree. The way it works currently is buggy - while we have 
rm_safe_restartpoint to avoid creating a restartpoint at a bad moment, there is 
nothing to stop you from running a checkpoint while incomplete actions are 
pending. It's possible that there are page locks or something that prevent it 
in practice, but it feels shaky.

So I'd prefer a solution that doesn't rely on rm_cleanup. Piggybacking on 
commit record seems ok to me, though if we're going to have a lot of different 
things to attach there, maybe we need to generalize it somehow. Like, allow 
resource managers to attach arbitrary payload to the commit record, and provide 
a new rm_redo_commit function to replay them.

 
- Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] PITR corrupts the database cluster

2013-07-24 Thread Andres Freund
On 2013-07-24 15:45:52 +0300, Heikki Linnakangas wrote:
 Andres Freund and...@2ndquadrant.com wrote:
 On 2013-07-24 12:59:43 +0200, Andres Freund wrote:
   Approach 2
  What we imo could do would be to drop the tablespaces in a *separate*
  transaction *after* the transaction that removed the pg_tablespace
  entry. Then an incomplete actions logic similar to btree and gin
 could
  be used to remove the database directory if we crashed between the
 two
  transactions.
  
  SO:
  TXN1 does:
  * remove catalog entries
  * drop buffers
  * XLogInsert(XLOG_DBASE_DROP_BEGIN)
  
  TXN2:
  * remove_dbtablespaces
  * XLogInsert(XLOG_DBASE_DROP_FINISH)
  
  The RM_DBASE_ID resource manager would then grow a rm_cleanup
 callback
  (which would perform TXN2 if we failed inbetween) and a
  rm_safe_restartpoint which would prevent restartpoints from occuring
 on
  standby between both.
  
  The same should probably done for CREATE DATABASE because that
 currently
  can result in partially copied databases lying around.
 
 And CREATE/DROP TABLESPACE.
 
 Not really related, but CREATE DATABASE's implementation makes me itch
 everytime I read parts of it...
 
 I've been hoping that we could get rid of the rm_cleanup mechanism entirely. 
 I eliminated it for gist a while back, and I've been thinking of doing the 
 same for gin and btree. The way it works currently is buggy - while we have 
 rm_safe_restartpoint to avoid creating a restartpoint at a bad moment, there 
 is nothing to stop you from running a checkpoint while incomplete actions are 
 pending. It's possible that there are page locks or something that prevent it 
 in practice, but it feels shaky.
 
 So I'd prefer a solution that doesn't rely on rm_cleanup. Piggybacking on 
 commit record seems ok to me, though if we're going to have a lot of 
 different things to attach there, maybe we need to generalize it somehow. 
 Like, allow resource managers to attach arbitrary payload to the commit 
 record, and provide a new rm_redo_commit function to replay them.

The problem is that piggybacking on the commit record doesn't really fix
the problem that we end up with a bad state if we crash in a bad
moment.

For CREATE DATABASE you will have to copy the template database *before*
you commit the pg_database insert. Which means if we abort before that
we have old data in the datadir.

For DROP DATABASE, without something like incomplete actions,
piggybacking on the commit record doesn't solve the issue of CHECKPOINTS
either, because the commit record you piggybacked on could have
committed before a checkpoint, while you still were busy deleting all
the files.

Similar things go for CREATE/DROP TABLESPACE.

I think getting rid of something like incomplete actions entirely isn't
really realistic. There are several places were we probably should use
them but don't. I think we probably will have change the logic around it
so there's something that determines whether a normal checkpoint is safe
atm.
A simple version of this basically exists via
GetVirtualXIDsDelayingChkpt() which is now also used for checksums but
we probably want to extend that at some point.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] PITR corrupts the database cluster

2013-07-24 Thread Heikki Linnakangas
Andres Freund and...@2ndquadrant.com wrote:
On 2013-07-24 15:45:52 +0300, Heikki Linnakangas wrote:
 Andres Freund and...@2ndquadrant.com wrote:
 On 2013-07-24 12:59:43 +0200, Andres Freund wrote:
   Approach 2
  What we imo could do would be to drop the tablespaces in a
*separate*
  transaction *after* the transaction that removed the pg_tablespace
  entry. Then an incomplete actions logic similar to btree and gin
 could
  be used to remove the database directory if we crashed between the
 two
  transactions.
  
  SO:
  TXN1 does:
  * remove catalog entries
  * drop buffers
  * XLogInsert(XLOG_DBASE_DROP_BEGIN)
  
  TXN2:
  * remove_dbtablespaces
  * XLogInsert(XLOG_DBASE_DROP_FINISH)
  
  The RM_DBASE_ID resource manager would then grow a rm_cleanup
 callback
  (which would perform TXN2 if we failed inbetween) and a
  rm_safe_restartpoint which would prevent restartpoints from
occuring
 on
  standby between both.
  
  The same should probably done for CREATE DATABASE because that
 currently
  can result in partially copied databases lying around.
 
 And CREATE/DROP TABLESPACE.
 
 Not really related, but CREATE DATABASE's implementation makes me
itch
 everytime I read parts of it...
 
 I've been hoping that we could get rid of the rm_cleanup mechanism
entirely. I eliminated it for gist a while back, and I've been thinking
of doing the same for gin and btree. The way it works currently is
buggy - while we have rm_safe_restartpoint to avoid creating a
restartpoint at a bad moment, there is nothing to stop you from running
a checkpoint while incomplete actions are pending. It's possible that
there are page locks or something that prevent it in practice, but it
feels shaky.
 
 So I'd prefer a solution that doesn't rely on rm_cleanup.
Piggybacking on commit record seems ok to me, though if we're going to
have a lot of different things to attach there, maybe we need to
generalize it somehow. Like, allow resource managers to attach
arbitrary payload to the commit record, and provide a new
rm_redo_commit function to replay them.

The problem is that piggybacking on the commit record doesn't really
fix
the problem that we end up with a bad state if we crash in a bad
moment.

For CREATE DATABASE you will have to copy the template database
*before*
you commit the pg_database insert. Which means if we abort before that
we have old data in the datadir.

For DROP DATABASE, without something like incomplete actions,
piggybacking on the commit record doesn't solve the issue of
CHECKPOINTS
either, because the commit record you piggybacked on could have
committed before a checkpoint, while you still were busy deleting all
the files.

That's no different from CREATE TABLE / INDEX and DROP TABLE / INDEX. E.g. If 
you crash after CREATE TABLE but before COMMIT, the file is leaked. But it's 
just a waste of space, everything still works.

It would be  nice to fix that leak, for tables and indexes too...


- Heikki


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Suggestion for concurrent index creation using a single full scan operation

2013-07-24 Thread Tim Kane
Wow.. thanks guys, really appreciate the detailed analysis.

Tim


On Wed, Jul 24, 2013 at 4:08 AM, Noah Misch n...@leadboat.com wrote:

 On Tue, Jul 23, 2013 at 01:06:26PM +0100, Tim Kane wrote:
  I haven't given this a lot of thought, but it struck me that when
  rebuilding tables (be it for a restore process, or some other operational
  activity) - there is more often than not a need to build an index or two,
  sometimes many indexes, against the same relation.
 
  It strikes me that in order to build just one index, we probably need to
  perform a full table scan (in a lot of cases).   If we are building
  multiple indexes sequentially against that same table, then we're
 probably
  performing multiple sequential scans in succession, once for each index.

 Check.

  Could we architect a mechanism that allowed multiple index creation
  statements to execute concurrently, with all of their inputs fed directly
  from a single sequential scan against the full relation?
 
  From a language construct point of view, this may not be trivial to
  implement for raw/interactive SQL - but possibly this is a candidate for
  the custom format restore?

 As Greg Stark mentioned, pg_restore can already issue index build commands
 in
 parallel.  Where applicable, that's probably superior to having one backend
 build multiple indexes during a single heap scan.  Index builds are
 CPU-intensive, and the pg_restore approach takes advantage of additional
 CPU
 cores in addition to possibly saving I/O.

 However, the pg_restore method is not applicable if you want CREATE INDEX
 CONCURRENTLY, and it's not applicable for implicit index building such as
 happens for ALTER TABLE rewrites and for VACUUM FULL.  Backend-managed
 concurrent index builds could shine there.

  I presume this would substantially increase the memory overhead required
 to
  build those indexes, though the performance gains may be advantageous.

 The multi-index-build should respect maintenance_work_mem overall.
  Avoiding
 cases where that makes concurrent builds slower than sequential builds is a
 key challenge for such a project:

 - If the index builds each fit in maintenance_work_mem when run
 sequentially
   and some spill to disk when run concurrently, expect concurrency to lose.
 - If the heap is small enough to stay in cache from one index build to the
   next, performing the builds concurrently is probably a wash or a loss.
 - Concurrency should help when a wide-row table large enough to exhaust OS
   cache has narrow indexes that all fit in maintenance_work_mem.  I don't
 know
   whether concurrency would help for a huge-table scenario where the
 indexes
   do overspill maintenance_work_mem.  You would have N indexes worth of
   external merge files competing for disk bandwidth; that could cancel out
   heap I/O savings.

 Overall, it's easy to end up with a loss.  We could punt by having an
 index_build_concurrency GUC, much like pg_restore relies on the user to
 discover a good -j value.  But if finding cases where concurrency helps
 is
 too hard, leaving the GUC at one would become the standard advice.

  Apologies in advance if this is not the correct forum for suggestions..

 It's the right forum.

 Thanks,
 nm

 --
 Noah Misch
 EnterpriseDB http://www.enterprisedb.com



Re: [HACKERS] [bug fix] PITR corrupts the database cluster

2013-07-24 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 That's no different from CREATE TABLE / INDEX and DROP TABLE / INDEX. E.g. If 
 you crash after CREATE TABLE but before COMMIT, the file is leaked. But it's 
 just a waste of space, everything still works.

Well, it is different, because if you crash partway through dropping a
tablespace or database, you have inconsistent state.

 It would be  nice to fix that leak, for tables and indexes too...

I'm inclined to think that this wouldn't be a good use of resources,
at least not at the individual table/index level.  We'd surely be adding
some significant amount of overhead to normal operating paths, in order
to cover a case that really shouldn't happen in practice.

The only thing here that really bothers me is that a crash during DROP
DATABASE/TABLESPACE could leave us with a partially populated db/ts
that's still accessible through the system catalogs.  We could however
do something to ensure that the db/ts is atomically removed from use
before we start dropping individual files.  Then, if you get a crash,
there'd still be system catalog entries but they'd be pointing at
nothing, so the behavior would be clean and understandable whereas
right now it's not.

In the case of DROP TABLESPACE this seems relatively easy: drop or
rename the symlink before we start flushing individual files.
I'm not quite sure how to do it for DROP DATABASE though --- I thought
of renaming the database directory, say from 12345 to 12345.dead,
but if there are tablespaces in use then we might have a database
subdirectory in each one, so we couldn't rename them all atomically.
I guess one thing we could do is create a flag file, say
dead.dont.use, in the database's default-tablespace directory, and
make new backends check for that before being willing to start up in
that database; then make sure that removal of that file is the last
step in DROP DATABASE.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] PITR corrupts the database cluster

2013-07-24 Thread Andres Freund
On 2013-07-24 16:15:59 +0300, Heikki Linnakangas wrote:
 For DROP DATABASE, without something like incomplete actions,
 piggybacking on the commit record doesn't solve the issue of
 CHECKPOINTS
 either, because the commit record you piggybacked on could have
 committed before a checkpoint, while you still were busy deleting all
 the files.
 
 That's no different from CREATE TABLE / INDEX and DROP TABLE /
 INDEX. E.g. If you crash after CREATE TABLE but before COMMIT, the
 file is leaked. But it's just a waste of space, everything still
 works.

Imo it's not really the same. For one, spurious files left over by
CREATE TABLE are handled when assigning future relfilenodes (see
GetNewRelFileNode()), we don't do the same for databases and
tablespaces afaik.
Furthermore, I don't think there's such a big issue with DROP TABLE
unless maybe you've created the relation in the same transaction that
you're dropping it. Sure, there's the issue with a checkpoint in the
wrong place, but other than that we're going to remove the file when
replaying the commit record.

We also should never end up with an inconsistent state between catalog
and filesystem.

 It would be  nice to fix that leak, for tables and indexes too...

Agreed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Insert result does not match record count

2013-07-24 Thread Vik Fearing
On 07/22/2013 06:20 PM, Jeff Janes wrote:
 On Fri, Jul 19, 2013 at 3:20 PM, Natalie Wenz nataliew...@ebureau.com wrote:
 Hi all,

 I am moving some data from one table to another in 9.2.4, and keep seeing 
 this strange scenario:

 insert into newtable select data from oldtable where proc_date = x and 
 proc_date  y;

 INSERT 0 78551642

 select count(*) from newtable where proc_date = x and proc_date  y;
count
 ---
  4373518938
 It looks to me like the status report is 32 bits and overflowed.

 4,373,518,938 - 2^32 = 78,551,642

Attached is a small patch that should fix the problem.

Vik
*** a/src/backend/commands/createas.c
--- b/src/backend/commands/createas.c
***
*** 172,178  ExecCreateTableAs(CreateTableAsStmt *stmt, const char *queryString,
  	/* save the rowcount if we're given a completionTag to fill */
  	if (completionTag)
  		snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
!  SELECT %u, queryDesc-estate-es_processed);
  
  	/* and clean up */
  	ExecutorFinish(queryDesc);
--- 172,178 
  	/* save the rowcount if we're given a completionTag to fill */
  	if (completionTag)
  		snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
!  SELECT  UINT64_FORMAT, queryDesc-estate-es_processed);
  
  	/* and clean up */
  	ExecutorFinish(queryDesc);
*** a/src/backend/tcop/pquery.c
--- b/src/backend/tcop/pquery.c
***
*** 195,201  ProcessQuery(PlannedStmt *plan,
  		{
  			case CMD_SELECT:
  snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
! 		 SELECT %u, queryDesc-estate-es_processed);
  break;
  			case CMD_INSERT:
  if (queryDesc-estate-es_processed == 1)
--- 195,201 
  		{
  			case CMD_SELECT:
  snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
! 		 SELECT  UINT64_FORMAT, queryDesc-estate-es_processed);
  break;
  			case CMD_INSERT:
  if (queryDesc-estate-es_processed == 1)
***
*** 203,217  ProcessQuery(PlannedStmt *plan,
  else
  	lastOid = InvalidOid;
  snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
!    INSERT %u %u, lastOid, queryDesc-estate-es_processed);
  break;
  			case CMD_UPDATE:
  snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
! 		 UPDATE %u, queryDesc-estate-es_processed);
  break;
  			case CMD_DELETE:
  snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
! 		 DELETE %u, queryDesc-estate-es_processed);
  break;
  			default:
  strcpy(completionTag, ???);
--- 203,217 
  else
  	lastOid = InvalidOid;
  snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
!    INSERT %u  UINT64_FORMAT, lastOid, queryDesc-estate-es_processed);
  break;
  			case CMD_UPDATE:
  snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
! 		 UPDATE  UINT64_FORMAT, queryDesc-estate-es_processed);
  break;
  			case CMD_DELETE:
  snprintf(completionTag, COMPLETION_TAG_BUFSIZE,
! 		 DELETE  UINT64_FORMAT, queryDesc-estate-es_processed);
  break;
  			default:
  strcpy(completionTag, ???);
*** a/src/include/nodes/execnodes.h
--- b/src/include/nodes/execnodes.h
***
*** 375,381  typedef struct EState
  
  	List	   *es_rowMarks;	/* List of ExecRowMarks */
  
! 	uint32		es_processed;	/* # of tuples processed */
  	Oid			es_lastoid;		/* last oid processed (by INSERT) */
  
  	int			es_top_eflags;	/* eflags passed to ExecutorStart */
--- 375,381 
  
  	List	   *es_rowMarks;	/* List of ExecRowMarks */
  
! 	uint64		es_processed;	/* # of tuples processed */
  	Oid			es_lastoid;		/* last oid processed (by INSERT) */
  
  	int			es_top_eflags;	/* eflags passed to ExecutorStart */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] PITR corrupts the database cluster

2013-07-24 Thread Tom Lane
I wrote:
 The only thing here that really bothers me is that a crash during DROP
 DATABASE/TABLESPACE could leave us with a partially populated db/ts
 that's still accessible through the system catalogs.  ...
 I guess one thing we could do is create a flag file, say
 dead.dont.use, in the database's default-tablespace directory, and
 make new backends check for that before being willing to start up in
 that database; then make sure that removal of that file is the last
 step in DROP DATABASE.

After a bit of experimentation, it seems there's a pre-existing way that
we could do this: just remove PG_VERSION from the database's default
directory as the first filesystem action in DROP DATABASE.  If we
crash before committing, subsequent attempts to connect to that database
would fail like this:

$ psql bogus
psql: FATAL:  base/176774 is not a valid data directory
DETAIL:  File base/176774/PG_VERSION is missing.

which is probably already good enough, though maybe we could add a HINT
suggesting that the DB was incompletely dropped.

To ensure this is replayed properly on slave servers, I'd be inclined to
mechanize it by (1) changing remove_dbtablespaces to ensure that the
DB's default tablespace is the first one dropped, and (2) incorporating
remove-PG_VERSION-first into rmtree().

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [bug fix] PITR corrupts the database cluster

2013-07-24 Thread Andres Freund
On 2013-07-24 10:57:14 -0400, Tom Lane wrote:
 I wrote:
  The only thing here that really bothers me is that a crash during DROP
  DATABASE/TABLESPACE could leave us with a partially populated db/ts
  that's still accessible through the system catalogs.  ...
  I guess one thing we could do is create a flag file, say
  dead.dont.use, in the database's default-tablespace directory, and
  make new backends check for that before being willing to start up in
  that database; then make sure that removal of that file is the last
  step in DROP DATABASE.

 After a bit of experimentation, it seems there's a pre-existing way that
 we could do this: just remove PG_VERSION from the database's default
 directory as the first filesystem action in DROP DATABASE.  If we
 crash before committing, subsequent attempts to connect to that database
 would fail like this:

Neat idea, especially as it would work on the back branches without
problems.

It doesn't address MauMau's original problem of not being able to set a
correct stop point because the datadir is being removed before the
commit record (which is the only place we currently can set a
recovery_target on in a basebackup scenario) though.

Wouldn't it make more sense to simply commit the pg_database entry
removal separately from the actual removal of the datadir? DROP DATABASE
already can't run in a transaction, so that should be easy.


 $ psql bogus
 psql: FATAL:  base/176774 is not a valid data directory
 DETAIL:  File base/176774/PG_VERSION is missing.

 which is probably already good enough, though maybe we could add a HINT
 suggesting that the DB was incompletely dropped.

We could rename/fsync() the file to PG_BEING_DROPPED atomically. Then we
could give a useful error instead of an error message with a maybe hint.

That would require to remove the file last though, so I am not sure if
it's worth the bother.

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ilist.h is not useful as-is

2013-07-24 Thread Tom Lane
So I went off to implement the SPITupleTable tracking discussed in
6553.1374424...@sss.pgh.pa.us, and thought it would be cool to
use the slist infrastructure defined in lib/ilist.h rather than
creating a separate List node for each SPITupleTable struct.
However, I soon ran into a problem: there's no real support for
remove the current element of an slist while we're scanning it,
which is really the only list manipulation I need.  The only way
to remove an element is slist_delete(), which will iterate over
the list *again* and thus create an O(N^2) penalty.  Or I could
use a dlist, but two pointers per struct seem pretty silly.

So I'm going to end up hand-implementing the same kind of manipulation
we frequently use with traditional Lists, namely keep a second variable
that's the preceding list element (not the next one) so I can unlink and
delete the target element when I find it.  ilist.h is not offering me
any useful support at all for this scenario.  Seems like we're missing
a bet here.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ilist.h is not useful as-is

2013-07-24 Thread Andres Freund
On 2013-07-24 11:26:00 -0400, Tom Lane wrote:
 So I went off to implement the SPITupleTable tracking discussed in
 6553.1374424...@sss.pgh.pa.us, and thought it would be cool to
 use the slist infrastructure defined in lib/ilist.h rather than
 creating a separate List node for each SPITupleTable struct.
 However, I soon ran into a problem: there's no real support for
 remove the current element of an slist while we're scanning it,
 which is really the only list manipulation I need.  The only way
 to remove an element is slist_delete(), which will iterate over
 the list *again* and thus create an O(N^2) penalty.  Or I could
 use a dlist, but two pointers per struct seem pretty silly.

 So I'm going to end up hand-implementing the same kind of manipulation
 we frequently use with traditional Lists, namely keep a second variable
 that's the preceding list element (not the next one) so I can unlink and
 delete the target element when I find it.  ilist.h is not offering me
 any useful support at all for this scenario.  Seems like we're missing
 a bet here.

Hm. Yes. This should be added. I didn't need it so far, but I definitely
can see usecases.

slist_delete_current(slist_mutable_iter *)?

I am willing to cough up a patch if you want.

This will require another member variable in slist_mutable_iter which
obviously will need to be maintained, but that seems fine to me since it
will reduce the cost of actually deleting noticeably.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Insert result does not match record count

2013-07-24 Thread Vik Fearing
On 07/24/2013 04:04 PM, Vik Fearing wrote:
 On 07/22/2013 06:20 PM, Jeff Janes wrote:
 On Fri, Jul 19, 2013 at 3:20 PM, Natalie Wenz nataliew...@ebureau.com 
 wrote:
 Hi all,

 I am moving some data from one table to another in 9.2.4, and keep seeing 
 this strange scenario:

 insert into newtable select data from oldtable where proc_date = x and 
 proc_date  y;

 INSERT 0 78551642

 select count(*) from newtable where proc_date = x and proc_date  y;
count
 ---
  4373518938
 It looks to me like the status report is 32 bits and overflowed.

 4,373,518,938 - 2^32 = 78,551,642
 Attached is a small patch that should fix the problem.

It would seem this was not as thorough as it should have been and
there's a lot more to do.  I'm working on a better patch.

Also worth mentioning is bug #7766.
http://www.postgresql.org/message-id/e1tlli5-0007tr...@wrigleys.postgresql.org

Vik


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ilist.h is not useful as-is

2013-07-24 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-07-24 11:26:00 -0400, Tom Lane wrote:
 So I'm going to end up hand-implementing the same kind of manipulation
 we frequently use with traditional Lists, namely keep a second variable
 that's the preceding list element (not the next one) so I can unlink and
 delete the target element when I find it.  ilist.h is not offering me
 any useful support at all for this scenario.  Seems like we're missing
 a bet here.

 Hm. Yes. This should be added. I didn't need it so far, but I definitely
 can see usecases.
 slist_delete_current(slist_mutable_iter *)?
 I am willing to cough up a patch if you want.

Please.

 This will require another member variable in slist_mutable_iter which
 obviously will need to be maintained, but that seems fine to me since it
 will reduce the cost of actually deleting noticeably.

I think that's all right.  Conceivably we could introduce two forms of
iterator depending on whether you want to delete or not, but that seems
like overkill to me.

In fact, now that I think about it, the distinction between slist_iter
and slist_mutable_iter is really misstated in the comments, isn't it?
The *only* action on the list that's unsafe with an active slist_iter
is to delete the current element (and then continue iterating).
If the value of slist_mutable_iter is to support deletion of the current
element, then it seems obvious that it should support doing so
efficiently, ie it should carry both prev and next.  Also, if it's
carrying both of those, then use of slist_mutable_iter really doesn't
allow any action other than deleting the current element --- adding
new nodes ahead of the current element isn't safe.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ilist.h is not useful as-is

2013-07-24 Thread Alvaro Herrera
Andres Freund wrote:
 On 2013-07-24 11:26:00 -0400, Tom Lane wrote:

  So I'm going to end up hand-implementing the same kind of manipulation
  we frequently use with traditional Lists, namely keep a second variable
  that's the preceding list element (not the next one) so I can unlink and
  delete the target element when I find it.  ilist.h is not offering me
  any useful support at all for this scenario.  Seems like we're missing
  a bet here.
 
 Hm. Yes. This should be added. I didn't need it so far, but I definitely
 can see usecases.
 
 slist_delete_current(slist_mutable_iter *)?
 
 I am willing to cough up a patch if you want.
 
 This will require another member variable in slist_mutable_iter which
 obviously will need to be maintained, but that seems fine to me since it
 will reduce the cost of actually deleting noticeably.

Amusingly, this will mean ForgetBackgroundWorker() will need to have a
slist_mutable_iter * argument if it wants to use the new function ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ilist.h is not useful as-is

2013-07-24 Thread Andres Freund
On 2013-07-24 11:53:39 -0400, Alvaro Herrera wrote:
 Andres Freund wrote:
  On 2013-07-24 11:26:00 -0400, Tom Lane wrote:
 
   So I'm going to end up hand-implementing the same kind of manipulation
   we frequently use with traditional Lists, namely keep a second variable
   that's the preceding list element (not the next one) so I can unlink and
   delete the target element when I find it.  ilist.h is not offering me
   any useful support at all for this scenario.  Seems like we're missing
   a bet here.
  
  Hm. Yes. This should be added. I didn't need it so far, but I definitely
  can see usecases.
  
  slist_delete_current(slist_mutable_iter *)?
  
  I am willing to cough up a patch if you want.
  
  This will require another member variable in slist_mutable_iter which
  obviously will need to be maintained, but that seems fine to me since it
  will reduce the cost of actually deleting noticeably.
 
 Amusingly, this will mean ForgetBackgroundWorker() will need to have a
 slist_mutable_iter * argument if it wants to use the new function ...

The only alternative I see would be to pass both the prev, and the cur
elements which seems to be less expressive from my POV.
I am open to other suggestions?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ilist.h is not useful as-is

2013-07-24 Thread Alvaro Herrera
Andres Freund wrote:
 On 2013-07-24 11:53:39 -0400, Alvaro Herrera wrote:
  Andres Freund wrote:

  Amusingly, this will mean ForgetBackgroundWorker() will need to have a
  slist_mutable_iter * argument if it wants to use the new function ...
 
 The only alternative I see would be to pass both the prev, and the cur
 elements which seems to be less expressive from my POV.
 I am open to other suggestions?

In the grand scheme of things, I think it's fine.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] dynamic background workers, round two

2013-07-24 Thread Robert Haas
The dynamic background workers patch that I submitted for CF1 was
generally well-received, but several people commented on a significant
limitation: there's currently no way for a backend that requests a new
background worker to know whether that background worker was
successfully started.  If you're using the background worker mechanism
to run daemon processes of some sort, this is probably not a huge
problem.   You most likely don't start and stop those processes very
frequently, and when you do manually start them, you can always
examine the server log to see whether everything worked as planned.
Maybe not ideal, but not unbearably awful, either.

However, the goal I'm working towards here is parallel query, and for
that application, and some others, things don't look so good.  In that
case, you are probably launching a background worker flagged as
BGW_NEVER_RESTART, and so the postmaster is going to try to start it
just once, and if it doesn't work, you really need some way of knowing
that.  Of course, if the worker is launched successfully, you can have
it notify the process that started it via any mechanism you choose:
creating a sentinel file, inserting data into a table, setting the
process latch.  Sky's the limit.  However, if the worker isn't
launched successfully (fork fails, or the process crashes before it
reaches your code) you have no way of knowing.  If you don't receive
the agreed-upon notification from the child, it means that EITHER the
process was never started in the first place OR the postmaster just
hasn't gotten around to starting it yet.  Of course, you could wait
for a long time (5s ?) and then give up, but there's no good way to
set the wait time.  If you make it long, then it takes a long time for
you to report errors to the client even when those errors happen
quickly.  If you make it short, you may time out spuriously on a busy
system.

The attached patch attempts to remedy this problem.  When you register
a background worker, you can obtain a handle that can subsequently
be used to query for the worker's PID.  If you additionally initialize
bgw_notify_pid = MyProcPid, then the postmaster will send you SIGUSR1
when worker startup has been attempted (successfully or not).  You can
wait for this signal by passing your handle to
WaitForBackgroundWorkerStartup(), which will return only when either
(1) an attempt has been made to start the worker or (2) the postmaster
is determined to have died.  This interface seems adequate for
something like worker_spi, where it's useful to know whether the child
was started or not (and return the PID if so) but that's about all
that's really needed.

More complex notification interfaces can also be coded using the
primitives introduced by this patch.  Setting bgw_notify_pid will
cause the postmaster to send SIGUSR1 every time it starts the worker
and every time the worker dies.  Every time you receive that signal,
you can call GetBackgroundWorkerPid() for each background worker to
find out which ones have started or terminated.  The only slight
difficulty is in waiting for receipt of that signal, which I
implemented by adding a new global called set_latch_on_sigusr1.
WaitForBackgroundWorkerStartup() uses that flag internally, but
there's nothing to prevent a caller from using it as part of their own
event loop.  I find the set_latch_on_sigusr1 flag  to be slightly
ugly, but it seems better and far safer than having the postmaster try
to actually set the latch itself - for example, it'd obviously be
unacceptable to pass a Latch * to the postmaster and have the
postmaster assume that pointer valid.

I'm hopeful that this is about as much fiddling with the background
worker mechanism per se as will be needed for parallel query.  Once we
have this, I think the next hurdle will be designing suitable IPC
mechanisms, so that there's a relatively easy way for the parent
process to pass information to its background workers and visca versa.
 I expect the main tool for that to be a dynamic shared memory
facility; but I'm hoping that won't be closely tied to background
workers, though they may be heavy users of it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


bgworker-feedback-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Design proposal: fsync absorb linear slider

2013-07-24 Thread Robert Haas
On Tue, Jul 23, 2013 at 12:13 PM, Greg Smith g...@2ndquadrant.com wrote:
 On 7/23/13 10:56 AM, Robert Haas wrote:
 On Mon, Jul 22, 2013 at 11:48 PM, Greg Smith g...@2ndquadrant.com wrote:

 We know that a 1GB relation segment can take a really long time to write
 out.  That could include up to 128 changed 8K pages, and we allow all of
 them to get dirty before any are forced to disk with fsync.

 By my count, it can include up to 131,072 changed 8K pages.

 Even better!  I can pinpoint exactly what time last night I got tired enough
 to start making trivial mistakes.  Everywhere I said 128 it's actually
 131,072, which just changes the range of the GUC I proposed.

 Getting the number right really highlights just how bad the current
 situation is.  Would you expect the database to dump up to 128K writes into
 a file and then have low latency when it's flushed to disk with fsync?  Of
 course not.  But that's the job the checkpointer process is trying to do
 right now.  And it's doing it blind--it has no idea how many dirty pages
 might have accumulated before it started.

 I'm not exactly sure how best to use the information collected.  fsync every
 N writes is one approach.  Another is to use accumulated writes to predict
 how long fsync on that relation should take.  Whenever I tried to spread
 fsync calls out before, the scale of the piled up writes from backends was
 the input I really wanted available.  The segment write count gives an
 alternate way to sort the blocks too, you might start with the heaviest hit
 ones.

 In all these cases, the fundamental I keep coming back to is wanting to cue
 off past write statistics.  If you want to predict relative I/O delay times
 with any hope of accuracy, you have to start the checkpoint knowing
 something about the backend and background writer activity since the last
 one.

So, I don't think this is a bad idea; in fact, I think it'd be a good
thing to explore.  The hard part is likely to be convincing ourselves
of anything about how well or poorly it works on arbitrary hardware
under arbitrary workloads, but we've got to keep trying things until
we find something that works well, so why not this?

One general observation is that there are two bad things that happen
when we checkpoint.  One is that we force all of the data in RAM out
to disk, and the other is that we start doing lots of FPIs.  Both of
these things harm throughput.  Your proposal allows the user to make
the first of those behaviors more frequent without making the second
one more frequent.  That idea seems promising, and it also seems to
admit of many variations.  For example, instead of issuing an fsync
when after N OS writes to a particular file, we could fsync the file
with the most writes every K seconds.  That way, if the system has
busy and idle periods, we'll effectively catch up on our fsyncs when
the system isn't that busy, and we won't bunch them up too much if
there's a sudden surge of activity.

Now that's just a shot in the dark and there might be reasons why it's
terrible, but I just generally offer it as food for thought that the
triggering event for the extra fsyncs could be chosen via a multitude
of different algorithms, and as you hack through this it might be
worth trying a few different possibilities.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-24 Thread Robert Haas
On Tue, Jul 23, 2013 at 9:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If it weren't that we've been speculating for years about deprecating
 SRFs-in-tlists once we had LATERAL, I would personally consider this
 patch DOA in this form.  If we do think we'll probably deprecate that
 feature, then not extending WITH ORDINALITY to such cases is at least
 defensible.  On the other hand, considering that we've yet to ship a
 release supporting LATERAL, it's probably premature to commit to such
 deprecation --- we don't really know whether people will find LATERAL
 to be a convenient and performant substitute.

I guess I'd sort of assumed that the plan was to continue accepting
SRFs in tlists but rewrite them as lateral joins, rather than getting
rid of them altogether.  IIUC that would simplify some things inside
the executor.   I'd be a bit more reluctant to just ban SRFs in target
lists outright.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-24 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 23, 2013 at 9:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If it weren't that we've been speculating for years about deprecating
 SRFs-in-tlists once we had LATERAL, I would personally consider this
 patch DOA in this form.

 I guess I'd sort of assumed that the plan was to continue accepting
 SRFs in tlists but rewrite them as lateral joins, rather than getting
 rid of them altogether.

That seems to me to be unlikely to happen, because it would be
impossible to preserve the current (admittedly bad) semantics.
If we're going to change the behavior at all we might as well just
drop the feature, IMO.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-24 Thread Robert Haas
On Fri, Jul 19, 2013 at 1:50 PM, Greg Stark st...@mit.edu wrote:
 My only reservation with this patch is whether the WITH_ORDINALITY
 parser hack is the way we want to go. The precedent was already set
 with WITH TIME ZONE though and I think this was the best option.

I share this reservation.  Lexer hacks are reasonable ways of getting
LALR(2)-ish behavior in very simple cases, but it doesn't take much to
get into trouble.  I think the with ordinality as (select 1) select *
from ordinality example you posted is precisely on point.  Currently,
we will have four classes of keywords: unreserved, column-name,
type-function, and reserved.  There are rules for when each of those
types of keywords needs to be quoted, and those rules are relatively
well-understood.

This patch will introduce, without documentation, a fifth class of
keyword.  ORDINALITY will need to be quoted when, and only when, it
immediately follows WITH.  Without some change to our deparsing code,
this is a dump/restore hazard; and with some such change it's still
probably not a good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-24 Thread Robert Haas
On Wed, Jul 24, 2013 at 1:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 23, 2013 at 9:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 If it weren't that we've been speculating for years about deprecating
 SRFs-in-tlists once we had LATERAL, I would personally consider this
 patch DOA in this form.

 I guess I'd sort of assumed that the plan was to continue accepting
 SRFs in tlists but rewrite them as lateral joins, rather than getting
 rid of them altogether.

 That seems to me to be unlikely to happen, because it would be
 impossible to preserve the current (admittedly bad) semantics.
 If we're going to change the behavior at all we might as well just
 drop the feature, IMO.

Maybe.  I'd be kind of sad to lose some of the simple cases that work
now, like SELECT srf(), in favor of having to write SELECT * FROM
srf().  I'd probably get over it, but I'm sure a lot of people would
be mildly annoyed at having to change their working application code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-24 Thread Greg Stark
On Wed, Jul 24, 2013 at 6:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 That seems to me to be unlikely to happen, because it would be
 impossible to preserve the current (admittedly bad) semantics.
 If we're going to change the behavior at all we might as well just
 drop the feature, IMO.

It would be nice to support a single SRF in the target list. That
would side-step the bad semantics and also make it easier to
implement. But I'm not sure how easy it would be in practice because
I've learned not to underestimate the difficulty of making seemingly
small changes to the planner.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Insert result does not match record count

2013-07-24 Thread Tom Lane
Vik Fearing vik.fear...@dalibo.com writes:
 Also worth mentioning is bug #7766.
 http://www.postgresql.org/message-id/e1tlli5-0007tr...@wrigleys.postgresql.org

Yeah, did you read that whole thread?  The real issue here is going to
be whether client-side code falls over on wider-than-32-bit counts.
We can fix the backend and be pretty sure that we've found all the
relevant places inside it, but we'll just be exporting the issue.

I did look at libpq and noted that it doesn't seem to have any internal
problem, because it returns the count to callers as a string (!).
But what do you think are the odds that callers are using code that
won't overflow?  I'd bet on finding atoi() or suchlike in a lot of
callers.  Even if they thought to use strtoul(), unsigned long is
not necessarily 64 bits wide.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add support of IF NOT EXISTS to others CREATE statements

2013-07-24 Thread Karol Trzcionka
Hello,
patch works fine but is there any reason to comparing each ifNotExists
in different way?
i.e.
ProcedureCreate
if (!ifNotExists)
...
else
{
...
return
}

TypeCreate
if (ifNotExists)
{
...
return
}
...
---
Shouldn't it be more consistent?
Regards,
Karol


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-24 Thread Greg Stark
On Wed, Jul 24, 2013 at 6:39 PM, Robert Haas robertmh...@gmail.com wrote:
 This patch will introduce, without documentation, a fifth class of
 keyword.  ORDINALITY will need to be quoted when, and only when, it
 immediately follows WITH.  Without some change to our deparsing code,
 this is a dump/restore hazard; and with some such change it's still
 probably not a good idea.

Strictly speaking this patc doesn't introduce this fifth class of
keyword. We already had TIME in that category (and also FIRST and LAST
in a similar category following NULLS). If we have a solution for WITH
keyword then presumably we would implement it for WITH TIME and WITH
ORDINALITY at the same time.

In the interim I suppose we could teach pg_dump to quote any keyword
that follows WITH or NULLS pretty easily. Or just quote those four
words unconditionally.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding Zigzag Merge Join to Index Nested Loops Join

2013-07-24 Thread Robert Haas
On Tue, Jul 23, 2013 at 3:40 PM, tubadzin tubad...@o2.pl wrote:
 I want add Zigzag Merge join to Index Nested Loops Join alghoritm.
 http://www.cs.berkeley.edu/~fox/summaries/database/query_eval_5-8.html
 Which files are responsible for Index nested loops join ? (not simple nested
 loops join which has double foreach in nodeNestloop.c) nodeIndexscan.c?
 nodeIndexonlyscan.c?
 Best Regards

nodeNestloop.c handles all execution of all nested loops, regardless
of whether there's an index scan involved.  Of course, if there is an
index scan involved, then nodeIndexscan.c will also be involved; if
there is an index-only scan, then nodeIndexonlyscan.c, and so on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ilist.h is not useful as-is

2013-07-24 Thread Andres Freund
Hi,

On 2013-07-24 11:49:20 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  This will require another member variable in slist_mutable_iter which
  obviously will need to be maintained, but that seems fine to me since it
  will reduce the cost of actually deleting noticeably.
 
 I think that's all right.  Conceivably we could introduce two forms of
 iterator depending on whether you want to delete or not, but that seems
 like overkill to me.

Agreed. Especially as you point out there's no real point to
mutable_iter as committed.

 In fact, now that I think about it, the distinction between slist_iter
 and slist_mutable_iter is really misstated in the comments, isn't it?
 The *only* action on the list that's unsafe with an active slist_iter
 is to delete the current element (and then continue iterating).
 If the value of slist_mutable_iter is to support deletion of the current
 element, then it seems obvious that it should support doing so
 efficiently, ie it should carry both prev and next.  Also, if it's
 carrying both of those, then use of slist_mutable_iter really doesn't
 allow any action other than deleting the current element --- adding
 new nodes ahead of the current element isn't safe.

True. I think I mentally copypasted to much logic from the doubly
linked list case.

The first attached patch adds slist_delete_current(), updates the
comments addressing your points and converts the bgworker code to pass
the iterator around (it's more efficient which might actually matter
with a few hundred bgworkers).
I found the added newlines in slist_foreach_modify useful, but maybe they
should be removed again.

I think this should be included in 9.3 once reviewed.

The second patch adds a regression test for background workers via
worker_spi which I used to test slist_delete_current() addition. It's not 100% 
as
it, but I thought it worthwile to post it anyway
a) only tests dynamically registered workers, it should start it's own
   regression test starting some bgworkers statically
b) disables installcheck harshly causing a warning from make:
/home/andres/src/postgresql/src/makefiles/pgxs.mk:297: warning: ignoring old 
commands for target `installcheck'
Manually defining a pg_regress in 'check:' should fix it probably
because that part of pgxs.mk is dependant on REGRESS = being set.
c) it only tests BGW_NEVER_RESTART

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 0dfa542f457ebd71640830a993e3a428720b4179 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Wed, 24 Jul 2013 19:34:10 +0200
Subject: [PATCH 1/2] Add slist_delete_current()

---
 src/backend/lib/ilist.c |  2 +-
 src/backend/postmaster/bgworker.c   |  5 -
 src/backend/postmaster/postmaster.c |  4 ++--
 src/include/lib/ilist.h | 32 +
 src/include/postmaster/bgworker_internals.h |  2 +-
 5 files changed, 36 insertions(+), 9 deletions(-)

diff --git a/src/backend/lib/ilist.c b/src/backend/lib/ilist.c
index 957a118..dddb6eb 100644
--- a/src/backend/lib/ilist.c
+++ b/src/backend/lib/ilist.c
@@ -28,7 +28,7 @@
  *
  * It is not allowed to delete a 'node' which is is not in the list 'head'
  *
- * Caution: this is O(n)
+ * Caution: this is O(n), use slist_delete_current() while iterating.
  */
 void
 slist_delete(slist_head *head, slist_node *node)
diff --git a/src/backend/postmaster/bgworker.c b/src/backend/postmaster/bgworker.c
index ada24e9..9967b69 100644
--- a/src/backend/postmaster/bgworker.c
+++ b/src/backend/postmaster/bgworker.c
@@ -272,10 +272,13 @@ BackgroundWorkerStateChange(void)
  * the postmaster.
  */
 void
-ForgetBackgroundWorker(RegisteredBgWorker *rw)
+ForgetBackgroundWorker(slist_mutable_iter *cur)
 {
+	RegisteredBgWorker *rw;
 	BackgroundWorkerSlot *slot;
 
+	rw = slist_container(RegisteredBgWorker, rw_lnode, cur-cur);
+
 	Assert(rw-rw_shmem_slot  max_worker_processes);
 	slot = BackgroundWorkerData-slot[rw-rw_shmem_slot];
 	slot-in_use = false;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index b6b8111..0342be7 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1452,7 +1452,7 @@ DetermineSleepTime(struct timeval * timeout)
 
 			if (rw-rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
 			{
-ForgetBackgroundWorker(rw);
+ForgetBackgroundWorker(siter);
 continue;
 			}
 
@@ -5643,7 +5643,7 @@ StartOneBackgroundWorker(void)
 		{
 			if (rw-rw_worker.bgw_restart_time == BGW_NEVER_RESTART)
 			{
-ForgetBackgroundWorker(rw);
+ForgetBackgroundWorker(iter);
 continue;
 			}
 
diff --git a/src/include/lib/ilist.h b/src/include/lib/ilist.h
index 73f4115..f8f0458 100644
--- a/src/include/lib/ilist.h
+++ b/src/include/lib/ilist.h
@@ -211,7 +211,9 @@ typedef struct slist_head
  * Used as state in 

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-24 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 On Wed, Jul 24, 2013 at 6:39 PM, Robert Haas robertmh...@gmail.com wrote:
 This patch will introduce, without documentation, a fifth class of
 keyword.  ORDINALITY will need to be quoted when, and only when, it
 immediately follows WITH.  Without some change to our deparsing code,
 this is a dump/restore hazard; and with some such change it's still
 probably not a good idea.

 Strictly speaking this patc doesn't introduce this fifth class of
 keyword. We already had TIME in that category (and also FIRST and LAST
 in a similar category following NULLS). If we have a solution for WITH
 keyword then presumably we would implement it for WITH TIME and WITH
 ORDINALITY at the same time.

It strikes me that we could hack the grammar for CTEs so that it allows
WITH_ORDINALITY and WITH_TIME as the first token, with appropriate
insertion of the proper identifier value.  I'm not sure if there are any
other places where WITH can be followed by a random identifier.

I don't see any workable fix that doesn't involve the funny token, though.
Consider

CREATE VIEW v AS SELECT ... FROM UNNEST(...) WITH ORDINALITY;
CREATE VIEW v AS SELECT ... FROM UNNEST(...) WITH NO DATA;

WITH ORDINALITY really needs to be parsed as part of the FROM clause.
WITH NO DATA really needs to *not* be parsed as part of the FROM clause.
Without looking ahead more than one token, there is absolutely no way
for the grammar to decide if it's got the whole FROM clause or not
at the point where WITH is the next token.  So our choices are to have
two-token lookahead at the lexer level, or to give up on bison and find
something that can implement a parsing algorithm better than LALR(1).
I know which one seems more likely to get done in the foreseeable future.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-24 Thread Andres Freund
On 2013-07-24 13:36:39 -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Tue, Jul 23, 2013 at 9:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  If it weren't that we've been speculating for years about deprecating
  SRFs-in-tlists once we had LATERAL, I would personally consider this
  patch DOA in this form.
 
  I guess I'd sort of assumed that the plan was to continue accepting
  SRFs in tlists but rewrite them as lateral joins, rather than getting
  rid of them altogether.
 
 That seems to me to be unlikely to happen, because it would be
 impossible to preserve the current (admittedly bad) semantics.
 If we're going to change the behavior at all we might as well just
 drop the feature, IMO.

I think removing the feature will be a rather painful procedure for
users and thus will need a rather long deprecation period. The amount of
code using SRFs in targetlists is quite huge if my experience is
anything to go by.
And much of that can trivially/centrally be rewritten to LATERAL, not
to speak of the cross-version compatibility problem.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Insert result does not match record count

2013-07-24 Thread Andres Freund
On 2013-07-24 13:48:23 -0400, Tom Lane wrote:
 Vik Fearing vik.fear...@dalibo.com writes:
  Also worth mentioning is bug #7766.
  http://www.postgresql.org/message-id/e1tlli5-0007tr...@wrigleys.postgresql.org
 
 Yeah, did you read that whole thread?  The real issue here is going to
 be whether client-side code falls over on wider-than-32-bit counts.
 We can fix the backend and be pretty sure that we've found all the
 relevant places inside it, but we'll just be exporting the issue.

 I did look at libpq and noted that it doesn't seem to have any internal
 problem, because it returns the count to callers as a string (!).
 But what do you think are the odds that callers are using code that
 won't overflow?  I'd bet on finding atoi() or suchlike in a lot of
 callers.  Even if they thought to use strtoul(), unsigned long is
 not necessarily 64 bits wide.

Application code that relies on the values already has problems though
since the returned values are pretty bogus now. Including the fact that
it can return 0 as the number of modified rows which is checked for more
frequently than the actual number IME...
So I think client code that uses simplistic stuff like atoi isn't worse
off afterwards since the values will be about as bogus. I am more
worried about code that does range checks like java's string conversion
routines...

I think fixing this for 9.4 is fine, but due to the compat issues I
think it's to late for 9.3.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ilist.h is not useful as-is

2013-07-24 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 The first attached patch adds slist_delete_current(), updates the
 comments addressing your points and converts the bgworker code to pass
 the iterator around (it's more efficient which might actually matter
 with a few hundred bgworkers).
 I found the added newlines in slist_foreach_modify useful, but maybe they
 should be removed again.

 I think this should be included in 9.3 once reviewed.

Agreed; since we have not shipped ilist.h in a release yet, the benefits
of having it behave the same in all branches should outweigh any pain
from changing this post-beta.

 The second patch adds a regression test for background workers via
 worker_spi which I used to test slist_delete_current() addition. It's not 
 100% as
 it, but I thought it worthwile to post it anyway

Hm.  I'll review and commit the ilist changes, but Alvaro or somebody
should decide if such a test is sensible.  I'd be a bit worried about
it in a make installcheck context ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GENERAL] Insert result does not match record count

2013-07-24 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 I think fixing this for 9.4 is fine, but due to the compat issues I
 think it's to late for 9.3.

Agreed -- this is effectively a protocol change, albeit a pretty minor
one, so I can't see back-patching it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ilist.h is not useful as-is

2013-07-24 Thread Andres Freund
 Andres Freund and...@2ndquadrant.com writes:
  The first attached patch adds slist_delete_current(), updates the
  comments addressing your points and converts the bgworker code to pass
  the iterator around (it's more efficient which might actually matter
  with a few hundred bgworkers).
  I found the added newlines in slist_foreach_modify useful, but maybe they
  should be removed again.
 
  I think this should be included in 9.3 once reviewed.
 
 Agreed; since we have not shipped ilist.h in a release yet, the benefits
 of having it behave the same in all branches should outweigh any pain
 from changing this post-beta.

Especially as it shouldn't break any existing working code since the old
API is still there. Obviously it breaks the ABI, but ...

On 2013-07-24 14:29:42 -0400, Tom Lane wrote:
  The second patch adds a regression test for background workers via
  worker_spi which I used to test slist_delete_current() addition. It's not 
  100% as
  it, but I thought it worthwile to post it anyway
 
 Hm.  I'll review and commit the ilist changes, but Alvaro or somebody
 should decide if such a test is sensible.  I'd be a bit worried about
 it in a make installcheck context ...

I've disabled installcheck for that reason. Is there any way to override
a makefile target in gnu make without a warning?

If we want coverage for statically registered workers - which seems like
a good idea, we need our own postgresql.conf stanza and thus a manual
pg_regress invocation anyway. Which should fix that error.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic background workers, round two

2013-07-24 Thread Andres Freund
Hi,

On 2013-07-24 12:46:21 -0400, Robert Haas wrote:
 The attached patch attempts to remedy this problem.  When you register
 a background worker, you can obtain a handle that can subsequently
 be used to query for the worker's PID.  If you additionally initialize
 bgw_notify_pid = MyProcPid, then the postmaster will send you SIGUSR1
 when worker startup has been attempted (successfully or not).  You can
 wait for this signal by passing your handle to
 WaitForBackgroundWorkerStartup(), which will return only when either
 (1) an attempt has been made to start the worker or (2) the postmaster
 is determined to have died.  This interface seems adequate for
 something like worker_spi, where it's useful to know whether the child
 was started or not (and return the PID if so) but that's about all
 that's really needed.

This seems like a sensible idea to me. But, in the context of dynamic
query, don't we also need the reverse infrastructure of notifying a
bgworker that the client, that requested it to be started, has died?
Ending up with a dozen bgworkers running because the normal backend
FATALed doesn't seem nice.

 More complex notification interfaces can also be coded using the
 primitives introduced by this patch.  Setting bgw_notify_pid will
 cause the postmaster to send SIGUSR1 every time it starts the worker
 and every time the worker dies.  Every time you receive that signal,
 you can call GetBackgroundWorkerPid() for each background worker to
 find out which ones have started or terminated.  The only slight
 difficulty is in waiting for receipt of that signal, which I
 implemented by adding a new global called set_latch_on_sigusr1.
 WaitForBackgroundWorkerStartup() uses that flag internally, but
 there's nothing to prevent a caller from using it as part of their own
 event loop.  I find the set_latch_on_sigusr1 flag  to be slightly
 ugly, but it seems better and far safer than having the postmaster try
 to actually set the latch itself - for example, it'd obviously be
 unacceptable to pass a Latch * to the postmaster and have the
 postmaster assume that pointer valid.

I am with you with finding it somewhat ugly.


 --- a/contrib/worker_spi/worker_spi.c
 +++ b/contrib/worker_spi/worker_spi.c

Btw, I've posted a minimal regression test for bworkers/worker_spi in
20130724175742.gd10...@alap2.anarazel.de - I'd like to see some coverage
of this...

 +int bgw_notify_pid;

It really should be pid_t even though we're not consistently doing that
in the code.



 diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
 index f7ebd1a..6414291 100644
 --- a/src/backend/commands/async.c
 +++ b/src/backend/commands/async.c
 @@ -207,8 +207,6 @@ typedef struct QueueBackendStatus
   QueuePosition pos;  /* backend has read queue up to 
 here */
  } QueueBackendStatus;
  

 -#define InvalidPid   (-1)
 -

 diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h
 index edced29..0aa540a 100644
 --- a/src/include/miscadmin.h
 +++ b/src/include/miscadmin.h
 @@ -28,6 +28,8 @@
  
  #define PG_BACKEND_VERSIONSTR postgres (PostgreSQL)  PG_VERSION \n
  
 +#define InvalidPid   (-1)
 +

That value strikes me as a rather bad idea. As a pid that represents
init's processgroup. I.e. kill(InvalidPid) will try to kill init... I'd
rather not spread that outside async.c.

 +/*
 + * Report the PID of a newly-launched background worker in shared memory.
 + *
 + * This function should only be called from the postmaster.
 + */
 +void
 +ReportBackgroundWorkerPID(RegisteredBgWorker *rw)
 +{
 + BackgroundWorkerSlot *slot;
 +
 + Assert(rw-rw_shmem_slot  max_worker_processes);
 + slot = BackgroundWorkerData-slot[rw-rw_shmem_slot];
 + slot-pid = rw-rw_pid;
 +
 + if (rw-rw_worker.bgw_notify_pid != 0)
 + kill(rw-rw_worker.bgw_notify_pid, SIGUSR1);
 +}

Any reason not to use SendProcSignal() properly here? Just that you
don't know the BackendId? I seems unclean to reuse SIGUSR1 without using
the infrastructure built for that.

I first thought you didn't want to do it because you it touches shmem,
but given that ReportBackgroundWorkerPID() does so itself...

 + * If handle != NULL, we'll set *handle to a pointer that can subsequently
 + * be used as an argument to GetBackgroundWorkerPid().  The caller can
 + * free this pointer using pfree(), if desired.
   */
  bool
 -RegisterDynamicBackgroundWorker(BackgroundWorker *worker)
 +RegisterDynamicBackgroundWorker(BackgroundWorker *worker,
 +
 BackgroundWorkerHandle **handle)

I'd just let the caller provide the memory, but whatever ;)

 +/*
 + * Wait for a background worker to start up.
 + *
 + * The return value is the same as for GetBackgroundWorkerPID, except that
 + * we only return InvalidPid if the postmaster has died (and therefore we
 + * know that the worker will never be started).
 + */
 +int
 +WaitForBackgroundWorkerStartup(BackgroundWorkerHandle *handle)
 

Re: [HACKERS] Review: UNNEST (and other functions) WITH ORDINALITY

2013-07-24 Thread Andrew Gierth
Tom Lane said:
 If we did it with a WithOrdinality expression node, the result would
 always be of type RECORD, and we'd have to use blessed tuple
 descriptors to keep track of exactly which record type a particular
 instance emits.  This is certainly do-able (see RowExpr for
 precedent).

Maybe RowExpr is a precedent for something, but it has this
long-standing problem that makes it very hard to use usefully:

postgres=# select (r).* from (select row(a,b) as r from (values (1,2)) v(a,b)) 
s;
ERROR:  record type has not been registered

 It seems way too short on comments. [...]

This can certainly be addressed.

 but it sure looks like it flat out removed several existing
 regression-test cases

Here's why, in rangefuncs.sql:

--invokes ExecReScanFunctionScan
SELECT * FROM foorescan f WHERE f.fooid IN (SELECT fooid FROM 
foorescan(5002,5004)) ORDER BY 1,2;

I don't think that has invoked ExecReScanFunctionScan since 7.4 or so.
It certainly does not do so now (confirmed by gdb as well as by the
query plan). By all means keep the old tests if you want a
never-remove-tests-for-any-reason policy, but having added tests that
actually _do_ invoke ExecReScanFunctionScan, I figured the old ones
were redundant.  (Also, these kinds of tests can be done a bit better
now with values and lateral rather than creating and dropping tables
just for the one test.)

 and a few existing comments as well.

I've double-checked, and I don't see any existing comments removed.

 FWIW, I concur with the gripe I remember seeing upthread that the
 default name of the added column ought not be ?column?.

This seems to be a common complaint, but gives rise to two questions:

1) what should the name be?

2) should users be depending on it?

I've yet to find another db that actually documents a specific column
name for the ordinality column; it's always taken for granted that the
user should always be supplying an alias. (Admittedly there are not
many dbs that support it at all; DB2 does, and I believe Teradata.)

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ilist.h is not useful as-is

2013-07-24 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 The first attached patch adds slist_delete_current(), updates the
 comments addressing your points and converts the bgworker code to pass
 the iterator around (it's more efficient which might actually matter
 with a few hundred bgworkers).

Applied with fixes.  The slist_delete_current() logic wouldn't actually
work for deleting multiple adjacent list elements, because the foreach
macro would advance prev to point at the just-deleted element.  Compare
the places where we use list_delete_cell() in a loop --- we're careful
to advance prev only when we *don't* delete the current element.
I dealt with that by making slist_delete_current() back up the
iterator's cur pointer to equal prev, so that the loop macro's next
copying of cur to prev is a no-op.  This means callers can't rely on
cur anymore after the deletion, but in typical usage they'd have
computed a pointer to their struct already so this shouldn't be a
problem.  Another fine point is you can't use both slist_delete and
slist_delete_current within the same loop, since the former wouldn't
apply this correction.

Also, I fixed the slist_foreach_modify macro to not doubly evaluate
its lhead argument, since I think we agreed that was a bad idea,
and did some more work on the comments.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] dynamic background workers, round two

2013-07-24 Thread Michael Paquier
On Thu, Jul 25, 2013 at 6:34 AM, Andres Freund and...@2ndquadrant.comwrote:

 Hi,

 On 2013-07-24 12:46:21 -0400, Robert Haas wrote:
  The attached patch attempts to remedy this problem.  When you register
  a background worker, you can obtain a handle that can subsequently
  be used to query for the worker's PID.  If you additionally initialize
  bgw_notify_pid = MyProcPid, then the postmaster will send you SIGUSR1
  when worker startup has been attempted (successfully or not).  You can
  wait for this signal by passing your handle to
  WaitForBackgroundWorkerStartup(), which will return only when either
  (1) an attempt has been made to start the worker or (2) the postmaster
  is determined to have died.  This interface seems adequate for
  something like worker_spi, where it's useful to know whether the child
  was started or not (and return the PID if so) but that's about all
  that's really needed.

 This seems like a sensible idea to me. But, in the context of dynamic
 query, don't we also need the reverse infrastructure of notifying a
 bgworker that the client, that requested it to be started, has died?
 Ending up with a dozen bgworkers running because the normal backend
 FATALed doesn't seem nice.

Yes, this would be something necessary to have, but definitely it should be
a separate patch. In order to satisfy that, you could register for each
worker the PID of the parent process that started it, if it was started by
an an other bgworker, and have postmaster scan the list of bgworkers that
need to be stopped after the death of their parent if it was a bgworker. By
the way, the PID registered in this case should not be bgw_notify_pid but
something saved in BackgroundWorkerSlot. My 2c on that though, feel free to
comment.

 More complex notification interfaces can also be coded using the
  primitives introduced by this patch.  Setting bgw_notify_pid will
  cause the postmaster to send SIGUSR1 every time it starts the worker
  and every time the worker dies.  Every time you receive that signal,
  you can call GetBackgroundWorkerPid() for each background worker to
  find out which ones have started or terminated.  The only slight
  difficulty is in waiting for receipt of that signal, which I
  implemented by adding a new global called set_latch_on_sigusr1.
  WaitForBackgroundWorkerStartup() uses that flag internally, but
  there's nothing to prevent a caller from using it as part of their own
  event loop.  I find the set_latch_on_sigusr1 flag  to be slightly
  ugly, but it seems better and far safer than having the postmaster try
  to actually set the latch itself - for example, it'd obviously be
  unacceptable to pass a Latch * to the postmaster and have the
  postmaster assume that pointer valid.

 I am with you with finding it somewhat ugly.

After a quick look at the code, yeah using a boolean here looks like an
overkill.
-- 
Michael


Re: [HACKERS] dynamic background workers, round two

2013-07-24 Thread Michael Paquier
On Thu, Jul 25, 2013 at 6:34 AM, Andres Freund and...@2ndquadrant.comwrote:

  --- a/contrib/worker_spi/worker_spi.c
  +++ b/contrib/worker_spi/worker_spi.c

 Btw, I've posted a minimal regression test for bworkers/worker_spi in
 20130724175742.gd10...@alap2.anarazel.de - I'd like to see some coverage
 of this...

I could not find this email ID in the archives. A link perhaps?
-- 
Michael


Re: [HACKERS] dynamic background workers, round two

2013-07-24 Thread Andres Freund
On 2013-07-25 08:03:17 +0900, Michael Paquier wrote:
 On Thu, Jul 25, 2013 at 6:34 AM, Andres Freund and...@2ndquadrant.comwrote:
 
   --- a/contrib/worker_spi/worker_spi.c
   +++ b/contrib/worker_spi/worker_spi.c
 
  Btw, I've posted a minimal regression test for bworkers/worker_spi in
  20130724175742.gd10...@alap2.anarazel.de - I'd like to see some coverage
  of this...
 
 I could not find this email ID in the archives. A link perhaps?

Hm. Works for me:
http://www.postgresql.org/message-id/20130724175742.gd10...@alap2.anarazel.de

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: proposal 9.4 plpgsql: allows access to call stack from GET DIAGNOSTICS statement

2013-07-24 Thread Stephen Frost
All,

* Rushabh Lathia (rushabh.lat...@gmail.com) wrote:
 Latest patch looks good to me.

I've committed this, with some pretty serious clean-up.  In the future,
please don't simply copy/paste code w/o any updates to the comments
included, particularly when the comments no longer make sense in the new
place.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Preventing tuple-table leakage in plpgsql

2013-07-24 Thread Tom Lane
I wrote:
 Hmm ... good point.  The other plan I'd been considering was to add
 explicit tracking inside spi.c of all tuple tables created within the
 current procedure, and then have AtEOSubXact_SPI flush any that were
 created inside a failed subxact.  The tables would still be children of
 the procCxt and thus could not be leaked past SPI_finish.  When you
 suggested attaching to subtransaction contexts I thought that would let
 us get away without this additional bookkeeping logic, but maybe we
 should bite the bullet and add the extra logic.  A change that's meant
 to remove leak risks really shouldn't be introducing other, new leak
 risks.  (An additional advantage is we could detect attempts to free
 the same tuptable more than once, which would be a good thing ...)

Here's a draft cut at that.  I've checked that this fixes the reported
memory leak.  This uses ilist.h, so it couldn't easily be backported
to before 9.3, but we weren't going to do that anyway.

Worth noting is that SPI_freetuptable() is now much more picky about what
it's being passed: it won't free anything that is not a tuptable of the
currently connected SPI procedure.  This doesn't appear to be a problem
for anything in core or contrib, but I wonder whether anyone thinks we
need to relax that?  If so, we could allow it to search down the SPI
context stack, but I'm not sure I see a use-case for allowing an inner
SPI procedure to free a tuple table made by an outer one.  In general,
I think this version of SPI_freetuptable() is a lot safer than what
we had before, and possibly likely to find bugs in calling code.

Another point worth making is that this version of the patch deletes the
tuple tables during AtEOSubXact_SPI(), earlier in cleanup than would
happen with the prior version.  That increases the risk that external
code might try to delete an already-deleted tuple table, if it tries
to call SPI_freetuptable() during subxact cleanup.  The new code won't
crash, although come to think of it it will probably throw an error
because you're not connected anymore.  (Maybe this is a reason to not
insist on being connected, but just silently search whatever the top
stack context is?)

This still lacks doc changes, too.

regards, tom lane

diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index f30b2cd93330d72e0a412d6517aec08c96cc753e..f179922dd4bd4eb36212ebd91413ba28de44b2fc 100644
*** a/src/backend/executor/spi.c
--- b/src/backend/executor/spi.c
*** SPI_connect(void)
*** 126,131 
--- 126,132 
  	_SPI_current-processed = 0;
  	_SPI_current-lastoid = InvalidOid;
  	_SPI_current-tuptable = NULL;
+ 	slist_init(_SPI_current-tuptables);
  	_SPI_current-procCxt = NULL;		/* in case we fail to create 'em */
  	_SPI_current-execCxt = NULL;
  	_SPI_current-connectSubid = GetCurrentSubTransactionId();
*** SPI_finish(void)
*** 166,172 
  	/* Restore memory context as it was before procedure call */
  	MemoryContextSwitchTo(_SPI_current-savedcxt);
  
! 	/* Release memory used in procedure call */
  	MemoryContextDelete(_SPI_current-execCxt);
  	_SPI_current-execCxt = NULL;
  	MemoryContextDelete(_SPI_current-procCxt);
--- 167,173 
  	/* Restore memory context as it was before procedure call */
  	MemoryContextSwitchTo(_SPI_current-savedcxt);
  
! 	/* Release memory used in procedure call (including tuptables) */
  	MemoryContextDelete(_SPI_current-execCxt);
  	_SPI_current-execCxt = NULL;
  	MemoryContextDelete(_SPI_current-procCxt);
*** AtEOSubXact_SPI(bool isCommit, SubTransa
*** 282,292 
  	 */
  	if (_SPI_current  !isCommit)
  	{
  		/* free Executor memory the same as _SPI_end_call would do */
  		MemoryContextResetAndDeleteChildren(_SPI_current-execCxt);
! 		/* throw away any partially created tuple-table */
! 		SPI_freetuptable(_SPI_current-tuptable);
! 		_SPI_current-tuptable = NULL;
  	}
  }
  
--- 283,316 
  	 */
  	if (_SPI_current  !isCommit)
  	{
+ 		slist_mutable_iter siter;
+ 
  		/* free Executor memory the same as _SPI_end_call would do */
  		MemoryContextResetAndDeleteChildren(_SPI_current-execCxt);
! 
! 		/* throw away any tuple tables created within current subxact */
! 		slist_foreach_modify(siter, _SPI_current-tuptables)
! 		{
! 			SPITupleTable *tuptable;
! 
! 			tuptable = slist_container(SPITupleTable, next, siter.cur);
! 			if (tuptable-subid = mySubid)
! 			{
! /*
!  * If we used SPI_freetuptable() here, its internal search of
!  * the tuptables list would make this operation O(N^2).
!  * Instead, just free the tuptable manually.
!  */
! slist_delete_current(siter);
! if (tuptable == _SPI_current-tuptable)
! 	_SPI_current-tuptable = NULL;
! if (tuptable == SPI_tuptable)
! 	SPI_tuptable = NULL;
! MemoryContextDelete(tuptable-tuptabcxt);
! 			}
! 		}
! 		/* in particular we should have gotten rid of any in-progress table */
! 		Assert(_SPI_current-tuptable == NULL);
  	}
  }
  

Re: [HACKERS] dynamic background workers, round two

2013-07-24 Thread Michael Paquier
On Thu, Jul 25, 2013 at 8:05 AM, Andres Freund and...@2ndquadrant.comwrote:

 On 2013-07-25 08:03:17 +0900, Michael Paquier wrote:
  On Thu, Jul 25, 2013 at 6:34 AM, Andres Freund and...@2ndquadrant.com
 wrote:
 
--- a/contrib/worker_spi/worker_spi.c
+++ b/contrib/worker_spi/worker_spi.c
  
   Btw, I've posted a minimal regression test for bworkers/worker_spi in
   20130724175742.gd10...@alap2.anarazel.de - I'd like to see some
 coverage
   of this...
  
  I could not find this email ID in the archives. A link perhaps?

 Hm. Works for me:

 http://www.postgresql.org/message-id/20130724175742.gd10...@alap2.anarazel.de

Oops. Thanks, the search field does not like if spaces are added with the
ID.
-- 
Michael


[HACKERS] Re: proposal 9.4 plpgsql: allows access to call stack from GET DIAGNOSTICS statement

2013-07-24 Thread Pavel Stehule
Hello

2013/7/25 Stephen Frost sfr...@snowman.net:
 All,

 * Rushabh Lathia (rushabh.lat...@gmail.com) wrote:
 Latest patch looks good to me.

 I've committed this, with some pretty serious clean-up.  In the future,
 please don't simply copy/paste code w/o any updates to the comments
 included, particularly when the comments no longer make sense in the new
 place.

Thank you very much

Regards

Pavel



 Thanks,

 Stephen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers