Re: [HACKERS] RFC: Making TRUNCATE more MVCC-safe

2012-03-03 Thread Simon Riggs
On Sat, Feb 11, 2012 at 4:46 AM, Noah Misch n...@leadboat.com wrote:

 But I have to admit I'm intrigued by the idea of extending this to
 other cases, if there's a reasonable way to do that.  For example, if
 we could fix things up so that we don't see a table at all if it was
 created after we took our snapshot, that would remove one of the
 obstacles to marking pages bulk-loaded into that table with FrozenXID
 and PD_ALL_VISIBLE from the get-go.  I think a lot of people would be
 mighty happy about that.

 Exactly.

 But the necessary semantics seem somewhat different.  For TRUNCATE,
 you want to throw a serialization error; but is that also what you
 want for CREATE TABLE?  Or do you just want it to appear empty?

 I think an error helps just as much there.  If you create a table and populate
 it with data in the same transaction, letting some snapshot see an empty table
 is an atomicity failure.

 Your comment illustrates a helpful point: this is just another kind of
 ordinary serialization failure, one that can happen at any isolation level.
 No serial transaction sequence can yield one of these errors.

Thanks Noah for drawing attention to this thread. I hadn't been
watching. As you say, this work would allow me to freeze rows at load
time and avoid the overhead of hint bit setting, which avoids
performance issues from hint bit setting in checksum patch.

I've reviewed this patch and it seems OK to me. Good work Marti.

v2 patch attached, updated to latest HEAD. Patch adds
* a GUC called strict_mvcc to isolate the new behaviour if required
* relvalidxid is reset by VACUUM/ANALYZE to avoid wraparound failure

At present this lacks docs for strict_mvcc and doesn't attempt to
handle CREATE TABLE case yet, but should be straightforward to do so.

Hint bit setting is in separate patch on other thread.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c910863..4387f49 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -73,7 +73,7 @@
 
 /* GUC variable */
 bool		synchronize_seqscans = true;
-
+bool		StrictMVCC = true;
 
 static HeapScanDesc heap_beginscan_internal(Relation relation,
 		Snapshot snapshot,
@@ -1175,6 +1175,24 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot,
 	HeapScanDesc scan;
 
 	/*
+	 * If the snapshot is older than relvalidxmin then that either a table
+	 * has only recently been created or that a TRUNCATE has removed data
+	 * that we should still be able to see. Either way, we aren't allowed
+	 * to view the rows in StrictMVCC mode.
+	 */
+	if (StrictMVCC 
+		TransactionIdIsValid(relation-rd_rel-relvalidxmin) 
+		TransactionIdIsValid(snapshot-xmax) 
+		NormalTransactionIdPrecedes(snapshot-xmax, relation-rd_rel-relvalidxmin))
+	{
+		ereport(ERROR,
+(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
+ errmsg(canceling statement due to conflict with TRUNCATE TABLE on %s,
+		 NameStr(relation-rd_rel-relname)),
+ errdetail(User query attempting to see row versions that have been removed.)));
+	}
+
+	/*
 	 * increment relation ref count while scanning relation
 	 *
 	 * This is just to make really sure the relcache entry won't go away while
diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c
index aef410a..3f9bd5d 100644
--- a/src/backend/catalog/heap.c
+++ b/src/backend/catalog/heap.c
@@ -787,6 +787,7 @@ InsertPgClassTuple(Relation pg_class_desc,
 	values[Anum_pg_class_relhastriggers - 1] = BoolGetDatum(rd_rel-relhastriggers);
 	values[Anum_pg_class_relhassubclass - 1] = BoolGetDatum(rd_rel-relhassubclass);
 	values[Anum_pg_class_relfrozenxid - 1] = TransactionIdGetDatum(rd_rel-relfrozenxid);
+	values[Anum_pg_class_relvalidxmin - 1] = TransactionIdGetDatum(rd_rel-relvalidxmin);
 	if (relacl != (Datum) 0)
 		values[Anum_pg_class_relacl - 1] = relacl;
 	else
@@ -884,6 +885,7 @@ AddNewRelationTuple(Relation pg_class_desc,
 		new_rel_reltup-relfrozenxid = InvalidTransactionId;
 	}
 
+	new_rel_reltup-relvalidxmin = InvalidTransactionId;
 	new_rel_reltup-relowner = relowner;
 	new_rel_reltup-reltype = new_type_oid;
 	new_rel_reltup-reloftype = reloftype;
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index bfbe642..0d96a6a 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -2854,7 +2854,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
 		}
 
 		/* We'll build a new physical relation for the index */
-		RelationSetNewRelfilenode(iRel, InvalidTransactionId);
+		RelationSetNewRelfilenode(iRel, InvalidTransactionId, InvalidTransactionId);
 
 		/* Initialize the index and rebuild */
 		/* Note: we do not need to re-establish pkey setting */
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index b40e57b..0578241 100644
--- a/src/backend/commands/analyze.c
+++ 

Re: [HACKERS] COPY with hints, rebirth

2012-03-03 Thread Simon Riggs
On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch n...@leadboat.com wrote:
 On Fri, Mar 02, 2012 at 08:46:45AM +, Simon Riggs wrote:
 On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas 
 heikki.linnakan...@enterprisedb.com wrote:
  It's still broken:

 [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO]

 So this approach isn't the one...

 The COPY FREEZE patch provides a way for the user to say explicitly
 that they don't really care about these MVCC corner cases and as a
 result allows us to avoid touching XidInMVCCSnapshot() at all. So
 there is still a patch on the table.

 You can salvage the optimization by tightening its prerequisite: use it when
 the current subtransaction or a child thereof created or truncated the table.
 A parent subtransaction having done so is acceptable for the WAL avoidance
 optimization but not for this.

I misread your earlier comment. Yes, that will make this work correctly.

 Incidentally, I contend that we should write frozen tuples to new/truncated
 tables unconditionally.  The current behavior of making old snapshots see the
 table as empty violates atomicity at least as badly as letting those snapshots
 see the future-snapshot contents.  But Marti has a sound proposal that would
 interact with your efforts here to avoid violating atomicity at all:
 http://archives.postgresql.org/message-id/cabrt9rbrmdsoz8kxgehfb4lg-ev9u67-6dlqvoiibpkkhtl...@mail.gmail.com

Thank you for bringing this to my attention.

This will make this optimisation work correctly without adding
anything to the main code path of XidInMVCCSnapshot() and without the
annoying FREEZE syntax. So this puts the patch squarely back on the
table.

I'll do another version of this later today designed to work with the
StrictMVCC patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: CHECK FUNCTION statement

2012-03-03 Thread Petr Jelínek

On 03/03/2012 02:24 AM, Alvaro Herrera wrote:

question: how attached you are to the current return format for CHECK
FUNCTION?

check function f1();
CHECK FUNCTION
-
  In function: 'f1()'
  error:42804:5:assignment:subscripted object is not an array
(2 rows)

It seems to me that it'd be trivial to make it look like this instead:

check function f1();
function | lineno | statement  | sqlstate |  message   
| detail | hint | level | position | query
-+++--+++--+---+--+---
f1() |  5 | assignment | 42804| subscripted object is not an array 
||  | error |  |
(1 row)

This looks much nicer to me.

One thing we lose is the caret marking the position of the error -- but
I'm wondering if that really works well.  I didn't test it but from the
code it looks to me like it'd misbehave if you had a multiline statement.

Opinions?


Well, if you want nicely formated table you can always call the checker 
function directly, I think the statement returning something that is 
more human and less machine is more consistent approach with the rest of 
the utility commands. In other words I don't really see the point of it.


Petr

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-03 Thread Dimitri Fontaine
Robert Haas robertmh...@gmail.com writes:
       CREATE COMMAND TRIGGER name ... properties ...;
       DROP COMMAND TRIGGER name;

 full stop.  If you want to run the same trigger function on some
 more commands, add another trigger name.

 +1

 +1.  I suggested the same thing a while back.

Yeah, I know, I just wanted to hear from more people before ditching out
a part of the work I did, and Thom was balancing in the opposite
direction.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-03 Thread Thom Brown
On 3 March 2012 13:45, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
       CREATE COMMAND TRIGGER name ... properties ...;
       DROP COMMAND TRIGGER name;

 full stop.  If you want to run the same trigger function on some
 more commands, add another trigger name.

 +1

 +1.  I suggested the same thing a while back.

 Yeah, I know, I just wanted to hear from more people before ditching out
 a part of the work I did, and Thom was balancing in the opposite
 direction.

I was?  I agreed with Tom's comment, but I did query your
interpretation of it with regards to the CREATE COMMAND TRIGGER
statement.  It seems you removed the ability to create a command
trigger against multiple commands, but I don't think that was the
problem.  It was the DROP COMMAND TRIGGER statement that garnered
comment, as it makes more sense to drop the entire trigger than
individual commands for that trigger.  Initially I had proposed a way
to drop all commands on a trigger at once as an additional option, but
just dropping it completely or not at all is preferable.

But if there is agreement to have multiple commands on a command
trigger, I'm wondering whether we should have an OR separator rather
than a comma?  The reason is that regular triggers define a list of
statements this way.  Personally I prefer the comma syntax, but my
concern (not a strong concern) is for lack of consistency.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label

2012-03-03 Thread Yeb Havinga

On 2012-02-24 17:25, Yeb Havinga wrote:

On 2012-02-23 12:17, Kohei KaiGai wrote:

2012/2/20 Yeb Havingayebhavi...@gmail.com:

On 2012-02-05 10:09, Kohei KaiGai wrote:
The attached part-1 patch moves related routines from hooks.c to 
label.c
because of references to static variables. The part-2 patch 
implements above

mechanism.


I took a short look at this patch but am stuck getting the 
regression test

to run properly.

First, patch 2 misses the file sepgsql.sql.in and therefore the 
creation

function command for sepgsql_setcon is missing.


Thanks for your comments.

I added the definition of sepgsql_setcon function to sepgsql.sql.in 
file,

in addition to patch rebasing.


Very brief comments due to must leave keyboard soon:

I read the source code and played a bit with setcon and the debugger, 
no strange things found.


Code comments / questions:


I took the liberty to change a few things, mostly comments, in the 
attached patch:


maybe client_label_committed is a better name for client_label_setcon?


this change was made.


Is the double negation in the sentence below intended?


several comments were changed / moved. There is now one place where te 
behaviour of the different client_label variables are explained.




sepgsql_set_client_label(), maybe add a comment to !new_label that it 
is reset to the peer label.


done.



Is the assert client_label_peer != NULL in sepgsql_get_client_label 
necessary?
new_label == NULL / pending_label-label == NULL means use the peer 
label. Why not use the peer label instead?


It turned out that pending_label-label is invariantly non null. Changed 
code to assume that and added some Asserts.




set_label: if new_label == current label according to getcon, is it 
necessary to add to the pending list?


this question still remains. Maybe there would be reasons to hit selinux 
with the question: can I change from A-A.


sepgsql_subxact_callback(), could this be made easier to read by just 
taking llast(client_label_pending), assert that plabel-subid == 
mySubId and then list_delete on pointer of that listcell?


no this was a naieve suggestion, which fails in any case of a 
subtransaction with not exactly one call to sepgsql_setcon()


Some comments contain typos, I can spend some time on this, though I'm 
not a native english speaker so it won't be perfect.


sgml documentation must still be added. If time permits I can spend some 
time on that tomorrow.


regards,
Yeb Havinga


--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data

diff --git a/contrib/sepgsql/expected/label.out b/contrib/sepgsql/expected/label.out
index bac169f..e967c7c 100644
--- a/contrib/sepgsql/expected/label.out
+++ b/contrib/sepgsql/expected/label.out
@@ -26,6 +26,11 @@ CREATE FUNCTION f4 () RETURNS text
 AS 'SELECT sepgsql_getcon()'
 LANGUAGE sql;
 SECURITY LABEL ON FUNCTION f4()
+IS 'system_u:object_r:sepgsql_nosuch_trusted_proc_exec_t:s0';
+CREATE FUNCTION f5 (text) RETURNS bool
+	AS 'SELECT sepgsql_setcon($1)'
+LANGUAGE sql;
+SECURITY LABEL ON FUNCTION f5(text)
 IS 'system_u:object_r:sepgsql_regtest_trusted_proc_exec_t:s0';
 --
 -- Tests for default labeling behavior
@@ -100,6 +105,223 @@ SELECT sepgsql_getcon();	-- client's label must be restored
 (1 row)
 
 --
+-- Test for Dynamic Domain Transition
+--
+-- validation of transaction aware dynamic-transition
+SELECT sepgsql_getcon();	-- confirm client privilege
+  sepgsql_getcon  
+--
+ unconfined_u:unconfined_r:unconfined_t:s0:c0.c25
+(1 row)
+
+SELECT sepgsql_setcon('unconfined_u:unconfined_r:unconfined_t:s0:c0.c15');
+ sepgsql_setcon 
+
+ t
+(1 row)
+
+SELECT sepgsql_getcon();
+  sepgsql_getcon  
+--
+ unconfined_u:unconfined_r:unconfined_t:s0:c0.c15
+(1 row)
+
+SELECT sepgsql_setcon(NULL);	-- failed to reset
+ERROR:  SELinux: security policy violation
+SELECT sepgsql_getcon();
+  sepgsql_getcon  
+--
+ unconfined_u:unconfined_r:unconfined_t:s0:c0.c15
+(1 row)
+
+BEGIN;
+SELECT sepgsql_setcon('unconfined_u:unconfined_r:unconfined_t:s0:c0.c12');
+ sepgsql_setcon 
+
+ t
+(1 row)
+
+SELECT sepgsql_getcon();
+  sepgsql_getcon  
+--
+ unconfined_u:unconfined_r:unconfined_t:s0:c0.c12
+(1 row)
+
+SAVEPOINT svpt_1;
+SELECT sepgsql_setcon('unconfined_u:unconfined_r:unconfined_t:s0:c0.c9');
+ sepgsql_setcon 
+
+ t
+(1 row)
+
+SELECT sepgsql_getcon();
+ sepgsql_getcon  
+-
+ unconfined_u:unconfined_r:unconfined_t:s0:c0.c9
+(1 row)
+
+SAVEPOINT svpt_2;
+SELECT sepgsql_setcon('unconfined_u:unconfined_r:unconfined_t:s0:c0.c6');
+ sepgsql_setcon 
+
+ t
+(1 

Re: [HACKERS] Command Triggers, patch v11

2012-03-03 Thread Dimitri Fontaine
Thom Brown t...@linux.com writes:
 And having tried building it, it appears to fail.

Sorry about that, my compiler here was happy building the source (and I
had been doing make clean install along the way) and make installcheck
passed, here.

Now fixed on my github's branch, including docs.

I'll send an updated patch revision later, hopefully including pg_dump
support fixtures (well, adaptation to the new way of doing things IIUC)
and maybe with some trigger arguments rework done.

I understand that you're not blocked until that new version is out,
right? You could use either the incremental patch attached for
convenience.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

commit 301d8e58b6bfe89f35d82ad7a5216891f56c5d48
Author: Dimitri Fontaine d...@tapoueh.org
Date:   Sat Mar 3 15:18:59 2012 +0100

Fix RenameCmdTrigger(), per review.

diff --git a/doc/src/sgml/ref/alter_command_trigger.sgml b/doc/src/sgml/ref/alter_command_trigger.sgml
index dd903e7..48b536c 100644
--- a/doc/src/sgml/ref/alter_command_trigger.sgml
+++ b/doc/src/sgml/ref/alter_command_trigger.sgml
@@ -22,6 +22,7 @@ PostgreSQL documentation
  refsynopsisdiv
 synopsis
 ALTER COMMAND TRIGGER replaceable class=PARAMETERname/replaceable SET replaceable class=parameterenabled/replaceable
+ALTER COMMAND TRIGGER replaceable class=PARAMETERname/replaceable RENAME TO replaceable class=PARAMETERnewname/replaceable
 
 phrasewhere replaceable class=parameterenabled/replaceable can be one of:/phrase
 
@@ -62,10 +63,10 @@ ALTER COMMAND TRIGGER replaceable class=PARAMETERname/replaceable SET rep
/varlistentry
 
varlistentry
-termreplaceable class=PARAMETERcommand/replaceable/term
+termreplaceable class=PARAMETERnewname/replaceable/term
 listitem
  para
-  The command tag on which this trigger acts.
+  The new name of the command trigger.
  /para
 /listitem
/varlistentry
diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c
index 4d07642..b447306 100644
--- a/src/backend/commands/alter.c
+++ b/src/backend/commands/alter.c
@@ -66,7 +66,7 @@ ExecRenameStmt(RenameStmt *stmt)
 			break;
 
 		case OBJECT_CMDTRIGGER:
-			RenameCmdTrigger(stmt-object, stmt-subname, stmt-newname);
+			RenameCmdTrigger(stmt-object, stmt-newname);
 			break;
 
 		case OBJECT_DATABASE:
diff --git a/src/backend/commands/cmdtrigger.c b/src/backend/commands/cmdtrigger.c
index e8d294d..ef9794f 100644
--- a/src/backend/commands/cmdtrigger.c
+++ b/src/backend/commands/cmdtrigger.c
@@ -294,13 +294,17 @@ AlterCmdTrigger(AlterCmdTrigStmt *stmt)
  * Rename command trigger
  */
 void
-RenameCmdTrigger(const char *trigname, const char *newname)
+RenameCmdTrigger(List *name, const char *newname)
 {
 	SysScanDesc tgscan;
 	ScanKeyData skey[1];
 	HeapTuple	tup;
 	Relation	rel;
 	Form_pg_cmdtrigger cmdForm;
+	char *trigname;
+
+	Assert(list_length(name) == 1);
+	trigname = strVal((Value *)linitial(name));
 
 	CheckCmdTriggerPrivileges();
 
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index defcdd1..6a20a6c 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -3463,7 +3463,6 @@ _copyCreateCmdTrigStmt(const CreateCmdTrigStmt *from)
 {
 	CreateCmdTrigStmt *newnode = makeNode(CreateCmdTrigStmt);
 
-	COPY_NODE_FIELD(command);
 	COPY_STRING_FIELD(trigname);
 	COPY_SCALAR_FIELD(timing);
 	COPY_NODE_FIELD(funcname);
@@ -3476,7 +3475,6 @@ _copyAlterCmdTrigStmt(const AlterCmdTrigStmt *from)
 {
 	AlterCmdTrigStmt *newnode = makeNode(AlterCmdTrigStmt);
 
-	COPY_STRING_FIELD(command);
 	COPY_STRING_FIELD(trigname);
 	COPY_STRING_FIELD(tgenabled);
 
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index 1ad31f0..137075a 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -1780,7 +1780,6 @@ _equalCreateTrigStmt(const CreateTrigStmt *a, const CreateTrigStmt *b)
 static bool
 _equalCreateCmdTrigStmt(const CreateCmdTrigStmt *a, const CreateCmdTrigStmt *b)
 {
-	COMPARE_NODE_FIELD(command);
 	COMPARE_STRING_FIELD(trigname);
 	COMPARE_SCALAR_FIELD(timing);
 	COMPARE_NODE_FIELD(funcname);
@@ -1791,7 +1790,6 @@ _equalCreateCmdTrigStmt(const CreateCmdTrigStmt *a, const CreateCmdTrigStmt *b)
 static bool
 _equalAlterCmdTrigStmt(const AlterCmdTrigStmt *a, const AlterCmdTrigStmt *b)
 {
-	COMPARE_STRING_FIELD(command);
 	COMPARE_STRING_FIELD(trigname);
 	COMPARE_STRING_FIELD(tgenabled);
 
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index e5a0d34..e9872d3 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -6931,13 +6931,12 @@ RenameStmt: ALTER AGGREGATE func_name aggr_args RENAME TO name
 	n-missing_ok = false;
 	$$ = (Node *)n;
 }
-			| ALTER TRIGGER name ON COMMAND trigger_command RENAME TO name
+			| ALTER COMMAND TRIGGER name RENAME TO name
 {
 	RenameStmt *n = makeNode(RenameStmt);
 	n-renameType = OBJECT_CMDTRIGGER;
-	

Re: [HACKERS] Command Triggers, patch v11

2012-03-03 Thread Dimitri Fontaine
Thom Brown t...@linux.com writes:
 problem.  It was the DROP COMMAND TRIGGER statement that garnered
 comment, as it makes more sense to drop the entire trigger than
 individual commands for that trigger.

What you're saying here is that a single command could have more than
one command attached to it, and what I understand Tom, Robert and Kevin
are saying is that any given command trigger should only be attached to
a single command.

If we wanted to be more consistent we would need to have a way to attach
the same trigger to both BEFORE and AFTER the command, as of now you
need two triggers here.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-03 Thread Thom Brown
On 3 March 2012 14:26, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Thom Brown t...@linux.com writes:
 And having tried building it, it appears to fail.

 Sorry about that, my compiler here was happy building the source (and I
 had been doing make clean install along the way) and make installcheck
 passed, here.

 Now fixed on my github's branch, including docs.

 I'll send an updated patch revision later, hopefully including pg_dump
 support fixtures (well, adaptation to the new way of doing things IIUC)
 and maybe with some trigger arguments rework done.

 I understand that you're not blocked until that new version is out,
 right? You could use either the incremental patch attached for
 convenience.

Thanks for the extra patch.  Do you see any functional changes between
now and your next patch?  If so, it doesn't make sense for me to test
anything yet.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-03 Thread Thom Brown
On 3 March 2012 14:34, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Thom Brown t...@linux.com writes:
 problem.  It was the DROP COMMAND TRIGGER statement that garnered
 comment, as it makes more sense to drop the entire trigger than
 individual commands for that trigger.

 What you're saying here is that a single command could have more than
 one command attached to it, and what I understand Tom, Robert and Kevin
 are saying is that any given command trigger should only be attached to
 a single command.

I hadn't interpreted it that way, but then that could just be my
misinterpretation.  I'm still of the opinion that a single command
trigger should be able to attach to multiple commands, just not
amendable once the trigger has been created.  If that's not how others
saw it, I'm outnumbered.

So my vote was for:

CREATE COMMAND TRIGGER name { BEFORE | AFTER } command [, ... ]
EXECUTE PROCEDURE function_name ()

ALTER COMMAND TRIGGER name SET enabled

DROP COMMAND TRIGGER [ IF EXISTS ] name [ CASCADE | RESTRICT ]

The only difference shown above compared to the current implementation
in your patch is [, ... ] after command in the CREATE COMMAND TRIGGER
statement.

My argument for this is that you may wish to implement the same
trigger for tables, views and foreign tables rather than create 3
separate triggers.

 If we wanted to be more consistent we would need to have a way to attach
 the same trigger to both BEFORE and AFTER the command, as of now you
 need two triggers here.

I'm not sure I understand.  Attaching a trigger to both BEFORE and
AFTER isn't possible for regular triggers so I don't see why that
would introduce consistency.  Could you explain?

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-03 Thread Kevin Grittner
 Thom Brown  wrote:
 Dimitri Fontaine  wrote:
 Thom Brown  writes:
 problem. It was the DROP COMMAND TRIGGER statement that garnered
 comment, as it makes more sense to drop the entire trigger than
 individual commands for that trigger.

 What you're saying here is that a single command could have more
 than one command attached to it, and what I understand Tom, Robert
 and Kevin are saying is that any given command trigger should only
 be attached to a single command.
 
 I hadn't interpreted it that way
 
Nor had I.
 
 I'm still of the opinion that a single command trigger should be
able to attach to multiple commands, just not
 amendable once the trigger has been created.
 
That was my understanding of what we were discussing, too.
 
 So my vote was for:
 
 CREATE COMMAND TRIGGER name { BEFORE | AFTER } command [, ... ]
 EXECUTE PROCEDURE function_name ()
 
 ALTER COMMAND TRIGGER name SET enabled
 
 DROP COMMAND TRIGGER [ IF EXISTS ] name [ CASCADE | RESTRICT ]
 
Same here.
 
 My argument for this is that you may wish to implement the same
 trigger for tables, views and foreign tables rather than create 3
 separate triggers.
 
Right.  What I thought I was agreeing with was the notion that you
should need to specify more than the trigger name to drop the
trigger.  Rather like how you can create a trigger AFTER INSERT OR
UPDATE OR DELETE, but you don't need to specify all those events to
drop the trigger -- just the name will do.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-03 Thread Thom Brown
On 3 March 2012 16:12, Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Thom Brown  wrote:
 Dimitri Fontaine  wrote:
 Thom Brown  writes:
 problem. It was the DROP COMMAND TRIGGER statement that garnered
 comment, as it makes more sense to drop the entire trigger than
 individual commands for that trigger.

 What you're saying here is that a single command could have more
 than one command attached to it, and what I understand Tom, Robert
 and Kevin are saying is that any given command trigger should only
 be attached to a single command.

 I hadn't interpreted it that way

 Nor had I.

 I'm still of the opinion that a single command trigger should be
 able to attach to multiple commands, just not
 amendable once the trigger has been created.

 That was my understanding of what we were discussing, too.

 So my vote was for:

 CREATE COMMAND TRIGGER name { BEFORE | AFTER } command [, ... ]
 EXECUTE PROCEDURE function_name ()

 ALTER COMMAND TRIGGER name SET enabled

 DROP COMMAND TRIGGER [ IF EXISTS ] name [ CASCADE | RESTRICT ]

 Same here.

 My argument for this is that you may wish to implement the same
 trigger for tables, views and foreign tables rather than create 3
 separate triggers.

 Right.  What I thought I was agreeing with was the notion that you
 should need to specify more than the trigger name to drop the
 trigger.  Rather like how you can create a trigger AFTER INSERT OR
 UPDATE OR DELETE, but you don't need to specify all those events to
 drop the trigger -- just the name will do.

Don't you mean shouldn't need to specify more than the trigger name?

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-03 Thread Tom Lane
... BTW, could you explain exactly how that Fill histogram by hashtab
loop works?  It's way too magic for my taste, and does in fact have bugs
in the currently submitted patch.  I've reworked it to this:

/* Fill histogram by hashtab. */
delta = analyzed_rows - 1;
count_item_index = 0;
frac = sorted_count_items[0]-frequency * (num_hist - 1);
for (i = 0; i  num_hist; i++)
{
while (frac = 0)
{
count_item_index++;
Assert(count_item_index  count_items_count);
frac += sorted_count_items[count_item_index]-frequency * 
(num_hist - 1);
}
hist[i] = sorted_count_items[count_item_index]-count;
frac -= delta;
}
Assert(count_item_index == count_items_count - 1);

The asserts don't fire in any test case I've tried, which seems to
indicate that it *does* work in the sense that the first histogram entry
is always the smallest count and the last histogram entry is always
the largest one.  But it's extremely unclear why it manages to stop
exactly at the last count_items array entry, or for that matter why it's
generating a representative histogram at all.  I'm suspicious that the
-1 bits represent off-by-one bugs.

I also don't especially like the fact that frac is capable of
overflowing (since worst case frequency is 300 * 1 and worst case
num_hist is 1, with the current limits on statistics_target).
We could work around that by doing the frac arithmetic in int64, but I
wonder whether that couldn't be avoided.  In any case, first I'd like
an explanation why this code works at all.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-03 Thread Kevin Grittner
Thom Brown  wrote:
 
 Don't you mean shouldn't need to specify more than the trigger
 name?
 
You are right, that's what I meant to say.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Command Triggers, patch v11

2012-03-03 Thread Dimitri Fontaine
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Right.  What I thought I was agreeing with was the notion that you
 should need to specify more than the trigger name to drop the
 trigger.  Rather like how you can create a trigger AFTER INSERT OR
 UPDATE OR DELETE, but you don't need to specify all those events to
 drop the trigger -- just the name will do.

The parallel between INSERT/UPDATE/DELETE and the trigger's command is
not working well enough, because in the data trigger case we're managing
a single catalog entry with a single command, and in the command trigger
case, in my model at least, we would be managing several catalog entries
per command.

To take an example:

  CREATE COMMAND TRIGGER foo AFTER create table, create view;
  DROP COMMAND TRIGGER foo;

The first command would create two catalog entries, and the second one
would delete the same two entries.  It used to work this way in the
patch, then when merging with the new remove object infrastructure I
lost that ability.  From the beginning Robert has been saying he didn't
want that behavior, and Tom is now saying the same, IIUC.

So we're back to one command, one catalog entry.

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: CHECK FUNCTION statement

2012-03-03 Thread Alvaro Herrera

Excerpts from Petr Jelínek's message of sáb mar 03 10:26:04 -0300 2012:
 On 03/03/2012 02:24 AM, Alvaro Herrera wrote:
  question: how attached you are to the current return format for CHECK
  FUNCTION?
 
  check function f1();
  CHECK FUNCTION
  -
In function: 'f1()'
error:42804:5:assignment:subscripted object is not an array
  (2 rows)

 Well, if you want nicely formated table you can always call the checker 
 function directly, I think the statement returning something that is 
 more human and less machine is more consistent approach with the rest of 
 the utility commands. In other words I don't really see the point of it.

I am not against having some more human readable output than plain
tabular.  In particular the idea that we need to have all fields is of
course open to discussion.  But is the output as proposed above really
all that human friendly?  I disagree that it cannot be improved.

BTW one thing that's missing in this feature so far is some
translatability of the returned output.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-03-03 Thread Simon Riggs
On Sat, Mar 3, 2012 at 1:14 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch n...@leadboat.com wrote:
 On Fri, Mar 02, 2012 at 08:46:45AM +, Simon Riggs wrote:
 On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas 
 heikki.linnakan...@enterprisedb.com wrote:
  It's still broken:

 [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO]

 So this approach isn't the one...

 The COPY FREEZE patch provides a way for the user to say explicitly
 that they don't really care about these MVCC corner cases and as a
 result allows us to avoid touching XidInMVCCSnapshot() at all. So
 there is still a patch on the table.

 You can salvage the optimization by tightening its prerequisite: use it when
 the current subtransaction or a child thereof created or truncated the table.
 A parent subtransaction having done so is acceptable for the WAL avoidance
 optimization but not for this.

 I misread your earlier comment. Yes, that will make this work correctly.

 Incidentally, I contend that we should write frozen tuples to new/truncated
 tables unconditionally.  The current behavior of making old snapshots see the
 table as empty violates atomicity at least as badly as letting those 
 snapshots
 see the future-snapshot contents.  But Marti has a sound proposal that would
 interact with your efforts here to avoid violating atomicity at all:
 http://archives.postgresql.org/message-id/cabrt9rbrmdsoz8kxgehfb4lg-ev9u67-6dlqvoiibpkkhtl...@mail.gmail.com

 Thank you for bringing this to my attention.

 This will make this optimisation work correctly without adding
 anything to the main code path of XidInMVCCSnapshot() and without the
 annoying FREEZE syntax. So this puts the patch squarely back on the
 table.

 I'll do another version of this later today designed to work with the
 StrictMVCC patch.

OK, so latest version of patch handling the subtransaction issues
correctly. Thanks to Noah and Heikki for identifying them and hitting
me with the clue stick.

So this version of patch has negligible effect on mainline performance
though leaves newly loaded data visible in ways that would break MVCC.
So this patch relies on the safe_truncate.v2.patch posted on separate
thread.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c910863..b891327 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2050,12 +2050,27 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 	tup-t_data-t_infomask = ~(HEAP_XACT_MASK);
 	tup-t_data-t_infomask2 = ~(HEAP2_XACT_MASK);
 	tup-t_data-t_infomask |= HEAP_XMAX_INVALID;
-	HeapTupleHeaderSetXmin(tup-t_data, xid);
 	HeapTupleHeaderSetCmin(tup-t_data, cid);
 	HeapTupleHeaderSetXmax(tup-t_data, 0);		/* for cleanliness */
 	tup-t_tableOid = RelationGetRelid(relation);
 
 	/*
+	 * If we are inserting into a new relation invisible as yet to other
+	 * backends and our session has no prior snapshots and no ready portals
+	 * then we can also set the hint bit for the rows we are inserting. The
+	 * last two restrictions ensure that HeapTupleSatisfiesMVCC() gives
+	 * the right answer if the current transaction inspects the data after
+	 * we load it.
+	 */
+	if (options  HEAP_INSERT_HINT_XMIN)
+	{
+		tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED;
+		HeapTupleHeaderSetXmin(tup-t_data, FrozenTransactionId);
+	}
+	else
+		HeapTupleHeaderSetXmin(tup-t_data, xid);
+
+	/*
 	 * If the new tuple is too big for storage or contains already toasted
 	 * out-of-line attributes from some other relation, invoke the toaster.
 	 */
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 110480f..a34f072 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -43,6 +43,7 @@
 #include utils/builtins.h
 #include utils/lsyscache.h
 #include utils/memutils.h
+#include utils/portal.h
 #include utils/rel.h
 #include utils/snapmgr.h
 
@@ -1922,6 +1923,21 @@ CopyFrom(CopyState cstate)
 		hi_options |= HEAP_INSERT_SKIP_FSM;
 		if (!XLogIsNeeded())
 			hi_options |= HEAP_INSERT_SKIP_WAL;
+
+		if (ThereAreNoPriorRegisteredSnapshots() 
+			ThereAreNoReadyPortals())
+		{
+			SubTransactionId	currxid = GetCurrentSubTransactionId();
+
+			/*
+			 * If the relfilenode has been created in this subtransaction
+			 * then we can further optimise the data load by setting hint
+			 * bits and pre-freezing tuples.
+			 */
+			if (cstate-rel-rd_createSubid == currxid ||
+cstate-rel-rd_newRelfilenodeSubid == currxid)
+hi_options |= HEAP_INSERT_HINT_XMIN;
+		}
 	}
 
 	/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index cfb73c1..24075db 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -1055,3 +1055,22 @@ pg_cursor(PG_FUNCTION_ARGS)
 
 	return (Datum) 0;
 }
+
+bool

Re: [HACKERS] Command Triggers, patch v11

2012-03-03 Thread Thom Brown
On 3 March 2012 19:25, Dimitri Fontaine dimi...@2ndquadrant.fr wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Right.  What I thought I was agreeing with was the notion that you
 should need to specify more than the trigger name to drop the
 trigger.  Rather like how you can create a trigger AFTER INSERT OR
 UPDATE OR DELETE, but you don't need to specify all those events to
 drop the trigger -- just the name will do.

 The parallel between INSERT/UPDATE/DELETE and the trigger's command is
 not working well enough, because in the data trigger case we're managing
 a single catalog entry with a single command, and in the command trigger
 case, in my model at least, we would be managing several catalog entries
 per command.

 To take an example:

  CREATE COMMAND TRIGGER foo AFTER create table, create view;
  DROP COMMAND TRIGGER foo;

 The first command would create two catalog entries, and the second one
 would delete the same two entries.  It used to work this way in the
 patch, then when merging with the new remove object infrastructure I
 lost that ability.  From the beginning Robert has been saying he didn't
 want that behavior, and Tom is now saying the same, IIUC.

 So we're back to one command, one catalog entry.

That sucks.  I'm surprised there's no provision for overriding it on a
command-by-command basis.

I would suggest an array instead, but that sounds costly from a
look-up perspective.

-- 
Thom

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Patch: improve selectivity estimation for IN/NOT IN

2012-03-03 Thread Daniele Varrazzo
Hello,

the attached patch improves the array selectivity estimation for = ANY
and  ALL, hence for the IN/NOT IN operators, to avoid the
shortcoming described in
http://archives.postgresql.org/pgsql-performance/2012-03/msg6.php.

Cheers,

-- Daniele
From d10e60dd3dec340b96b372792e4a0650ec10da92 Mon Sep 17 00:00:00 2001
From: Daniele Varrazzo daniele.varra...@gmail.com
Date: Sat, 3 Mar 2012 19:27:16 +
Subject: [PATCH] Improve array selectivity estimation for = ANY and  ALL

Assume distinct array elements, hence disjoint probabilities instead of
independent. Using the wrong probability model can easily lead to serious
selectivity overestimation for the IN operator and underestimation for NOT IN.
---
 src/backend/utils/adt/selfuncs.c |   37 +
 1 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 0a685aa..b0816ca 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -1797,10 +1797,29 @@ scalararraysel(PlannerInfo *root,
   ObjectIdGetDatum(operator),
   PointerGetDatum(args),
   Int32GetDatum(varRelid)));
+			/*
+			 * For generic operators, assume the selection probabilities as
+			 * independent for each array element. But for the equality the
+			 * probabilities are disjoint, so the total probability is just
+			 * the sum of the probabilities of the single elements. This is
+			 * true only if the array doesn't contain dups, but the check is
+			 * expensive: just assume it, to avoid penalizing well-written
+			 * queries in favour of poorly-written ones.
+			 */
 			if (useOr)
-s1 = s1 + s2 - s1 * s2;
+			{
+if (oprselproc.fn_addr == eqsel)
+	s1 = s1 + s2;
+else
+	s1 = s1 + s2 - s1 * s2;
+			}
 			else
-s1 = s1 * s2;
+			{
+if (oprselproc.fn_addr == neqsel)
+	s1 = s1 + s2 - 1.0;
+else
+	s1 = s1 * s2;
+			}
 		}
 	}
 	else if (rightop  IsA(rightop, ArrayExpr) 
@@ -1840,9 +1859,19 @@ scalararraysel(PlannerInfo *root,
   PointerGetDatum(args),
   Int32GetDatum(varRelid)));
 			if (useOr)
-s1 = s1 + s2 - s1 * s2;
+			{
+if (oprselproc.fn_addr == eqsel)
+	s1 = s1 + s2;
+else
+	s1 = s1 + s2 - s1 * s2;
+			}
 			else
-s1 = s1 * s2;
+			{
+if (oprselproc.fn_addr == neqsel)
+	s1 = s1 + s2 - 1.0;
+else
+	s1 = s1 * s2;
+			}
 		}
 	}
 	else
-- 
1.7.5.4


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: CHECK FUNCTION statement

2012-03-03 Thread Alvaro Herrera

Excerpts from Pavel Stehule's message of sáb mar 03 02:25:52 -0300 2012:
 Hello
 
 
  It wasn't all that difficult -- see below.  While at this, I have a
  question: how attached you are to the current return format for CHECK
  FUNCTION?
 
 TupleDescr is created by language creator. This ensure exactly
 expected format, because there are no possible registry check function
 with other output tuple descriptor.

I'm not sure what you're saying.  What TupDesc are you talking about?
The tupdesc returned by the checker is certainly hardcoded by the core
support; the language creator cannot deviate from it.


  check function f1();
  function | lineno | statement  | sqlstate |              message            
     | detail | hint | level | position | query
  -+++--+++--+---+--+---
  f1()     |      5 | assignment | 42804    | subscripted object is not an 
  array |        |      | error |          |
  (1 row)
 
  This looks much nicer to me.
 
 I am strongly disagree.
 
 1. This format is not consistent with other commands,
 2. This format is difficult for copy/paste
 3. THE ARE NOT CARET - this is really important
 5. This form is bad for terminals - there are long rows, and with \x
 outout, there are lot of garbage for multicommand
 4. When you would to this form, you can to directly call SRF PL check
 functions.

I am not sure that consistency is the most important thing here; I think
what we care about is that it's usable.  So yeah, it might be hard to
cut and paste, and also too wide.  Maybe we can run some of the fields
together, and omit others.

I am not sure about the caret thingy -- mainly because I don't think it
works all that well.  I don't know how psql does it, but I notice that
it shows a single line in a multiline query -- so it's not just a matter
of adding some number of spaces.

Given the negative feedback, I'm going to leave this output format
unchanged; we can tweak it later.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-03-03 Thread Simon Riggs
On Fri, Mar 2, 2012 at 11:15 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Simon Riggs si...@2ndquadrant.com wrote:

 I like Marti's idea. At present, making his idea work could easily
 mean checksums sink, so not sure whether to attempt to make that
 work in detail.

 For my part, improving bulk load performance and TRUNCATE
 transactional semantics would trump checksums.  If we have to defer
 one or the other to a later release, I would prefer that we bump
 checksums.

 While I understand the desire for checksums, and understand others
 have had situations where they would have helped, so far I have yet
 to run into a situation where I feel they would have helped me.  The
 hint bit and freeze issues require ongoing attention and have real
 performance consequences on a regular basis.  And closing the window
 for odd transactional semantics on TRUNCATE versus DELETE of all
 rows in a table would be one less thing to worry about, in my world.

Since you supported this, I've invested the time to make it work. It
doesn't look like it needs to be either-or.

Please review the safe_truncate.v2.patch and
copy_autofrozen.v359.patch, copied here to assist testing and
inspection.

At present those patches handle only the TRUNCATE/COPY optimisation
but we could easily add CTAS, CREATE/COPY, CLUSTERVACUUM FULL etc..

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index c910863..b891327 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2050,12 +2050,27 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
 	tup-t_data-t_infomask = ~(HEAP_XACT_MASK);
 	tup-t_data-t_infomask2 = ~(HEAP2_XACT_MASK);
 	tup-t_data-t_infomask |= HEAP_XMAX_INVALID;
-	HeapTupleHeaderSetXmin(tup-t_data, xid);
 	HeapTupleHeaderSetCmin(tup-t_data, cid);
 	HeapTupleHeaderSetXmax(tup-t_data, 0);		/* for cleanliness */
 	tup-t_tableOid = RelationGetRelid(relation);
 
 	/*
+	 * If we are inserting into a new relation invisible as yet to other
+	 * backends and our session has no prior snapshots and no ready portals
+	 * then we can also set the hint bit for the rows we are inserting. The
+	 * last two restrictions ensure that HeapTupleSatisfiesMVCC() gives
+	 * the right answer if the current transaction inspects the data after
+	 * we load it.
+	 */
+	if (options  HEAP_INSERT_HINT_XMIN)
+	{
+		tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED;
+		HeapTupleHeaderSetXmin(tup-t_data, FrozenTransactionId);
+	}
+	else
+		HeapTupleHeaderSetXmin(tup-t_data, xid);
+
+	/*
 	 * If the new tuple is too big for storage or contains already toasted
 	 * out-of-line attributes from some other relation, invoke the toaster.
 	 */
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 110480f..a34f072 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -43,6 +43,7 @@
 #include utils/builtins.h
 #include utils/lsyscache.h
 #include utils/memutils.h
+#include utils/portal.h
 #include utils/rel.h
 #include utils/snapmgr.h
 
@@ -1922,6 +1923,21 @@ CopyFrom(CopyState cstate)
 		hi_options |= HEAP_INSERT_SKIP_FSM;
 		if (!XLogIsNeeded())
 			hi_options |= HEAP_INSERT_SKIP_WAL;
+
+		if (ThereAreNoPriorRegisteredSnapshots() 
+			ThereAreNoReadyPortals())
+		{
+			SubTransactionId	currxid = GetCurrentSubTransactionId();
+
+			/*
+			 * If the relfilenode has been created in this subtransaction
+			 * then we can further optimise the data load by setting hint
+			 * bits and pre-freezing tuples.
+			 */
+			if (cstate-rel-rd_createSubid == currxid ||
+cstate-rel-rd_newRelfilenodeSubid == currxid)
+hi_options |= HEAP_INSERT_HINT_XMIN;
+		}
 	}
 
 	/*
diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c
index cfb73c1..24075db 100644
--- a/src/backend/utils/mmgr/portalmem.c
+++ b/src/backend/utils/mmgr/portalmem.c
@@ -1055,3 +1055,22 @@ pg_cursor(PG_FUNCTION_ARGS)
 
 	return (Datum) 0;
 }
+
+bool
+ThereAreNoReadyPortals(void)
+{
+	HASH_SEQ_STATUS status;
+	PortalHashEnt *hentry;
+
+	hash_seq_init(status, PortalHashTable);
+
+	while ((hentry = (PortalHashEnt *) hash_seq_search(status)) != NULL)
+	{
+		Portal		portal = hentry-portal;
+
+		if (portal-status == PORTAL_READY)
+			return false;
+	}
+
+	return true;
+}
diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 5aebbd1..5d9e3bf 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -1183,3 +1183,12 @@ DeleteAllExportedSnapshotFiles(void)
 
 	FreeDir(s_dir);
 }
+
+bool
+ThereAreNoPriorRegisteredSnapshots(void)
+{
+	if (RegisteredSnapshots = 1)
+		return true;
+
+	return false;
+}
diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h
index fa38803..0381785 100644
--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -26,6 

[HACKERS] tuplesort memory usage: grow_memtuples

2012-03-03 Thread Jeff Janes
When sorting small tuples, the memtuples array can use a substantial
fraction of the total per-tuple memory used. (In the case of pass by
value, it is all of it)

The way it grows leads to sub-optimal memory usage.

In one case, if it can grow by 1.999 x, but not by 2 x, we just give
up and use half the memory we could have.  This is what actually
happens in pass-by-value sorting when using a power of two for
*work_mem.

While I understand we don't want to thrash memory, it seems like we
could at least grow by less than 2x just one last time.

Also, if there is just barely room to double memtuples (and it is
pass-by-reference sorting) then the number of slots is doubled but
leaves no room for the tuples themselves to be stored, so most of
those slots can't be used.

A solution to both would be to assess when we are about to do one of
the two above things, and instead use the historical tuple size to
compute how much of a growth we can do so that the memtuples slots and
the cumulative palloc heap of tuples will exhaust at the same time.
If the availMem is less than half of allowedMem, then the next
doubling of memtuples would either fail, or would succeed but leave
behind too little allowedMem to hold the full complement of new
tuples.  So either way, extrapolate from current usage.   The patch
assumes that nothing other than memtuples and individual tuple storage
have been deducted from availMem.  Currently that is true, but may not
always be so (we recently discussed pre-deducting the tape buffer
overhead, so it doesn't cause a memory overrun).

As the  indicates, I am not currently able to prove to myself that
there are no conditions under which this change could cause
LACKMEM(state) to become true.

This patch gives about a 2-fold window over which a sort that would
otherwise go to tape sort will instead be done in memory, which is 2-3
times faster.  That might not be very impressive, because after all if
you wanted to use more memory you could have just increased work_mem.
But it still seems worthwhile to use the memory we are told to use.
It may be hard to fine-tune the *work_mem settings, but not using the
amount we are given surely doesn't make it any easier.

But other than the tape vs in memory improvement, this also gives
other improvements.  For example, this gives me about a 35% speed up
when using default work_mem to do a select count(distinct k) from t
where k is random integer and t has 512 million rows.  Granted, it is
silly to intentionally do such a large sort with such small memory.
But it still seems worth improving, and the wins should be sitting
around at other sizes as well, though probably not as big.

In my test case, the improvement mostly comes from turning random IO
reads into sequential ones.

The prevelance of random IO is due to a cascade of issues:

1) As discussed above, we are using only half the memory we could be.

2) Due to MINORDER we are spreading that reduced memory over twice as
many tapes as MERGE_BUFFER_SIZE would suggest.  Changing this would
obviously have a trade-off

3) MERGE_BUFFER_SIZE itself describes the in-RAM footprint, not the
on-tape footprint, because we unpack tuples as we read them from tape.
 Perhaps mergeprereadone could stash the preread data unpacked and
unpack it only as needed.  But that is a much more invasive change.

4) Pre-reading all tapes whenever just one becomes empty causes blocks
to be freed, and thus re-used, in smaller contiguous chunks than they
otherwise would. (Easy to change, but there is trade-off to changing
it)

5) Even without 4, logtape.c still makes a jumble of the free list on
multi-level merges.  I'm not entirely sure why yet--I think maybe it
is the way indirect blocks are handled.

Add it all up, and instead of pre-reading 32 consecutive 8K blocks, it
pre-reads only about 1 or 2 consecutive ones on the final merge.  Now
some of those could be salvaged by the kernel keeping track of
multiple interleaved read ahead opportunities, but in my hands vmstat
shows a lot of IO wait and shows reads that seem to be closer to
random IO than large read-ahead.  If it used truly efficient read
ahead, CPU would probably be limiting.

I'll add this to the open commit fest.  When doing performance
testing, I find it far easier to alter a guc.c parameter than to keep
multiple compiled binaries around.  Would people like a testing patch
that does the same thing as this, or does the original behavior, under
control of a parameter setting?

Cheers,

Jeff


sortmem_grow-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] COPY with hints, rebirth

2012-03-03 Thread Kevin Grittner
 Simon Riggs  wrote:
 Kevin Grittner  wrote:
 Simon Riggs  wrote:

 I like Marti's idea. At present, making his idea work could
 easily mean checksums sink
 
 For my part, improving bulk load performance and TRUNCATE
 transactional semantics would trump checksums
 
 It doesn't look like it needs to be either-or.
 
Great news!
 
 Please review the safe_truncate.v2.patch and
 copy_autofrozen.v359.patch, copied here to assist testing and
 inspection.
 
I'll look at it today.
 
 At present those patches handle only the TRUNCATE/COPY optimisation
 but we could easily add CTAS, CREATE/COPY, CLUSTERVACUUM FULL etc..
 
CREATE/COPY would be important so that pg_dump | psql -1 would
benefit.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] poll: CHECK TRIGGER?

2012-03-03 Thread Alvaro Herrera

Hi,

Pavel's patch for CHECK FUNCTION is adding another command besides that
one, which is CHECK TRIGGER.  The idea behind this is that you give it
the relation to which the trigger is attached in addition to the trigger
name, and it checks the function being called by that trigger.

IMHO having a separate command for this is not warranted.  It seems to
me that we could simply have a variant of CREATE FUNCTION for this; I
proposed CREATE FUNCTION trigfunc() AS TRIGGER ON tabname.  Besides,
it's really the function code being checked, not the trigger itself; the
trigger is only providing the tuple descriptor for NEW and OLD.

Pavel didn't like this idea; also, a quick poll on twitter elicited only
two answers: Dimitri Fontaine prefers CHECK TRIGGER, and Guillaume
Lelarge prefers CHECK FUNCTION.

So I still don't know which route to go with this.  Thoughts?

One thing to consider is eventual support for triggers that use
anonymous function blocks, without a previous CREATE FUNCTION, which is
being discussed in another thread.  Another point is that CHECK TRIGGER
requires a separate documentation page.

-- 
Álvaro Herrera alvhe...@alvh.no-ip.org

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] review: CHECK FUNCTION statement

2012-03-03 Thread Alvaro Herrera

Excerpts from Alvaro Herrera's message of sáb mar 03 16:54:19 -0300 2012:
 Excerpts from Pavel Stehule's message of sáb mar 03 02:25:52 -0300 2012:

  3. THE ARE NOT CARET - this is really important

 I am not sure about the caret thingy -- mainly because I don't think it
 works all that well.  I don't know how psql does it, but I notice that
 it shows a single line in a multiline query -- so it's not just a matter
 of adding some number of spaces.

I checked how this works in psql.  It is upwards of 200 lines of code --
see reportErrorPosition in libpq/fe-protocol3.c.  I'm not sure this can
be made to work sensibly here.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Memory usage during sorting

2012-03-03 Thread Jeff Janes
On Sun, Feb 26, 2012 at 7:20 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Feb 25, 2012 at 4:31 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 I'm not sure about the conclusion, but given this discussion, I'm
 inclined to mark this Returned with Feedback.

 OK, thanks.  Does anyone have additional feed-back on how tightly we
 wish to manage memory usage?  Is trying to make us use as much memory
 as we are allowed to without going over a worthwhile endeavor at all,
 or is it just academic nitpicking?

 I'm not sure, either.  It strikes me that, in general, it's hard to
 avoid a little bit of work_mem overrun, since we can't know whether
 the next tuple will fit until we've read it, and then if it turns out
 to be big, well, the best thing we can do is free it, but perhaps
 that's closing the barn door after the horse has gotten out already.

That type of overrun doesn't bother me much, because the size of the
next tuple some else feeds us is mostly outside of this modules
control.  And also be cause the memory that is overrun should be
reusable once the offending tuple is written out to tape.  The type of
overrun I'm more concerned with is that which is purely under this
modules control, and which is then not re-used productively.

The better solution would be to reduce the overhead in the first
place.  While building the initial runs, there is no reason to have 3
blocks worth of overhead for each tape, when only one tape is ever
being used at a time.  But that change seems much tougher to
implement.

 Having recently spent quite a bit of time looking at tuplesort.c as a
 result of Peter Geoghegan's work and some other concerns, I'm inclined
 to think that it needs more than minor surgery.  That file is peppered
 with numerous references to Knuth which serve the dual function of
 impressing the reader with the idea that the code must be very good
 (since Knuth is a smart guy) and rendering it almost completely
 impenetrable (since the design is justified by reference to a textbook
 most of us probably do not have copies of).

Yes, I agree with that analysis.  And getting the volume I want by
inter-library-loan has been challenging--I keep getting the volume
before or after the one I want.  Maybe Knuth starts counting volumes
at 0 and libraries start counting at 1 :)

Anyway, I think the logtape could use redoing.  When your tapes are
actually physically tape drives, it is necessary to build up runs one
after the other on physical tapes, because un-mounting a tape from a
tape drive and remounting another tape is not very feasible at scale.
Which then means you need to place your runs carefully, because if the
final merge finds that two runs it needs are back-to-back on one tape,
that is bad.  But with disks pretending to be tapes, you could
re-arrange the runs with just some book keeping.  Maintaining the
distinction between tapes and runs is pointless, which means the
Fibonacci placement algorithm is pointless as well.

 A quick Google search for external sorting algorithms suggest that the
 typical way of doing an external sort is to read data until you fill
 your in-memory buffer, quicksort it, and dump it out as a run.  Repeat
 until end-of-data; then, merge the runs (either in a single pass, or
 if there are too many, in multiple passes).  I'm not sure whether that
 would be better than what we're doing now, but there seem to be enough
 other people doing it that we might want to try it out.  Our current
 algorithm is to build a heap, bounded in size by work_mem, and dribble
 tuples in and out, but maintaining that heap is pretty expensive;
 there's a reason people use quicksort rather than heapsort for
 in-memory sorting.

But it would mean we have about 1.7x  more runs that need to be merged
(for initially random data).  Considering the minimum merge order is
6, that increase in runs is likely not to lead to an additional level
of merging, in which case the extra speed of building the runs would
definitely win.  But if it does cause an additional level of merge, it
could end up being a loss.

Is there some broad corpus of sorting benchmarks which changes could
be tested against?  I usually end up testing just simple columns of
integers or small strings, because they are easy to set up.  That is
not ideal.

 As a desirable side effect, I think it would mean
 that we could dispense with retail palloc and pfree altogether.  We
 could just allocate a big chunk of memory, copy tuples into it until
 it's full, using a pointer to keep track of the next unused byte, and
 then, after writing the run, reset the allocation pointer back to the
 beginning of the buffer.  That would not only avoid the cost of going
 through palloc/pfree, but also the memory overhead imposed by
 bookkeeping and power-of-two rounding.

Wouldn't we still need an array of pointers to the start of every
tuple's location in the buffer?  Or else, how would qsort know where
to find them?

Also, to do this we would need to get around 

Re: [HACKERS] review: CHECK FUNCTION statement

2012-03-03 Thread Alvaro Herrera

Excerpts from Alvaro Herrera's message of sáb mar 03 17:56:23 -0300 2012:
 Excerpts from Alvaro Herrera's message of sáb mar 03 16:54:19 -0300 2012:
  Excerpts from Pavel Stehule's message of sáb mar 03 02:25:52 -0300 2012:
 
   3. THE ARE NOT CARET - this is really important
 
  I am not sure about the caret thingy -- mainly because I don't think it
  works all that well.

It doesn't work correctly with your patch; see sample below.  Note the
caret is pointing to an entirely nonsensical position.  I'm not sure
about duplicating the libpq line-counting logic in the backend.

Also note i18n seems to be working well, except for the In function
header, query, and the error level.  That seems easily fixable.

I remain unconvinced that this is the best possible output.

alvherre=# create function f() returns int language plpgsql as $$
begin select
var
from
foo; end; $$;
CREATE FUNCTION
alvherre=# check function f();
 CHECK FUNCTION  
-
 In function: 'f()'
 error:42P01:2:sentencia SQL:no existe la relación «foo»
 query:select   +
 var+
 from   +
 foo
   ^
(4 filas)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] archive_keepalive_command

2012-03-03 Thread Jeff Janes
On Sun, Jan 15, 2012 at 5:52 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Fri, Dec 16, 2011 at 3:01 PM, Simon Riggs si...@2ndquadrant.com wrote:
 archive_command and restore_command describe how to ship WAL files
 to/from an archive.

 When there is nothing to ship, we delay sending WAL files. When no WAL
 files, the standby has no information at all.

 To provide some form of keepalive on quiet systems the
 archive_keepalive_command provides a generic hook to implement
 keepalives. This is implemented as a separate command to avoid storing
 keepalive messages in the archive, or at least allow overwrites using
 a single filename like keepalive.


 Patch.

Preliminary review:

Applies with several hunks, and with some fuzz in xlog.h

Builds cleanly and passes make check.

Does not provide documentation, which is needed.

Does not include regression tests, but there is no framework for
testing archiving.

Usability testing:

Does this patch have any user-visible effect?  I thought it would make
pg_last_xact_replay_timestamp() advance, but it does not seem to.  I
looked through the source a bit, and as best I can tell this only sets
some internal state which is never used, except under DEBUG2

The example archive_keepalive_command given in postgresql.conf.sample
is not usable as given.  If the file is named %f, then there is no
easy way for restore_keepalive_command to retrieve the file because it
would not know the name to use.  So the example given in
postgresql.conf.sample should be more like the one given in
recovery.conf.sample, where it uses a hard-coded name rather than %f.
But in that case, it is not clear what %f might be useful for.


Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collect frequency statistics for arrays

2012-03-03 Thread Tom Lane
Alexander Korotkov aekorot...@gmail.com writes:
 [ array statistics patch ]

I've committed this after a fair amount of editorialization.  There are
still some loose ends to deal with, but I felt it was ready to go into
the tree for wider testing.

The main thing I changed that wasn't in the nature of cleanup/bugfixing
was that I revised the effort-limiting logic in
mcelem_array_contained_selec.  The submitted code basically just punted
if the estimated work was too much, but as was already noted in
http://archives.postgresql.org/pgsql-hackers/2011-10/msg01349.php
that can result in really bad estimates.  What I did instead is
something closer to Robert's original suggestion: trim the number of
element values taken into consideration from the array constant to a
value that fits within the desired effort limit.  If we consider just
the N most common values from the array constant, we still get a pretty
good estimate (since the trimmed N will still be close to 100 for the
values we're talking about).

I redid the tests in the above-mentioned message and see no cases where
the estimate is off by more than a factor of 2, and very few where it's
off by more than 20%, so this seems to work pretty well now.

The remaining loose ends IMO are:

1. I'm still unhappy about the loop that fills the count histogram,
as I noted earlier today.  It at least needs a decent comment and some
overflow protection, and I'm not entirely convinced that it doesn't have
more bugs than the overflow issue.

2. The tests in the above-mentioned message show that in most cases
where mcelem_array_contained_selec falls through to the rough
estimate, the resulting rowcount estimate is just 1, ie we are coming
out with very small selectivities.  Although that path will now only be
taken when there are no stats, it seems like we'd be better off to
return DEFAULT_CONTAIN_SEL instead of what it's doing.  I think there
must be something wrong with the rough estimate logic.  Could you
recheck that?

3. As I mentioned yesterday, I think it'd be a good idea to make some
provisions to reduce the width of pg_statistic rows for array columns
by not storing the scalar-style and/or array-style stats, if the DBA
knows that they're not going to be useful for a particular column.
I have not done anything about that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] poll: CHECK TRIGGER?

2012-03-03 Thread Tom Lane
Alvaro Herrera alvhe...@alvh.no-ip.org writes:
 Pavel's patch for CHECK FUNCTION is adding another command besides that
 one, which is CHECK TRIGGER.  The idea behind this is that you give it
 the relation to which the trigger is attached in addition to the trigger
 name, and it checks the function being called by that trigger.

 IMHO having a separate command for this is not warranted.  It seems to
 me that we could simply have a variant of CREATE FUNCTION for this; I
 proposed CREATE FUNCTION trigfunc() AS TRIGGER ON tabname.

You mean CHECK FUNCTION ... right?

In principle the CHECK TRIGGER command could apply more checks than
are possible with the proposed CHECK FUNCTION syntax: in particular,
AFAICS AS TRIGGER ON tabname doesn't provide enough info to know
whether the function should expect new and/or old rows to be provided,
nor what it ought to return (which is different for BEFORE/AFTER cases,
STATEMENT cases, etc).  We could add all that info to the CHECK FUNCTION
syntax, but there's definitely some merit to defining the check as
occurring against an existing trigger definition instead.

Now, if there's no intention of ever making the code actually apply
checks of the sort I'm thinking of, maybe CHECK FUNCTION AS TRIGGER
is fine.

 One thing to consider is eventual support for triggers that use
 anonymous function blocks, without a previous CREATE FUNCTION, which is
 being discussed in another thread.

Yeah, that angle seems to be another reason to support CHECK TRIGGER.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] poll: CHECK TRIGGER?

2012-03-03 Thread Alvaro Herrera

Excerpts from Tom Lane's message of sáb mar 03 23:00:26 -0300 2012:
 Alvaro Herrera alvhe...@alvh.no-ip.org writes:
  Pavel's patch for CHECK FUNCTION is adding another command besides that
  one, which is CHECK TRIGGER.  The idea behind this is that you give it
  the relation to which the trigger is attached in addition to the trigger
  name, and it checks the function being called by that trigger.
 
  IMHO having a separate command for this is not warranted.  It seems to
  me that we could simply have a variant of CREATE FUNCTION for this; I
  proposed CREATE FUNCTION trigfunc() AS TRIGGER ON tabname.
 
 You mean CHECK FUNCTION ... right?

Yeah, sorry.

 In principle the CHECK TRIGGER command could apply more checks than
 are possible with the proposed CHECK FUNCTION syntax: in particular,
 AFAICS AS TRIGGER ON tabname doesn't provide enough info to know
 whether the function should expect new and/or old rows to be provided,
 nor what it ought to return (which is different for BEFORE/AFTER cases,
 STATEMENT cases, etc).  We could add all that info to the CHECK FUNCTION
 syntax, but there's definitely some merit to defining the check as
 occurring against an existing trigger definition instead.

Uh!  Now that I read this I realize that what you're supposed to give to
CHECK TRIGGER is the trigger name, not the function name!  In that
light, using CHECK FUNCTION for this doesn't make a lot of sense.

Okay, CHECK TRIGGER it is.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] RFC: Making TRUNCATE more MVCC-safe

2012-03-03 Thread Marti Raudsepp
On Sat, Mar 3, 2012 at 14:53, Simon Riggs si...@2ndquadrant.com wrote:
 Thanks Noah for drawing attention to this thread. I hadn't been
 watching. As you say, this work would allow me to freeze rows at load
 time and avoid the overhead of hint bit setting, which avoids
 performance issues from hint bit setting in checksum patch.

 I've reviewed this patch and it seems OK to me. Good work Marti.

Thanks! This approach wasn't universally liked, but if it gets us
tangible benefits (COPY with frozenxid) then I guess it's a reason to
reconsider.

 v2 patch attached, updated to latest HEAD. Patch adds
 * relvalidxid is reset by VACUUM/ANALYZE to avoid wraparound failure

Personally I'd rather keep this out of ANALYZE -- since its purpose is
to collect stats; VACUUM is responsible for correctness (xid
wraparound etc). But I don't feel strongly about this.

A more important consideration is how this interacts with hot standby.
Currently you compare OldestXmin to relvalidxmin to decide when to
reset it. But the standby's OldestXmin may be older than the master's.
(If VACUUM removes rows then this causes a recovery conflict, but
AFAICT that won't happen if only relvalidxmin changes)

It might be more robust to wait until relfrozenxid exceeds
relvalidxmin -- by then, recovery conflict mechanisms will have taken
care of killing all older snapshots, or am I mistaken?

And a few typos the code...

+ gettext_noop(When enabled viewing a truncated or newly created table 
+ will throw a serialization error to prevent MVCC 
+ discrepancies that were possible priot to 9.2.)

prior not priot

+ * Reset relvalidxmin if its old enough

Should be it's in this context.

Regards,
Marti

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: improve selectivity estimation for IN/NOT IN

2012-03-03 Thread Tom Lane
Daniele Varrazzo daniele.varra...@gmail.com writes:
 the attached patch improves the array selectivity estimation for = ANY
 and  ALL, hence for the IN/NOT IN operators, to avoid the
 shortcoming described in
 http://archives.postgresql.org/pgsql-performance/2012-03/msg6.php.

In connection with Alexander Korotkov's array-estimation patch,
I just committed some code into scalararraysel() that checks whether the
operator is equality or inequality of the array element type.  It does
that by consulting the default btree or hash opclass for the element
type.  I did that with the thought that it could be used to attack this
issue too, but I see that you've done it another way, ie check to see
whether the operator uses eqsel() or neqsel() as selectivity estimator.

I'm not sure offhand which way is better.  It could be argued that yours
is more appropriate because if the operator isn't btree equality, but acts
enough like it to use eqsel() as estimator, then it's still appropriate
for scalararraysel() to treat it as equality.  On the other side of the
coin, an operator might be equality but have reason to use some
operator-specific estimator rather than eqsel().  We have real examples
of the former (such as the approximate-equality geometric operators)
but I think the latter case is just hypothetical.  Another thing that
has to be thought about is that there are numerous cross-type operators
that use eqsel, such as date-vs-timestamp, and it's far from clear
whether it's appropriate for scalararraysel() to use the modified stats
calculation when dealing with one of these.  The btree-based logic is
impervious to that since it won't match any cross-type operator.

Thoughts?

(BTW, in any case I don't trust function pointer comparison to be
portable.  It'd be a lot safer to look at the function OID.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch: improve selectivity estimation for IN/NOT IN

2012-03-03 Thread Daniele Varrazzo
On Sun, Mar 4, 2012 at 2:38 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniele Varrazzo daniele.varra...@gmail.com writes:
 the attached patch improves the array selectivity estimation for = ANY
 and  ALL, hence for the IN/NOT IN operators, to avoid the
 shortcoming described in
 http://archives.postgresql.org/pgsql-performance/2012-03/msg6.php.

 In connection with Alexander Korotkov's array-estimation patch,
 I just committed some code into scalararraysel() that checks whether the
 operator is equality or inequality of the array element type.

It looks like you have grand plans for array estimation. My patch has
a much more modest scope, and I'm hoping it could be applied to
currently maintained PG versions, as I consider the currently produced
estimations a bug.

 It does
 that by consulting the default btree or hash opclass for the element
 type.  I did that with the thought that it could be used to attack this
 issue too, but I see that you've done it another way, ie check to see
 whether the operator uses eqsel() or neqsel() as selectivity estimator.

 I'm not sure offhand which way is better.  It could be argued that yours
 is more appropriate because if the operator isn't btree equality, but acts
 enough like it to use eqsel() as estimator, then it's still appropriate
 for scalararraysel() to treat it as equality.  On the other side of the
 coin, an operator might be equality but have reason to use some
 operator-specific estimator rather than eqsel().  We have real examples
 of the former (such as the approximate-equality geometric operators)
 but I think the latter case is just hypothetical.  Another thing that
 has to be thought about is that there are numerous cross-type operators
 that use eqsel, such as date-vs-timestamp, and it's far from clear
 whether it's appropriate for scalararraysel() to use the modified stats
 calculation when dealing with one of these.  The btree-based logic is
 impervious to that since it won't match any cross-type operator.

 Thoughts?

 (BTW, in any case I don't trust function pointer comparison to be
 portable.  It'd be a lot safer to look at the function OID.)

My original idea was to compare the comparison function with the
catalog: I just don't know how to inspect the catalog/perform the
check. Studying how to do it I've noticed the fn_addr referring to the
well known eqsel/neqsel functions and thought about using it as
indicators of the right operator semantics.

If you are still interested in the patch, for sake of bugfixing, and
somebody provides an example in the source about how to compare the
functions oid, I can try to improve it.

-- Daniele

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parameterized-path cost comparisons need some work

2012-03-03 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Feb 29, 2012 at 6:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 (... thinks some more ...)  No, that doesn't get us there, because that
 doesn't establish that a more-parameterized path produces fewer rows
 than some path that requires less parameterization, yet not none at
 all.  You really want add_path carrying out those comparisons.  In your
 previous example, it's entirely possible that path D is dominated by B
 or C because of poor choices of join quals.

 I'm not following this part.  Can you explain further?  It seems to me
 at any rate that we could get pretty far if we could just separate
 parameterized paths and unparameterized paths into separate buckets.

To try to get some definitive info about this, I instrumented add_path
to emit a log message any time it compared two paths for which one's
parameterization was a strict subset of the other's, and yet the first
was not estimated to return more rows.  Sure enough, I got a lot of
messages, just by running the regression tests (and even more with some
of the other test cases I'd been using for parameterized-path testing).
All of the hits were for equal numbers of rows, though -- there were
no cases with a rows difference in the opposite direction from
expectations.

After looking at the results, I think that the fallacy in what we've
been discussing is this: a parameterized path may well have some extra
selectivity over a less-parameterized one, but perhaps *not enough to be
meaningful*.  The cases I was getting hits on were where the rowcount
estimate got rounded off to be the same as for the less-parameterized
path.  (In this connection it's worth noting that most of the hits were
for rowcount estimates of only 1 or 2 rows.)  So basically, the scenario
is where you have restriction clauses that are already enough to get
down to a small number of rows retrieved, and then you have some join
clauses that are not very selective and don't reduce the rowcount any
further.  Or maybe you have some nicely selective join clauses, and then
adding more joins to some other relations doesn't help any further.

In situations like this, we want add_path to reject the ineffective
more-parameterized path as not being an improvement over the
less-parameterized path.  Not having it do so might save cycles in
add_path itself, but we'd be being penny-wise and pound-foolish, because
not getting rid of the useless paths will cost us a lot more at the next
join level up.

So I'm back to thinking we need to look explicitly at the rowcount
comparison as well as all the existing conditions in add_path.

One annoying thing about that is that it will reduce the usefulness of
add_path_precheck, because that's called before we compute the rowcount
estimates (and indeed not having to make the rowcount estimates is one
of the major savings from the precheck).  I think what we'll have to do
is assume that a difference in parameterization could result in a
difference in rowcount, and hence only a dominant path with exactly the
same parameterization can result in failing the precheck.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Our regex vs. POSIX on longest match

2012-03-03 Thread Brendan Jurd
Hello folks,

I am in the process of accelerating down the rabbit hole of regex
internals.  Something that came up during my reading, is that a POSIX
compliant regex engine ought to always prefer the longest possible
match, when multiple matches are possible beginning from the same
location in the string. [1]

I wasn't sure that that was how our regex engine worked, and indeed,
on checking the manual [2] I found that our regex engine uses a
strange sort of inductive greediness to determine whether the
longest or the shortest possible match ought to be preferred.  The
greediness of individual particles in the regex are taken into
account, and at the top level the entire expression is concluded to be
either greedy, or non-greedy.

I'll admit that this is a pretty obscure point, but we do appear to be
in direct violation of POSIX here.  I am getting the impression that
our engine works this way to route around some of the performance
issues that can arise in trying out every possible match with an
NFA-style engine.

I find it a bit unfortunate that POSIX is so unambiguous about this,
while our engine's treatment is, well ... quite arcane by comparison.
At minimum, I think we should be more explicit in the manual that this
behaviour flips POSIX the proverbial bird.  Several paragraphs south,
there is a sentence reading Incompatibilities of note include ... the
longest/shortest-match (rather than first-match) matching semantics,
but in the context it seems as though the author is talking about an
incompatibility with Perl, not with POSIX.

Thoughts?

Cheers,
BJ

[1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html
[2] 
http://www.postgresql.org/docs/9.0/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Our regex vs. POSIX on longest match

2012-03-03 Thread Tom Lane
Brendan Jurd dire...@gmail.com writes:
 I am in the process of accelerating down the rabbit hole of regex
 internals.  Something that came up during my reading, is that a POSIX
 compliant regex engine ought to always prefer the longest possible
 match, when multiple matches are possible beginning from the same
 location in the string. [1]

 I wasn't sure that that was how our regex engine worked, and indeed,
 on checking the manual [2] I found that our regex engine uses a
 strange sort of inductive greediness to determine whether the
 longest or the shortest possible match ought to be preferred.  The
 greediness of individual particles in the regex are taken into
 account, and at the top level the entire expression is concluded to be
 either greedy, or non-greedy.

 I'll admit that this is a pretty obscure point, but we do appear to be
 in direct violation of POSIX here.

How so?  POSIX doesn't contain any non-greedy constructs.  If you use
only the POSIX-compatible greedy constructs, the behavior is compliant.

The issue that is obscure is, once you define some non-greedy
constructs, how to define how they should act in combination with greedy
ones.  I'm not sure to what extent the engine's behavior is driven by
implementation restrictions and to what extent it's really the sanest
behavior Henry could think of.  I found a comment from him about it:
http://groups.google.com/group/comp.lang.tcl/msg/c493317cc0d10d50
but it's short on details as to what alternatives he considered.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers