Re: [HACKERS] RFC: Making TRUNCATE more MVCC-safe
On Sat, Feb 11, 2012 at 4:46 AM, Noah Misch n...@leadboat.com wrote: But I have to admit I'm intrigued by the idea of extending this to other cases, if there's a reasonable way to do that. For example, if we could fix things up so that we don't see a table at all if it was created after we took our snapshot, that would remove one of the obstacles to marking pages bulk-loaded into that table with FrozenXID and PD_ALL_VISIBLE from the get-go. I think a lot of people would be mighty happy about that. Exactly. But the necessary semantics seem somewhat different. For TRUNCATE, you want to throw a serialization error; but is that also what you want for CREATE TABLE? Or do you just want it to appear empty? I think an error helps just as much there. If you create a table and populate it with data in the same transaction, letting some snapshot see an empty table is an atomicity failure. Your comment illustrates a helpful point: this is just another kind of ordinary serialization failure, one that can happen at any isolation level. No serial transaction sequence can yield one of these errors. Thanks Noah for drawing attention to this thread. I hadn't been watching. As you say, this work would allow me to freeze rows at load time and avoid the overhead of hint bit setting, which avoids performance issues from hint bit setting in checksum patch. I've reviewed this patch and it seems OK to me. Good work Marti. v2 patch attached, updated to latest HEAD. Patch adds * a GUC called strict_mvcc to isolate the new behaviour if required * relvalidxid is reset by VACUUM/ANALYZE to avoid wraparound failure At present this lacks docs for strict_mvcc and doesn't attempt to handle CREATE TABLE case yet, but should be straightforward to do so. Hint bit setting is in separate patch on other thread. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index c910863..4387f49 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -73,7 +73,7 @@ /* GUC variable */ bool synchronize_seqscans = true; - +bool StrictMVCC = true; static HeapScanDesc heap_beginscan_internal(Relation relation, Snapshot snapshot, @@ -1175,6 +1175,24 @@ heap_beginscan_internal(Relation relation, Snapshot snapshot, HeapScanDesc scan; /* + * If the snapshot is older than relvalidxmin then that either a table + * has only recently been created or that a TRUNCATE has removed data + * that we should still be able to see. Either way, we aren't allowed + * to view the rows in StrictMVCC mode. + */ + if (StrictMVCC + TransactionIdIsValid(relation-rd_rel-relvalidxmin) + TransactionIdIsValid(snapshot-xmax) + NormalTransactionIdPrecedes(snapshot-xmax, relation-rd_rel-relvalidxmin)) + { + ereport(ERROR, +(errcode(ERRCODE_T_R_SERIALIZATION_FAILURE), + errmsg(canceling statement due to conflict with TRUNCATE TABLE on %s, + NameStr(relation-rd_rel-relname)), + errdetail(User query attempting to see row versions that have been removed.))); + } + + /* * increment relation ref count while scanning relation * * This is just to make really sure the relcache entry won't go away while diff --git a/src/backend/catalog/heap.c b/src/backend/catalog/heap.c index aef410a..3f9bd5d 100644 --- a/src/backend/catalog/heap.c +++ b/src/backend/catalog/heap.c @@ -787,6 +787,7 @@ InsertPgClassTuple(Relation pg_class_desc, values[Anum_pg_class_relhastriggers - 1] = BoolGetDatum(rd_rel-relhastriggers); values[Anum_pg_class_relhassubclass - 1] = BoolGetDatum(rd_rel-relhassubclass); values[Anum_pg_class_relfrozenxid - 1] = TransactionIdGetDatum(rd_rel-relfrozenxid); + values[Anum_pg_class_relvalidxmin - 1] = TransactionIdGetDatum(rd_rel-relvalidxmin); if (relacl != (Datum) 0) values[Anum_pg_class_relacl - 1] = relacl; else @@ -884,6 +885,7 @@ AddNewRelationTuple(Relation pg_class_desc, new_rel_reltup-relfrozenxid = InvalidTransactionId; } + new_rel_reltup-relvalidxmin = InvalidTransactionId; new_rel_reltup-relowner = relowner; new_rel_reltup-reltype = new_type_oid; new_rel_reltup-reloftype = reloftype; diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index bfbe642..0d96a6a 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -2854,7 +2854,7 @@ reindex_index(Oid indexId, bool skip_constraint_checks) } /* We'll build a new physical relation for the index */ - RelationSetNewRelfilenode(iRel, InvalidTransactionId); + RelationSetNewRelfilenode(iRel, InvalidTransactionId, InvalidTransactionId); /* Initialize the index and rebuild */ /* Note: we do not need to re-establish pkey setting */ diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c index b40e57b..0578241 100644 --- a/src/backend/commands/analyze.c +++
Re: [HACKERS] COPY with hints, rebirth
On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch n...@leadboat.com wrote: On Fri, Mar 02, 2012 at 08:46:45AM +, Simon Riggs wrote: On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: It's still broken: [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO] So this approach isn't the one... The COPY FREEZE patch provides a way for the user to say explicitly that they don't really care about these MVCC corner cases and as a result allows us to avoid touching XidInMVCCSnapshot() at all. So there is still a patch on the table. You can salvage the optimization by tightening its prerequisite: use it when the current subtransaction or a child thereof created or truncated the table. A parent subtransaction having done so is acceptable for the WAL avoidance optimization but not for this. I misread your earlier comment. Yes, that will make this work correctly. Incidentally, I contend that we should write frozen tuples to new/truncated tables unconditionally. The current behavior of making old snapshots see the table as empty violates atomicity at least as badly as letting those snapshots see the future-snapshot contents. But Marti has a sound proposal that would interact with your efforts here to avoid violating atomicity at all: http://archives.postgresql.org/message-id/cabrt9rbrmdsoz8kxgehfb4lg-ev9u67-6dlqvoiibpkkhtl...@mail.gmail.com Thank you for bringing this to my attention. This will make this optimisation work correctly without adding anything to the main code path of XidInMVCCSnapshot() and without the annoying FREEZE syntax. So this puts the patch squarely back on the table. I'll do another version of this later today designed to work with the StrictMVCC patch. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: CHECK FUNCTION statement
On 03/03/2012 02:24 AM, Alvaro Herrera wrote: question: how attached you are to the current return format for CHECK FUNCTION? check function f1(); CHECK FUNCTION - In function: 'f1()' error:42804:5:assignment:subscripted object is not an array (2 rows) It seems to me that it'd be trivial to make it look like this instead: check function f1(); function | lineno | statement | sqlstate | message | detail | hint | level | position | query -+++--+++--+---+--+--- f1() | 5 | assignment | 42804| subscripted object is not an array || | error | | (1 row) This looks much nicer to me. One thing we lose is the caret marking the position of the error -- but I'm wondering if that really works well. I didn't test it but from the code it looks to me like it'd misbehave if you had a multiline statement. Opinions? Well, if you want nicely formated table you can always call the checker function directly, I think the statement returning something that is more human and less machine is more consistent approach with the rest of the utility commands. In other words I don't really see the point of it. Petr -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
Robert Haas robertmh...@gmail.com writes: CREATE COMMAND TRIGGER name ... properties ...; DROP COMMAND TRIGGER name; full stop. If you want to run the same trigger function on some more commands, add another trigger name. +1 +1. I suggested the same thing a while back. Yeah, I know, I just wanted to hear from more people before ditching out a part of the work I did, and Thom was balancing in the opposite direction. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
On 3 March 2012 13:45, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Robert Haas robertmh...@gmail.com writes: CREATE COMMAND TRIGGER name ... properties ...; DROP COMMAND TRIGGER name; full stop. If you want to run the same trigger function on some more commands, add another trigger name. +1 +1. I suggested the same thing a while back. Yeah, I know, I just wanted to hear from more people before ditching out a part of the work I did, and Thom was balancing in the opposite direction. I was? I agreed with Tom's comment, but I did query your interpretation of it with regards to the CREATE COMMAND TRIGGER statement. It seems you removed the ability to create a command trigger against multiple commands, but I don't think that was the problem. It was the DROP COMMAND TRIGGER statement that garnered comment, as it makes more sense to drop the entire trigger than individual commands for that trigger. Initially I had proposed a way to drop all commands on a trigger at once as an additional option, but just dropping it completely or not at all is preferable. But if there is agreement to have multiple commands on a command trigger, I'm wondering whether we should have an OR separator rather than a comma? The reason is that regular triggers define a list of statements this way. Personally I prefer the comma syntax, but my concern (not a strong concern) is for lack of consistency. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [v9.2] Add GUC sepgsql.client_label
On 2012-02-24 17:25, Yeb Havinga wrote: On 2012-02-23 12:17, Kohei KaiGai wrote: 2012/2/20 Yeb Havingayebhavi...@gmail.com: On 2012-02-05 10:09, Kohei KaiGai wrote: The attached part-1 patch moves related routines from hooks.c to label.c because of references to static variables. The part-2 patch implements above mechanism. I took a short look at this patch but am stuck getting the regression test to run properly. First, patch 2 misses the file sepgsql.sql.in and therefore the creation function command for sepgsql_setcon is missing. Thanks for your comments. I added the definition of sepgsql_setcon function to sepgsql.sql.in file, in addition to patch rebasing. Very brief comments due to must leave keyboard soon: I read the source code and played a bit with setcon and the debugger, no strange things found. Code comments / questions: I took the liberty to change a few things, mostly comments, in the attached patch: maybe client_label_committed is a better name for client_label_setcon? this change was made. Is the double negation in the sentence below intended? several comments were changed / moved. There is now one place where te behaviour of the different client_label variables are explained. sepgsql_set_client_label(), maybe add a comment to !new_label that it is reset to the peer label. done. Is the assert client_label_peer != NULL in sepgsql_get_client_label necessary? new_label == NULL / pending_label-label == NULL means use the peer label. Why not use the peer label instead? It turned out that pending_label-label is invariantly non null. Changed code to assume that and added some Asserts. set_label: if new_label == current label according to getcon, is it necessary to add to the pending list? this question still remains. Maybe there would be reasons to hit selinux with the question: can I change from A-A. sepgsql_subxact_callback(), could this be made easier to read by just taking llast(client_label_pending), assert that plabel-subid == mySubId and then list_delete on pointer of that listcell? no this was a naieve suggestion, which fails in any case of a subtransaction with not exactly one call to sepgsql_setcon() Some comments contain typos, I can spend some time on this, though I'm not a native english speaker so it won't be perfect. sgml documentation must still be added. If time permits I can spend some time on that tomorrow. regards, Yeb Havinga -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data diff --git a/contrib/sepgsql/expected/label.out b/contrib/sepgsql/expected/label.out index bac169f..e967c7c 100644 --- a/contrib/sepgsql/expected/label.out +++ b/contrib/sepgsql/expected/label.out @@ -26,6 +26,11 @@ CREATE FUNCTION f4 () RETURNS text AS 'SELECT sepgsql_getcon()' LANGUAGE sql; SECURITY LABEL ON FUNCTION f4() +IS 'system_u:object_r:sepgsql_nosuch_trusted_proc_exec_t:s0'; +CREATE FUNCTION f5 (text) RETURNS bool + AS 'SELECT sepgsql_setcon($1)' +LANGUAGE sql; +SECURITY LABEL ON FUNCTION f5(text) IS 'system_u:object_r:sepgsql_regtest_trusted_proc_exec_t:s0'; -- -- Tests for default labeling behavior @@ -100,6 +105,223 @@ SELECT sepgsql_getcon(); -- client's label must be restored (1 row) -- +-- Test for Dynamic Domain Transition +-- +-- validation of transaction aware dynamic-transition +SELECT sepgsql_getcon(); -- confirm client privilege + sepgsql_getcon +-- + unconfined_u:unconfined_r:unconfined_t:s0:c0.c25 +(1 row) + +SELECT sepgsql_setcon('unconfined_u:unconfined_r:unconfined_t:s0:c0.c15'); + sepgsql_setcon + + t +(1 row) + +SELECT sepgsql_getcon(); + sepgsql_getcon +-- + unconfined_u:unconfined_r:unconfined_t:s0:c0.c15 +(1 row) + +SELECT sepgsql_setcon(NULL); -- failed to reset +ERROR: SELinux: security policy violation +SELECT sepgsql_getcon(); + sepgsql_getcon +-- + unconfined_u:unconfined_r:unconfined_t:s0:c0.c15 +(1 row) + +BEGIN; +SELECT sepgsql_setcon('unconfined_u:unconfined_r:unconfined_t:s0:c0.c12'); + sepgsql_setcon + + t +(1 row) + +SELECT sepgsql_getcon(); + sepgsql_getcon +-- + unconfined_u:unconfined_r:unconfined_t:s0:c0.c12 +(1 row) + +SAVEPOINT svpt_1; +SELECT sepgsql_setcon('unconfined_u:unconfined_r:unconfined_t:s0:c0.c9'); + sepgsql_setcon + + t +(1 row) + +SELECT sepgsql_getcon(); + sepgsql_getcon +- + unconfined_u:unconfined_r:unconfined_t:s0:c0.c9 +(1 row) + +SAVEPOINT svpt_2; +SELECT sepgsql_setcon('unconfined_u:unconfined_r:unconfined_t:s0:c0.c6'); + sepgsql_setcon + + t +(1
Re: [HACKERS] Command Triggers, patch v11
Thom Brown t...@linux.com writes: And having tried building it, it appears to fail. Sorry about that, my compiler here was happy building the source (and I had been doing make clean install along the way) and make installcheck passed, here. Now fixed on my github's branch, including docs. I'll send an updated patch revision later, hopefully including pg_dump support fixtures (well, adaptation to the new way of doing things IIUC) and maybe with some trigger arguments rework done. I understand that you're not blocked until that new version is out, right? You could use either the incremental patch attached for convenience. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support commit 301d8e58b6bfe89f35d82ad7a5216891f56c5d48 Author: Dimitri Fontaine d...@tapoueh.org Date: Sat Mar 3 15:18:59 2012 +0100 Fix RenameCmdTrigger(), per review. diff --git a/doc/src/sgml/ref/alter_command_trigger.sgml b/doc/src/sgml/ref/alter_command_trigger.sgml index dd903e7..48b536c 100644 --- a/doc/src/sgml/ref/alter_command_trigger.sgml +++ b/doc/src/sgml/ref/alter_command_trigger.sgml @@ -22,6 +22,7 @@ PostgreSQL documentation refsynopsisdiv synopsis ALTER COMMAND TRIGGER replaceable class=PARAMETERname/replaceable SET replaceable class=parameterenabled/replaceable +ALTER COMMAND TRIGGER replaceable class=PARAMETERname/replaceable RENAME TO replaceable class=PARAMETERnewname/replaceable phrasewhere replaceable class=parameterenabled/replaceable can be one of:/phrase @@ -62,10 +63,10 @@ ALTER COMMAND TRIGGER replaceable class=PARAMETERname/replaceable SET rep /varlistentry varlistentry -termreplaceable class=PARAMETERcommand/replaceable/term +termreplaceable class=PARAMETERnewname/replaceable/term listitem para - The command tag on which this trigger acts. + The new name of the command trigger. /para /listitem /varlistentry diff --git a/src/backend/commands/alter.c b/src/backend/commands/alter.c index 4d07642..b447306 100644 --- a/src/backend/commands/alter.c +++ b/src/backend/commands/alter.c @@ -66,7 +66,7 @@ ExecRenameStmt(RenameStmt *stmt) break; case OBJECT_CMDTRIGGER: - RenameCmdTrigger(stmt-object, stmt-subname, stmt-newname); + RenameCmdTrigger(stmt-object, stmt-newname); break; case OBJECT_DATABASE: diff --git a/src/backend/commands/cmdtrigger.c b/src/backend/commands/cmdtrigger.c index e8d294d..ef9794f 100644 --- a/src/backend/commands/cmdtrigger.c +++ b/src/backend/commands/cmdtrigger.c @@ -294,13 +294,17 @@ AlterCmdTrigger(AlterCmdTrigStmt *stmt) * Rename command trigger */ void -RenameCmdTrigger(const char *trigname, const char *newname) +RenameCmdTrigger(List *name, const char *newname) { SysScanDesc tgscan; ScanKeyData skey[1]; HeapTuple tup; Relation rel; Form_pg_cmdtrigger cmdForm; + char *trigname; + + Assert(list_length(name) == 1); + trigname = strVal((Value *)linitial(name)); CheckCmdTriggerPrivileges(); diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c index defcdd1..6a20a6c 100644 --- a/src/backend/nodes/copyfuncs.c +++ b/src/backend/nodes/copyfuncs.c @@ -3463,7 +3463,6 @@ _copyCreateCmdTrigStmt(const CreateCmdTrigStmt *from) { CreateCmdTrigStmt *newnode = makeNode(CreateCmdTrigStmt); - COPY_NODE_FIELD(command); COPY_STRING_FIELD(trigname); COPY_SCALAR_FIELD(timing); COPY_NODE_FIELD(funcname); @@ -3476,7 +3475,6 @@ _copyAlterCmdTrigStmt(const AlterCmdTrigStmt *from) { AlterCmdTrigStmt *newnode = makeNode(AlterCmdTrigStmt); - COPY_STRING_FIELD(command); COPY_STRING_FIELD(trigname); COPY_STRING_FIELD(tgenabled); diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c index 1ad31f0..137075a 100644 --- a/src/backend/nodes/equalfuncs.c +++ b/src/backend/nodes/equalfuncs.c @@ -1780,7 +1780,6 @@ _equalCreateTrigStmt(const CreateTrigStmt *a, const CreateTrigStmt *b) static bool _equalCreateCmdTrigStmt(const CreateCmdTrigStmt *a, const CreateCmdTrigStmt *b) { - COMPARE_NODE_FIELD(command); COMPARE_STRING_FIELD(trigname); COMPARE_SCALAR_FIELD(timing); COMPARE_NODE_FIELD(funcname); @@ -1791,7 +1790,6 @@ _equalCreateCmdTrigStmt(const CreateCmdTrigStmt *a, const CreateCmdTrigStmt *b) static bool _equalAlterCmdTrigStmt(const AlterCmdTrigStmt *a, const AlterCmdTrigStmt *b) { - COMPARE_STRING_FIELD(command); COMPARE_STRING_FIELD(trigname); COMPARE_STRING_FIELD(tgenabled); diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index e5a0d34..e9872d3 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -6931,13 +6931,12 @@ RenameStmt: ALTER AGGREGATE func_name aggr_args RENAME TO name n-missing_ok = false; $$ = (Node *)n; } - | ALTER TRIGGER name ON COMMAND trigger_command RENAME TO name + | ALTER COMMAND TRIGGER name RENAME TO name { RenameStmt *n = makeNode(RenameStmt); n-renameType = OBJECT_CMDTRIGGER; -
Re: [HACKERS] Command Triggers, patch v11
Thom Brown t...@linux.com writes: problem. It was the DROP COMMAND TRIGGER statement that garnered comment, as it makes more sense to drop the entire trigger than individual commands for that trigger. What you're saying here is that a single command could have more than one command attached to it, and what I understand Tom, Robert and Kevin are saying is that any given command trigger should only be attached to a single command. If we wanted to be more consistent we would need to have a way to attach the same trigger to both BEFORE and AFTER the command, as of now you need two triggers here. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
On 3 March 2012 14:26, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Thom Brown t...@linux.com writes: And having tried building it, it appears to fail. Sorry about that, my compiler here was happy building the source (and I had been doing make clean install along the way) and make installcheck passed, here. Now fixed on my github's branch, including docs. I'll send an updated patch revision later, hopefully including pg_dump support fixtures (well, adaptation to the new way of doing things IIUC) and maybe with some trigger arguments rework done. I understand that you're not blocked until that new version is out, right? You could use either the incremental patch attached for convenience. Thanks for the extra patch. Do you see any functional changes between now and your next patch? If so, it doesn't make sense for me to test anything yet. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
On 3 March 2012 14:34, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Thom Brown t...@linux.com writes: problem. It was the DROP COMMAND TRIGGER statement that garnered comment, as it makes more sense to drop the entire trigger than individual commands for that trigger. What you're saying here is that a single command could have more than one command attached to it, and what I understand Tom, Robert and Kevin are saying is that any given command trigger should only be attached to a single command. I hadn't interpreted it that way, but then that could just be my misinterpretation. I'm still of the opinion that a single command trigger should be able to attach to multiple commands, just not amendable once the trigger has been created. If that's not how others saw it, I'm outnumbered. So my vote was for: CREATE COMMAND TRIGGER name { BEFORE | AFTER } command [, ... ] EXECUTE PROCEDURE function_name () ALTER COMMAND TRIGGER name SET enabled DROP COMMAND TRIGGER [ IF EXISTS ] name [ CASCADE | RESTRICT ] The only difference shown above compared to the current implementation in your patch is [, ... ] after command in the CREATE COMMAND TRIGGER statement. My argument for this is that you may wish to implement the same trigger for tables, views and foreign tables rather than create 3 separate triggers. If we wanted to be more consistent we would need to have a way to attach the same trigger to both BEFORE and AFTER the command, as of now you need two triggers here. I'm not sure I understand. Attaching a trigger to both BEFORE and AFTER isn't possible for regular triggers so I don't see why that would introduce consistency. Could you explain? -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
Thom Brown wrote: Dimitri Fontaine wrote: Thom Brown writes: problem. It was the DROP COMMAND TRIGGER statement that garnered comment, as it makes more sense to drop the entire trigger than individual commands for that trigger. What you're saying here is that a single command could have more than one command attached to it, and what I understand Tom, Robert and Kevin are saying is that any given command trigger should only be attached to a single command. I hadn't interpreted it that way Nor had I. I'm still of the opinion that a single command trigger should be able to attach to multiple commands, just not amendable once the trigger has been created. That was my understanding of what we were discussing, too. So my vote was for: CREATE COMMAND TRIGGER name { BEFORE | AFTER } command [, ... ] EXECUTE PROCEDURE function_name () ALTER COMMAND TRIGGER name SET enabled DROP COMMAND TRIGGER [ IF EXISTS ] name [ CASCADE | RESTRICT ] Same here. My argument for this is that you may wish to implement the same trigger for tables, views and foreign tables rather than create 3 separate triggers. Right. What I thought I was agreeing with was the notion that you should need to specify more than the trigger name to drop the trigger. Rather like how you can create a trigger AFTER INSERT OR UPDATE OR DELETE, but you don't need to specify all those events to drop the trigger -- just the name will do. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
On 3 March 2012 16:12, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Thom Brown wrote: Dimitri Fontaine wrote: Thom Brown writes: problem. It was the DROP COMMAND TRIGGER statement that garnered comment, as it makes more sense to drop the entire trigger than individual commands for that trigger. What you're saying here is that a single command could have more than one command attached to it, and what I understand Tom, Robert and Kevin are saying is that any given command trigger should only be attached to a single command. I hadn't interpreted it that way Nor had I. I'm still of the opinion that a single command trigger should be able to attach to multiple commands, just not amendable once the trigger has been created. That was my understanding of what we were discussing, too. So my vote was for: CREATE COMMAND TRIGGER name { BEFORE | AFTER } command [, ... ] EXECUTE PROCEDURE function_name () ALTER COMMAND TRIGGER name SET enabled DROP COMMAND TRIGGER [ IF EXISTS ] name [ CASCADE | RESTRICT ] Same here. My argument for this is that you may wish to implement the same trigger for tables, views and foreign tables rather than create 3 separate triggers. Right. What I thought I was agreeing with was the notion that you should need to specify more than the trigger name to drop the trigger. Rather like how you can create a trigger AFTER INSERT OR UPDATE OR DELETE, but you don't need to specify all those events to drop the trigger -- just the name will do. Don't you mean shouldn't need to specify more than the trigger name? -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Collect frequency statistics for arrays
... BTW, could you explain exactly how that Fill histogram by hashtab loop works? It's way too magic for my taste, and does in fact have bugs in the currently submitted patch. I've reworked it to this: /* Fill histogram by hashtab. */ delta = analyzed_rows - 1; count_item_index = 0; frac = sorted_count_items[0]-frequency * (num_hist - 1); for (i = 0; i num_hist; i++) { while (frac = 0) { count_item_index++; Assert(count_item_index count_items_count); frac += sorted_count_items[count_item_index]-frequency * (num_hist - 1); } hist[i] = sorted_count_items[count_item_index]-count; frac -= delta; } Assert(count_item_index == count_items_count - 1); The asserts don't fire in any test case I've tried, which seems to indicate that it *does* work in the sense that the first histogram entry is always the smallest count and the last histogram entry is always the largest one. But it's extremely unclear why it manages to stop exactly at the last count_items array entry, or for that matter why it's generating a representative histogram at all. I'm suspicious that the -1 bits represent off-by-one bugs. I also don't especially like the fact that frac is capable of overflowing (since worst case frequency is 300 * 1 and worst case num_hist is 1, with the current limits on statistics_target). We could work around that by doing the frac arithmetic in int64, but I wonder whether that couldn't be avoided. In any case, first I'd like an explanation why this code works at all. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
Thom Brown wrote: Don't you mean shouldn't need to specify more than the trigger name? You are right, that's what I meant to say. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Command Triggers, patch v11
Kevin Grittner kevin.gritt...@wicourts.gov writes: Right. What I thought I was agreeing with was the notion that you should need to specify more than the trigger name to drop the trigger. Rather like how you can create a trigger AFTER INSERT OR UPDATE OR DELETE, but you don't need to specify all those events to drop the trigger -- just the name will do. The parallel between INSERT/UPDATE/DELETE and the trigger's command is not working well enough, because in the data trigger case we're managing a single catalog entry with a single command, and in the command trigger case, in my model at least, we would be managing several catalog entries per command. To take an example: CREATE COMMAND TRIGGER foo AFTER create table, create view; DROP COMMAND TRIGGER foo; The first command would create two catalog entries, and the second one would delete the same two entries. It used to work this way in the patch, then when merging with the new remove object infrastructure I lost that ability. From the beginning Robert has been saying he didn't want that behavior, and Tom is now saying the same, IIUC. So we're back to one command, one catalog entry. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: CHECK FUNCTION statement
Excerpts from Petr Jelínek's message of sáb mar 03 10:26:04 -0300 2012: On 03/03/2012 02:24 AM, Alvaro Herrera wrote: question: how attached you are to the current return format for CHECK FUNCTION? check function f1(); CHECK FUNCTION - In function: 'f1()' error:42804:5:assignment:subscripted object is not an array (2 rows) Well, if you want nicely formated table you can always call the checker function directly, I think the statement returning something that is more human and less machine is more consistent approach with the rest of the utility commands. In other words I don't really see the point of it. I am not against having some more human readable output than plain tabular. In particular the idea that we need to have all fields is of course open to discussion. But is the output as proposed above really all that human friendly? I disagree that it cannot be improved. BTW one thing that's missing in this feature so far is some translatability of the returned output. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY with hints, rebirth
On Sat, Mar 3, 2012 at 1:14 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Mar 2, 2012 at 8:58 PM, Noah Misch n...@leadboat.com wrote: On Fri, Mar 02, 2012 at 08:46:45AM +, Simon Riggs wrote: On Thu, Mar 1, 2012 at 8:49 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: It's still broken: [BEGIN;TRUNCATE;SAVEPOINT;COPY;ROLLBACK TO] So this approach isn't the one... The COPY FREEZE patch provides a way for the user to say explicitly that they don't really care about these MVCC corner cases and as a result allows us to avoid touching XidInMVCCSnapshot() at all. So there is still a patch on the table. You can salvage the optimization by tightening its prerequisite: use it when the current subtransaction or a child thereof created or truncated the table. A parent subtransaction having done so is acceptable for the WAL avoidance optimization but not for this. I misread your earlier comment. Yes, that will make this work correctly. Incidentally, I contend that we should write frozen tuples to new/truncated tables unconditionally. The current behavior of making old snapshots see the table as empty violates atomicity at least as badly as letting those snapshots see the future-snapshot contents. But Marti has a sound proposal that would interact with your efforts here to avoid violating atomicity at all: http://archives.postgresql.org/message-id/cabrt9rbrmdsoz8kxgehfb4lg-ev9u67-6dlqvoiibpkkhtl...@mail.gmail.com Thank you for bringing this to my attention. This will make this optimisation work correctly without adding anything to the main code path of XidInMVCCSnapshot() and without the annoying FREEZE syntax. So this puts the patch squarely back on the table. I'll do another version of this later today designed to work with the StrictMVCC patch. OK, so latest version of patch handling the subtransaction issues correctly. Thanks to Noah and Heikki for identifying them and hitting me with the clue stick. So this version of patch has negligible effect on mainline performance though leaves newly loaded data visible in ways that would break MVCC. So this patch relies on the safe_truncate.v2.patch posted on separate thread. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index c910863..b891327 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2050,12 +2050,27 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, tup-t_data-t_infomask = ~(HEAP_XACT_MASK); tup-t_data-t_infomask2 = ~(HEAP2_XACT_MASK); tup-t_data-t_infomask |= HEAP_XMAX_INVALID; - HeapTupleHeaderSetXmin(tup-t_data, xid); HeapTupleHeaderSetCmin(tup-t_data, cid); HeapTupleHeaderSetXmax(tup-t_data, 0); /* for cleanliness */ tup-t_tableOid = RelationGetRelid(relation); /* + * If we are inserting into a new relation invisible as yet to other + * backends and our session has no prior snapshots and no ready portals + * then we can also set the hint bit for the rows we are inserting. The + * last two restrictions ensure that HeapTupleSatisfiesMVCC() gives + * the right answer if the current transaction inspects the data after + * we load it. + */ + if (options HEAP_INSERT_HINT_XMIN) + { + tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED; + HeapTupleHeaderSetXmin(tup-t_data, FrozenTransactionId); + } + else + HeapTupleHeaderSetXmin(tup-t_data, xid); + + /* * If the new tuple is too big for storage or contains already toasted * out-of-line attributes from some other relation, invoke the toaster. */ diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 110480f..a34f072 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -43,6 +43,7 @@ #include utils/builtins.h #include utils/lsyscache.h #include utils/memutils.h +#include utils/portal.h #include utils/rel.h #include utils/snapmgr.h @@ -1922,6 +1923,21 @@ CopyFrom(CopyState cstate) hi_options |= HEAP_INSERT_SKIP_FSM; if (!XLogIsNeeded()) hi_options |= HEAP_INSERT_SKIP_WAL; + + if (ThereAreNoPriorRegisteredSnapshots() + ThereAreNoReadyPortals()) + { + SubTransactionId currxid = GetCurrentSubTransactionId(); + + /* + * If the relfilenode has been created in this subtransaction + * then we can further optimise the data load by setting hint + * bits and pre-freezing tuples. + */ + if (cstate-rel-rd_createSubid == currxid || +cstate-rel-rd_newRelfilenodeSubid == currxid) +hi_options |= HEAP_INSERT_HINT_XMIN; + } } /* diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c index cfb73c1..24075db 100644 --- a/src/backend/utils/mmgr/portalmem.c +++ b/src/backend/utils/mmgr/portalmem.c @@ -1055,3 +1055,22 @@ pg_cursor(PG_FUNCTION_ARGS) return (Datum) 0; } + +bool
Re: [HACKERS] Command Triggers, patch v11
On 3 March 2012 19:25, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Right. What I thought I was agreeing with was the notion that you should need to specify more than the trigger name to drop the trigger. Rather like how you can create a trigger AFTER INSERT OR UPDATE OR DELETE, but you don't need to specify all those events to drop the trigger -- just the name will do. The parallel between INSERT/UPDATE/DELETE and the trigger's command is not working well enough, because in the data trigger case we're managing a single catalog entry with a single command, and in the command trigger case, in my model at least, we would be managing several catalog entries per command. To take an example: CREATE COMMAND TRIGGER foo AFTER create table, create view; DROP COMMAND TRIGGER foo; The first command would create two catalog entries, and the second one would delete the same two entries. It used to work this way in the patch, then when merging with the new remove object infrastructure I lost that ability. From the beginning Robert has been saying he didn't want that behavior, and Tom is now saying the same, IIUC. So we're back to one command, one catalog entry. That sucks. I'm surprised there's no provision for overriding it on a command-by-command basis. I would suggest an array instead, but that sounds costly from a look-up perspective. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Patch: improve selectivity estimation for IN/NOT IN
Hello, the attached patch improves the array selectivity estimation for = ANY and ALL, hence for the IN/NOT IN operators, to avoid the shortcoming described in http://archives.postgresql.org/pgsql-performance/2012-03/msg6.php. Cheers, -- Daniele From d10e60dd3dec340b96b372792e4a0650ec10da92 Mon Sep 17 00:00:00 2001 From: Daniele Varrazzo daniele.varra...@gmail.com Date: Sat, 3 Mar 2012 19:27:16 + Subject: [PATCH] Improve array selectivity estimation for = ANY and ALL Assume distinct array elements, hence disjoint probabilities instead of independent. Using the wrong probability model can easily lead to serious selectivity overestimation for the IN operator and underestimation for NOT IN. --- src/backend/utils/adt/selfuncs.c | 37 + 1 files changed, 33 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 0a685aa..b0816ca 100644 --- a/src/backend/utils/adt/selfuncs.c +++ b/src/backend/utils/adt/selfuncs.c @@ -1797,10 +1797,29 @@ scalararraysel(PlannerInfo *root, ObjectIdGetDatum(operator), PointerGetDatum(args), Int32GetDatum(varRelid))); + /* + * For generic operators, assume the selection probabilities as + * independent for each array element. But for the equality the + * probabilities are disjoint, so the total probability is just + * the sum of the probabilities of the single elements. This is + * true only if the array doesn't contain dups, but the check is + * expensive: just assume it, to avoid penalizing well-written + * queries in favour of poorly-written ones. + */ if (useOr) -s1 = s1 + s2 - s1 * s2; + { +if (oprselproc.fn_addr == eqsel) + s1 = s1 + s2; +else + s1 = s1 + s2 - s1 * s2; + } else -s1 = s1 * s2; + { +if (oprselproc.fn_addr == neqsel) + s1 = s1 + s2 - 1.0; +else + s1 = s1 * s2; + } } } else if (rightop IsA(rightop, ArrayExpr) @@ -1840,9 +1859,19 @@ scalararraysel(PlannerInfo *root, PointerGetDatum(args), Int32GetDatum(varRelid))); if (useOr) -s1 = s1 + s2 - s1 * s2; + { +if (oprselproc.fn_addr == eqsel) + s1 = s1 + s2; +else + s1 = s1 + s2 - s1 * s2; + } else -s1 = s1 * s2; + { +if (oprselproc.fn_addr == neqsel) + s1 = s1 + s2 - 1.0; +else + s1 = s1 * s2; + } } } else -- 1.7.5.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: CHECK FUNCTION statement
Excerpts from Pavel Stehule's message of sáb mar 03 02:25:52 -0300 2012: Hello It wasn't all that difficult -- see below. While at this, I have a question: how attached you are to the current return format for CHECK FUNCTION? TupleDescr is created by language creator. This ensure exactly expected format, because there are no possible registry check function with other output tuple descriptor. I'm not sure what you're saying. What TupDesc are you talking about? The tupdesc returned by the checker is certainly hardcoded by the core support; the language creator cannot deviate from it. check function f1(); function | lineno | statement | sqlstate | message | detail | hint | level | position | query -+++--+++--+---+--+--- f1() | 5 | assignment | 42804 | subscripted object is not an array | | | error | | (1 row) This looks much nicer to me. I am strongly disagree. 1. This format is not consistent with other commands, 2. This format is difficult for copy/paste 3. THE ARE NOT CARET - this is really important 5. This form is bad for terminals - there are long rows, and with \x outout, there are lot of garbage for multicommand 4. When you would to this form, you can to directly call SRF PL check functions. I am not sure that consistency is the most important thing here; I think what we care about is that it's usable. So yeah, it might be hard to cut and paste, and also too wide. Maybe we can run some of the fields together, and omit others. I am not sure about the caret thingy -- mainly because I don't think it works all that well. I don't know how psql does it, but I notice that it shows a single line in a multiline query -- so it's not just a matter of adding some number of spaces. Given the negative feedback, I'm going to leave this output format unchanged; we can tweak it later. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY with hints, rebirth
On Fri, Mar 2, 2012 at 11:15 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Simon Riggs si...@2ndquadrant.com wrote: I like Marti's idea. At present, making his idea work could easily mean checksums sink, so not sure whether to attempt to make that work in detail. For my part, improving bulk load performance and TRUNCATE transactional semantics would trump checksums. If we have to defer one or the other to a later release, I would prefer that we bump checksums. While I understand the desire for checksums, and understand others have had situations where they would have helped, so far I have yet to run into a situation where I feel they would have helped me. The hint bit and freeze issues require ongoing attention and have real performance consequences on a regular basis. And closing the window for odd transactional semantics on TRUNCATE versus DELETE of all rows in a table would be one less thing to worry about, in my world. Since you supported this, I've invested the time to make it work. It doesn't look like it needs to be either-or. Please review the safe_truncate.v2.patch and copy_autofrozen.v359.patch, copied here to assist testing and inspection. At present those patches handle only the TRUNCATE/COPY optimisation but we could easily add CTAS, CREATE/COPY, CLUSTERVACUUM FULL etc.. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index c910863..b891327 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2050,12 +2050,27 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid, tup-t_data-t_infomask = ~(HEAP_XACT_MASK); tup-t_data-t_infomask2 = ~(HEAP2_XACT_MASK); tup-t_data-t_infomask |= HEAP_XMAX_INVALID; - HeapTupleHeaderSetXmin(tup-t_data, xid); HeapTupleHeaderSetCmin(tup-t_data, cid); HeapTupleHeaderSetXmax(tup-t_data, 0); /* for cleanliness */ tup-t_tableOid = RelationGetRelid(relation); /* + * If we are inserting into a new relation invisible as yet to other + * backends and our session has no prior snapshots and no ready portals + * then we can also set the hint bit for the rows we are inserting. The + * last two restrictions ensure that HeapTupleSatisfiesMVCC() gives + * the right answer if the current transaction inspects the data after + * we load it. + */ + if (options HEAP_INSERT_HINT_XMIN) + { + tup-t_data-t_infomask |= HEAP_XMIN_COMMITTED; + HeapTupleHeaderSetXmin(tup-t_data, FrozenTransactionId); + } + else + HeapTupleHeaderSetXmin(tup-t_data, xid); + + /* * If the new tuple is too big for storage or contains already toasted * out-of-line attributes from some other relation, invoke the toaster. */ diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 110480f..a34f072 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -43,6 +43,7 @@ #include utils/builtins.h #include utils/lsyscache.h #include utils/memutils.h +#include utils/portal.h #include utils/rel.h #include utils/snapmgr.h @@ -1922,6 +1923,21 @@ CopyFrom(CopyState cstate) hi_options |= HEAP_INSERT_SKIP_FSM; if (!XLogIsNeeded()) hi_options |= HEAP_INSERT_SKIP_WAL; + + if (ThereAreNoPriorRegisteredSnapshots() + ThereAreNoReadyPortals()) + { + SubTransactionId currxid = GetCurrentSubTransactionId(); + + /* + * If the relfilenode has been created in this subtransaction + * then we can further optimise the data load by setting hint + * bits and pre-freezing tuples. + */ + if (cstate-rel-rd_createSubid == currxid || +cstate-rel-rd_newRelfilenodeSubid == currxid) +hi_options |= HEAP_INSERT_HINT_XMIN; + } } /* diff --git a/src/backend/utils/mmgr/portalmem.c b/src/backend/utils/mmgr/portalmem.c index cfb73c1..24075db 100644 --- a/src/backend/utils/mmgr/portalmem.c +++ b/src/backend/utils/mmgr/portalmem.c @@ -1055,3 +1055,22 @@ pg_cursor(PG_FUNCTION_ARGS) return (Datum) 0; } + +bool +ThereAreNoReadyPortals(void) +{ + HASH_SEQ_STATUS status; + PortalHashEnt *hentry; + + hash_seq_init(status, PortalHashTable); + + while ((hentry = (PortalHashEnt *) hash_seq_search(status)) != NULL) + { + Portal portal = hentry-portal; + + if (portal-status == PORTAL_READY) + return false; + } + + return true; +} diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c index 5aebbd1..5d9e3bf 100644 --- a/src/backend/utils/time/snapmgr.c +++ b/src/backend/utils/time/snapmgr.c @@ -1183,3 +1183,12 @@ DeleteAllExportedSnapshotFiles(void) FreeDir(s_dir); } + +bool +ThereAreNoPriorRegisteredSnapshots(void) +{ + if (RegisteredSnapshots = 1) + return true; + + return false; +} diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h index fa38803..0381785 100644 --- a/src/include/access/heapam.h +++ b/src/include/access/heapam.h @@ -26,6
[HACKERS] tuplesort memory usage: grow_memtuples
When sorting small tuples, the memtuples array can use a substantial fraction of the total per-tuple memory used. (In the case of pass by value, it is all of it) The way it grows leads to sub-optimal memory usage. In one case, if it can grow by 1.999 x, but not by 2 x, we just give up and use half the memory we could have. This is what actually happens in pass-by-value sorting when using a power of two for *work_mem. While I understand we don't want to thrash memory, it seems like we could at least grow by less than 2x just one last time. Also, if there is just barely room to double memtuples (and it is pass-by-reference sorting) then the number of slots is doubled but leaves no room for the tuples themselves to be stored, so most of those slots can't be used. A solution to both would be to assess when we are about to do one of the two above things, and instead use the historical tuple size to compute how much of a growth we can do so that the memtuples slots and the cumulative palloc heap of tuples will exhaust at the same time. If the availMem is less than half of allowedMem, then the next doubling of memtuples would either fail, or would succeed but leave behind too little allowedMem to hold the full complement of new tuples. So either way, extrapolate from current usage. The patch assumes that nothing other than memtuples and individual tuple storage have been deducted from availMem. Currently that is true, but may not always be so (we recently discussed pre-deducting the tape buffer overhead, so it doesn't cause a memory overrun). As the indicates, I am not currently able to prove to myself that there are no conditions under which this change could cause LACKMEM(state) to become true. This patch gives about a 2-fold window over which a sort that would otherwise go to tape sort will instead be done in memory, which is 2-3 times faster. That might not be very impressive, because after all if you wanted to use more memory you could have just increased work_mem. But it still seems worthwhile to use the memory we are told to use. It may be hard to fine-tune the *work_mem settings, but not using the amount we are given surely doesn't make it any easier. But other than the tape vs in memory improvement, this also gives other improvements. For example, this gives me about a 35% speed up when using default work_mem to do a select count(distinct k) from t where k is random integer and t has 512 million rows. Granted, it is silly to intentionally do such a large sort with such small memory. But it still seems worth improving, and the wins should be sitting around at other sizes as well, though probably not as big. In my test case, the improvement mostly comes from turning random IO reads into sequential ones. The prevelance of random IO is due to a cascade of issues: 1) As discussed above, we are using only half the memory we could be. 2) Due to MINORDER we are spreading that reduced memory over twice as many tapes as MERGE_BUFFER_SIZE would suggest. Changing this would obviously have a trade-off 3) MERGE_BUFFER_SIZE itself describes the in-RAM footprint, not the on-tape footprint, because we unpack tuples as we read them from tape. Perhaps mergeprereadone could stash the preread data unpacked and unpack it only as needed. But that is a much more invasive change. 4) Pre-reading all tapes whenever just one becomes empty causes blocks to be freed, and thus re-used, in smaller contiguous chunks than they otherwise would. (Easy to change, but there is trade-off to changing it) 5) Even without 4, logtape.c still makes a jumble of the free list on multi-level merges. I'm not entirely sure why yet--I think maybe it is the way indirect blocks are handled. Add it all up, and instead of pre-reading 32 consecutive 8K blocks, it pre-reads only about 1 or 2 consecutive ones on the final merge. Now some of those could be salvaged by the kernel keeping track of multiple interleaved read ahead opportunities, but in my hands vmstat shows a lot of IO wait and shows reads that seem to be closer to random IO than large read-ahead. If it used truly efficient read ahead, CPU would probably be limiting. I'll add this to the open commit fest. When doing performance testing, I find it far easier to alter a guc.c parameter than to keep multiple compiled binaries around. Would people like a testing patch that does the same thing as this, or does the original behavior, under control of a parameter setting? Cheers, Jeff sortmem_grow-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] COPY with hints, rebirth
Simon Riggs wrote: Kevin Grittner wrote: Simon Riggs wrote: I like Marti's idea. At present, making his idea work could easily mean checksums sink For my part, improving bulk load performance and TRUNCATE transactional semantics would trump checksums It doesn't look like it needs to be either-or. Great news! Please review the safe_truncate.v2.patch and copy_autofrozen.v359.patch, copied here to assist testing and inspection. I'll look at it today. At present those patches handle only the TRUNCATE/COPY optimisation but we could easily add CTAS, CREATE/COPY, CLUSTERVACUUM FULL etc.. CREATE/COPY would be important so that pg_dump | psql -1 would benefit. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] poll: CHECK TRIGGER?
Hi, Pavel's patch for CHECK FUNCTION is adding another command besides that one, which is CHECK TRIGGER. The idea behind this is that you give it the relation to which the trigger is attached in addition to the trigger name, and it checks the function being called by that trigger. IMHO having a separate command for this is not warranted. It seems to me that we could simply have a variant of CREATE FUNCTION for this; I proposed CREATE FUNCTION trigfunc() AS TRIGGER ON tabname. Besides, it's really the function code being checked, not the trigger itself; the trigger is only providing the tuple descriptor for NEW and OLD. Pavel didn't like this idea; also, a quick poll on twitter elicited only two answers: Dimitri Fontaine prefers CHECK TRIGGER, and Guillaume Lelarge prefers CHECK FUNCTION. So I still don't know which route to go with this. Thoughts? One thing to consider is eventual support for triggers that use anonymous function blocks, without a previous CREATE FUNCTION, which is being discussed in another thread. Another point is that CHECK TRIGGER requires a separate documentation page. -- Álvaro Herrera alvhe...@alvh.no-ip.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] review: CHECK FUNCTION statement
Excerpts from Alvaro Herrera's message of sáb mar 03 16:54:19 -0300 2012: Excerpts from Pavel Stehule's message of sáb mar 03 02:25:52 -0300 2012: 3. THE ARE NOT CARET - this is really important I am not sure about the caret thingy -- mainly because I don't think it works all that well. I don't know how psql does it, but I notice that it shows a single line in a multiline query -- so it's not just a matter of adding some number of spaces. I checked how this works in psql. It is upwards of 200 lines of code -- see reportErrorPosition in libpq/fe-protocol3.c. I'm not sure this can be made to work sensibly here. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Memory usage during sorting
On Sun, Feb 26, 2012 at 7:20 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Feb 25, 2012 at 4:31 PM, Jeff Janes jeff.ja...@gmail.com wrote: I'm not sure about the conclusion, but given this discussion, I'm inclined to mark this Returned with Feedback. OK, thanks. Does anyone have additional feed-back on how tightly we wish to manage memory usage? Is trying to make us use as much memory as we are allowed to without going over a worthwhile endeavor at all, or is it just academic nitpicking? I'm not sure, either. It strikes me that, in general, it's hard to avoid a little bit of work_mem overrun, since we can't know whether the next tuple will fit until we've read it, and then if it turns out to be big, well, the best thing we can do is free it, but perhaps that's closing the barn door after the horse has gotten out already. That type of overrun doesn't bother me much, because the size of the next tuple some else feeds us is mostly outside of this modules control. And also be cause the memory that is overrun should be reusable once the offending tuple is written out to tape. The type of overrun I'm more concerned with is that which is purely under this modules control, and which is then not re-used productively. The better solution would be to reduce the overhead in the first place. While building the initial runs, there is no reason to have 3 blocks worth of overhead for each tape, when only one tape is ever being used at a time. But that change seems much tougher to implement. Having recently spent quite a bit of time looking at tuplesort.c as a result of Peter Geoghegan's work and some other concerns, I'm inclined to think that it needs more than minor surgery. That file is peppered with numerous references to Knuth which serve the dual function of impressing the reader with the idea that the code must be very good (since Knuth is a smart guy) and rendering it almost completely impenetrable (since the design is justified by reference to a textbook most of us probably do not have copies of). Yes, I agree with that analysis. And getting the volume I want by inter-library-loan has been challenging--I keep getting the volume before or after the one I want. Maybe Knuth starts counting volumes at 0 and libraries start counting at 1 :) Anyway, I think the logtape could use redoing. When your tapes are actually physically tape drives, it is necessary to build up runs one after the other on physical tapes, because un-mounting a tape from a tape drive and remounting another tape is not very feasible at scale. Which then means you need to place your runs carefully, because if the final merge finds that two runs it needs are back-to-back on one tape, that is bad. But with disks pretending to be tapes, you could re-arrange the runs with just some book keeping. Maintaining the distinction between tapes and runs is pointless, which means the Fibonacci placement algorithm is pointless as well. A quick Google search for external sorting algorithms suggest that the typical way of doing an external sort is to read data until you fill your in-memory buffer, quicksort it, and dump it out as a run. Repeat until end-of-data; then, merge the runs (either in a single pass, or if there are too many, in multiple passes). I'm not sure whether that would be better than what we're doing now, but there seem to be enough other people doing it that we might want to try it out. Our current algorithm is to build a heap, bounded in size by work_mem, and dribble tuples in and out, but maintaining that heap is pretty expensive; there's a reason people use quicksort rather than heapsort for in-memory sorting. But it would mean we have about 1.7x more runs that need to be merged (for initially random data). Considering the minimum merge order is 6, that increase in runs is likely not to lead to an additional level of merging, in which case the extra speed of building the runs would definitely win. But if it does cause an additional level of merge, it could end up being a loss. Is there some broad corpus of sorting benchmarks which changes could be tested against? I usually end up testing just simple columns of integers or small strings, because they are easy to set up. That is not ideal. As a desirable side effect, I think it would mean that we could dispense with retail palloc and pfree altogether. We could just allocate a big chunk of memory, copy tuples into it until it's full, using a pointer to keep track of the next unused byte, and then, after writing the run, reset the allocation pointer back to the beginning of the buffer. That would not only avoid the cost of going through palloc/pfree, but also the memory overhead imposed by bookkeeping and power-of-two rounding. Wouldn't we still need an array of pointers to the start of every tuple's location in the buffer? Or else, how would qsort know where to find them? Also, to do this we would need to get around
Re: [HACKERS] review: CHECK FUNCTION statement
Excerpts from Alvaro Herrera's message of sáb mar 03 17:56:23 -0300 2012: Excerpts from Alvaro Herrera's message of sáb mar 03 16:54:19 -0300 2012: Excerpts from Pavel Stehule's message of sáb mar 03 02:25:52 -0300 2012: 3. THE ARE NOT CARET - this is really important I am not sure about the caret thingy -- mainly because I don't think it works all that well. It doesn't work correctly with your patch; see sample below. Note the caret is pointing to an entirely nonsensical position. I'm not sure about duplicating the libpq line-counting logic in the backend. Also note i18n seems to be working well, except for the In function header, query, and the error level. That seems easily fixable. I remain unconvinced that this is the best possible output. alvherre=# create function f() returns int language plpgsql as $$ begin select var from foo; end; $$; CREATE FUNCTION alvherre=# check function f(); CHECK FUNCTION - In function: 'f()' error:42P01:2:sentencia SQL:no existe la relación «foo» query:select + var+ from + foo ^ (4 filas) -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] archive_keepalive_command
On Sun, Jan 15, 2012 at 5:52 PM, Simon Riggs si...@2ndquadrant.com wrote: On Fri, Dec 16, 2011 at 3:01 PM, Simon Riggs si...@2ndquadrant.com wrote: archive_command and restore_command describe how to ship WAL files to/from an archive. When there is nothing to ship, we delay sending WAL files. When no WAL files, the standby has no information at all. To provide some form of keepalive on quiet systems the archive_keepalive_command provides a generic hook to implement keepalives. This is implemented as a separate command to avoid storing keepalive messages in the archive, or at least allow overwrites using a single filename like keepalive. Patch. Preliminary review: Applies with several hunks, and with some fuzz in xlog.h Builds cleanly and passes make check. Does not provide documentation, which is needed. Does not include regression tests, but there is no framework for testing archiving. Usability testing: Does this patch have any user-visible effect? I thought it would make pg_last_xact_replay_timestamp() advance, but it does not seem to. I looked through the source a bit, and as best I can tell this only sets some internal state which is never used, except under DEBUG2 The example archive_keepalive_command given in postgresql.conf.sample is not usable as given. If the file is named %f, then there is no easy way for restore_keepalive_command to retrieve the file because it would not know the name to use. So the example given in postgresql.conf.sample should be more like the one given in recovery.conf.sample, where it uses a hard-coded name rather than %f. But in that case, it is not clear what %f might be useful for. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Collect frequency statistics for arrays
Alexander Korotkov aekorot...@gmail.com writes: [ array statistics patch ] I've committed this after a fair amount of editorialization. There are still some loose ends to deal with, but I felt it was ready to go into the tree for wider testing. The main thing I changed that wasn't in the nature of cleanup/bugfixing was that I revised the effort-limiting logic in mcelem_array_contained_selec. The submitted code basically just punted if the estimated work was too much, but as was already noted in http://archives.postgresql.org/pgsql-hackers/2011-10/msg01349.php that can result in really bad estimates. What I did instead is something closer to Robert's original suggestion: trim the number of element values taken into consideration from the array constant to a value that fits within the desired effort limit. If we consider just the N most common values from the array constant, we still get a pretty good estimate (since the trimmed N will still be close to 100 for the values we're talking about). I redid the tests in the above-mentioned message and see no cases where the estimate is off by more than a factor of 2, and very few where it's off by more than 20%, so this seems to work pretty well now. The remaining loose ends IMO are: 1. I'm still unhappy about the loop that fills the count histogram, as I noted earlier today. It at least needs a decent comment and some overflow protection, and I'm not entirely convinced that it doesn't have more bugs than the overflow issue. 2. The tests in the above-mentioned message show that in most cases where mcelem_array_contained_selec falls through to the rough estimate, the resulting rowcount estimate is just 1, ie we are coming out with very small selectivities. Although that path will now only be taken when there are no stats, it seems like we'd be better off to return DEFAULT_CONTAIN_SEL instead of what it's doing. I think there must be something wrong with the rough estimate logic. Could you recheck that? 3. As I mentioned yesterday, I think it'd be a good idea to make some provisions to reduce the width of pg_statistic rows for array columns by not storing the scalar-style and/or array-style stats, if the DBA knows that they're not going to be useful for a particular column. I have not done anything about that. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] poll: CHECK TRIGGER?
Alvaro Herrera alvhe...@alvh.no-ip.org writes: Pavel's patch for CHECK FUNCTION is adding another command besides that one, which is CHECK TRIGGER. The idea behind this is that you give it the relation to which the trigger is attached in addition to the trigger name, and it checks the function being called by that trigger. IMHO having a separate command for this is not warranted. It seems to me that we could simply have a variant of CREATE FUNCTION for this; I proposed CREATE FUNCTION trigfunc() AS TRIGGER ON tabname. You mean CHECK FUNCTION ... right? In principle the CHECK TRIGGER command could apply more checks than are possible with the proposed CHECK FUNCTION syntax: in particular, AFAICS AS TRIGGER ON tabname doesn't provide enough info to know whether the function should expect new and/or old rows to be provided, nor what it ought to return (which is different for BEFORE/AFTER cases, STATEMENT cases, etc). We could add all that info to the CHECK FUNCTION syntax, but there's definitely some merit to defining the check as occurring against an existing trigger definition instead. Now, if there's no intention of ever making the code actually apply checks of the sort I'm thinking of, maybe CHECK FUNCTION AS TRIGGER is fine. One thing to consider is eventual support for triggers that use anonymous function blocks, without a previous CREATE FUNCTION, which is being discussed in another thread. Yeah, that angle seems to be another reason to support CHECK TRIGGER. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] poll: CHECK TRIGGER?
Excerpts from Tom Lane's message of sáb mar 03 23:00:26 -0300 2012: Alvaro Herrera alvhe...@alvh.no-ip.org writes: Pavel's patch for CHECK FUNCTION is adding another command besides that one, which is CHECK TRIGGER. The idea behind this is that you give it the relation to which the trigger is attached in addition to the trigger name, and it checks the function being called by that trigger. IMHO having a separate command for this is not warranted. It seems to me that we could simply have a variant of CREATE FUNCTION for this; I proposed CREATE FUNCTION trigfunc() AS TRIGGER ON tabname. You mean CHECK FUNCTION ... right? Yeah, sorry. In principle the CHECK TRIGGER command could apply more checks than are possible with the proposed CHECK FUNCTION syntax: in particular, AFAICS AS TRIGGER ON tabname doesn't provide enough info to know whether the function should expect new and/or old rows to be provided, nor what it ought to return (which is different for BEFORE/AFTER cases, STATEMENT cases, etc). We could add all that info to the CHECK FUNCTION syntax, but there's definitely some merit to defining the check as occurring against an existing trigger definition instead. Uh! Now that I read this I realize that what you're supposed to give to CHECK TRIGGER is the trigger name, not the function name! In that light, using CHECK FUNCTION for this doesn't make a lot of sense. Okay, CHECK TRIGGER it is. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] RFC: Making TRUNCATE more MVCC-safe
On Sat, Mar 3, 2012 at 14:53, Simon Riggs si...@2ndquadrant.com wrote: Thanks Noah for drawing attention to this thread. I hadn't been watching. As you say, this work would allow me to freeze rows at load time and avoid the overhead of hint bit setting, which avoids performance issues from hint bit setting in checksum patch. I've reviewed this patch and it seems OK to me. Good work Marti. Thanks! This approach wasn't universally liked, but if it gets us tangible benefits (COPY with frozenxid) then I guess it's a reason to reconsider. v2 patch attached, updated to latest HEAD. Patch adds * relvalidxid is reset by VACUUM/ANALYZE to avoid wraparound failure Personally I'd rather keep this out of ANALYZE -- since its purpose is to collect stats; VACUUM is responsible for correctness (xid wraparound etc). But I don't feel strongly about this. A more important consideration is how this interacts with hot standby. Currently you compare OldestXmin to relvalidxmin to decide when to reset it. But the standby's OldestXmin may be older than the master's. (If VACUUM removes rows then this causes a recovery conflict, but AFAICT that won't happen if only relvalidxmin changes) It might be more robust to wait until relfrozenxid exceeds relvalidxmin -- by then, recovery conflict mechanisms will have taken care of killing all older snapshots, or am I mistaken? And a few typos the code... + gettext_noop(When enabled viewing a truncated or newly created table + will throw a serialization error to prevent MVCC + discrepancies that were possible priot to 9.2.) prior not priot + * Reset relvalidxmin if its old enough Should be it's in this context. Regards, Marti -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: improve selectivity estimation for IN/NOT IN
Daniele Varrazzo daniele.varra...@gmail.com writes: the attached patch improves the array selectivity estimation for = ANY and ALL, hence for the IN/NOT IN operators, to avoid the shortcoming described in http://archives.postgresql.org/pgsql-performance/2012-03/msg6.php. In connection with Alexander Korotkov's array-estimation patch, I just committed some code into scalararraysel() that checks whether the operator is equality or inequality of the array element type. It does that by consulting the default btree or hash opclass for the element type. I did that with the thought that it could be used to attack this issue too, but I see that you've done it another way, ie check to see whether the operator uses eqsel() or neqsel() as selectivity estimator. I'm not sure offhand which way is better. It could be argued that yours is more appropriate because if the operator isn't btree equality, but acts enough like it to use eqsel() as estimator, then it's still appropriate for scalararraysel() to treat it as equality. On the other side of the coin, an operator might be equality but have reason to use some operator-specific estimator rather than eqsel(). We have real examples of the former (such as the approximate-equality geometric operators) but I think the latter case is just hypothetical. Another thing that has to be thought about is that there are numerous cross-type operators that use eqsel, such as date-vs-timestamp, and it's far from clear whether it's appropriate for scalararraysel() to use the modified stats calculation when dealing with one of these. The btree-based logic is impervious to that since it won't match any cross-type operator. Thoughts? (BTW, in any case I don't trust function pointer comparison to be portable. It'd be a lot safer to look at the function OID.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Patch: improve selectivity estimation for IN/NOT IN
On Sun, Mar 4, 2012 at 2:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Daniele Varrazzo daniele.varra...@gmail.com writes: the attached patch improves the array selectivity estimation for = ANY and ALL, hence for the IN/NOT IN operators, to avoid the shortcoming described in http://archives.postgresql.org/pgsql-performance/2012-03/msg6.php. In connection with Alexander Korotkov's array-estimation patch, I just committed some code into scalararraysel() that checks whether the operator is equality or inequality of the array element type. It looks like you have grand plans for array estimation. My patch has a much more modest scope, and I'm hoping it could be applied to currently maintained PG versions, as I consider the currently produced estimations a bug. It does that by consulting the default btree or hash opclass for the element type. I did that with the thought that it could be used to attack this issue too, but I see that you've done it another way, ie check to see whether the operator uses eqsel() or neqsel() as selectivity estimator. I'm not sure offhand which way is better. It could be argued that yours is more appropriate because if the operator isn't btree equality, but acts enough like it to use eqsel() as estimator, then it's still appropriate for scalararraysel() to treat it as equality. On the other side of the coin, an operator might be equality but have reason to use some operator-specific estimator rather than eqsel(). We have real examples of the former (such as the approximate-equality geometric operators) but I think the latter case is just hypothetical. Another thing that has to be thought about is that there are numerous cross-type operators that use eqsel, such as date-vs-timestamp, and it's far from clear whether it's appropriate for scalararraysel() to use the modified stats calculation when dealing with one of these. The btree-based logic is impervious to that since it won't match any cross-type operator. Thoughts? (BTW, in any case I don't trust function pointer comparison to be portable. It'd be a lot safer to look at the function OID.) My original idea was to compare the comparison function with the catalog: I just don't know how to inspect the catalog/perform the check. Studying how to do it I've noticed the fn_addr referring to the well known eqsel/neqsel functions and thought about using it as indicators of the right operator semantics. If you are still interested in the patch, for sake of bugfixing, and somebody provides an example in the source about how to compare the functions oid, I can try to improve it. -- Daniele -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parameterized-path cost comparisons need some work
Robert Haas robertmh...@gmail.com writes: On Wed, Feb 29, 2012 at 6:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: (... thinks some more ...) No, that doesn't get us there, because that doesn't establish that a more-parameterized path produces fewer rows than some path that requires less parameterization, yet not none at all. You really want add_path carrying out those comparisons. In your previous example, it's entirely possible that path D is dominated by B or C because of poor choices of join quals. I'm not following this part. Can you explain further? It seems to me at any rate that we could get pretty far if we could just separate parameterized paths and unparameterized paths into separate buckets. To try to get some definitive info about this, I instrumented add_path to emit a log message any time it compared two paths for which one's parameterization was a strict subset of the other's, and yet the first was not estimated to return more rows. Sure enough, I got a lot of messages, just by running the regression tests (and even more with some of the other test cases I'd been using for parameterized-path testing). All of the hits were for equal numbers of rows, though -- there were no cases with a rows difference in the opposite direction from expectations. After looking at the results, I think that the fallacy in what we've been discussing is this: a parameterized path may well have some extra selectivity over a less-parameterized one, but perhaps *not enough to be meaningful*. The cases I was getting hits on were where the rowcount estimate got rounded off to be the same as for the less-parameterized path. (In this connection it's worth noting that most of the hits were for rowcount estimates of only 1 or 2 rows.) So basically, the scenario is where you have restriction clauses that are already enough to get down to a small number of rows retrieved, and then you have some join clauses that are not very selective and don't reduce the rowcount any further. Or maybe you have some nicely selective join clauses, and then adding more joins to some other relations doesn't help any further. In situations like this, we want add_path to reject the ineffective more-parameterized path as not being an improvement over the less-parameterized path. Not having it do so might save cycles in add_path itself, but we'd be being penny-wise and pound-foolish, because not getting rid of the useless paths will cost us a lot more at the next join level up. So I'm back to thinking we need to look explicitly at the rowcount comparison as well as all the existing conditions in add_path. One annoying thing about that is that it will reduce the usefulness of add_path_precheck, because that's called before we compute the rowcount estimates (and indeed not having to make the rowcount estimates is one of the major savings from the precheck). I think what we'll have to do is assume that a difference in parameterization could result in a difference in rowcount, and hence only a dominant path with exactly the same parameterization can result in failing the precheck. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Our regex vs. POSIX on longest match
Hello folks, I am in the process of accelerating down the rabbit hole of regex internals. Something that came up during my reading, is that a POSIX compliant regex engine ought to always prefer the longest possible match, when multiple matches are possible beginning from the same location in the string. [1] I wasn't sure that that was how our regex engine worked, and indeed, on checking the manual [2] I found that our regex engine uses a strange sort of inductive greediness to determine whether the longest or the shortest possible match ought to be preferred. The greediness of individual particles in the regex are taken into account, and at the top level the entire expression is concluded to be either greedy, or non-greedy. I'll admit that this is a pretty obscure point, but we do appear to be in direct violation of POSIX here. I am getting the impression that our engine works this way to route around some of the performance issues that can arise in trying out every possible match with an NFA-style engine. I find it a bit unfortunate that POSIX is so unambiguous about this, while our engine's treatment is, well ... quite arcane by comparison. At minimum, I think we should be more explicit in the manual that this behaviour flips POSIX the proverbial bird. Several paragraphs south, there is a sentence reading Incompatibilities of note include ... the longest/shortest-match (rather than first-match) matching semantics, but in the context it seems as though the author is talking about an incompatibility with Perl, not with POSIX. Thoughts? Cheers, BJ [1] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html [2] http://www.postgresql.org/docs/9.0/static/functions-matching.html#FUNCTIONS-POSIX-REGEXP -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Our regex vs. POSIX on longest match
Brendan Jurd dire...@gmail.com writes: I am in the process of accelerating down the rabbit hole of regex internals. Something that came up during my reading, is that a POSIX compliant regex engine ought to always prefer the longest possible match, when multiple matches are possible beginning from the same location in the string. [1] I wasn't sure that that was how our regex engine worked, and indeed, on checking the manual [2] I found that our regex engine uses a strange sort of inductive greediness to determine whether the longest or the shortest possible match ought to be preferred. The greediness of individual particles in the regex are taken into account, and at the top level the entire expression is concluded to be either greedy, or non-greedy. I'll admit that this is a pretty obscure point, but we do appear to be in direct violation of POSIX here. How so? POSIX doesn't contain any non-greedy constructs. If you use only the POSIX-compatible greedy constructs, the behavior is compliant. The issue that is obscure is, once you define some non-greedy constructs, how to define how they should act in combination with greedy ones. I'm not sure to what extent the engine's behavior is driven by implementation restrictions and to what extent it's really the sanest behavior Henry could think of. I found a comment from him about it: http://groups.google.com/group/comp.lang.tcl/msg/c493317cc0d10d50 but it's short on details as to what alternatives he considered. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers