Re: [HACKERS] Information about Access methods
On Wed, Nov 13, 2013 at 2:42 AM, Craig Ringer cr...@2ndquadrant.com wrote: On 11/13/2013 06:36 AM, Rohit Goyal wrote: Hi, Thanks for document. It was really helpful. Now, as index folder contains basically interface for all index type, do I need to change it, if I want to modify only b tree index algorithm? Do I need to change only btree index files or Do i have to change index folder files like indexam.c Without knowing what you are trying to do, it's really hard to answer that usefully. In general if you want to modify b-tree indexes you'd only have to modify the b-tree index implementation. If you want to add new capabilities or features to the indexing system in general then use them in b-tree, you'd probably also have to modify the access method interface. If you aren't just doing this for learning/research you should explain in detail what you are trying to do. There is no point spending lots of time creating a patch that would never get accepted into the PostgreSQL core, so you have to maintain it forever more... -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services Hi, Thanks for update. Actually, I am doing it for learning. I want to modify B tree approach for insert and update operation to perform in a more faster way. Could you please suggest something about abt update operation of B tree index. Regards, Rohit Goyal
Re: [HACKERS] [OT] why not keeping the original column name in catalog after a drop?
On Wed, Nov 13, 2013 at 1:22 PM, Luca Ferrari fluca1...@infinito.it wrote: I'm wondering what is the problem in placing the old/original name after the pg.dropped prefix. I know that the tuple in pg_attribute is temporary, but what are the possible conflicts the comment talks about? May be when a column with the same name is added and again dropped ? Of course, we can have the attribute number and column name both to avoid conflict. Thanks, Pavan -- Pavan Deolasee http://www.linkedin.com/in/pavandeolasee
Re: [HACKERS] [OT] why not keeping the original column name in catalog after a drop?
Hi, On 2013-11-13 08:52:27 +0100, Luca Ferrari wrote: when you drop a column on a table the pg_attribute is updated and the name of the column is changed with an almost fixed identifier that reports only the original column position: /* * Change the column name to something that isn't likely to conflict */ snprintf(newattname, sizeof(newattname), pg.dropped.%d, attnum); namestrcpy((attStruct-attname), newattname); I'm wondering what is the problem in placing the old/original name after the pg.dropped prefix. I know that the tuple in pg_attribute is temporary, but what are the possible conflicts the comment talks about? The old name might not fit there, attribute names have a relatively low maximum length (64 by default), so we cannot always fit the entire old name there. Also, think about: CREATE TABLE foo(cola int); ALTER TABLE foo DROP COLUMN cola; ALTER TABLE foo ADD COLUMN cola; ALTER TABLE foo DROP COLUMN cola; -- should not error out I don't really see much need for anything better than the current solution, why is the old name interesting? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
2013/11/12 Claudio Freire klaussfre...@gmail.com: On Tue, Nov 12, 2013 at 6:41 PM, Nicolas Barbier nicolas.barb...@gmail.com wrote: (Note that K B-trees can be merged by simply scanning all of them concurrently, and merging them just like a merge sort merges runs. Also, all B-trees except for the first level (of size S) can be compacted 100% as there is no need to reserve space for further insertions in them.) Unless you can guarantee strong correlation of index-order vs physical-order, scanning multiple indexes in index-order will be quite slow (random I/O). As all the bigger trees are written in one pass (as the result of a merge of multiple smaller trees), I would assume that it is rather easy to guarantee it for them. As for the smallest trees (size S), I think it doesn’t matter much as they “fit easily in memory.” Initially I would say that redefining it so that K of them (rather than 1) must still fit in memory is the easy fix. A future optimization could alleviate the need for the redefinition (and would also improve normal B-tree indexes): Somehow make sure that smaller trees (that fit in memory) are typically written out more-or-less in the right order. For that, one could for example postpone determining the ultimate block-order to write-out time. This is similar to what Reiser4 calls “dancing trees,” but unfortunately requires some rejiggering of the abstraction layers on PostgreSQL (I think). Having deferred insertion (which is probably way easier to implement) could conceivably also improve things. URL:https://en.wikipedia.org/wiki/Dancing_tree Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Information about Access methods
On 11/13/2013 08:59 AM, Rohit Goyal wrote: Could you please suggest something about abt update operation of B tree index. access/nbtree/README is probably the next text to read. It points to theoretical background and also explains specifics of Postgres implementation. // Antonin Houska (Tony) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using indices for UNION.
Thank you for pointing out. I missed the warning. There is a new compiler warning: setrefs.c: In function ‘set_plan_refs’: setrefs.c:782:7: warning: initialization from incompatible pointer type [enabled by default] Added explicit cast there and rebased to current master. Checked no new warning by this patch. make check succeeded at both $(src) and $(src)/src/test. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index d8aa35d..86abdf6 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -1063,15 +1063,6 @@ grouping_planner(PlannerInfo *root, double tuple_fraction) List *set_sortclauses; /* - * If there's a top-level ORDER BY, assume we have to fetch all the - * tuples. This might be too simplistic given all the hackery below - * to possibly avoid the sort; but the odds of accurate estimates here - * are pretty low anyway. - */ - if (parse-sortClause) - tuple_fraction = 0.0; - - /* * Construct the plan for set operations. The result will not need * any work except perhaps a top-level sort and/or LIMIT. Note that * any special work for recursive unions is the responsibility of diff --git a/src/backend/optimizer/plan/setrefs.c b/src/backend/optimizer/plan/setrefs.c index b78d727..c6abe18 100644 --- a/src/backend/optimizer/plan/setrefs.c +++ b/src/backend/optimizer/plan/setrefs.c @@ -778,9 +778,26 @@ set_plan_refs(PlannerInfo *root, Plan *plan, int rtoffset) Assert(splan-plan.qual == NIL); foreach(l, splan-appendplans) { - lfirst(l) = set_plan_refs(root, - (Plan *) lfirst(l), - rtoffset); + Append *newp = + (Append *)set_plan_refs(root, +(Plan *) lfirst(l), +rtoffset); + /* + * UNION on inherited tables may create directly nested + * Appends in plan tree. This structure can be flatten by + * taking grandchildren into parent. + */ + if (IsA(newp, Append) + list_length(newp-appendplans) 0) + { + ListCell *plc = list_head(newp-appendplans); + lfirst(l) = lfirst(plc); + for_each_cell(plc, lnext(plc)) + l = lappend_cell(splan-appendplans, + l, lfirst(plc)); + } + else + lfirst(l) = newp; } } break; diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index e249628..e8a78a7 100644 --- a/src/backend/optimizer/prep/prepunion.c +++ b/src/backend/optimizer/prep/prepunion.c @@ -29,6 +29,7 @@ #include postgres.h #include limits.h +#include math.h #include access/heapam.h #include access/htup_details.h @@ -60,6 +61,7 @@ typedef struct static Plan *recurse_set_operations(Node *setOp, PlannerInfo *root, double tuple_fraction, List *colTypes, List *colCollations, + List *groupClauses, bool junkOK, int flag, List *refnames_tlist, List **sortClauses, double *pNumGroups); @@ -78,7 +80,8 @@ static Plan *generate_nonunion_plan(SetOperationStmt *op, PlannerInfo *root, static List *recurse_union_children(Node *setOp, PlannerInfo *root, double tuple_fraction, SetOperationStmt *top_union, - List *refnames_tlist); + List *refnames_tlist, + List **child_sortclause); static Plan *make_union_unique(SetOperationStmt *op, Plan *plan, PlannerInfo *root, double tuple_fraction, List **sortClauses); @@ -97,7 +100,8 @@ static List *generate_append_tlist(List *colTypes, List *colCollations, bool flag, List *input_plans, List *refnames_tlist); -static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist); +static List *generate_setop_grouplist(List *groupClauses, List *targetlist, + List *sortClauses); static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti); static void make_inh_translation_list(Relation oldrelation, @@ -152,6 +156,15 @@ plan_set_operations(PlannerInfo *root, double tuple_fraction, Assert(parse-distinctClause == NIL); /* + * If there's a top-level ORDER BY, assume we have to fetch all the tuples + * except for UNION. This might be too simplistic given all the hackery + * below to possibly avoid the sort; but the odds of accurate estimates + * here are pretty low anyway. + */ + if (parse-sortClause topop-op != SETOP_UNION) + tuple_fraction = 0.0; + + /* * We'll need to build RelOptInfos for each of the leaf subqueries, which * are RTE_SUBQUERY rangetable entries in this Query. Prepare the index * arrays for that. @@ -186,18 +199,49 @@ plan_set_operations(PlannerInfo *root, double tuple_fraction, */ return recurse_set_operations((Node *) topop, root, tuple_fraction, topop-colTypes, topop-colCollations, + topop-groupClauses, true, -1,
Re: [HACKERS] UNION ALL on partitioned tables won't use indices.
Umm. I might be working on a bit unstable place.. Your patch doesn't apply anymore. Please rebase it. Thank you. I rebased all patches to current master. regards, -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 711b161..734ed47 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -1940,6 +1940,7 @@ add_child_rel_equivalences(PlannerInfo *root, Expr *child_expr; Relids new_relids; Relids new_nullable_relids; +bool has_children = false; child_expr = (Expr *) adjust_appendrel_attrs(root, @@ -1969,9 +1970,15 @@ add_child_rel_equivalences(PlannerInfo *root, child_rel-relids); } +/* + * Does this child_rel have children? If and only if so, tell + * add_eq_member to register new_relids to cur_ec. + */ +has_children = + root-simple_rte_array[child_rel-relid]-inh; (void) add_eq_member(cur_ec, child_expr, new_relids, new_nullable_relids, - true, cur_em-em_datatype); + !has_children, cur_em-em_datatype); } } } diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 64b1705..0e3cf4b 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -944,7 +944,8 @@ create_merge_append_path(PlannerInfo *root, MergeAppendPath *pathnode = makeNode(MergeAppendPath); Cost input_startup_cost; Cost input_total_cost; - ListCell *l; + ListCell *l, *lm; + bool collapse_subpaths = false; pathnode-path.pathtype = T_MergeAppend; pathnode-path.parent = rel; @@ -953,6 +954,47 @@ create_merge_append_path(PlannerInfo *root, pathnode-path.pathkeys = pathkeys; pathnode-subpaths = subpaths; + + /* + * If subpaths containes MergeAppendPaths already ordered on the pathkeys + * of the creating node, they can be expanded onto this node. + */ + foreach (lm, subpaths) + { + Path *spath = (Path *) lfirst(lm); + + if (IsA(spath, MergeAppendPath) + pathkeys_contained_in(pathkeys, spath-pathkeys)) + { + collapse_subpaths = true; + break; + } + } + + if (collapse_subpaths) + { + subpaths = NIL; + + foreach (lm, pathnode-subpaths) + { + MergeAppendPath *mpath = (MergeAppendPath *) lfirst(lm); + ListCell *lcm; + + if (IsA(mpath, MergeAppendPath) +pathkeys_contained_in(pathkeys, mpath-path.pathkeys)) + { +foreach (lcm, mpath-subpaths) +{ + Path *smpath = (Path*) lfirst (lcm); + subpaths = lappend(subpaths, smpath); +} + } + else +subpaths = lappend(subpaths, subpaths); + } + pathnode-subpaths = subpaths; + } + /* * Apply query-wide LIMIT if known and path is for sole base relation. * (Handling this at this low level is a bit klugy.) diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 64b1705..0e3cf4b 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -944,7 +944,8 @@ create_merge_append_path(PlannerInfo *root, MergeAppendPath *pathnode = makeNode(MergeAppendPath); Cost input_startup_cost; Cost input_total_cost; - ListCell *l; + ListCell *l, *lm; + bool collapse_subpaths = false; pathnode-path.pathtype = T_MergeAppend; pathnode-path.parent = rel; @@ -953,6 +954,47 @@ create_merge_append_path(PlannerInfo *root, pathnode-path.pathkeys = pathkeys; pathnode-subpaths = subpaths; + + /* + * If subpaths containes MergeAppendPaths already ordered on the pathkeys + * of the creating node, they can be expanded onto this node. + */ + foreach (lm, subpaths) + { + Path *spath = (Path *) lfirst(lm); + + if (IsA(spath, MergeAppendPath) + pathkeys_contained_in(pathkeys, spath-pathkeys)) + { + collapse_subpaths = true; + break; + } + } + + if (collapse_subpaths) + { + subpaths = NIL; + + foreach (lm, pathnode-subpaths) + { + MergeAppendPath *mpath = (MergeAppendPath *) lfirst(lm); + ListCell *lcm; + + if (IsA(mpath, MergeAppendPath) +pathkeys_contained_in(pathkeys, mpath-path.pathkeys)) + { +foreach (lcm, mpath-subpaths) +{ + Path *smpath = (Path*) lfirst (lcm); + subpaths = lappend(subpaths, smpath); +} + } + else +subpaths = lappend(subpaths, subpaths); + } + pathnode-subpaths = subpaths; + } + /* * Apply query-wide LIMIT if known and path is for sole base relation. * (Handling this at this low level is a bit klugy.) diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index d8aa35d..8167583 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -404,6 +404,15 @@ subquery_planner(PlannerGlobal *glob, Query *parse, expand_inherited_tables(root); /* + * Collapse multilevel inheritances into fewer levels. + * + * UNION ALL containing subselects on
Re: [HACKERS] FDW: possible resjunk columns in AddForeignUpdateTargets
Tomas Vondra wrote: have you found a way to pass data types other than TID as a resjunk column? I'm trying to solve almost the same thing (pass INT8 instead of TID), but I got stuck. Adding a custom Var with INT8OID instead of TIDOID seems to work fine, but I've found no way to populate this in IterateForeignScan :-( I didn't get to try it yet, but I'll keep you updated. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On 12 November 2013 19:54, Claudio Freire klaussfre...@gmail.com wrote: On Tue, Nov 12, 2013 at 7:14 PM, Simon Riggs si...@2ndquadrant.com wrote: I still think we need to look at this from a query perspective though. We need to check whether there is a class of real world queries that are not well optimised by minmax indexes, or cannot be made to be in future releases. For example, large DELETEs from a table are almost trivially optimised for min max. Only if you don't have a PK (or other index). Right. Min max indexes are optimised for large DELETEs, btrees are not (yet), which is what we are discussing. I believe it remains to be shown that a btree is actually desirable on a very big table. So far the discussion has just assumed this is the case, without looking at specific SQL. It might be better to look at ways of avoiding a btree altogether than to spend time optimising btrees for this case. Perhaps we can enforce a PK constraint without using a btree, if one is required. This might be guaranteed by using a sequence or other mechanism. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
2013/11/12 Simon Riggs si...@2ndquadrant.com: On 12 November 2013 21:41, Nicolas Barbier nicolas.barb...@gmail.com wrote: Look-up speed is as follows: Each look-up must look through all B-trees. That can be optimised by using a min max approach, so we need only look at sub-trees that may contain data. Under the assumption that the data are really *really* inserted in random order, I think that min max indexes would not help much: All subtrees would contain so many different values that the boundaries will almost never be narrow enough to exclude scanning it (unless one is searching for outliers). I think that min max indexes are very useful for data that is inserted in a some less-than-random way: When rather large swathes of inserted data fall in between boundaries that a lot of other data doesn’t fall in between. For min max index scans to be fast, the data doesn’t exactly have to be ordered in the heap (that’s one of the advantages vs. B-trees, where a different order in the index and the heap is very bad for scans of any significant part of the index), but it can also not be entirely arbitrary. I would say that min max indexes should be used in either of the following cases (and probably some more that I don’t think of right now): * When the data is inserted close to ordered (i.e., past or future dates that have a tendency to be correlated with “the current day,” such as invoice dates). For this usecase, a normal B-tree would also work, but would take more space and require more heavy maintenance. * When large batches of “similar” data (fitting in between boundaries that are more narrow than the “global boundaries”) are inserted at a time, even when multiple of such batches arrive in random order. * When insertion is up to entirely random, but the goal is to optimize look-ups for (relatively rare) outliers. * (combined with any the above to actually make the index *useful*) When needing a lot a indexes on the same data or having a very high rate of insertion, and therefore the maintenance burden matters a lot. I would add that it is possible to optimise large DELETEs from a table if complete sub-trees of the btree can be easily removed. This for me would be the compelling benefit of this approach. Idem WRT the randomness: If the data are really deleted in a completely random fashion (e.g., Leonardo might want to delete all phone calls in a certain year, while the index is on phone number), whole subtrees would almost never become candidates for deletion (the same problem applies to a normal min max index). Of course, everything would change if the data is not really deleted randomly. The same usecases as mentioned above (replace “insertion” with “deletion”) seem to apply for deletion in min max indexes. Note that B-forests as described before don’t work well for a high (or even not-so-high) rate of deletions: During VACUUM the updates to the bigger trees would kill all performance similar to how a high rate of insertion kills the performance of normal B-trees once they get big. To remedy this, one could adds stuff such as “B-trees of deleted entries” (i.e., negative trees) that may then afterwards be merged with other such “negative” trees + “positive” trees. Look-ups would need to take all those trees into account. Nicolas -- A. Because it breaks the logical sequence of discussion. Q. Why is top posting bad? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
Simon Riggs wrote On 5 November 2013 14:28, Leonardo Francalanci lt; m_lists@ gt; wrote: Either my sql is not correct (likely), or my understanding of the minmax index is not correct (even more likely), or the minmax index is not usable in a random inputs scenario. Please show the real world SQL you intend to run, so we can comment on it. Inventing a use case that breaks effectiveness of any optimisation is always easy, but the question is whether the use case is likely or even desirable. The use case is pretty simple. Think it as the NSA, as it would be much easier. Collect all calls made/received by any user on a mobile network. (in fact, it's something more than calls, so in fact is 2-10 rows per call). Keep the data for 20 days. That's the insert part. Query: search calls made/received by the user using IMSI (caller id) or IMEI (phone id). Date range is usually days (past 4 days, from 10 days ago to 5 days ago...) The result is just a very small percentage of the rows present in the table: a single user doesn't call that much! Searches are made by a human, so no that many request per second. It's not a write mostly scenario, it's a 99% write 1% read scenario. Problem? having 4 btree indexes on random values (imsi+imei * 2, since we have calling and caller) kills the performance in insertion after a while. Solution so far? partition every 15 minutes, create the indexes in bulk. Simon Riggs wrote If we have a query to show the most recent calls by a particular caller SELECT * FROM cdr WHERE callerid = X ORDER BY call_timestamp DESC LIMIT 100 then this could potentially be optimised using a minmax index, by traversing the data ranges in call_timestamp order. That is not part of the code in this initial release, since the main use case is for WHERE call_timestamp = X, or WHERE primarykey = Y I don't understand how a index on call_timestamp would help in the query above. Simon Riggs wrote I don't believe there is a credible business case for running that same query but without the ORDER BY and LIMIT, since it could potentially return gazillions of rows Gazillion of rows??? We're talking about calls made/received by one user here. How many calls do you make in 10 days??? Simon Riggs wrote so it isn't surprising at all that it would access a large % of the table. In fact, the query I use return a fraction of the table, and only a very small amount of users get searched. Simon, you keep on talking about these minmax indexes, and I still don't see any reference to some performance tests. And, again, I think that random values insertion is the worst use case for minmax indexes. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778092.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TABLE not synonymous with SELECT * FROM?
David et al, How about something like this? Cheers, Colin diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index e603b76..a68014b 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -33,13 +33,14 @@ PostgreSQL documentation refsynopsisdiv synopsis [ WITH [ RECURSIVE ] replaceable class=parameterwith_query/replaceable [, ...] ] -SELECT [ ALL | DISTINCT [ ON ( replaceable class=parameterexpression/replaceable [, ...] ) ] ] +{ SELECT [ ALL | DISTINCT [ ON ( replaceable class=parameterexpression/replaceable [, ...] ) ] ] * | replaceable class=parameterexpression/replaceable [ [ AS ] replaceable class=parameteroutput_name/replaceable ] [, ...] [ FROM replaceable class=parameterfrom_item/replaceable [, ...] ] [ WHERE replaceable class=parametercondition/replaceable ] [ GROUP BY replaceable class=parameterexpression/replaceable [, ...] ] [ HAVING replaceable class=parametercondition/replaceable [, ...] ] [ WINDOW replaceable class=parameterwindow_name/replaceable AS ( replaceable class=parameterwindow_definition/replaceable ) [, ...] ] +| TABLE [ ONLY ] replaceable class=parametertable_name/replaceable [ * ] } [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] replaceable class=parameterselect/replaceable ] [ ORDER BY replaceable class=parameterexpression/replaceable [ ASC | DESC | USING replaceable class=parameteroperator/replaceable ] [ NULLS { FIRST | LAST } ] [, ...] ] [ LIMIT { replaceable class=parametercount/replaceable | ALL } ] @@ -60,8 +61,6 @@ SELECT [ ALL | DISTINCT [ ON ( replaceable class=parameterexpression/replac phraseand replaceable class=parameterwith_query/replaceable is:/phrase replaceable class=parameterwith_query_name/replaceable [ ( replaceable class=parametercolumn_name/replaceable [, ...] ) ] AS ( replaceable class=parameterselect/replaceable | replaceable class=parametervalues/replaceable | replaceable class=parameterinsert/replaceable | replaceable class=parameterupdate/replaceable | replaceable class=parameterdelete/replaceable ) - -TABLE [ ONLY ] replaceable class=parametertable_name/replaceable [ * ] /synopsis /refsynopsisdiv @@ -198,6 +197,27 @@ TABLE [ ONLY ] replaceable class=parametertable_name/replaceable [ * ] literalUPDATE/literal privilege as well (for at least one column of each table so selected). /para + + refsect2 id=SQL-TABLE + titleliteralTABLE/literal Command/title + + para +The command +programlisting +TABLE replaceable class=parametername/replaceable +/programlisting +is equivalent to +programlisting +SELECT * FROM replaceable class=parametername/replaceable +/programlisting +It can be used as a top-level command or as a space-saving syntax +variant in parts of complex queries. Only the literalWITH/, literalORDER BY/, literalLIMIT/, +and Locking clauses and set operations can be used with commandTABLE/; the +literalWHERE/ clause and any form of aggregation cannot be used. + +Note that on this page and other places in the documentation, where commandSELECT/ is mentioned, commandTABLE/ is also assumed, subject to the restrictions mentioned here. + /para + /refsect2 /refsect1 refsect1 @@ -211,7 +231,7 @@ TABLE [ ONLY ] replaceable class=parametertable_name/replaceable [ * ] subqueries that can be referenced by name in the primary query. The subqueries effectively act as temporary tables or views for the duration of the primary query. -Each subquery can be a commandSELECT/command, commandVALUES/command, +Each subquery can be a commandSELECT/command, commandTABLE/, commandVALUES/command, commandINSERT/command, commandUPDATE/command or commandDELETE/command statement. When writing a data-modifying statement (commandINSERT/command, @@ -1437,23 +1457,6 @@ SELECT * FROM (SELECT * FROM mytable FOR UPDATE) ss ORDER BY column1; /para /caution /refsect2 - - refsect2 id=SQL-TABLE - titleliteralTABLE/literal Command/title - - para -The command -programlisting -TABLE replaceable class=parametername/replaceable -/programlisting -is completely equivalent to -programlisting -SELECT * FROM replaceable class=parametername/replaceable -/programlisting -It can be used as a top-level command or as a space-saving syntax -variant in parts of complex queries. - /para - /refsect2 /refsect1 refsect1 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Sort contents entries in reference documentation
Hi, While looking at the documentation on SELECT I noticed that the entries in reference.sgml aren't sorted correctly -- psql \h does have them in the correct order. Attached a trivial patch to fix this. In addition, reference.sgml doesn't have entries for TABLE or WITH which should link to ref/select.sgml but I couldn't figure out how to achieve this. psql \h does have this so the reference page probably should too. Cheers, Colin diff --git a/doc/src/sgml/reference.sgml b/doc/src/sgml/reference.sgml index 14e217a..d967f66 100644 --- a/doc/src/sgml/reference.sgml +++ b/doc/src/sgml/reference.sgml @@ -40,8 +40,8 @@ alterDatabase; alterDefaultPrivileges; alterDomain; - alterExtension; alterEventTrigger; + alterExtension; alterForeignDataWrapper; alterForeignTable; alterFunction; @@ -84,8 +84,8 @@ createConversion; createDatabase; createDomain; - createExtension; createEventTrigger; + createExtension; createForeignDataWrapper; createForeignTable; createFunction; @@ -124,8 +124,8 @@ dropConversion; dropDatabase; dropDomain; - dropExtension; dropEventTrigger; + dropExtension; dropForeignDataWrapper; dropForeignTable; dropFunction; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] init_sequence spill to hash table
Here http://www.postgresql.org/message-id/24278.1352922...@sss.pgh.pa.us there was some talk about init_sequence being a bottleneck when many sequences are used in a single backend. The attached I think implements what was talked about in the above link which for me seems to double the speed of a currval() loop over 3 sequences. It goes from about 7 seconds to 3.5 on my laptop. I thought I would post the patch early to see if this is actually wanted before I do too much more work on it. My implementation maintains using the linear list for sequences up to a defined threshold (currently 32) then it moves everything over to a hashtable and free's off the list. A more complete solution would contain regression tests to exercise the hash table code. I know there is a desire to move sequences over to a single table still, but I see this as a separate patch and storing current values in a hash table for each backend should still be a win even if/when the single table stuff gets implemented. Regards David Rowley CREATE FUNCTION create_seq(n integer) RETURNS integer LANGUAGE plpgsql AS $$ BEGIN WHILE n 0 LOOP EXECUTE 'CREATE SEQUENCE test' || CAST(n AS TEXT); n := n - 1; END LOOP; RETURN 0; END $$; CREATE FUNCTION currval_seq(n integer) RETURNS integer LANGUAGE plpgsql AS $_$ BEGIN WHILE n 0 LOOP EXECUTE $$SELECT currval('test$$ || CAST(n AS TEXT) || $$')$$; n := n - 1; END LOOP; RETURN 0; END $_$; CREATE FUNCTION drop_seq(n integer) RETURNS integer LANGUAGE plpgsql AS $$ BEGIN WHILE n 0 LOOP EXECUTE 'DROP SEQUENCE test' || CAST(n AS TEXT); n := n - 1; END LOOP; RETURN 0; END $$; CREATE OR REPLACE FUNCTION nextval_seq(n integer) RETURNS integer LANGUAGE plpgsql AS $_$ BEGIN WHILE n 0 LOOP EXECUTE $$SELECT nextval('test$$ || CAST(n AS TEXT) || $$')$$; n := n - 1; END LOOP; RETURN 0; END $_$; SELECT create_seq(1); SELECT nextval_seq(1); SELECT currval_seq(1); hashtab_seq_v0.1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
On 01 October 2013 00:56 Amit Kapila wrote: On Mon, Sep 30, 2013 at 9:07 PM, Peter Eisentraut pete...@gmx.net wrote: On 9/28/13 3:05 AM, Amit Kapila wrote: Now as we have an agreement, I had updated patch for below left issues: Regression tests are failing. Thanks for informing. I am sorry for not running regression before sending patch. Reason for failure was that source for GUC in new function validate_conf_option() was hardcoded to PGC_S_FILE which was okay for Alter System, but not for SET path. I had added new parameter source in this function and get the value of source when this is called from SET path. Some of the initial observation of the patch are, 1. Patch is not applying against git head, needs a rebase. 2. Patch doesn't contain the tests. I started reviewing the patch, will share the details once I finish. Regards, Hari babu. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On 13 November 2013 06:07, Leonardo Francalanci m_li...@yahoo.it wrote: The use case is pretty simple. Think it as the NSA, as it would be much easier. Collect all calls made/received by any user on a mobile network. (in fact, it's something more than calls, so in fact is 2-10 rows per call). Keep the data for 20 days. That's the insert part. Query: search calls made/received by the user using IMSI (caller id) or IMEI (phone id). Date range is usually days (past 4 days, from 10 days ago to 5 days ago...) The result is just a very small percentage of the rows present in the table: a single user doesn't call that much! Searches are made by a human, so no that many request per second. It's not a write mostly scenario, it's a 99% write 1% read scenario. Problem? having 4 btree indexes on random values (imsi+imei * 2, since we have calling and caller) kills the performance in insertion after a while. Solution so far? partition every 15 minutes, create the indexes in bulk. So in the use case you describe, the min max index would require a scan of only 25% of the table, not the 80% described earlier for random inserts. In my experience, people wish to keep data for much longer periods and so the percentage of scan required would drop lower than 25%, possibly to 5% or less for many applications. The plan would use sequential I/O so could still work quickly; given the low read rate, longer query times could be acceptable. Minmax indexes are simply a way to make this use case happen automatically, without the need for manual partitioning of the table. They are not the answer to every prayer, but with respect they are better than you had claimed they would be. (25% not 80%, in your use case). I saw this was likely to be the case and this is why I challenged you to describe in more detail. Thank you. Simon Riggs wrote If we have a query to show the most recent calls by a particular caller SELECT * FROM cdr WHERE callerid = X ORDER BY call_timestamp DESC LIMIT 100 then this could potentially be optimised using a minmax index, by traversing the data ranges in call_timestamp order. That is not part of the code in this initial release, since the main use case is for WHERE call_timestamp = X, or WHERE primarykey = Y I don't understand how a index on call_timestamp would help in the query above. The min max index would cover call_timestamp and would be used to optimise the query. That is not in the current version, it is a later optimisation that I think is possible after considering your use case and similar ones. This is a similar optimisation to the Merge Append case for partitioned tables. Simon Riggs wrote I don't believe there is a credible business case for running that same query but without the ORDER BY and LIMIT, since it could potentially return gazillions of rows Gazillion of rows??? We're talking about calls made/received by one user here. How many calls do you make in 10 days??? Simon Riggs wrote so it isn't surprising at all that it would access a large % of the table. In fact, the query I use return a fraction of the table, and only a very small amount of users get searched. Simon, you keep on talking about these minmax indexes, and I still don't see any reference to some performance tests. Performance tests are only possible with a clear use case. It would be helpful if you could benchmark your own use case and I request others do the same. I have and will challenge people that simply assert this new type of index is not useful based on a generic argument, with no use case and no tests. That is nothing personal, it is simply that I do not wish misunderstandings to block the adoption of new features. Please see that Alvaro and I have gone out of our way to provide a new facility to help you and others, and that it requires changing how we think about the solution. I accept it may not provide for every case but it requires careful analysis before deciding that is so. And, again, I think that random values insertion is the worst use case for minmax indexes. I agree, it is. What we have disagreed on is the extent to which that scenario exists for use cases on very large tables, which are typically append-mostly with most queries accessing a subset of the table, e.g. date range. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] init_sequence spill to hash table
On 13.11.2013 11:55, David Rowley wrote: I thought I would post the patch early to see if this is actually wanted before I do too much more work on it. Seems reasonable. My implementation maintains using the linear list for sequences up to a defined threshold (currently 32) then it moves everything over to a hashtable and free's off the list. Did you check how it would perform if you just always used the hash table? Or if you just have a single entry before you move to hash table, ie. set the threshold to 2? That would be slightly simpler. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [OT] why not keeping the original column name in catalog after a drop?
On Wed, Nov 13, 2013 at 9:00 AM, Andres Freund and...@2ndquadrant.com wrote: The old name might not fit there, attribute names have a relatively low maximum length (64 by default), so we cannot always fit the entire old name there. Thanks, I was guessing this. Also, think about: CREATE TABLE foo(cola int); ALTER TABLE foo DROP COLUMN cola; ALTER TABLE foo ADD COLUMN cola; ALTER TABLE foo DROP COLUMN cola; -- should not error out Well, I was talking about appending the original column name, and therefore the above should have been respectively pg.dropped.1.cola. and pg.dropped.2.cola. Of course the original name is not very much interesting, I was just curios about the conflicts. Luca -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] The number of character limitation of custom script on pgbench
Hi all, The function of custom script of pgbench allows only BUFSIZ (i.g.,1024byte) or less as length of a SQL. I think that when we want to bench mark with long SQL then it will difficult. At that time even pgbench doesn't return ERROR. It will try to do query with the broken SQL. And user can not know why function of custom script is not work fine. It look as just error of SQL to user. So I'm thinking following solution. (1) to increase buffer size (2) to change to variable buffer size (3) to return ERROR with information Thought? Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On 13/11/13 09:07, Leonardo Francalanci wrote: Problem? having 4 btree indexes on random values (imsi+imei * 2, since we have calling and caller) kills the performance in insertion after a while. Surely there's good correlation between IMSI IMEI, so have a separate table to translate one to (a group of) the others, and halve the indexes on your main table? -- Jeremy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] patch to fix unused variable warning on windows build
On Thu, 2013-11-07 at 19:13 +1300, David Rowley wrote: Attached is a small patch which fixes the unused variable warning in the visual studios build. Seems like VS does not support __attribute__((unused)) but looks like all other places we must assign to the variable. committed -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] Report exit code from external recovery commands properly
When an external recovery command such as restore_command or archive_cleanup_command fails, it just reports return code 34567 or something, but we have facilities to do decode this properly, so use them. From 8aa3cf503fe1c1f41a2a833c008f4273c22a86c9 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut pete...@gmx.net Date: Wed, 13 Nov 2013 06:38:18 -0500 Subject: [PATCH] Report exit code from external recovery commands properly When an external recovery command such as restore_command or archive_cleanup_command fails, report the exit code properly, distinguishing signals and normal exists, using the existing wait_result_to_str() facility, instead of just reporting the return value from system(). --- src/backend/access/transam/xlogarchive.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c index 342975c..be95684 100644 --- a/src/backend/access/transam/xlogarchive.c +++ b/src/backend/access/transam/xlogarchive.c @@ -300,8 +300,8 @@ RestoreArchivedFile(char *path, const char *xlogfname, signaled = WIFSIGNALED(rc) || WEXITSTATUS(rc) 125; ereport(signaled ? FATAL : DEBUG2, - (errmsg(could not restore file \%s\ from archive: return code %d, -xlogfname, rc))); + (errmsg(could not restore file \%s\ from archive: %s, +xlogfname, wait_result_to_str(rc; not_available: @@ -410,9 +410,10 @@ ExecuteRecoveryCommand(char *command, char *commandName, bool failOnSignal) ereport((signaled failOnSignal) ? FATAL : WARNING, /*-- translator: First %s represents a recovery.conf parameter name like - recovery_end_command, and the 2nd is the value of that parameter. */ -(errmsg(%s \%s\: return code %d, commandName, - command, rc))); + recovery_end_command, the 2nd is the value of that parameter, the + third an already translated error message. */ +(errmsg(%s \%s\: %s, commandName, + command, wait_result_to_str(rc; } } -- 1.8.4.2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock strength reduction patch is unsafe
On 1 August 2013 01:53, Noah Misch n...@leadboat.com wrote: A remediation strategy that seemed attractive when I last contemplated this problem is to repoint rd_att immediately but arrange to free the obsolete TupleDesc in AtEOXact_RelationCache(). I agree that the best way to resolve this is to retain a copy of the TupleDesc, so that copied pointers to it remain valid. EOXact is actually longer than strictly necessary in some cases, but trying to work out a more minimal approach seems hard and possibly inefficient. Comments in relcache.c indicate that the Relation swapping concept might be replaced by refcounting approach. I can't see how that differs from your suggested route. Which means I can't see any other way of doing this other than the way you suggest. Will implement. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] GIN bugs in master branch
On 12.11.2013 21:33, Teodor Sigaev wrote: Suppose, some last changes in GIN are broken, 9.3 works fine. Fixed, thanks for the report. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
Simon Riggs wrote So in the use case you describe, the min max index would require a scan of only 25% of the table, not the 80% described earlier for random inserts. In my experience, people wish to keep data for much longer periods and so the percentage of scan required would drop lower than 25%, possibly to 5% or less for many applications. The plan would use sequential I/O so could still work quickly; given the low read rate, longer query times could be acceptable. Quickly??? Seq scan 25% of a table of TB is not quick. Simon Riggs wrote Minmax indexes are simply a way to make this use case happen automatically, without the need for manual partitioning of the table. You logic assumes that we don't index anything but the call_timestamp. That would lead to huge query response times. Plus: btree doesn't have a big problem to keep up in sequential insertion scenario (such as the call_timestamp). So I still don't see the gain in using the minmax indexes: again, could you point me to some performance tests of *any* use case? Simon Riggs wrote They are not the answer to every prayer, but with respect they are better than you had claimed they would be. (25% not 80%, in your use case). I saw this was likely to be the case and this is why I challenged you to describe in more detail. Thank you. I claimed they would scan the 80% of the table because I assumed I had to use them in the random fields; not in the call_timestamp field. I don't need a better index in the call_timestamp, because it's sequential, I don't have problems with that. But it's useless: I don't want to seq scan 25% of a multi-TB table. Simon Riggs wrote Performance tests are only possible with a clear use case. Well, so I can add my weird_index patch in postgresql source code, and it would be committed right away??? I assumed you had to prove somehow that the new index was better than what is already available, at least for some cases. Or, in other words: what are you going to write in the minmax index documentation, try and see if they work better for you? Simon Riggs wrote Please see that Alvaro and I have gone out of our way to provide a new facility to help you and others, and that it requires changing how we think about the solution. I accept it may not provide for every case but it requires careful analysis before deciding that is so. If I came out too rough, I ask your pardon. I always appreciate people taking their time to help someone else for free. Plus, I'm *very* interested in the minmax index, especially for call_timestamp (some queries are date-range only, such as give me all the calls in this particular 2 secs range) or the id column I have. But, at the same time, I don't see any evidence that they work better than btrees (except for the size of the index). I would like to see some numbers. I worked a little in the bitmap index implementation, and I stopped because on the large tables these indexes are supposed to be used, the heap lookup took so much time that the (slightly) faster index access didn't really help, because it was a fraction of the query time... I'm afraid it would be the same with minmax indexes, that's why I wanted to see some numbers... Simon Riggs wrote And, again, I think that random values insertion is the worst use case for minmax indexes. I agree, it is. What we have disagreed on is the extent to which that scenario exists for use cases on very large tables, which are typically append-mostly with most queries accessing a subset of the table, e.g. date range. Mhh... maybe this is this point we don't understand each other? I query the table by userid + date range. The date range is *not* selective enough (it's, as you said, 25% of the multi-TB table). The userid is selective *a lot*. I'm looking for a better index for the userid column(s). The new indexes I mentioned in the OP claim they are better in this scenario (but I don't blindly believe them) I don't see how indexing the call_timestamp only could be of any interest, since it would require seq-scanning 25% of a huge table for every query. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778124.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
Jeremy Harris wrote Surely there's good correlation between IMSI IMEI, so have a separate table to translate one to (a group of) the others, and halve the indexes on your main table? Yes; unfortunately not always both are available; but it's something we are thinking about (it requires logic in the inserting application that at the moment doesn't exist, but it is something that we'll have to add sooner or later). But in the end yes, trying to use less indexed-fields is a good path. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778125.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.5
On 11/11/2013 02:06 PM, Andres Freund wrote: On 2013-11-10 14:45:17 -0500, Steve Singer wrote: Not really keen - that'd be a noticeable overhead. Note that in the cases where DEFAULT|INDEX is used, you can just use the new tuple to extract what you need for the pkey lookup since they now have the same format and since it's guaranteed that the relevant columns haven't changed if oldtup is null and there's a key. What are you actually doing with those columns? Populating a WHERE clause? Yup building a WHERE clause Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The number of character limitation of custom script on pgbench
On 11/13/13, 6:18 AM, Sawada Masahiko wrote: Hi all, The function of custom script of pgbench allows only BUFSIZ (i.g.,1024byte) or less as length of a SQL. I think that when we want to bench mark with long SQL then it will difficult. At that time even pgbench doesn't return ERROR. It will try to do query with the broken SQL. And user can not know why function of custom script is not work fine. It look as just error of SQL to user. So I'm thinking following solution. (1) to increase buffer size (2) to change to variable buffer size (3) to return ERROR with information I'd go for #2. But at least an error. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The number of character limitation of custom script on pgbench
Peter Eisentraut pete...@gmx.net writes: On 11/13/13, 6:18 AM, Sawada Masahiko wrote: The function of custom script of pgbench allows only BUFSIZ (i.g.,1024byte) or less as length of a SQL. So I'm thinking following solution. (1) to increase buffer size (2) to change to variable buffer size (3) to return ERROR with information I'd go for #2. But at least an error. #2 definitely. I've run into this limitation myself recently, and so have other people. It's time to fix it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On 13 November 2013 09:31, Leonardo Francalanci m_li...@yahoo.it wrote: Or, in other words: what are you going to write in the minmax index documentation, try and see if they work better for you? That seems like good advice to me. Bacon Aristotle. There is very little written about suitability of any index type for Postgres. I'm sure the docs could be improved there. Plus, I'm *very* interested in the minmax index Good, thanks. But, at the same time, I don't see any evidence that they work better than btrees (except for the size of the index). Min max indexes are not designed to be a better btree, they are designed to be roughly same as automatic partitioning. They offer considerably improved time to build, significantly reduced index size and significantly improved insert performance. Min max will be clearly slower than btrees for small numbers of records, though for large numbers of records we may expect min max to perform same as btrees, though that requires better testing to get a more accurate picture. There may yet be optimisations of the patch also. Based what we have discussed here, we've come up with two new optimisations that can be used with MinMax, namely the bulk DELETE case and the Merge Append case. Thank you for highlighting additional cases and requirements. From our discussions here, IMHO there is a strong case for avoiding btrees completely for larger historical data tables. That isn't something I had even considered as desirable before this conversation but ISTM now that taking that approach will be more fruitful than attempting to implement LSM trees. Having said that, I am also in favour of declarative partitioning, just that there is no funding available to work on that at present. Further work on bitmap indexes is expected. They are already designed with good insert performance in mind and this discussion re-emphasises that requirement. I would like to see some numbers. Alvaro has given me some results for his patch. The figures I have are for a 2GB table. Index Build Time MinMax 11 s Btree 96s Index Size MinMax 2 pages + metapage Btree approx 200,000 pages + metapage Load time MinMax no overhead, same as raw COPY BTree - considerably slower Index SELECT Slower for small groups of rows Broadly same for large requests (more results required on that assessment) I expect to publish results against TPC-H cases in next few weeks. Additional tests are welcome for other use cases. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Replace duplicate_oids with Perl implementation
On 11/12/13, 10:48 PM, Andrew Dunstan wrote: You quoted me out of context. Your prevuious para referred to duplicate_oids. ... which was in turn quoted out of context. ;-) What exactly is your argument, here? If we change unused_oids to a Perl implementation, we will add additional inconvenience to users who don't have /usr/bin/perl in that exact location. Versus allowing use by users who don't have /bin/sh. I don't know what the ratio between those two camps is, among potential users of unused_oids. But most of the discussion so far appeared to be under the impression that unused_oids is called from a makefile, which is not correct. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] hail the CFM
The commit fest manager mace has been passed on to me[*]. More to follow. [*] Actually, I found it behind the dumpster in the alley. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hail the CFM
On Wed, Nov 13, 2013 at 8:45 AM, Peter Eisentraut pete...@gmx.net wrote: The commit fest manager mace has been passed on to me[*]. More to follow. [*] Actually, I found it behind the dumpster in the alley. ROFL. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: ALTER SYSTEM SET command to change postgresql.conf parameters (RE: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review])
On Wed, Nov 13, 2013 at 4:05 PM, Haribabu kommi haribabu.ko...@huawei.com wrote: On 01 October 2013 00:56 Amit Kapila wrote: On Mon, Sep 30, 2013 at 9:07 PM, Peter Eisentraut pete...@gmx.net wrote: On 9/28/13 3:05 AM, Amit Kapila wrote: Now as we have an agreement, I had updated patch for below left issues: Regression tests are failing. Thanks for informing. I am sorry for not running regression before sending patch. Reason for failure was that source for GUC in new function validate_conf_option() was hardcoded to PGC_S_FILE which was okay for Alter System, but not for SET path. I had added new parameter source in this function and get the value of source when this is called from SET path. Some of the initial observation of the patch are, 1. Patch is not applying against git head, needs a rebase. 2. Patch doesn't contain the tests. It was intentional and as per feedback for this patch. As for testing this feature, we need to put sleep after operation, so it was suggested to remove tests. I started reviewing the patch, will share the details once I finish. Thanks. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [COMMITTERS] pgsql: Replace duplicate_oids with Perl implementation
On 11/13/2013 08:36 AM, Peter Eisentraut wrote: What exactly is your argument, here? If we change unused_oids to a Perl implementation, we will add additional inconvenience to users who don't have /usr/bin/perl in that exact location. Versus allowing use by users who don't have /bin/sh. I don't know what the ratio between those two camps is, among potential users of unused_oids. According to my count we currently have 30 instances of scripts with this shebang line. That includes things not called from makefiles, such as pgindent. I don't recall this causing an issue. For any small group of people that might exists who don't have /usr/bin/perl, they have a simple workaround to call perl unused_oids. Can we please get on with dealing with real problems? I don't believe this is one. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
On 11/12/13, 1:57 AM, Tatsuo Ishii wrote: Currently there's no such an universal encoding in the universe, I think the only way is, inventing it by ourselves. I think ISO 2022 is something in that direction, but it's not ASCII-safe, AFAICT. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] The number of character limitation of custom script on pgbench
On Wed, Nov 13, 2013 at 10:27 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Eisentraut pete...@gmx.net writes: On 11/13/13, 6:18 AM, Sawada Masahiko wrote: The function of custom script of pgbench allows only BUFSIZ (i.g.,1024byte) or less as length of a SQL. So I'm thinking following solution. (1) to increase buffer size (2) to change to variable buffer size (3) to return ERROR with information I'd go for #2. But at least an error. #2 definitely. I've run into this limitation myself recently, and so have other people. It's time to fix it. Yes, I also think #2 is good. I will implement the patch. Regards, --- Sawada Masahiko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
Simon Riggs wrote From our discussions here, IMHO there is a strong case for avoiding btrees completely for larger historical data tables. That isn't something I had even considered as desirable before this conversation but ISTM now that taking that approach will be more fruitful than attempting to implement LSM trees. Eh? I don't understand this point. How can I avoid btrees, and searching by caller_id? I don't get it... Simon Riggs wrote Alvaro has given me some results for his patch. The figures I have are for a 2GB table. Index Build Time MinMax 11 s Btree 96s Index Size MinMax 2 pages + metapage Btree approx 200,000 pages + metapage Load time MinMax no overhead, same as raw COPY BTree - considerably slower Great!!! This looks very promising. Were the values indexed sequential? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778150.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Wed, Nov 13, 2013 at 1:33 AM, Craig Ringer cr...@2ndquadrant.com wrote: On 11/13/2013 02:34 AM, Andrew Dunstan wrote: If there's agreement on taking these, I will prepare patches and submit them by the 15th. With JSON enhancement, my only concern is that there's work ongoing to integrate the v2 development version of hstore with json, providing typed hstore and an efficient binary storage format for json. It might be worth seeing how that work is going and what functionality needs to be added to it, rather than enhancing the existing json support that may soon change dramatically. I'm not so sure we should require hstore to do things like build arbitrary json objects even though I agree that hstore will probably displace json for must cases where you want to store nested data (as opposed to (de-)serialize). Andrew's patches just fill out a couple of missing cases that are handled in the existing API. Putting all the patches together, ISTM there might be a function or two too many. I'm not sure why the json_ prefix was abandoned for build_json_object and build_json_array. Also, json_object is pretty weird to me, I'm not sure I see the advantage of a new serialization format, and I don't agree with the statement but it is the caller's reponsibility to ensure that keys are not repeated.. I think the caller should have no such responsibility. Keys should be able to repeated. Also, I'm not sure how the {k,v,k,v,k,v}...convention serialized into a string is very useful in general practice. I greatly prefer the aggregation and the variadic methods in json_build. Putting it all together, I'd consider: *) dropping json_object (although maybe there is a case I'm not thinking about) *) changing json_build function names to get the json prefix *) adding a json object constructor that takes two parallel arrays as arguments. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On Wed, Nov 13, 2013 at 7:33 AM, Simon Riggs si...@2ndquadrant.com wrote: On 13 November 2013 09:31, Leonardo Francalanci m_li...@yahoo.it wrote: I would like to see some numbers. Alvaro has given me some results for his patch. The figures I have are for a 2GB table. Index Build Time MinMax 11 s Btree 96s Index Size MinMax 2 pages + metapage Btree approx 200,000 pages + metapage Load time MinMax no overhead, same as raw COPY BTree - considerably slower Index SELECT Slower for small groups of rows Broadly same for large requests (more results required on that assessment) I expect to publish results against TPC-H cases in next few weeks. Additional tests are welcome for other use cases. Those are pretty exciting numbers. These days for analytics work I'm using mostly covering index type approaches. I bet the tiny index would more than offset the extra heap accesses. Can you CLUSTER against a minmax index? merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On 13 November 2013 11:54, Merlin Moncure mmonc...@gmail.com wrote: On Wed, Nov 13, 2013 at 7:33 AM, Simon Riggs si...@2ndquadrant.com wrote: On 13 November 2013 09:31, Leonardo Francalanci m_li...@yahoo.it wrote: I would like to see some numbers. Alvaro has given me some results for his patch. The figures I have are for a 2GB table. Index Build Time MinMax 11 s Btree 96s Index Size MinMax 2 pages + metapage Btree approx 200,000 pages + metapage Load time MinMax no overhead, same as raw COPY BTree - considerably slower Index SELECT Slower for small groups of rows Broadly same for large requests (more results required on that assessment) I expect to publish results against TPC-H cases in next few weeks. Additional tests are welcome for other use cases. Those are pretty exciting numbers. These days for analytics work I'm using mostly covering index type approaches. If you're using index only scans then this will work for you as well, hopefully better. Same principle wrt all visible page ranges. I bet the tiny index would more than offset the extra heap accesses. That's the trade-off, yes. I was hoping that would lead to cases where the min max is better than a btree, but not there yet. Can you CLUSTER against a minmax index? Not in this release, at least in my understanding. It's not yet possible to do an ordered fetch, so the cluster scan probably won't work. I was hoping to include some special Freespace Map modes that would help there. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
On 13 November 2013 11:54, Merlin Moncure mmonc...@gmail.com wrote: Load time MinMax no overhead, same as raw COPY BTree - considerably slower And just as a general comment, the min max index does not slow down COPY as the table gets larger, whereas the btree gets slower as the table gets larger. Which is the reason Leonardo requires partitioned tables. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Idea for debug/recovery snapshots
On Tue, Nov 12, 2013 at 08:39:35PM -0500, Bruce Momjian wrote: My specific wording is incoherent mostly because it really belongs to a larger corpus that currently exists only in my head. Oh, OK, it sounds fine. The user really doesn't choose what timeline to see --- rather, it is the current xid at the time they take their snapshot and other running xids that controls that. You can control your transaction isolation level, but that only controls how often you take snapshots. [ moved to hackers ] Actually, it would be possible to allow snapshots that are equal to the earliest active snapshot for the current database. I don't think it would be useful in production, but perhaps for debugging or some disaster recovery. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add \i option to bring in the specified file as a quoted literal
On Wed, Nov 13, 2013 at 08:58:07AM +0530, Amit Kapila wrote: On Tue, Nov 12, 2013 at 9:37 PM, Bruce Momjian br...@momjian.us wrote: On Wed, Oct 23, 2013 at 10:31:39AM +0530, Amit Kapila wrote: On Tue, Oct 22, 2013 at 3:04 AM, Piotr Marcinczyk pmarc...@gmail.com wrote: Hi, I would like to implement item from TODO marked as easy: Add \i option to bring in the specified file as a quoted literal. I understand intent of this item, to be able to have parts of query written in separate files (now it is impossible, because \i tries to execute content of file as a separate command by function process_file). For the usecase discussed in the mail chain of that TODO item, Robert Haas has provided an alternative to achieve it, please check below link: http://www.postgresql.org/message-id/AANLkTi=7c8xfyf7uqw0y+si8ondkoy2nx8jc4bu0g...@mail.gmail.com If you think that alternative is not sufficient for the use case, then adding new option/syntax is worth, otherwise it might be a shortcut or other form of some existing way which can be useful depending on how frequently users use this syntax. So, can we remove this TODO item? TODO item is created before Robert Haas has provided an alternative way to achieve the same thing. In some cases there are multiple ways to achieve the same thing (example: shortcut options in psql) if it is used quite frequently and people want some easy way of doing it. In this case I don't think this is used frequently, so I don't see the need of doing it. We should remove this TODO item. OK, removed. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] pg_upgrade misreports full disk
When pg_upgrade encounters a full disk while copying relation files, it reports this as: error while copying relation xyz (...): Success because it doesn't set errno in some error cases. In other places we treat short writes as ENOSPC, so here is a patch to do that for pg_upgrade as well. diff --git a/contrib/pg_upgrade/file.c b/contrib/pg_upgrade/file.c index dfeb79f..b35034b 100644 --- a/contrib/pg_upgrade/file.c +++ b/contrib/pg_upgrade/file.c @@ -136,16 +136,22 @@ copy_file(const char *srcfile, const char *dstfile, bool force) int save_errno = 0; if ((srcfile == NULL) || (dstfile == NULL)) + { + errno = EINVAL; return -1; + } if ((src_fd = open(srcfile, O_RDONLY, 0)) 0) return -1; if ((dest_fd = open(dstfile, O_RDWR | O_CREAT | (force ? 0 : O_EXCL), S_IRUSR | S_IWUSR)) 0) { + save_errno = errno; + if (src_fd != 0) close(src_fd); + errno = save_errno; return -1; } @@ -170,6 +176,8 @@ copy_file(const char *srcfile, const char *dstfile, bool force) if (write(dest_fd, buffer, nbytes) != nbytes) { + if (errno == 0) +errno = ENOSPC; save_errno = errno; ret = -1; break; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fast insertion indexes: why no developments
Simon Riggs wrote Can you CLUSTER against a minmax index? Not in this release, at least in my understanding. It's not yet possible to do an ordered fetch, so the cluster scan probably won't work. As per the patch I helped writing, CLUSTER should use the sequential heap scan+sort when it makes sense. So I think that if the index is not able to do an ordered fetch, CLUSTER should fall back to scan+sort automatically (which is what you want in a large table anyway). Obviously, that should be tested. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Fast-insertion-indexes-why-no-developments-tp5776227p5778171.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 11/13/2013 09:45 AM, Merlin Moncure wrote: On Wed, Nov 13, 2013 at 1:33 AM, Craig Ringer cr...@2ndquadrant.com wrote: On 11/13/2013 02:34 AM, Andrew Dunstan wrote: If there's agreement on taking these, I will prepare patches and submit them by the 15th. With JSON enhancement, my only concern is that there's work ongoing to integrate the v2 development version of hstore with json, providing typed hstore and an efficient binary storage format for json. It might be worth seeing how that work is going and what functionality needs to be added to it, rather than enhancing the existing json support that may soon change dramatically. I'm going to be fairly upset if I'm told I have to wait for the new format work, and then I'm later told it's too late to bring this into 9.4. I think these are really orthogonal issues. Adding a new serialization format (which I have been discussing with Oleg and Teodor, and which I hope to help in bringing to JSON) will make some things lots faster than they now are, and might make some things easier or possible where now they are hard or impossible, but it won't remove any functionality requirement. In particular, json_build, which lets you build up arbitrarily complex and irregular json in a way that's just not possible without using a PL right now, is quite an important requirement. I've given talks about it and JSON users have been quite excited by the possibilities it opens up. The patch for json_to_record is quite small (two functions), and it has the advantage that unlike the json_populate_record functions you don't need to have or create a named type to use it. I think that makes it worth having in itself. I'm not so sure we should require hstore to do things like build arbitrary json objects even though I agree that hstore will probably displace json for must cases where you want to store nested data (as opposed to (de-)serialize). I have no idea what this means. The plan with the work that Oleg and Teodor are doing is to provide a set of common code that can be used by either a binary json representation (which will be able to be distinguished from a text representation, so there would be no pg_upgrade problems) or nested hstore. In effect, nested hstore and json would have pretty much identical capabilities, so using one ovber another should be largely a matter of preference than a forced choice. Frankly, I believe the audience for JSON is vastly larger, and I expect it to be the treeish data format of choice for almost all users. Andrew's patches just fill out a couple of missing cases that are handled in the existing API. Putting all the patches together, ISTM there might be a function or two too many. I'm not sure why the json_ prefix was abandoned for build_json_object and build_json_array. I'm quite happy to change it. Also, json_object is pretty weird to me, I'm not sure I see the advantage of a new serialization format, What? there is no new serialization format. This is a way to generate a json object in the existing format from a one or two dimensional array of text. c.f. |existing function hstore(text[]) = hstore| and I don't agree with the statement but it is the caller's reponsibility to ensure that keys are not repeated.. I think the caller should have no such responsibility. Keys should be able to repeated. They can be repeated, as they can in the current json text format. However, the function makes no attempt to deal with repeated keys. If a key is repeated in the inout it will be repeated in the output. In this respect it differs from the hstore function. Note too, that one effect of moving to a non-text representation of json will be that duplicated keys will be resolved (last value will win). But that's a much wider issue that this function. Also, I'm not sure how the {k,v,k,v,k,v}...convention serialized into a string is very useful in general practice. I greatly prefer the aggregation and the variadic methods in json_build. The extension was built before json_build. But it met a requirement that existed at the time. It probably wouldn't be a tragedy to leave it out, but there is probably a place for it just as there is for the hstore function. Putting it all together, I'd consider: *) dropping json_object (although maybe there is a case I'm not thinking about) *) changing json_build function names to get the json prefix *) adding a json object constructor that takes two parallel arrays as arguments. The third point seems to conflict with the first. I'd only consider that if we *do* add the one-array version of json_object. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clang 3.3 Analyzer Results
Tom Lane t...@sss.pgh.pa.us wrote: No, this isn't about test code vs production, it's about not bothering to free memory explicitly when a program is about to terminate. Alvaro is suggesting that the proposed addition to pg_regress.c is just a waste of cycles. IMO it's not that big a deal either way in this case, since it's just one line of code that isn't going to take too long. Right. IMV, it's easier in this case to silence the warnings for all future static code analysis runs, by this tool or any other, by fixing it rather than having this particular triviality resurface to annoy anyone in the future. Fix pushed to the master branch. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] alter_table regression test problem
On Mon, Nov 11, 2013 at 4:34 PM, Andres Freund and...@2ndquadrant.com wrote: I'm pretty sure that the current coding, which blows away the whole relation, is used in other places, and I really don't see why it should be fundamentally flawed, or why we should change it to clear the cache entries out one by one instead of en masse. RelidByRelfilenode definitely needs to use HASH_FIND rather than HASH_ENTER, so that part I agree with. It surely is possible to go that route, but imagine what happens if the heap_open() blows away the entire hash. We'd either need to recheck if the hash exists before entering or recreate it after dropping. It seemed simpler to follow attoptcache's example. I'm not sure if this is the best way forward, but I don't feel like arguing about it, either, so committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ERROR during end-of-xact/FATAL
On Tue, Nov 12, 2013 at 09:55:34AM -0500, Robert Haas wrote: On Fri, Nov 8, 2013 at 4:13 PM, Noah Misch n...@leadboat.com wrote: A PANIC will reinitialize everything relevant, largely resolving the problems around ERROR during FATAL. It's a heavy-handed solution, but it may well be the best solution. Efforts to harden CommitTransaction() and AbortTransaction() seem well-spent, but the additional effort to make FATAL exit cope where AbortTransaction() or another exit action could not cope seems to be slicing ever-smaller portions of additional robustness. I pondered a variant of that conclusion that distinguished critical cleanup needs from the rest. Each shared resource (heavyweight locks, buffer pins, LWLocks) would have an on_shmem_exit() callback that cleans up the resource under a critical section. (AtProcExit_Buffers() used to fill such a role, but resowner.c's work during AbortTransaction() has mostly supplanted it.) The ShutdownPostgres callback would not use a critical section, so lesser failures in AbortTransaction() would not upgrade to a PANIC. But I'm leaning against such a complication on the grounds that it would add seldom-tested code paths posing as much a chance of eroding robustness as bolstering it. The current situation is just plain weird: in the ERROR-then-ERROR case, we emit a WARNING and bounce right back into AbortTransaction(), and if it doesn't work any better the second time than the first time, we recurse again, and eventually if it fails enough times in a row, we just give up and PANIC. But in the ERROR-then-FATAL case, we *don't* retry AbortTransaction(); instead, we just continue running the rest of the on_shmem_exit callbacks and then exit. So, in short, ERROR + ERROR*10 = PANIC, but FATAL + ERROR*10 = FATAL. That's bizarre. Quite so. Given that that's where we are, promoting an ERROR during FATAL processing to PANIC doesn't seem like it's losing much; we're essentially already doing that in the (probably more likely) case of a persistent ERROR during ERROR processing. But since PANIC sucks, I'd rather go the other direction: let's make an ERROR during ERROR processing promote to FATAL. And then let's do what you write above: make sure that there's a separate on-shmem-exit callback for each critical shared memory resource and that we call of those during FATAL processing. Many of the factors that can cause AbortTransaction() to fail can also cause CommitTransaction() to fail, and those would still PANIC if the transaction had an xid. How practical might it be to also escape from an error during CommitTransaction() with a FATAL instead of PANIC? There's more to fix up in that case (sinval, NOTIFY), but it may be within reach. If such a technique can only reasonably fix abort, though, I have doubts it buys us enough. It seems to me that that's how things were originally designed to work, but that we've drifted away from it basically because the low-level callbacks to release heavyweight locks and buffer pins turned out to be kinda, uh, slow, and we thought those code paths couldn't be taken anyway (turns out they can). I think we could either make those routines very fast, or arrange only to run that code at all in the case where AbortTransaction() didn't complete successfully. Agreed; the performance hazards look tractable. It's true that such code will be rarely run, but the logic is simple enough that I think we can verify it by hand, and it's sure nice to avoid PANICs. True. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.6
On 2013-11-12 19:24:39 +0100, Andres Freund wrote: On 2013-11-12 13:18:19 -0500, Robert Haas wrote: On Tue, Nov 12, 2013 at 12:50 PM, Andres Freund and...@2ndquadrant.com wrote: Completely agreed. As evidenced by the fact that the current change doesn't update all relevant comments code. I wonder if we shouldn't leave the function the current way and just add a new function for the new behaviour. The hard thing with that would be coming up with a new name. IsSystemRelationId() having a different behaviour than IsSystemRelation() seems strange to me, so just keeping that and adapting the callers seems wrong to me. IsInternalRelation()? IsCatalogRelation()? Well, I went through and looked at the places that were affected by this and I tend to think that most places will be happier with the new definition. I agree that many if not most want the new definition. If there are call sites that want the existing test, maybe we should have IsRelationInSystemNamespace() for that, and reserve IsSystemRelation() for the test as to whether it's a bona fide system catalog. The big reason that I think we do not want the new behaviour for all is: *NB: TOAST relations are considered system relations by this test *for compatibility with the old IsSystemRelationName function. *This is appropriate in many places but not all. Where it's not, *also check IsToastRelation. the current state of things would allow to modify toast relations in some places :/ So, I think I found a useful defintion of IsSystemRelation() that fixes many of the issues with moving relations to pg_catalog: Continue to treat all pg_toast.* relations as system tables, but only consider initdb created relations in pg_class. I've then added IsCatalogRelation() which has a narrower definition of system relations, namely, it only counts toast tables if they are a catalog's toast table. This allows far more actions on user defined relations moved to pg_catalog. Now they aren't stuck there anymore and can be renamed, dropped et al. With one curious exception: We still cannot move a relation out of pg_catalog. I've included a hunk to allow creation of indexes on relations in pg_catalog in heap_create(), indexes on catalog relations are prevented way above, but maybe that should rather be a separate commit. What do you think? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services From 91a22bd7fb998b609e73a69b941999596ce4569f Mon Sep 17 00:00:00 2001 From: Andres Freund and...@anarazel.de Date: Wed, 13 Nov 2013 16:15:16 +0100 Subject: [PATCH] Don't regard user defined relations in pg_catalog as system tables. This allows for a more consistent behaviour of user defined relations in pg_catalog. As before it's not possible to create relations in pg_catalog, but now they can, after being moved there, manipulated normally. Before it was impossible to move or drop user defined relations in pg_catalog, now that's allowed. To that end, modify IsSystemRelation/Class() to not regard user defined tables in pg_catalog as system relations and add IsCatalogRelation/Class() which don't regard toast relations as catalog relation unless they are a catalog relation's toast table. This is also preparation for logical decoding which needs IsCatalogRelation(). --- src/backend/access/heap/heapam.c | 2 +- src/backend/catalog/aclchk.c | 2 +- src/backend/catalog/catalog.c | 69 ++- src/backend/catalog/heap.c| 11 - src/backend/commands/cluster.c| 2 +- src/backend/commands/indexcmds.c | 5 ++- src/backend/commands/tablecmds.c | 8 ++-- src/backend/commands/trigger.c| 2 +- src/backend/optimizer/util/plancat.c | 2 +- src/backend/rewrite/rewriteDefine.c | 4 +- src/backend/tcop/utility.c| 2 +- src/include/catalog/catalog.h | 4 +- src/test/regress/expected/alter_table.out | 34 +++ src/test/regress/sql/alter_table.sql | 28 + 14 files changed, 148 insertions(+), 27 deletions(-) diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index a21f31b..bbb71c7 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -2465,7 +2465,7 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples, * because the heaptuples data structure is all in local memory, not in * the shared buffer. */ - if (IsSystemRelation(relation)) + if (IsCatalogRelation(relation)) { for (i = 0; i ntuples; i++) CacheInvalidateHeapTuple(relation, heaptuples[i], NULL); diff --git a/src/backend/catalog/aclchk.c b/src/backend/catalog/aclchk.c index 06aa766..5a46fd9 100644 --- a/src/backend/catalog/aclchk.c +++
Re: [HACKERS] Re: Exempting superuser from row-security isn't enough. Run predicates as DEFINER?
On Mon, Nov 11, 2013 at 11:10 PM, Craig Ringer cr...@2ndquadrant.com wrote: On 11/11/2013 06:37 PM, Kohei KaiGai wrote: I'd like to vote the last options. It is a separate problem (or, might be specification), I think. I tend to agree, but I'm nervous about entirely hand-waving around this, as doing so would *expand* the existing problem. Suppose we define a new GUC, allow_row_level_security, which defaults to true. When set to false, any attempt to access a table protected with RLS will either (1) bypass RLS, if you have sufficient privileges to do that (presumably table owner and superuser, at least, would be sufficient) or (2) fail with an error if RLS cannot be bypassed. But, when allow_row_level_security is false, *under no circumstances* will we evaluate RLS quals - it's bypass-or-error. Then, we can teach pg_dump to set allow_row_level_security = false on server versions = 9.4, with an option --allow-row-level-security that bypasses this behavior. With these changes, pg_dump is safe by default, not only against hijacking attacks but against accidentally failing to dump all the data because you fail to realize that you're subject to an RLS qual. You'll either get a clean, restorable dump, or you'll fail with an easy-to-understand error. In the latter case, if you want to try to back up that portion of the table you can access, there's an option for that behavior, which can be documented to imply trust in the table owner. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Wed, Nov 13, 2013 at 9:32 AM, Andrew Dunstan and...@dunslane.net wrote: I'm not so sure we should require hstore to do things like build arbitrary json objects even though I agree that hstore will probably displace json for must cases where you want to store nested data (as opposed to (de-)serialize). I have no idea what this means. What I'm saying there is I agree: what's going with hstore is not relevant here. json features currently provide standalone serialization and deserialization minus a couple of edge cases that you are fixing up here. Hstore will emerge as a json manipulation engine. Aside: I thought we blew it (as you know) by not unifying the hstore and json APIs in the 9.2 cycle and now with the emerging json stuff I really think so...it's way to late to do anything about it now even if there was consensus on that point. Also, json_object is pretty weird to me, I'm not sure I see the advantage of a new serialization format, What? there is no new serialization format. This is a way to generate a json object in the existing format from a one or two dimensional array of text. c.f. |existing function hstore(text[]) = hstore| Right -- I thought it took text, not text[] -- withdrawn. I consider a variant taking (text[], text[]) to be generally more practical than the one argument version (either 1d or 2d variant). Dealing with 2d arrays is a headache unless you really know what you're doing. and I don't agree with the statement but it is the caller's reponsibility to ensure that keys are not repeated.. I think the caller should have no such responsibility. Keys should be able to repeated. They can be repeated, as they can in the current json text format. However, the function makes no attempt to deal with repeated keys. If a key is repeated in the inout it will be repeated in the output. In this respect it differs from the hstore function. Yes. and I think this is one of the major advantages of the json API vs hstore: you can serialize objects that hstore cannot -- at least not without extra scaffolding (at least, AIUI, I haven't fully grappled with the coming hstore stuff yet). In other words, just because key order and cardinality is unimportant in an associative array, it does not in any way follow it is similarly unimportant for object serialization. Note too, that one effect of moving to a non-text representation of json will be that duplicated keys will be resolved (last value will win). But that's a much wider issue that this function. Right, exactly. And I think this a pretty serious problem with 'non text json' unless there is a relatively robust and fast process to recompose the json properly for serialization purposes (but that's mostly off topic for your proposed patch). Also, I'm not sure how the {k,v,k,v,k,v}...convention serialized into a string is very useful in general practice. I greatly prefer the aggregation and the variadic methods in json_build. The extension was built before json_build. But it met a requirement that existed at the time. It probably wouldn't be a tragedy to leave it out, but there is probably a place for it just as there is for the hstore function. Putting it all together, I'd consider: *) dropping json_object (although maybe there is a case I'm not thinking about) *) changing json_build function names to get the json prefix *) adding a json object constructor that takes two parallel arrays as arguments. The third point seems to conflict with the first. I'd only consider that if we *do* add the one-array version of json_object. ok, agreed. so now I'm just saying to unify function names over json prefix and maybe add text[], text[] variant for the object builder, or maybe just drop json_object completely. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Errors on missing pg_subtrans/ files with 9.3
Looks like we got another set of errors overnight. Here's the log file from the errors. (Log file scrubbed slightly to remove private data, but still representative of the problem I believe.) Nov 13 05:34:34 dev postgres[6084]: [4-1] user=dev,db=dev ERROR: could not access status of transaction 6337381 Nov 13 05:34:34 dev postgres[6084]: [4-2] user=dev,db=dev DETAIL: Could not open file pg_subtrans/0060: No such file or directory. Nov 13 05:34:34 dev postgres[6084]: [4-3] user=dev,db=dev CONTEXT: SQL statement SELECT 1 FROM ONLY typhon.collection_batches x WHERE id OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x Nov 13 05:34:34 dev postgres[6084]: [4-4] user=dev,db=dev STATEMENT: update listings set deactivated_at=$1 where id=$2 and lock_version=$3 Nov 13 05:34:34 dev postgres[6076]: [4-1] user=dev,db=dev ERROR: could not access status of transaction 6337381 Nov 13 05:34:34 dev postgres[6076]: [4-2] user=dev,db=dev DETAIL: Could not open file pg_subtrans/0060: No such file or directory. Nov 13 05:34:34 dev postgres[6076]: [4-3] user=dev,db=dev CONTEXT: SQL statement SELECT 1 FROM ONLY typhon.collection_batches x WHERE id OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x Nov 13 05:34:34 dev postgres[6076]: [4-4] user=dev,db=dev STATEMENT: update listings set deactivated_at=$1 where id=$2 and lock_version=$3 Nov 13 05:34:34 dev postgres[6087]: [4-1] user=dev,db=dev ERROR: could not access status of transaction 6337381 Nov 13 05:34:34 dev postgres[6087]: [4-2] user=dev,db=dev DETAIL: Could not open file pg_subtrans/0060: No such file or directory. Nov 13 05:34:34 dev postgres[6087]: [4-3] user=dev,db=dev CONTEXT: SQL statement SELECT 1 FROM ONLY typhon.collection_batches x WHERE id OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x Nov 13 05:34:34 dev postgres[6087]: [4-4] user=dev,db=dev STATEMENT: update listings set deactivated_at=$1 where id=$2 and lock_version=$3 Nov 13 05:34:34 dev postgres[6086]: [4-1] user=dev,db=dev ERROR: could not access status of transaction 6337381 Nov 13 05:34:34 dev postgres[6086]: [4-2] user=dev,db=dev DETAIL: Could not open file pg_subtrans/0060: No such file or directory. Nov 13 05:34:34 dev postgres[6086]: [4-3] user=dev,db=dev CONTEXT: SQL statement SELECT 1 FROM ONLY typhon.collection_batches x WHERE id OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x Nov 13 05:34:34 dev postgres[6086]: [4-4] user=dev,db=dev STATEMENT: update listings set deactivated_at=$1 where id=$2 and lock_version=$3 Nov 13 05:34:34 dev postgres[6088]: [4-1] user=dev,db=dev ERROR: could not access status of transaction 6337381 Nov 13 05:34:34 dev postgres[6088]: [4-2] user=dev,db=dev DETAIL: Could not open file pg_subtrans/0060: No such file or directory. Nov 13 05:34:34 dev postgres[6088]: [4-3] user=dev,db=dev CONTEXT: SQL statement SELECT 1 FROM ONLY typhon.collection_batches x WHERE id OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x Nov 13 05:34:34 dev postgres[6088]: [4-4] user=dev,db=dev STATEMENT: update listings set deactivated_at=$1 where id=$2 and lock_version=$3 Nov 13 05:34:34 dev postgres[6085]: [4-1] user=dev,db=dev ERROR: could not access status of transaction 6337381 Nov 13 05:34:34 dev postgres[6085]: [4-2] user=dev,db=dev DETAIL: Could not open file pg_subtrans/0060: No such file or directory. Nov 13 05:34:34 dev postgres[6085]: [4-3] user=dev,db=dev CONTEXT: SQL statement SELECT 1 FROM ONLY typhon.collection_batches x WHERE id OPERATOR(pg_catalog.=) $1 FOR KEY SHARE OF x Nov 13 05:34:34 dev postgres[6085]: [4-4] user=dev,db=dev STATEMENT: update listings set deactivated_at=$1 where id=$2 and lock_version=$3 Several processes all seemed to hit the problem at the same moment, and all of them refer to the same transaction ID. Again, the file pg_subtrans/0060 doesn't exist, and the only file that does exist is pg_subtrans/005A which appears to be a zeroed-out file 245760 bytes in length. Still don't have a clue as to how I can reproduce the problem. It seems that in all cases the error occurred during either an UPDATE to a table_X or an INSERT to table_Y. In all cases, the error occurred in a manner identical to those shown in the log above, the only difference being either an UPDATE on table_X or an INSERT on table_Y. Not sure what direction I should head to now. Perhaps some aggressive logging would help, so we can see the queries surrounding the problems? I could reconfigure things to capture all statements and set up monit or something to send an alert when the problem resurfaces, for instance. Cheers all. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Clang 3.3 Analyzer Results
Kevin Grittner kgri...@ymail.com writes: If nobody objects, I'll fix that small memory leak in the regression test driver. Hopefully someone more familiar with pg_basebackup will fix the double-free (and related problems mentioned by Tom) in streamutil.c. Here's a less convoluted (IMHO) approach to the password management logic in streamutil.c. One thing I really didn't care for about the existing coding is that the loop-for-password included all the rest of the function, even though there's no intention to retry for any purpose except collecting a password. So I moved up the bottom of the loop. For ease of review, I've not reindented the code below the new loop bottom, but would do so before committing. Any objections to this version? regards, tom lane diff --git a/src/bin/pg_basebackup/streamutil.c b/src/bin/pg_basebackup/streamutil.c index 1dfb80f..c89fca9 100644 *** a/src/bin/pg_basebackup/streamutil.c --- b/src/bin/pg_basebackup/streamutil.c *** GetConnection(void) *** 40,47 int i; const char **keywords; const char **values; - char *password = NULL; const char *tmpparam; PQconninfoOption *conn_opts = NULL; PQconninfoOption *conn_opt; char *err_msg = NULL; --- 40,47 int i; const char **keywords; const char **values; const char *tmpparam; + bool need_password; PQconninfoOption *conn_opts = NULL; PQconninfoOption *conn_opt; char *err_msg = NULL; *** GetConnection(void) *** 114,140 i++; } while (true) { ! if (password) ! free(password); if (dbpassword) { - /* - * We've saved a password when a previous connection succeeded, - * meaning this is the call for a second session to the same - * database, so just forcibly reuse that password. - */ keywords[i] = password; values[i] = dbpassword; - dbgetpassword = -1; /* Don't try again if this fails */ } ! else if (dbgetpassword == 1) { ! password = simple_prompt(_(Password: ), 100, false); ! keywords[i] = password; ! values[i] = password; } tmpconn = PQconnectdbParams(keywords, values, true); --- 114,143 i++; } + /* If -W was given, force prompt for password, but only the first time */ + need_password = (dbgetpassword == 1 dbpassword == NULL); + while (true) { ! /* Get a new password if appropriate */ ! if (need_password) ! { ! if (dbpassword) ! free(dbpassword); ! dbpassword = simple_prompt(_(Password: ), 100, false); ! need_password = false; ! } + /* Use (or reuse, on a subsequent connection) password if we have it */ if (dbpassword) { keywords[i] = password; values[i] = dbpassword; } ! else { ! keywords[i] = NULL; ! values[i] = NULL; } tmpconn = PQconnectdbParams(keywords, values, true); *** GetConnection(void) *** 150,163 exit(1); } if (PQstatus(tmpconn) == CONNECTION_BAD PQconnectionNeedsPassword(tmpconn) dbgetpassword != -1) { - dbgetpassword = 1; /* ask for password next time */ PQfinish(tmpconn); ! continue; } if (PQstatus(tmpconn) != CONNECTION_OK) { --- 153,169 exit(1); } + /* If we need a password and -w wasn't given, loop back and get one */ if (PQstatus(tmpconn) == CONNECTION_BAD PQconnectionNeedsPassword(tmpconn) dbgetpassword != -1) { PQfinish(tmpconn); ! need_password = true; } + else + break; + } if (PQstatus(tmpconn) != CONNECTION_OK) { *** GetConnection(void) *** 204,212 exit(1); } - /* Store the password for next run */ - if (password) - dbpassword = password; return tmpconn; - } } --- 210,214 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] pg_basebackup: progress report max once per second
Prevent excessive progress reporting that can grow to gigabytes of output with large databases. --- src/bin/pg_basebackup/pg_basebackup.c | 21 - 1 file changed, 12 insertions(+), 9 deletions(-) diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index a1e12a8..90c4683 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -45,6 +45,7 @@ bool streamwal = false; bool fastcheckpoint = false; bool writerecoveryconf = false; intstandby_message_timeout = 10 * 1000;/* 10 sec = default */ +intlast_progress_report = 0; /* Progress counters */ static uint64 totalsize; @@ -74,7 +75,7 @@ static PQExpBuffer recoveryconfcontents = NULL; /* Function headers */ static void usage(void); static void verify_dir_is_empty_or_create(char *dirname); -static void progress_report(int tablespacenum, const char *filename); +static void progress_report(int tablespacenum, const char *filename, int force); static void ReceiveTarFile(PGconn *conn, PGresult *res, int rownum); static void ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum); @@ -399,12 +400,15 @@ verify_dir_is_empty_or_create(char *dirname) * is enabled, also print the current file name. */ static void -progress_report(int tablespacenum, const char *filename) +progress_report(int tablespacenum, const char *filename, int force) { int percent = (int) ((totaldone / 1024) * 100 / totalsize); chartotaldone_str[32]; chartotalsize_str[32]; + if(!showprogress || (time(NULL) == last_progress_report !force)) return; /* Max once per second */ + last_progress_report = time(NULL); + /* * Avoid overflowing past 100% or the full size. This may make the total * size number change as we approach the end of the backup (the estimate @@ -850,9 +854,9 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum) } } totaldone += r; - if (showprogress) - progress_report(rownum, filename); + progress_report(rownum, filename, 0); } /* while (1) */ + progress_report(rownum, filename, 1); if (copybuf != NULL) PQfreemem(copybuf); @@ -1073,8 +1077,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum) disconnect_and_exit(1); } totaldone += r; - if (showprogress) - progress_report(rownum, filename); + progress_report(rownum, filename, 0); current_len_left -= r; if (current_len_left == 0 current_padding == 0) @@ -1090,6 +1093,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum) } } /* continuing data in existing file */ } /* loop over all data blocks */ + progress_report(rownum, filename, 1); if (file != NULL) { @@ -1450,8 +1454,7 @@ BaseBackup(void) tablespacecount = PQntuples(res); for (i = 0; i PQntuples(res); i++) { - if (showprogress) - totalsize += atol(PQgetvalue(res, i, 2)); + totalsize += atol(PQgetvalue(res, i, 2)); /* * Verify tablespace directories are empty. Don't bother with the @@ -1498,7 +1501,7 @@ BaseBackup(void) if (showprogress) { - progress_report(PQntuples(res), NULL); + progress_report(PQntuples(res), NULL, 1); fprintf(stderr, \n); /* Need to move to next line */ } PQclear(res); -- 1.8.4.2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 11/13/2013 11:37 AM, Merlin Moncure wrote: Yes. and I think this is one of the major advantages of the json API vs hstore: you can serialize objects that hstore cannot -- at least not without extra scaffolding (at least, AIUI, I haven't fully grappled with the coming hstore stuff yet). In other words, just because key order and cardinality is unimportant in an associative array, it does not in any way follow it is similarly unimportant for object serialization. I think you're probably going to lose any argument that says we should necessarily preserve key order (and possibly key duplication) in objects. The standard doesn't support such a contention, either: An object is an unordered collection of zero or more name/value pairs ... The names within an object SHOULD be unique. Forcing us to preserve order and key duplication would be a pretty effective barrier to any performance improvements. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore patch
On 11/12/13, 1:35 PM, Teodor Sigaev wrote: Attatched patch adds nesting feature, types (string, boll and numeric values), arrays and scalar to hstore type. Could you check your email client for next time? It's sending Content-Type: application/x-tar for a *.patch.gz file. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transaction-lifespan memory leak with plpgsql DO blocks
Robert Haas robertmh...@gmail.com writes: On Tue, Nov 12, 2013 at 11:18 AM, Tom Lane t...@sss.pgh.pa.us wrote: Or we could say what the heck are you doing executing tens of thousands of DO blocks? Make it into a real live function; you'll save a lot of cycles on parsing costs. I'm not sure that this is a usage pattern we ought to be optimizing for. I'm not volunteering to spend time fixing this, but I disagree with the premise that it isn't worth fixing, because I think it's a POLA violation. Yeah, I'm not terribly comfortable with letting it go either. Attached is a proposed patch. I couldn't see any nice way to do it without adding a field to PLpgSQL_execstate, so this isn't a feasible solution for back-patching (it'd break the plpgsql debugger). However, given the infrequency of complaints, I think fixing it in 9.4 and up is good enough. I checked that this eliminates the memory leak using this test case: do $outer$ begin for i in 1..100 loop execute $e$ do $$ declare x int = 0; begin x := x + 1; end; $$; $e$; end loop; end; $outer$; which eats a couple GB in HEAD and nothing with the patch. The run time seems to be the same or a bit less, too. Any objections to applying this to HEAD? regards, tom lane diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index bc31fe9..76da842 100644 *** a/src/pl/plpgsql/src/pl_exec.c --- b/src/pl/plpgsql/src/pl_exec.c *** typedef struct *** 50,81 bool *freevals; /* which arguments are pfree-able */ } PreparedParamsData; ! /* ! * All plpgsql function executions within a single transaction share the same ! * executor EState for evaluating simple expressions. Each function call ! * creates its own eval_econtext ExprContext within this estate for ! * per-evaluation workspace. eval_econtext is freed at normal function exit, ! * and the EState is freed at transaction end (in case of error, we assume ! * that the abort mechanisms clean it all up). Furthermore, any exception ! * block within a function has to have its own eval_econtext separate from ! * the containing function's, so that we can clean up ExprContext callbacks ! * properly at subtransaction exit. We maintain a stack that tracks the ! * individual econtexts so that we can clean up correctly at subxact exit. ! * ! * This arrangement is a bit tedious to maintain, but it's worth the trouble ! * so that we don't have to re-prepare simple expressions on each trip through ! * a function. (We assume the case to optimize is many repetitions of a ! * function within a transaction.) ! */ ! typedef struct SimpleEcontextStackEntry ! { ! ExprContext *stack_econtext; /* a stacked econtext */ ! SubTransactionId xact_subxid; /* ID for current subxact */ ! struct SimpleEcontextStackEntry *next; /* next stack entry up */ ! } SimpleEcontextStackEntry; ! ! static EState *simple_eval_estate = NULL; ! static SimpleEcontextStackEntry *simple_econtext_stack = NULL; / * Local function forward declarations --- 50,57 bool *freevals; /* which arguments are pfree-able */ } PreparedParamsData; ! /* Shared simple-expression eval context for regular plpgsql functions */ ! static SimpleEvalContext simple_eval_context = {NULL, NULL}; / * Local function forward declarations *** static int exec_stmt_dynfors(PLpgSQL_exe *** 136,142 static void plpgsql_estate_setup(PLpgSQL_execstate *estate, PLpgSQL_function *func, ! ReturnSetInfo *rsi); static void exec_eval_cleanup(PLpgSQL_execstate *estate); static void exec_prepare_plan(PLpgSQL_execstate *estate, --- 112,119 static void plpgsql_estate_setup(PLpgSQL_execstate *estate, PLpgSQL_function *func, ! ReturnSetInfo *rsi, ! SimpleEvalContext *simple_context); static void exec_eval_cleanup(PLpgSQL_execstate *estate); static void exec_prepare_plan(PLpgSQL_execstate *estate, *** static char *format_preparedparamsdata(P *** 230,239 /* -- * plpgsql_exec_function Called by the call handler for *function execution. * -- */ Datum ! plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo fcinfo) { PLpgSQL_execstate estate; ErrorContextCallback plerrcontext; --- 207,222 /* -- * plpgsql_exec_function Called by the call handler for *function execution. + * + * This is also used to execute inline code blocks (DO blocks). The only + * difference that this code is aware of is that for a DO block, we want + * to use a private SimpleEvalContext, whose address must be passed as + * simple_context. For regular functions, pass NULL. * -- */ Datum ! plpgsql_exec_function(PLpgSQL_function *func, FunctionCallInfo
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
On Tue, Nov 12, 2013 at 03:57:52PM +0900, Tatsuo Ishii wrote: I have been thinking about this for years and I think the key idea for this is, implementing universal encoding. The universal encoding should have following characteristics to implement N2 encoding in a database. 1) no loss of round trip encoding conversion 2) no mapping table is necessary to convert from/to existing encodings Once we implement the universal encoding, other problem such as pg_database with multiple encoding problem can be solved easily. Isn't this essentially what the MULE internal encoding is? Currently there's no such an universal encoding in the universe, I think the only way is, inventing it by ourselves. This sounds like a terrible idea. In the future people are only going to want more advanced text functions, regular expressions, indexing and making encodings that don't exist anywhere else seems like a way to make a lot of work for little benefit. A better idea seems to me is to (if postgres is configured properly) embed the non-round-trippable characters in the custom character part of the unicode character set. In other words, adjust the mappings tables on demand and voila. Have a nice day, -- Martijn van Oosterhout klep...@svana.org http://svana.org/kleptog/ He who writes carelessly confesses thereby at the very outset that he does not attach much importance to his own thoughts. -- Arthur Schopenhauer signature.asc Description: Digital signature
[HACKERS] pg_upgrade rebuild_tsvector_tables.sql includes child table columns
When pg_upgrade generates a rebuild_tsvector_tables.sql script to rewrite tsvector columns, it includes ALTER TABLE commands for child tables that cannot be altered independently from their parents: psql:rebuild_tsvector_tables.sql:673: ERROR: cannot alter inherited column xxx This isn't a problem unless you run the script in single-transaction mode, because the commands will just fail and the work has already been done in the parent table. It's probably not worth fixing this, given that this only applies to upgrades from 8.3, but I wanted to get it into the archives. If we ever need to do something like this again, we should make it smarter. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
Martijn van Oosterhout klep...@svana.org writes: On Tue, Nov 12, 2013 at 03:57:52PM +0900, Tatsuo Ishii wrote: Once we implement the universal encoding, other problem such as pg_database with multiple encoding problem can be solved easily. Isn't this essentially what the MULE internal encoding is? MULE is completely evil. It has N different encodings for the same character, not to mention no support code available. Currently there's no such an universal encoding in the universe, I think the only way is, inventing it by ourselves. This sounds like a terrible idea. In the future people are only going to want more advanced text functions, regular expressions, indexing and making encodings that don't exist anywhere else seems like a way to make a lot of work for little benefit. Agreed. A better idea seems to me is to (if postgres is configured properly) embed the non-round-trippable characters in the custom character part of the unicode character set. In other words, adjust the mappings tables on demand and voila. From the standpoint of what will happen with existing library code (like strcoll), I'm not sure it's all that easy. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] hail the CFM
On 14/11/13 02:45, Peter Eisentraut wrote: The commit fest manager mace has been passed on to me[*]. More to follow. [*] Actually, I found it behind the dumpster in the alley. Did you take care not to touch it with bare skin properly sterilize it? As a used mace, may pick up blood from people with diseases you'd rather not know about... Ensure that you know how to use it effectively, as PostgreSQL developers are surly lot and respond well to a good bit of applied violence (just ask Tom Lane about his extensive list of fractured bones other old war injuries). If you are not sure how to use it, then there are plenty of sites on the web to help you, such as: http://www.lordsandladies.org/maces.htm Also ensure you apply for the correct licence from your local Police Station, as they will be very interested in how you intend to use such a weapon, so it is very important to consult them before you first use it! Cheers, Gavin (Who is now glad that he not actually a PostgreSQL developer) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Getting the clog bits for a particular xid
As part of doing some database corruption investigation, I'm trying to get the pg_clog/ bit pair for a particular transaction. Let's say we check on a particular tuple, and get: SELECT xmin, id FROM mytable WHERE pk=4727366; xmin|id ---+-- 107898222 | 4727366 Each pg_clog file (as least, as of 9.3.1) is 256KB, so there are 256KB * 8 bits/byte / 2 bits/transaction = 1M transactions per file So: 107898222 / 1048576 = 102, or 0x0066 107898222 % 1048576 = 943470. So, we're looking at file 0x0066. It's the 943470th transaction in that file, or the 943470*2 = 1886940th bit. So, (counting from the MSB being 0), it's the 4th and 5th bit of byte offset 235867 in that file. Is that correct? -- -- Christophe Pettus x...@thebuild.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan and...@dunslane.net wrote: On 11/13/2013 11:37 AM, Merlin Moncure wrote: Yes. and I think this is one of the major advantages of the json API vs hstore: you can serialize objects that hstore cannot -- at least not without extra scaffolding (at least, AIUI, I haven't fully grappled with the coming hstore stuff yet). In other words, just because key order and cardinality is unimportant in an associative array, it does not in any way follow it is similarly unimportant for object serialization. An object is an unordered collection ofz ero or more name/value pairs ... The names within an object SHOULD be unique. Forcing us to preserve order and key duplication would be a pretty effective barrier to any performance improvements. SHOULD != MUST. Here is the definition of object per RFC 4627. An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string. A single colon comes after each name, separating the name from the value. A single comma separates a value from a following name. The names within an object SHOULD be unique. And SHOULD means 3. SHOULD. This word, or the adjective RECOMMENDED, mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course. As far as I'm concerned, that settles things right there. Beyond that (although they do say 'unordered' above), as a consequence of your argument the json strings {a: 1, b: 2} and {b: 1, a: 2} should be considered equivalent. Another consequence is that creating particular legal constructions should be discouraged. I disagree with this. This is simply not the case with many json consuming clients. It's a nice idea but not how things work universally and that's exactly why the rules were hedged in the RFC. I have a couple of cases right now where I'm producing key order sensitive json for some (admittedly not very well designed) json consuming clients that are out of my control. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical changeset generation v6.5
On 11/9/13, 5:56 AM, Andres Freund wrote: ISTM ecpg's regression tests should be built (not run!) during $(recurse) not just during make check. Actually, I did just the opposite change some years ago. The rationale is, the build builds that which you want to install. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add min and max execute statement time in pg_stat_statement
On Wed, Oct 23, 2013 at 8:52 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote: Hmm, now if we had portable atomic addition, so that we could spare the spinlock ... That certainly seems like an interesting possibility. I think that pg_stat_statements should be made to do this kind of thing by a third party tool that aggregates snapshots of deltas. Time-series data, including (approximate) *local* minima and maxima should be built from that. I think tools like KONDO-san's pg_statsinfo tool have an important role to play here. I would like to see it or a similar tool become a kind of defacto standard for consuming pg_stat_statements' output. At this point we are in general very much chasing diminishing returns by adding new things to the counters struct, particularly given that it's currently protected by a spinlock. And adding a histogram or min/max for something like execution time isn't an approach that can be made to work for every existing cost tracked by pg_stat_statements. So, taking all that into consideration, I'm afraid this patch gets a -1 from me. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 11/13/2013 06:45 AM, Merlin Moncure wrote: I'm not so sure we should require hstore to do things like build Also, json_object is pretty weird to me, I'm not sure I see the advantage of a new serialization format, and I don't agree with the statement but it is the caller's reponsibility to ensure that keys are not repeated.. This is pretty standard in the programming languages I know of which use JSON. I think the caller should have no such responsibility. Keys should be able to repeated. Apparently your experience with using JSON in practice has been fairly different from mine; the projects I work on, the JSON is being constantly converted back and forth to hashes and dictionaries, which means that ordering is not preserved and keys have to be unique (or become unique within one conversion cycle). I think, based on the language of the RFC and common practice, that it's completely valid for us to require unique keys within JSON-manipulation routines. Certainly the upcoming binary storage is going to require unique keys. For that matter, both MongoDB and CouchDB store unique, unordered keys. And ever supporting CRUD functions (i.e. update this key) is going to require uniqueness. Putting it all together, I'd consider: *) dropping json_object (although maybe there is a case I'm not thinking about) *) changing json_build function names to get the json prefix *) adding a json object constructor that takes two parallel arrays as arguments. I was with you until the third idea. Huh? The scripting languages I use (Perl, Python) have functions which convert a list/array to a hash/dictionary. In each case, the standard input is a single list/array in the form [ k, v, k, v, k, v ]. Now, while there are standard language functions which support munging two parallel arrays into one hash (such as Python's zip()), these are less frequently used. Supporting the zip() option without supporting the [ k, v ] array option would be a bizarre and puzzling approach to most programmers I know. I can see three approaches which make sense: 1. we don't include json_object at all. 2. we include the existing json_object 3. we include json_object, plus a second json_object function which takes two arrays Keep in mind that all of Andrew's functions came out of real-life use cases of writing applications which return JSON to the caller, so they are based on real needs to fill holes in our JSON-building function library. In the case of json_object, the need was to supply column labels where, usually due to calculated columns, none exist in the input. Take the example where I want to return a bunch of aggregates from a table as a series of json objects with user-friendly labels: SELECT build_json_object( dept, department, total_costs, sum(costs), running_total, running_sum() ) FROM Where it becomes even more useful is when you want the json label to be the result of a calculated expression: SELECT build_json_object ( department, sum() ) Yes, you could do this with a two-array version as well; it's just not more intuitive, and in cases where you have dozens of columns, puts you in column-counting hell. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 11/13/2013 04:58 PM, Merlin Moncure wrote: On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan and...@dunslane.net wrote: On 11/13/2013 11:37 AM, Merlin Moncure wrote: Yes. and I think this is one of the major advantages of the json API vs hstore: you can serialize objects that hstore cannot -- at least not without extra scaffolding (at least, AIUI, I haven't fully grappled with the coming hstore stuff yet). In other words, just because key order and cardinality is unimportant in an associative array, it does not in any way follow it is similarly unimportant for object serialization. An object is an unordered collection ofz ero or more name/value pairs ... The names within an object SHOULD be unique. Forcing us to preserve order and key duplication would be a pretty effective barrier to any performance improvements. SHOULD != MUST. Here is the definition of object per RFC 4627. An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string. A single colon comes after each name, separating the name from the value. A single comma separates a value from a following name. The names within an object SHOULD be unique. And SHOULD means 3. SHOULD. This word, or the adjective RECOMMENDED, mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course. As far as I'm concerned, that settles things right there. Beyond that (although they do say 'unordered' above), as a consequence of your argument the json strings {a: 1, b: 2} and {b: 1, a: 2} should be considered equivalent. Another consequence is that creating particular legal constructions should be discouraged. I disagree with this. This is simply not the case with many json consuming clients. It's a nice idea but not how things work universally and that's exactly why the rules were hedged in the RFC. I have a couple of cases right now where I'm producing key order sensitive json for some (admittedly not very well designed) json consuming clients that are out of my control. I understand the difference between should and must. But there is nothing that REQUIRES us to preserve key order or duplicate keys. If you really need textual preservation, you should probably store the data as text and convert it to json to do json-ish things to it. If not, we're going to face huge demands to implement another type which almost everyone but you will move to in rapid order because it performs so much better. The strong consensus I have seen in discussions at conferences and elsewhere is to go the way we're going, instead. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 14/11/13 11:33, Andrew Dunstan wrote: On 11/13/2013 04:58 PM, Merlin Moncure wrote: On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan and...@dunslane.net wrote: On 11/13/2013 11:37 AM, Merlin Moncure wrote: Yes. and I think this is one of the major advantages of the json API vs hstore: you can serialize objects that hstore cannot -- at least not without extra scaffolding (at least, AIUI, I haven't fully grappled with the coming hstore stuff yet). In other words, just because key order and cardinality is unimportant in an associative array, it does not in any way follow it is similarly unimportant for object serialization. An object is an unordered collection ofz ero or more name/value pairs ... The names within an object SHOULD be unique. Forcing us to preserve order and key duplication would be a pretty effective barrier to any performance improvements. SHOULD != MUST. Here is the definition of object per RFC 4627. An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string. A single colon comes after each name, separating the name from the value. A single comma separates a value from a following name. The names within an object SHOULD be unique. And SHOULD means 3. SHOULD. This word, or the adjective RECOMMENDED, mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course. As far as I'm concerned, that settles things right there. Beyond that (although they do say 'unordered' above), as a consequence of your argument the json strings {a: 1, b: 2} and {b: 1, a: 2} should be considered equivalent. Another consequence is that creating particular legal constructions should be discouraged. I disagree with this. This is simply not the case with many json consuming clients. It's a nice idea but not how things work universally and that's exactly why the rules were hedged in the RFC. I have a couple of cases right now where I'm producing key order sensitive json for some (admittedly not very well designed) json consuming clients that are out of my control. I understand the difference between should and must. But there is nothing that REQUIRES us to preserve key order or duplicate keys. If you really need textual preservation, you should probably store the data as text and convert it to json to do json-ish things to it. If not, we're going to face huge demands to implement another type which almost everyone but you will move to in rapid order because it performs so much better. The strong consensus I have seen in discussions at conferences and elsewhere is to go the way we're going, instead. cheers andrew I can see that both points of view are valid, in different contexts. Would be possible to have a boolean, such as 'strict' - so that unique ordered was only imposed when strict was TRUE? Alternately, separate functions to allow the same choice? Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
Isn't this essentially what the MULE internal encoding is? No. MULE is not powerfull enough and overly complicated to deal with different encodings (character sets). Currently there's no such an universal encoding in the universe, I think the only way is, inventing it by ourselves. This sounds like a terrible idea. In the future people are only going to want more advanced text functions, regular expressions, indexing and making encodings that don't exist anywhere else seems like a way to make a lot of work for little benefit. That is probably a misunderstanding. We don't need to modify existing text handling modules such as text functions, regular expressions, indexing etc. We just convert from the universal encoding X to the original encoding before calling them. The process is pretty easy and fast because it just requires skipping encoding identifier and encoding length part. Basically the encoding X should be used for lower layer modules of PostgreSQL and higher layer module such as living in src/backend/utils/adt should not aware it. A better idea seems to me is to (if postgres is configured properly) embed the non-round-trippable characters in the custom character part of the unicode character set. In other words, adjust the mappings tables on demand and voila. Using Unicode requires overhead for encoding conversion because it needs to look up mapping tables. That will be a huge handicap for large data and that I want to avoid in the first place. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus j...@agliodbs.com wrote: On 11/13/2013 06:45 AM, Merlin Moncure wrote: I'm not so sure we should require hstore to do things like build Also, json_object is pretty weird to me, I'm not sure I see the advantage of a new serialization format, and I don't agree with the statement but it is the caller's reponsibility to ensure that keys are not repeated.. This is pretty standard in the programming languages I know of which use JSON. I think the caller should have no such responsibility. Keys should be able to repeated. Apparently your experience with using JSON in practice has been fairly different from mine; the projects I work on, the JSON is being constantly converted back and forth to hashes and dictionaries, which means that ordering is not preserved and keys have to be unique (or become unique within one conversion cycle). I think, based on the language of the RFC and common practice, that it's completely valid for us to require unique keys within JSON-manipulation routines. Common practice? The internet is littered with complaints about documents being spontaneously re-ordered and or de-duplicated in various stacks. Other stacks provide mechanisms for explicit key order handling (see here: http://docs.python.org/2/library/json.html). Why do you think they did that? I use pg/JSON all over the place. In several cases I have to create documents with ordered keys because the parser on the other side wants them that way -- this is not a hypothetical argument. The current json serialization API handles that just fine and the hstore stuff coming down the pike will not. I guess that's a done deal based on 'performance'. I'm clearly not the only one to have complained about this though. merln -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus j...@agliodbs.com wrote: Putting it all together, I'd consider: *) dropping json_object (although maybe there is a case I'm not thinking about) *) changing json_build function names to get the json prefix *) adding a json object constructor that takes two parallel arrays as arguments. I was with you until the third idea. Huh? I actually had a use case for this today, though with hstore, importing a fixed length record with something along the lines of: hstore( ARRAY['field 1', 'field 2', 'field 3'], regexp_matches(fixed_field,'(.{4})(.{10})(.{5})') ) __ *Mike Blackwell | Technical Analyst, Distribution Services/Rollout Management | RR Donnelley* 1750 Wallace Ave | St Charles, IL 60174-3401 Office: 630.313.7818 mike.blackw...@rrd.com http://www.rrdonnelley.com http://www.rrdonnelley.com/ * mike.blackw...@rrd.com*
Re: [HACKERS] additional json functionality
Merlin, I use pg/JSON all over the place. In several cases I have to create documents with ordered keys because the parser on the other side wants them that way -- this is not a hypothetical argument. The current json serialization API handles that just fine and the hstore stuff coming down the pike will not. I guess that's a done deal based on 'performance'. I'm clearly not the only one to have complained about this though. It's not just a matter of performance. It's the basic conflict of JSON as document format vs. JSON as data storage. For the latter, unique, unordered keys are required, or certain functionality isn't remotely possible: indexing, in-place key update, transformations, etc. XML went through the same thing, which is part of how we got a bunch of incompatible dialects of XML. Now, your use case does show us that there's a case to be made for still having text JSON even after we have binary JSON. There's a strong simplicity argument against that, though ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] -d option for pg_isready is broken
handyrep@john:~/handyrep$ pg_isready --version pg_isready (PostgreSQL) 9.3.1 handyrep@john:~/handyrep$ pg_isready -h john -p 5432 -U postgres -d postgres -q pg_isready: missing = after postgres in connection info string handyrep@john:~/handyrep$ pg_isready --host=john --port=5432 --user=postgres --dbname=postgres pg_isready: missing = after postgres in connection info string handyrep@john:~/handyrep$ pg_isready -h john -p 5432 -U postgres john:5432 - accepting connections so apparently the -d option: a) doesn't work, and b) doesn't do anything I suggest simply removing it from the utility. I'll note that the -U option doesn't appear to do anything relevant either, but at least it doesn't error unnecessarily: handyrep@john:~/handyrep$ pg_isready -h john -p 5432 -U no_such_user john:5432 - accepting connections -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] First patch -- somewhat trivial feature
This is my first attempt at writing a patch, so it's pretty simple. Commit log sums it up: Adds a convenience feature to the explain command which prints out GUC cost parameters in explain text output. For example: explain (params) select * from table; will include text output like the following: Cost Params: seq_page: 1.00, rnd_page: 4.00, cpu_tup: 0.01, cpu_ind: 0.005000, cpu_op: 0.002500 amenabled: 111 The bit vector is enable variables in the order listed in cost.h, though mainly provides a high level view on whether or not any strategies are disabled. -- I recognize that this is kind of a frivolous feature which may not be worth any potential maintenance burden, so submitted for what it's worth as an initial effort. Best Regards, Robert From 960dc64864306b554a2e150cb8e28e7a63f218d1 Mon Sep 17 00:00:00 2001 From: robert berry berrydigi...@gmail.com Date: Wed, 13 Nov 2013 15:29:23 -0800 Subject: [PATCH 1/1] Adds a convenience feature to the explain command which prints out GUC cost parameters in explain text output. For example: will include text output like the following: Cost Params: seq_page: 1.00, rnd_page: 4.00, cpu_tup: 0.01, cpu_ind: 0.005000, cpu_op: 0.002500 amenabled: 111 The bit vector is enable variables in the order listed in cost.h, though mainly provides a high level view on whether or not any strategies are disabled. --- src/backend/commands/explain.c | 18 ++ src/include/commands/explain.h | 1 + 2 files changed, 19 insertions(+) diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 4e93df2..8feeca8 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -31,6 +31,7 @@ #include utils/snapmgr.h #include utils/tuplesort.h #include utils/xml.h +#include optimizer/cost.h /* Hook for plugins to get control in ExplainOneQuery() */ @@ -140,6 +141,8 @@ ExplainQuery(ExplainStmt *stmt, const char *queryString, es.costs = defGetBoolean(opt); else if (strcmp(opt-defname, buffers) == 0) es.buffers = defGetBoolean(opt); + else if (strcmp(opt-defname, params) == 0) + es.params = defGetBoolean(opt); else if (strcmp(opt-defname, timing) == 0) { timing_set = true; @@ -516,6 +519,21 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, ExplainState *es, PopActiveSnapshot(); + /* include one line display of cost parameters */ + if (es-params) + { + if (es-format == EXPLAIN_FORMAT_TEXT) + { + appendStringInfo(es-str, \nCost Params:\n); + appendStringInfo(es-str, \tseq_page: %f, rnd_page: %f, cpu_tup: %f, cpu_ind: %f, cpu_op: %f, + seq_page_cost, random_page_cost, cpu_tuple_cost, cpu_index_tuple_cost, cpu_operator_cost, enable_seqscan); + appendStringInfo(es-str, \n\tamenabled: %d%d%d%d%d%d%d%d%d%d%d\n\n, + enable_seqscan, enable_indexscan, enable_indexonlyscan, enable_bitmapscan, + enable_tidscan, enable_sort, enable_hashagg, enable_nestloop, enable_material, + enable_mergejoin, enable_hashjoin); + } + } + /* We need a CCI just in case query expanded to multiple plans */ if (es-analyze) CommandCounterIncrement(); diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h index ca213d7..d76476c 100644 --- a/src/include/commands/explain.h +++ b/src/include/commands/explain.h @@ -33,6 +33,7 @@ typedef struct ExplainState bool costs; /* print costs */ bool buffers; /* print buffer usage */ bool timing; /* print timing */ + bool params; /* print optimizer cost params */ ExplainFormat format; /* output format */ /* other states */ PlannedStmt *pstmt; /* top of plan */ -- 1.8.3.2 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore patch
On 11/13/2013 01:37 AM, Andrew Dunstan wrote: On 11/12/2013 01:35 PM, Teodor Sigaev wrote: Hi! Attatched patch adds nesting feature, types (string, boll and numeric values), arrays and scalar to hstore type. All new features are described in PGConf.EU talk http://www.sai.msu.su/~megera/postgres/talks/hstore-dublin-2013.pdf (since PGCon some features was added). Patch includes: 1 implementaion SRF_RETURN_NEXT_NULL() 2 contrib/hstore changes 3 docs of new hstore module (many thanks to David E. Wheeler david.whee...@pgexperts.com) In current state patch is in WIP status, for short period I plan to move support of binary nested structure to core to share binary representation for hstore and json types. Thanks, Teodor. As soon as we have that shared binary representation available, I will be working on adapting it to JSON. As I remember from earlier discussions, current json has some artefacts that some people want to preserve and which are incompatible with hstore approach where you have actual object behind the serialisation. I remember strong voices in support of *not* normalising json, so that things like {a:1,a:true, a:b, a:none} would go through the system unaltered, for claimed standard usage of json as processing instructions. That is as source code which can possibly converted to JavaScript Object and not something that would come out of serialising of any existing JavaScript Object. I suggest we add another type, maybe jsobj, which has input and output as standard JSON but which is defined from the start to be equivalent of existing object and not preservable source code to such object. Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 11/14/2013 12:09 AM, Merlin Moncure wrote: On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus j...@agliodbs.com wrote: On 11/13/2013 06:45 AM, Merlin Moncure wrote: I'm not so sure we should require hstore to do things like build Also, json_object is pretty weird to me, I'm not sure I see the advantage of a new serialization format, and I don't agree with the statement but it is the caller's reponsibility to ensure that keys are not repeated.. This is pretty standard in the programming languages I know of which use JSON. I think the caller should have no such responsibility. Keys should be able to repeated. Apparently your experience with using JSON in practice has been fairly different from mine; the projects I work on, the JSON is being constantly converted back and forth to hashes and dictionaries, which means that ordering is not preserved and keys have to be unique (or become unique within one conversion cycle). I think, based on the language of the RFC and common practice, that it's completely valid for us to require unique keys within JSON-manipulation routines. Common practice? The internet is littered with complaints about documents being spontaneously re-ordered and or de-duplicated in various stacks. Other stacks provide mechanisms for explicit key order handling (see here: http://docs.python.org/2/library/json.html). Why do you think they did that? I use pg/JSON all over the place. In several cases I have to create documents with ordered keys because the parser on the other side wants them that way -- this is not a hypothetical argument. The current json serialization API handles that just fine and the hstore stuff coming down the pike will not. I guess we should not replace current JSON type with hstore based one, but add something json-like based on nested hstore instead. Maybe call it jsdoc or jdoc or jsobj or somesuch. For some time I was also pretty perplexed by by some PostgreSQL JSON type discussions where JSON was not really being a defined as the the type constructed from its string representation, and even not a string which results from serialising an existing javascript object, but rather a source code, which can be parsed into a structured type. So PostgreSQL json type is *not* a structured type like hstore is but is really a string type with a few syntax checks. Some of the json_* functions are then defined on top of this json-source type which treat this source as if it were actual structured type. It is kind of defining an int-notation type, which acts like an integer when added to another integer, but is required to also keep its original representation: select '1+1'::int-notation + 2; == 4 select '1+1'::int-notation == 1+1 I guess that's a done deal based on 'performance'. I'm clearly not the only one to have complained about this though. I am pretty sure we can not move to internal object representation and preserve the current 'json source behaviour. this is why I recommend not replacing json, but rather adding another built-in for real structured type. then you can keep using current json for the earlier-quoted uses of processing instructions and do real data manipulation on jsdoc/jsobj type. Also most of the current json functions should also be moved to work on jsobj instead with explicit cast from json to jsobj Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] First patch -- somewhat trivial feature
* Robert Berry (berrydigi...@gmail.com) wrote: This is my first attempt at writing a patch, so it's pretty simple. Neat! seq_page: 1.00, rnd_page: 4.00, cpu_tup: 0.01, cpu_ind: 0.005000, cpu_op: 0.002500 amenabled: 111 The bit vector is enable variables in the order listed in cost.h, though mainly provides a high level view on whether or not any strategies are disabled. While I like the general idea, I have to admit that I don't particularly like the format and I'm not sure why it makes sense to have this as part of 'explain'? Why not do a 'show all;' ahead of the explain? I recognize that this is kind of a frivolous feature which may not be worth any potential maintenance burden, so submitted for what it's worth as an initial effort. A bit more understanding about why you find it particularly useful (your specific use-case for it) would be helpful. Also, may I suggest that you look at http://commitfest.postgresql.org as you'll be asked to post any patches you wish to submit there for review and consideration. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] First patch -- somewhat trivial feature
Stephen Frost wrote * Robert Berry ( berrydigital@ ) wrote: This is my first attempt at writing a patch, so it's pretty simple. Neat! seq_page: 1.00, rnd_page: 4.00, cpu_tup: 0.01, cpu_ind: 0.005000, cpu_op: 0.002500 amenabled: 111 The bit vector is enable variables in the order listed in cost.h, though mainly provides a high level view on whether or not any strategies are disabled. While I like the general idea, I have to admit that I don't particularly like the format and I'm not sure why it makes sense to have this as part of 'explain'? Why not do a 'show all;' ahead of the explain? I kinda get the theory behind this but, WRT formatting, explain can output multiple formats and any patch affecting said output should provide for changing all of them. Having each of the sample outputs in the post would allow for comments from those who would not generally apply such patches. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/First-patch-somewhat-trivial-feature-tp5778245p5778250.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
MULE is completely evil. It has N different encodings for the same character, What's wrong with that? It aims that in the first place. not to mention no support code available. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
Tatsuo Ishii is...@postgresql.org writes: MULE is completely evil. It has N different encodings for the same character, What's wrong with that? It aims that in the first place. It greatly complicates comparisons --- at least, if you'd like to preserve the principle that strings that appear the same are equal. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] First patch -- somewhat trivial feature
Snowman -- Thanks for feedback. I imagined this feature could be useful in a couple of contexts, though mainly in the sense of documenting query optimization efforts. In one case you may be tweaking cost parameters and having a built in record of the parameters in the explain output can make that more reliable. In a support context, it can help communicate all the details behind the selection of a query plan, perhaps on IRC support. David J. -- I originally conceived the notion for the feature while off in the deep end thinking about automatic cost parameter selection. It seems plausible that the ideal theoretical costs would be influenced by server activity (io/cpu utilization) and various buffer states at a point in time. This kind of optimization work is a little beyond my understanding and capability so this was a first step in thinking about this topic. It turned into a trivial user text output feature, but implementing output formats that are more machine friendly still makes a lot of sense. Best Regards, -Robert On Wed, Nov 13, 2013 at 4:16 PM, David Johnston pol...@yahoo.com wrote: Stephen Frost wrote * Robert Berry ( berrydigital@ ) wrote: This is my first attempt at writing a patch, so it's pretty simple. Neat! seq_page: 1.00, rnd_page: 4.00, cpu_tup: 0.01, cpu_ind: 0.005000, cpu_op: 0.002500 amenabled: 111 The bit vector is enable variables in the order listed in cost.h, though mainly provides a high level view on whether or not any strategies are disabled. While I like the general idea, I have to admit that I don't particularly like the format and I'm not sure why it makes sense to have this as part of 'explain'? Why not do a 'show all;' ahead of the explain? I kinda get the theory behind this but, WRT formatting, explain can output multiple formats and any patch affecting said output should provide for changing all of them. Having each of the sample outputs in the post would allow for comments from those who would not generally apply such patches. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/First-patch-somewhat-trivial-feature-tp5778245p5778250.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] nested hstore patch
On Nov 13, 2013, at 3:59 PM, Hannu Krosing ha...@2ndquadrant.com wrote: I remember strong voices in support of *not* normalising json, so that things like {a:1,a:true, a:b, a:none} would go through the system unaltered, for claimed standard usage of json as processing instructions. That is as source code which can possibly converted to JavaScript Object and not something that would come out of serialising of any existing JavaScript Object. My recollection from PGCon was that there was consensus to normalize on the way in -- or at least, if we switched to a binary representation as proposed by Oleg Teodor, it was not worth the hassle to try to keep it. I suggest we add another type, maybe jsobj, which has input and output as standard JSON but which is defined from the start to be equivalent of existing object and not preservable source code to such object. -1 Let's try to keep this simple. See also VARCHAR and VARCHAR2 on Oracle. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Nov 13, 2013, at 2:41 PM, Gavin Flower gavinflo...@archidevsys.co.nz wrote: Would be possible to have a boolean, such as 'strict' - so that unique ordered was only imposed when strict was TRUE? Alternately, separate functions to allow the same choice? It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 11/13/2013 07:01 PM, Hannu Krosing wrote: I guess we should not replace current JSON type with hstore based one, but add something json-like based on nested hstore instead. Well, that's two voices for that course of action. Interesting that I don't think I heard a single voice for this either at pgCon or pgOpen, although I spent large amounts of time at both talking to people about Json, so I'd be interested to hear more voices. It would actually simplify things in a way if we do that - we've been working on a way of doing this that wouldn't upset pg_upgrade. This would render that effort unnecessary. However it will complicate things for users who will have to choose between the data types, and function authors who will possibly have to write versions of functions to work with both types. Also most of the current json functions should also be moved to work on jsobj instead with explicit cast from json to jsobj Sure, we can overload them - that's probably the least of our worries. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
Tatsuo Ishii is...@postgresql.org writes: MULE is completely evil. It has N different encodings for the same character, What's wrong with that? It aims that in the first place. It greatly complicates comparisons --- at least, if you'd like to preserve the principle that strings that appear the same are equal. You don't need to consider it because there's no place in PostgreSQL where a MULE encoded text consists of multiple encodings as far as I know. BTW, same characters are assigned different code points are pretty common in many character sets (Unicode, for example). -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On 11/13/2013 07:39 PM, David E. Wheeler wrote: On Nov 13, 2013, at 2:41 PM, Gavin Flower gavinflo...@archidevsys.co.nz wrote: Would be possible to have a boolean, such as 'strict' - so that unique ordered was only imposed when strict was TRUE? Alternately, separate functions to allow the same choice? It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO. I don't in the least understand how it could be a pretty printing option. If we move to a binary rep using the hstore stuff order will be destroyed and not stored anywhere, and duplicate keys will be lost. Once that's done, how would a pretty print option restore the lost info? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] additional json functionality
On Nov 13, 2013, at 4:45 PM, Andrew Dunstan and...@dunslane.net wrote: It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO. I don't in the least understand how it could be a pretty printing option. If we move to a binary rep using the hstore stuff order will be destroyed and not stored anywhere, and duplicate keys will be lost. Once that's done, how would a pretty print option restore the lost info? I meant ordering the keys, usually in lexicographic order. I agree that preserving order is untenable. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] pg_basebackup: progress report max once per second
Iy On Thu, Nov 14, 2013 at 3:51 AM, Mika Eloranta m...@ohmu.fi wrote: Prevent excessive progress reporting that can grow to gigabytes of output with large databases. It might be interesting to add this patch to the next commit fest where you could get a formal review: https://commitfest.postgresql.org/action/commitfest_view?id=20 Then just be sure to attach a patch file properly to your email such as people can grab and test the patch easily. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] commit fest 2013-11 starts this Friday
Reminder: Commit fest 2013-11, the third commit fest (out of four) in the PostgreSQL development cycle, will start this Friday, November 15. If you have a patch that you would like to see considered for inclusion into PostgreSQL 9.4, please register it in the commit fest application before Friday: https://commitfest.postgresql.org/action/commitfest_view?id=20 While you're there, submitting your patch, please also put your name down to review at least one other patch. If you don't, we might frown upon your patch. Note that the reverse is not true: Yo can put your name down to review a patch if you have not submitted one. We will not frown upon your review in that case. For first-time and forgetful reviewers, here are some instructions: https://wiki.postgresql.org/wiki/Reviewing_a_Patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UTF8 national character data type support WIP patch and list of open issues.
Tatsuo Ishii is...@postgresql.org writes: BTW, same characters are assigned different code points are pretty common in many character sets (Unicode, for example). This is widely considered a security bug; read section 10 in RFC 3629 (the definition of UTF8), and search the CVE database a bit if you still doubt it's a threat. I'm going to push back very hard on any suggestion that Postgres should build itself around a text representation with that kind of weakness designed in. regards, tom lane [1] http://tools.ietf.org/html/rfc3629#section-10 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Report exit code from external recovery commands properly
On Wed, Nov 13, 2013 at 3:42 AM, Peter Eisentraut pete...@gmx.net wrote: When an external recovery command such as restore_command or archive_cleanup_command fails, it just reports return code 34567 or something, but we have facilities to do decode this properly, so use them. I think this is a very good idea, but you should go a bit further: document the special relationship restore_command has to special return codes. Currently, the documentation says: It is important that the archive command return zero exit status if and only if it succeeded. Upon getting a zero result, PostgreSQL will assume that the WAL segment file has been successfully archived, and will remove or recycle it. However, a nonzero status tells PostgreSQL that the file was not archived; it will try again periodically until it succeeds. Yes, this concerns archive_command (where return code values that are non-zero *are* never distinguished), but nothing much is said about the return code of restore_command specifically anywhere else, so it's implied that it's exactly inverse to archive_command. In reality, some special return codes have a significance to restore_command: they make recovery abort, because they're taking as proxies for various failures that it isn't sensible to continue recovery in the event of. We're talking about the difference between recovery aborting, and recovery having conceptually reached the end of the WAL stream, so it's very surprising that this isn't documented currently. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Get more from indices.
On Tue, 2013-11-12 at 17:48 +0900, Kyotaro HORIGUCHI wrote: Hello, this is the revised patch. Since you're using git, please check your patch for trailing whitespace with git diff --check. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using indices for UNION.
On Wed, 2013-11-13 at 17:25 +0900, Kyotaro HORIGUCHI wrote: Added explicit cast there and rebased to current master. Checked no new warning by this patch. make check succeeded at both $(src) and $(src)/src/test. This patch also has whitespace errors detected by git diff --check. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] stats for network traffic WIP
On Fri, 2013-11-08 at 10:01 -0500, Nigel Heron wrote: here's v4 of the patch. I added documentation and a new global view called pg_stat_socket (includes bytes_sent, bytes_received and stats_reset time) Your patch needs to be rebased: CONFLICT (content): Merge conflict in src/test/regress/expected/rules.out -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Assertions in PL/PgSQL
On Wed, 2013-10-09 at 18:57 +0200, Pavel Stehule wrote: here is a patch for RAISE WHEN clause Your patch needs to be rebased. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] inherit support for foreign tables
Hi hackers, I'd like to propose adding inheritance support for foriegn tables. David Fetter mentioned this feature last July, but it seems stalled. http://www.postgresql.org/message-id/20130719005601.ga5...@fetter.org Supporting inheritance by foreign tables allows us to distribute query to remote servers by using foreign tables as partition table of a (perhaps ordinary) table. For this purpose, I think that constraint exclusion is necessary. As result of extending Devid's patch for PoC, and AFAIS we need these changes: 1) Add INHERITS(rel, ...) clause to CREATE/ALTER FOREIGN TABLE Apperantly we need to add new syntax to define parent table(s) of a foreign table. We have options about the position of INHERIT clause, but I'd prefer before SERVER clause because having options specific to foreign tables at the tail would be most extensible. a) CREATE FOREIGN TABLE child (...) INHERITS(p1, p2) SERVER server; b) CREATE FOREIGN TABLE child (...) SERVER server INHERITS(p1, p2); 2) Allow foreign tables to have CHECK constraints Like NOT NULL, I think we don't need to enforce the check duroing INSERT/UPDATE against foreign table. 3) Allow foreign table as a child node of Append Currently prepunion.c assumes that children of Append have RELKIND_RELATION as relkind always, so we need to set relkind of child RTE explicitly. Please see attached PoC patch. I'll enhance implementation, tests and document and submit the patch for the next CF. Regards, -- Shigeru HANADA diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 0b31f55..2f2dc88 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -465,10 +465,25 @@ DefineRelation(CreateStmt *stmt, char relkind, Oid ownerId) ereport(ERROR, (errcode(ERRCODE_INVALID_TABLE_DEFINITION), errmsg(ON COMMIT can only be used on temporary tables))); - if (stmt-constraints != NIL relkind == RELKIND_FOREIGN_TABLE) - ereport(ERROR, - (errcode(ERRCODE_WRONG_OBJECT_TYPE), -errmsg(constraints are not supported on foreign tables))); +/* + * Shouldn't this have been checked in parser? + */ + if (relkind == RELKIND_FOREIGN_TABLE) + { + ListCell *lc; + foreach(lc, stmt-constraints) + { + NewConstraint *nc = lfirst(lc); + + if (nc-contype != CONSTR_CHECK + nc-contype != CONSTR_DEFAULT + nc-contype != CONSTR_NULL + nc-contype != CONSTR_NOTNULL) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), +errmsg(only check constraints are supported on foreign tables))); + } + } /* * Look up the namespace in which we are supposed to create the relation, @@ -1463,10 +1478,11 @@ MergeAttributes(List *schema, List *supers, char relpersistence, */ relation = heap_openrv(parent, ShareUpdateExclusiveLock); - if (relation-rd_rel-relkind != RELKIND_RELATION) + if (relation-rd_rel-relkind != RELKIND_RELATION + relation-rd_rel-relkind != RELKIND_FOREIGN_TABLE) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), -errmsg(inherited relation \%s\ is not a table, +errmsg(inherited relation \%s\ is not a table or foreign table, parent-relname))); /* Permanent rels cannot inherit from temporary ones */ if (relpersistence != RELPERSISTENCE_TEMP @@ -3043,7 +3059,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd, pass = AT_PASS_ADD_INDEX; break; case AT_AddConstraint: /* ADD CONSTRAINT */ - ATSimplePermissions(rel, ATT_TABLE); + ATSimplePermissions(rel, ATT_TABLE | ATT_FOREIGN_TABLE); /* Recursion occurs during execution phase */ /* No command-specific prep needed except saving recurse flag */ if (recurse) @@ -3057,7 +3073,7 @@ ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd, pass = AT_PASS_ADD_CONSTR; break; case AT_DropConstraint: /* DROP CONSTRAINT */ - ATSimplePermissions(rel, ATT_TABLE); + ATSimplePermissions(rel, ATT_TABLE | ATT_FOREIGN_TABLE); /* Recursion
Re: [HACKERS] inherit support for foreign tables
Shigeru Hanada shigeru.han...@gmail.com writes: I'd like to propose adding inheritance support for foriegn tables. David Fetter mentioned this feature last July, but it seems stalled. http://www.postgresql.org/message-id/20130719005601.ga5...@fetter.org The discussion there pointed out that not enough consideration had been given to interactions with other commands. I'm not really satisfied with your analysis here. In particular: 2) Allow foreign tables to have CHECK constraints Like NOT NULL, I think we don't need to enforce the check duroing INSERT/UPDATE against foreign table. Really? It's one thing to say that somebody who adds a CHECK constraint to a foreign table is responsible to make sure that the foreign data will satisfy the constraint. It feels like a different thing to say that ALTER TABLE ADD CONSTRAINT applied to a parent table will silently assume that some child table that happens to be foreign doesn't need any enforcement. Perhaps more to the point, inheritance trees are the main place where the planner depends on the assumption that CHECK constraints represent reality. Are we really prepared to say that it's the user's fault if the planner generates an incorrect plan on the strength of a CHECK constraint that's not actually satisfied by the foreign data? If so, that had better be documented by this patch. But for a project that refuses to let people create a local CHECK or FOREIGN KEY constraint without mechanically checking it, it seems pretty darn weird to be so laissez-faire about constraints on foreign data. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] tcp_keepalives_idle
Hi, I am struggling to understand what following phrase says: tcp_keepalives_idle (integer) Specifies the number of seconds before sending a keepalive packet on an otherwise idle connection. What I am not sure is the meaning of otherwise. Can I safely replace it to: Specifies the number of seconds before sending a keepalive packet on an other side of idle connection. What I try to want to mean here is, other side is frontend side connection point. Or I am completely misunderstanding? -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese: http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers