On Sun, Jul 18, 2010 at 02:40, Peter Eisentraut <[email protected]> wrote:
> On lör, 2010-07-17 at 11:13 -0600, Alex Hunsaker wrote:
>> So I would expect the more indexes you
>> had or group by items to slow it down. Not so much the number of
>> columns. Right?
>
> At the outer level (which is not visible in this patch) it loops over
> all columns in the select list, and then it looks up the indexes each
> time. So the concern was that if the select list was * with a wide
> table, looking up the indexes each time would be slow.
Thanks for the explanation.
>> Anyhow it sounds like I should try it on top of the other patch and
>> see if it works. I assume it might still need some fixups to work
>> with that other patch? Or do you expect it to just work?
>
> There is some work necessary to integrate the two.
I just read that patch is getting pushed till at least the next commit
fest: http://archives.postgresql.org/pgsql-hackers/2010-07/msg01219.php
Should we push this patch back to? Alternatively we could make it
work with just primary keys until the other patch gets in. I think
that makes sense, find that attached. Thoughts?
Note I axed the index not null attribute checking, I have not thought
to deeply about it other than if its a primary key it cant have non
null attributes.... Right? I may have missed something subtle hence
the heads up.
*** a/doc/src/sgml/queries.sgml
--- b/doc/src/sgml/queries.sgml
***************
*** 886,895 **** SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
In this example, the columns <literal>product_id</literal>,
<literal>p.name</literal>, and <literal>p.price</literal> must be
in the <literal>GROUP BY</> clause since they are referenced in
! the query select list. (Depending on how the products
! table is set up, name and price might be fully dependent on the
! product ID, so the additional groupings could theoretically be
! unnecessary, though this is not implemented.) The column
<literal>s.units</> does not have to be in the <literal>GROUP
BY</> list since it is only used in an aggregate expression
(<literal>sum(...)</literal>), which represents the sales
--- 886,892 ----
In this example, the columns <literal>product_id</literal>,
<literal>p.name</literal>, and <literal>p.price</literal> must be
in the <literal>GROUP BY</> clause since they are referenced in
! the query select list (but see below). The column
<literal>s.units</> does not have to be in the <literal>GROUP
BY</> list since it is only used in an aggregate expression
(<literal>sum(...)</literal>), which represents the sales
***************
*** 898,903 **** SELECT product_id, p.name, (sum(s.units) * p.price) AS sales
--- 895,912 ----
</para>
<para>
+ If the products table is set up so that,
+ say, <literal>product_id</literal> is the primary key or a
+ not-null unique constraint, then it would be enough to group
+ by <literal>product_id</literal> in the above example, since name
+ and price would be <firstterm>functionally
+ dependent</firstterm><indexterm><primary>functional
+ dependency</primary></indexterm> on the product ID, and so there
+ would be no ambiguity about which name and price value to return
+ for each product ID group.
+ </para>
+
+ <para>
In strict SQL, <literal>GROUP BY</> can only group by columns of
the source table but <productname>PostgreSQL</productname> extends
this to also allow <literal>GROUP BY</> to group by columns in the
*** a/doc/src/sgml/ref/select.sgml
--- b/doc/src/sgml/ref/select.sgml
***************
*** 520,527 **** GROUP BY <replaceable class="parameter">expression</replaceable> [, ...]
produces a single value computed across all the selected rows).
When <literal>GROUP BY</literal> is present, it is not valid for
the <command>SELECT</command> list expressions to refer to
! ungrouped columns except within aggregate functions, since there
! would be more than one possible value to return for an ungrouped
column.
</para>
</refsect2>
--- 520,531 ----
produces a single value computed across all the selected rows).
When <literal>GROUP BY</literal> is present, it is not valid for
the <command>SELECT</command> list expressions to refer to
! ungrouped columns except within aggregate functions or if the
! ungrouped column is functionally dependent on the grouped columns,
! since there would otherwise be more than one possible value to
! return for an ungrouped column. A functional dependency exists if
! the grouped columns (or a subset thereof) are the primary key or a
! not-null unique constraint of the table containing the ungrouped
column.
</para>
</refsect2>
*** a/src/backend/commands/view.c
--- b/src/backend/commands/view.c
***************
*** 16,21 ****
--- 16,22 ----
#include "access/heapam.h"
#include "access/xact.h"
+ #include "catalog/dependency.h"
#include "catalog/namespace.h"
#include "commands/defrem.h"
#include "commands/tablecmds.h"
***************
*** 474,479 **** DefineView(ViewStmt *stmt, const char *queryString)
--- 475,501 ----
*/
CommandCounterIncrement();
+ if (list_length(viewParse->dependencies) > 0)
+ {
+ ObjectAddress myself, referenced;
+ ListCell *lc;
+
+ myself.classId = RelationRelationId;
+ myself.objectId = viewOid;
+ myself.objectSubId = 0;
+
+ foreach(lc, viewParse->dependencies)
+ {
+ Oid index_relid = lfirst_oid(lc);
+
+ referenced.classId = RelationRelationId;
+ referenced.objectId = index_relid;
+ referenced.objectSubId = 0;
+
+ recordDependencyOn(&myself, &referenced, DEPENDENCY_NORMAL);
+ }
+ }
+
/*
* The range table of 'viewParse' does not contain entries for the "OLD"
* and "NEW" relations. So... add them!
*** a/src/backend/nodes/copyfuncs.c
--- b/src/backend/nodes/copyfuncs.c
***************
*** 2272,2277 **** _copyQuery(Query *from)
--- 2272,2278 ----
COPY_NODE_FIELD(limitCount);
COPY_NODE_FIELD(rowMarks);
COPY_NODE_FIELD(setOperations);
+ COPY_NODE_FIELD(dependencies);
return newnode;
}
*** a/src/backend/nodes/equalfuncs.c
--- b/src/backend/nodes/equalfuncs.c
***************
*** 877,882 **** _equalQuery(Query *a, Query *b)
--- 877,883 ----
COMPARE_NODE_FIELD(limitCount);
COMPARE_NODE_FIELD(rowMarks);
COMPARE_NODE_FIELD(setOperations);
+ COMPARE_NODE_FIELD(dependencies);
return true;
}
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
***************
*** 2019,2024 **** _outQuery(StringInfo str, Query *node)
--- 2019,2025 ----
WRITE_NODE_FIELD(limitCount);
WRITE_NODE_FIELD(rowMarks);
WRITE_NODE_FIELD(setOperations);
+ WRITE_NODE_FIELD(dependencies);
}
static void
*** a/src/backend/nodes/readfuncs.c
--- b/src/backend/nodes/readfuncs.c
***************
*** 218,223 **** _readQuery(void)
--- 218,224 ----
READ_NODE_FIELD(limitCount);
READ_NODE_FIELD(rowMarks);
READ_NODE_FIELD(setOperations);
+ READ_NODE_FIELD(dependencies);
READ_DONE();
}
*** a/src/backend/parser/parse_agg.c
--- b/src/backend/parser/parse_agg.c
***************
*** 14,19 ****
--- 14,21 ----
*/
#include "postgres.h"
+ #include "access/heapam.h"
+ #include "catalog/pg_index.h"
#include "nodes/makefuncs.h"
#include "nodes/nodeFuncs.h"
#include "optimizer/tlist.h"
***************
*** 24,43 ****
#include "rewrite/rewriteManip.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
typedef struct
{
ParseState *pstate;
List *groupClauses;
bool have_non_var_grouping;
int sublevels_up;
} check_ungrouped_columns_context;
! static void check_ungrouped_columns(Node *node, ParseState *pstate,
List *groupClauses, bool have_non_var_grouping);
static bool check_ungrouped_columns_walker(Node *node,
check_ungrouped_columns_context *context);
/*
--- 26,48 ----
#include "rewrite/rewriteManip.h"
#include "utils/builtins.h"
#include "utils/lsyscache.h"
+ #include "utils/syscache.h"
typedef struct
{
ParseState *pstate;
+ Query *qry;
List *groupClauses;
bool have_non_var_grouping;
int sublevels_up;
} check_ungrouped_columns_context;
! static void check_ungrouped_columns(Node *node, ParseState *pstate, Query *qry,
List *groupClauses, bool have_non_var_grouping);
static bool check_ungrouped_columns_walker(Node *node,
check_ungrouped_columns_context *context);
+ static bool funcdeps_check_pk(List *group_clauses, Oid relid, Index rteno, Oid *index_relid);
/*
***************
*** 408,420 **** parseCheckAggregates(ParseState *pstate, Query *qry)
clause = (Node *) qry->targetList;
if (hasJoinRTEs)
clause = flatten_join_alias_vars(root, clause);
! check_ungrouped_columns(clause, pstate,
groupClauses, have_non_var_grouping);
clause = (Node *) qry->havingQual;
if (hasJoinRTEs)
clause = flatten_join_alias_vars(root, clause);
! check_ungrouped_columns(clause, pstate,
groupClauses, have_non_var_grouping);
/*
--- 413,425 ----
clause = (Node *) qry->targetList;
if (hasJoinRTEs)
clause = flatten_join_alias_vars(root, clause);
! check_ungrouped_columns(clause, pstate, qry,
groupClauses, have_non_var_grouping);
clause = (Node *) qry->havingQual;
if (hasJoinRTEs)
clause = flatten_join_alias_vars(root, clause);
! check_ungrouped_columns(clause, pstate, qry,
groupClauses, have_non_var_grouping);
/*
***************
*** 535,546 **** parseCheckWindowFuncs(ParseState *pstate, Query *qry)
* way more pain than the feature seems worth.
*/
static void
! check_ungrouped_columns(Node *node, ParseState *pstate,
List *groupClauses, bool have_non_var_grouping)
{
check_ungrouped_columns_context context;
context.pstate = pstate;
context.groupClauses = groupClauses;
context.have_non_var_grouping = have_non_var_grouping;
context.sublevels_up = 0;
--- 540,552 ----
* way more pain than the feature seems worth.
*/
static void
! check_ungrouped_columns(Node *node, ParseState *pstate, Query *qry,
List *groupClauses, bool have_non_var_grouping)
{
check_ungrouped_columns_context context;
context.pstate = pstate;
+ context.qry = qry;
context.groupClauses = groupClauses;
context.have_non_var_grouping = have_non_var_grouping;
context.sublevels_up = 0;
***************
*** 617,622 **** check_ungrouped_columns_walker(Node *node,
--- 623,641 ----
gvar->varlevelsup == 0)
return false; /* acceptable, we're okay */
}
+
+ /* Check whether primary key of var's table is subset of group clauses. */
+ rte = rt_fetch(var->varno, context->pstate->p_rtable);
+ if (rte->rtekind == RTE_RELATION)
+ {
+ Oid index_relid;
+
+ if (funcdeps_check_pk(context->groupClauses, rte->relid, var->varno, &index_relid))
+ {
+ context->qry->dependencies = lappend_oid(context->qry->dependencies, index_relid);
+ return false;
+ }
+ }
}
/* Found an ungrouped local variable; generate error message */
***************
*** 656,661 **** check_ungrouped_columns_walker(Node *node,
--- 675,755 ----
}
/*
+ * Check whether the attributes of the primary key relid with range table index
+ * rteno appear as a subset of the group_clauses. (If so, a functional
+ * dependency exists between the group clauses and any attribute of the
+ * relation, and so attributes of the relation can appear ungrouped.)
+ */
+ static bool
+ funcdeps_check_pk(List *group_clauses, Oid relid, Index rteno, Oid *index_relid)
+ {
+ Relation rel;
+ ListCell *indexoidcell;
+
+ rel = heap_open(relid, AccessShareLock);
+
+ foreach(indexoidcell, RelationGetIndexList(rel))
+ {
+ Oid indexoid = lfirst_oid(indexoidcell);
+ HeapTuple indexTuple;
+ Form_pg_index indexStruct;
+ int i;
+ bool found_col;
+ bool found_all_cols;
+
+ indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexoid));
+ if (!HeapTupleIsValid(indexTuple))
+ elog(ERROR, "cache lookup failed for index %u", indexoid);
+ indexStruct = (Form_pg_index) GETSTRUCT(indexTuple);
+
+ if (!indexStruct->indisprimary || !indexStruct->indimmediate)
+ continue;
+
+ /*
+ * Check that the group columns are a superset of the
+ * primary key columns.
+ */
+ for (i = 0; i < indexStruct->indnatts; i++)
+ {
+ int2 attnum;
+ ListCell *gl;
+
+ attnum = indexStruct->indkey.values[i];
+ found_col = false;
+
+ foreach(gl, group_clauses)
+ {
+ Var *gvar = (Var *) lfirst(gl);
+
+ if (IsA(gvar, Var) &&
+ gvar->varno == rteno &&
+ gvar->varattno == attnum &&
+ gvar->varlevelsup == 0)
+ {
+ found_col = true;
+ break;
+ }
+ }
+ if (!found_col)
+ break;
+ }
+ found_all_cols = (i == indexStruct->indnatts && found_col);
+
+ ReleaseSysCache(indexTuple);
+ if (found_all_cols)
+ {
+ heap_close(rel, NoLock);
+ *index_relid = indexoid;
+ return true;
+ }
+ }
+
+ heap_close(rel, NoLock);
+
+ return false;
+ }
+
+ /*
* Create expression trees for the transition and final functions
* of an aggregate. These are needed so that polymorphic functions
* can be used within an aggregate --- without the expression trees,
*** a/src/include/nodes/parsenodes.h
--- b/src/include/nodes/parsenodes.h
***************
*** 146,151 **** typedef struct Query
--- 146,154 ----
Node *setOperations; /* set-operation tree if this is top level of
* a UNION/INTERSECT/EXCEPT query */
+ List *dependencies; /* list of index OIDs that are
+ * required for the query to be
+ * valid */
} Query;
*** a/src/test/regress/parallel_schedule
--- b/src/test/regress/parallel_schedule
***************
*** 84,90 **** test: rules
# ----------
# Another group of parallel tests
# ----------
! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap
# ----------
# Another group of parallel tests
--- 84,90 ----
# ----------
# Another group of parallel tests
# ----------
! test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combocid tsearch tsdicts foreign_data window xmlmap functional_deps
# ----------
# Another group of parallel tests
*** a/src/test/regress/serial_schedule
--- b/src/test/regress/serial_schedule
***************
*** 76,81 **** test: union
--- 76,82 ----
test: case
test: join
test: aggregates
+ test: functional_deps
test: transactions
ignore: random
test: random
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers