======== Example: ======== Find different people using the same website at the same time:
create table session(sessionid text, username text, during tstzrange); SELECT s1.username, s2.username, s1.during * s2.during FROM session s1, session s2 WHERE s1.during && s2.during AND s1.username < s2.username ===================================== Brief summary of previous discussion: ===================================== http://www.postgresql.org/message-id/1334554850.10878.52.camel@jdavis - can indexes solve it already (parameterized paths, etc.)? - planner complexity (track path keys, mergejoinable equality, etc.) - spatial join algorithm choice - compare range merge join against existing indexes to see if any indexing approach is viable ========== Externals: ========== No new syntax or other externals. Just improved execution strategy for joining ranges using the overlaps operator (&&). New syntax is possible, but I don't see a strong reason to support special range join syntax at this time. ============= Optimizer Design ============= I took the path of least resistance, so if I am doing something wrong please let me know. I hardwired it to look for the range overlaps operator when checking if it could do a merge join, and then add range_ops to restrictinfo->mergeopfamilies. Then, I have to suppress adding it as an equivalence class, because overlaps is not equality. It still adds single-member ECs to restrictinfo->right_ec and left_ec, but (I think) that's OK. Also, I have to prevent generating paths for right/full join, because the range join algorithm can't (currently) handle those. ============= Costing ============= Needs more consideration. Seems to do reasonable things in the few examples I tried. ============= Executor Design ============= See detailed comments in nodeMergejoin.c ============= Performance ============= Seems much better than index nest loop join when the join is not selective. I will post detailed numbers soon. If no index is available, range merge join is the only reasonable way to execute a range join. For instance, if the inner is not a leaf in the plan. ============= Alternatives: ============= It was suggested that I approach it as a general spatial-join problem. That has merit, but I rejected it for now because the algorithm that I think has the most promise is range-oriented anyway. By that I mean we would need to extract all of the dimensions into ranges first, and then perform the join. With that in mind, it makes sense to implement range joins first; and then later provide the APIs to get at the spatial dimensions so that we can implement other spatial joins as range joins. Another suggestion was to base it on hash join, but I never understood that proposal and it didn't seem to map very well to a spatial join. Yet another suggestion was to use some kind of temporary index. Some brief experiments I did indicated that it would be fairly slow (though more investigation might be useful here). Also, it doesn't provide any alternative to the nestloop-with-inner-index we already offer at the leaf level today. Regards, Jeff Davis
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index a18ab43..1110c1e 100644 *** a/src/backend/commands/explain.c --- b/src/backend/commands/explain.c *************** *** 909,915 **** ExplainNode(PlanState *planstate, List *ancestors, pname = sname = "Nested Loop"; break; case T_MergeJoin: ! pname = "Merge"; /* "Join" gets added by jointype switch */ sname = "Merge Join"; break; case T_HashJoin: --- 909,919 ---- pname = sname = "Nested Loop"; break; case T_MergeJoin: ! if (((MergeJoin*)plan)->mergeRangeJoin) ! pname = "Range Merge"; /* "Join" gets added by jointype switch */ ! else ! pname = "Merge"; /* "Join" gets added by jointype switch */ ! sname = "Merge Join"; break; case T_HashJoin: diff --git a/src/backend/executor/nodindex 62784af..f3375c3 100644 *** a/src/backend/executor/nodeMergejoin.c --- b/src/backend/executor/nodeMergejoin.c *************** *** 89,102 **** --- 89,151 ---- * proceed to another state. This state is stored in the node's * execution state information and is preserved across calls to * ExecMergeJoin. -cim 10/31/89 + * + * RANGE MERGE JOIN + * + * Range Merge Join is a generalization of merge join. When a join + * predicate involves the range overlaps operator (&&), a merge join + * becomes a range join. + * + * The input ranges must be ordered by (lower-bound, upper-bound), which + * is the ordinary range_ops operator class. MJCompare must not simply + * use the operator class's comparison semantics though; instead it + * follows these rules: + * + * - return 0 if the outer range overlaps the inner range + * - return <0 if the outer range is strictly left-of the inner range + * - return >0 if the outer range is strictly right-of the inner range + * + * Another way to understand these rules is that 0 indicates a match, >0 + * indicates that a later inner tuple may match, and <0 means that no + * later inner tuple could possibly match. Importantly, >0 does *not* + * imply that no earlier inner tuple matches. + * + * Example: + * + * OUTER INNER + * ... [1,9] matches current outer + * [4,5] [2,3] no match + * ... [3,5] matches current outer + * ... [7,9] no match and no later matches for current outer + * + * Outer tuple [4,5] does not match [2,3], but it matches (overlaps with) + * the earlier tuple [1,9] and the later tuple [3,5]. But once we + * encounter [7,9], we know that no later inner tuple can possibly match + * the outer. This logic extends to multiple join conditions without + * additional complexity. + * + * When doing a range join, we lose two merge join optimizations: + * + * 1. Ordinary merge join only restores to the mark if it's equal to the + * new outer. For range join, we must always restore to the mark + * because there may be matches after the mark and before the current + * inner tuple. + * 2. After restoring to the mark, ordinary merge join immediately moves + * to JOINTUPLES. Range join must move to SKIP_TEST first. + * + * Additionally, range merge join is unable to implement RIGHT/FULL joins. */ #include "postgres.h" #include "access/nbtree.h" + #include "catalog/pg_operator.h" #include "executor/execdebug.h" #include "executor/nodeMergejoin.h" + #include "nodes/nodeFuncs.h" #include "utils/lsyscache.h" #include "utils/memutils.h" + #include "utils/rangetypes.h" + #include "utils/typcache.h" /* *************** *** 137,142 **** typedef struct MergeJoinClauseData --- 186,195 ---- * stored here. */ SortSupportData ssup; + + /* needed for Range Join */ + bool isrange; + TypeCacheEntry *range_typcache; } MergeJoinClauseData; /* Result type for MJEvalOuterValues and MJEvalInnerValues */ *************** *** 177,182 **** MJExamineQuals(List *mergeclauses, --- 230,236 ---- Oid *mergecollations, int *mergestrategies, bool *mergenullsfirst, + bool *israngejoin, PlanState *parent) { MergeJoinClause clauses; *************** *** 184,189 **** MJExamineQuals(List *mergeclauses, --- 238,245 ---- int iClause; ListCell *cl; + *israngejoin = false; + clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData)); iClause = 0; *************** *** 195,200 **** MJExamineQuals(List *mergeclauses, --- 251,257 ---- Oid collation = mergecollations[iClause]; StrategyNumber opstrategy = mergestrategies[iClause]; bool nulls_first = mergenullsfirst[iClause]; + Oid eq_opno; int op_strategy; Oid op_lefttype; Oid op_righttype; *************** *** 220,227 **** MJExamineQuals(List *mergeclauses, elog(ERROR, "unsupported mergejoin strategy %d", opstrategy); clause->ssup.ssup_nulls_first = nulls_first; /* Extract the operator's declared left/right datatypes */ ! get_op_opfamily_properties(qual->opno, opfamily, false, &op_strategy, &op_lefttype, &op_righttype); --- 277,308 ---- elog(ERROR, "unsupported mergejoin strategy %d", opstrategy); clause->ssup.ssup_nulls_first = nulls_first; + /* + * If a range join, the original operator must be "&&" rather than + * "=". Set eq_opno to the range equality operator for the purposes of + * setting up the merge clause, but mark it as a range. + */ + if (qual->opno == OID_RANGE_OVERLAP_OP) + { + Oid rngtypid = exprType((Node*)clause->lexpr->expr); + Assert(exprType((Node*)clause->lexpr->expr) == + exprType((Node*)clause->rexpr->expr)); + eq_opno = OID_RANGE_EQ_OP; + clause->isrange = true; + clause->range_typcache = lookup_type_cache(rngtypid, + TYPECACHE_RANGE_INFO); + *israngejoin = true; + } + else + { + eq_opno = qual->opno; + clause->isrange = false; + clause->range_typcache = NULL; + } + + /* Extract the operator's declared left/right datatypes */ ! get_op_opfamily_properties(eq_opno, opfamily, false, &op_strategy, &op_lefttype, &op_righttype); *************** *** 378,383 **** MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot) --- 459,491 ---- } /* + * Return 0 if ranges overlap, <0 if the first range is left-of the second, or + * >0 if the first range is right-of the second. See comment at the top of the + * file for explanation. + */ + static int + ApplyRangeMatch(Datum ldatum, bool lisnull, Datum rdatum, bool risnull, + SortSupport ssup, TypeCacheEntry *typcache) + { + Assert(!lisnull || !risnull); + + + if (lisnull) + return ssup->ssup_nulls_first ? -1 : 1; + if (risnull) + return ssup->ssup_nulls_first ? 1 : -1; + + if (range_before_internal(typcache, DatumGetRangeType(ldatum), + DatumGetRangeType(rdatum))) + return ssup->ssup_reverse ? 1 : -1; + else if (range_overlaps_internal(typcache, DatumGetRangeType(ldatum), + DatumGetRangeType(rdatum))) + return 0; + else + return ssup->ssup_reverse ? -1 : 1; + } + + /* * MJCompare * * Compare the mergejoinable values of the current two input tuples *************** *** 417,425 **** MJCompare(MergeJoinState *mergestate) continue; } ! result = ApplySortComparator(clause->ldatum, clause->lisnull, clause->rdatum, clause->risnull, ! &clause->ssup); if (result != 0) break; --- 525,538 ---- continue; } ! if (clause->isrange) ! result = ApplyRangeMatch(clause->ldatum, clause->lisnull, clause->rdatum, clause->risnull, ! &clause->ssup, clause->range_typcache); ! else ! result = ApplySortComparator(clause->ldatum, clause->lisnull, ! clause->rdatum, clause->risnull, ! &clause->ssup); if (result != 0) break; *************** *** 532,538 **** check_constant_qual(List *qual, bool *is_const_false) return true; } - /* ---------------------------------------------------------------- * ExecMergeTupleDump * --- 645,650 ---- diff --git a/src/backend/optimizer/path/joiindex de7044d..34b0d45 100644 *** a/src/backend/optimizer/path/joinpath.c --- b/src/backend/optimizer/path/joinpath.c *************** *** 440,445 **** try_mergejoin_path(PlannerInfo *root, --- 440,458 ---- Relids required_outer; JoinCostWorkspace workspace; + /* RIGHT/FULL joins don't support range join */ + if (jointype == JOIN_RIGHT || jointype == JOIN_FULL) + { + ListCell *lc; + + foreach(lc, mergeclauses) + { + RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc); + if (restrictinfo->rangejoin) + return; + } + } + if (is_partial) { try_partial_mergejoin_path(root, diff --git a/src/backend/optimizer/path/pathindex 2c26906..617e383 100644 *** a/src/backend/optimizer/path/pathkeys.c --- b/src/backend/optimizer/path/pathkeys.c *************** *** 922,928 **** initialize_mergeclause_eclasses(PlannerInfo *root, RestrictInfo *restrictinfo) Oid lefttype, righttype; - /* Should be a mergeclause ... */ Assert(restrictinfo->mergeopfamilies != NIL); /* ... with links not yet set */ Assert(restrictinfo->left_ec == NULL); --- 922,927 ---- diff --git a/src/backend/optimizer/plan/initindex ebd442a..86021c0 100644 *** a/src/backend/optimizer/plan/initsplan.c --- b/src/backend/optimizer/plan/initsplan.c *************** *** 14,19 **** --- 14,20 ---- */ #include "postgres.h" + #include "catalog/pg_operator.h" #include "catalog/pg_type.h" #include "nodes/nodeFuncs.h" #include "optimizer/clauses.h" *************** *** 1940,1946 **** distribute_qual_to_rels(PlannerInfo *root, Node *clause, */ if (restrictinfo->mergeopfamilies) { ! if (maybe_equivalence) { if (check_equivalence_delay(root, restrictinfo) && process_equivalence(root, restrictinfo, below_outer_join)) --- 1941,1947 ---- */ if (restrictinfo->mergeopfamilies) { ! if (maybe_equivalence && !restrictinfo->rangejoin) { if (check_equivalence_delay(root, restrictinfo) && process_equivalence(root, restrictinfo, below_outer_join)) *************** *** 2594,2599 **** check_mergejoinable(RestrictInfo *restrictinfo) --- 2595,2606 ---- opno = ((OpExpr *) clause)->opno; leftarg = linitial(((OpExpr *) clause)->args); + if (opno == OID_RANGE_OVERLAP_OP) + { + restrictinfo->rangejoin = true; + opno = OID_RANGE_EQ_OP; + } + if (op_mergejoinable(opno, exprType(leftarg)) && !contain_volatile_functions((Node *) clause)) restrictinfo->mergeopfamilies = get_mergejoin_opfamilies(opno); diff --git a/src/backend/optimizer/util/restrindex 6f79f96..623e284 100644 *** a/src/backend/optimizer/util/restrictinfo.c --- b/src/backend/optimizer/util/restrictinfo.c *************** *** 186,191 **** make_restrictinfo_internal(Expr *clause, --- 186,192 ---- restrictinfo->outer_selec = -1; restrictinfo->mergeopfamilies = NIL; + restrictinfo->rangejoin = false; restrictinfo->left_ec = NULL; restrictinfo->right_ec = NULL; diff --git a/src/backend/utils/adt/selfuncs.c b/index f5cffcb..1fda117 100644 *** a/src/backend/utils/adt/selfuncs.c --- b/src/backend/utils/adt/selfuncs.c *************** *** 2863,2869 **** mergejoinscansel(PlannerInfo *root, Node *clause, *right; VariableStatData leftvar, rightvar; - int op_strategy; Oid op_lefttype; Oid op_righttype; Oid opno, --- 2863,2868 ---- diff --git a/src/include/catalog/pg_opeindex fe8795a..1f1ffb4 100644 *** a/src/include/catalog/pg_operator.h --- b/src/include/catalog/pg_operator.h *************** *** 1748,1753 **** DESCR("greater than or equal"); --- 1748,1754 ---- /* generic range type operators */ DATA(insert OID = 3882 ( "=" PGNSP PGUID b t t 3831 3831 16 3882 3883 range_eq eqsel eqjoinsel )); DESCR("equal"); + #define OID_RANGE_EQ_OP 3882 DATA(insert OID = 3883 ( "<>" PGNSP PGUID b f f 3831 3831 16 3883 3882 range_ne neqsel neqjoinsel )); DESCR("not equal"); DATA(insert OID = 3884 ( "<" PGNSP PGUID b f f 3831 3831 16 3887 3886 range_lt rangesel scalarltjoinsel )); diff --git a/src/include/nodes/plannodesindex a2dd26f..4d768f1 100644 *** a/src/include/nodes/plannodes.h --- b/src/include/nodes/plannodes.h *************** *** 695,700 **** typedef struct MergeJoin --- 695,701 ---- Oid *mergeCollations; /* per-clause OIDs of collations */ int *mergeStrategies; /* per-clause ordering (ASC or DESC) */ bool *mergeNullsFirst; /* per-clause nulls ordering */ + bool mergeRangeJoin; /* is this a range merge join? */ } MergeJoin; /* ---------------- diff --git a/src/include/nodes/relatindex fc53eb1..e7b93a6 100644 *** a/src/include/nodes/relation.h --- b/src/include/nodes/relation.h *************** *** 1764,1769 **** typedef struct RestrictInfo --- 1764,1772 ---- /* valid if clause is mergejoinable, else NIL */ List *mergeopfamilies; /* opfamilies containing clause operator */ + /* is a rangejoin clause? */ + bool rangejoin; + /* cache space for mergeclause processing; NULL if not yet set */ EquivalenceClass *left_ec; /* EquivalenceClass containing lefthand */ EquivalenceClass *right_ec; /* EquivalenceClass containing righthand */ diff --git a/src/test/regress/expecnew file mode 100644 index 0000000..d406177 *** /dev/null --- b/src/test/regress/expected/rangejoin.out *************** *** 0 **** --- 1,94 ---- + create table rangejoin1_left(i1 int, ir1 int4range); + create table rangejoin1_right(i2 int, ir2 int4range); + insert into rangejoin1_left values + (1001, int4range(10, 80)), + (1002, int4range(20, 30)), + (1003, int4range(21, 25)), + (1004, int4range(22, 35)), + (1005, int4range(40, 60)), + (1006, int4range(50, 60)); + insert into rangejoin1_right values + (2001, int4range( 2, 4)), + (2002, int4range(10, 12)), + (2003, int4range(11, 14)), + (2004, int4range(20, 45)), + (2005, int4range(24, 28)), + (2006, int4range(85, 90)); + explain select i1, ir1, i2, ir2 + from rangejoin1_left, rangejoin1_right + where ir1 && ir2; + QUERY PLAN + --------------------------------------------------------------------------------- + Range Merge Join (cost=176.34..303.67 rows=8064 width=72) + Merge Cond: (rangejoin1_left.ir1 && rangejoin1_right.ir2) + -> Sort (cost=88.17..91.35 rows=1270 width=36) + Sort Key: rangejoin1_left.ir1 + -> Seq Scan on rangejoin1_left (cost=0.00..22.70 rows=1270 width=36) + -> Sort (cost=88.17..91.35 rows=1270 width=36) + Sort Key: rangejoin1_right.ir2 + -> Seq Scan on rangejoin1_right (cost=0.00..22.70 rows=1270 width=36) + (8 rows) + + select i1, ir1, i2, ir2 + from rangejoin1_left, rangejoin1_right + where ir1 && ir2; + i1 | ir1 | i2 | ir2 + ------+---------+------+--------- + 1001 | [10,80) | 2002 | [10,12) + 1001 | [10,80) | 2003 | [11,14) + 1001 | [10,80) | 2004 | [20,45) + 1001 | [10,80) | 2005 | [24,28) + 1002 | [20,30) | 2004 | [20,45) + 1002 | [20,30) | 2005 | [24,28) + 1003 | [21,25) | 2004 | [20,45) + 1003 | [21,25) | 2005 | [24,28) + 1004 | [22,35) | 2004 | [20,45) + 1004 | [22,35) | 2005 | [24,28) + 1005 | [40,60) | 2004 | [20,45) + (11 rows) + + explain (costs off) select i1, ir1, i2, ir2 + from rangejoin1_left left join rangejoin1_right + on (ir1 && ir2); + QUERY PLAN + ------------------------------------------------------------- + Range Merge Left Join + Merge Cond: (rangejoin1_left.ir1 && rangejoin1_right.ir2) + -> Sort + Sort Key: rangejoin1_left.ir1 + -> Seq Scan on rangejoin1_left + -> Sort + Sort Key: rangejoin1_right.ir2 + -> Seq Scan on rangejoin1_right + (8 rows) + + select i1, ir1, i2, ir2 + from rangejoin1_left left join rangejoin1_right + on (ir1 && ir2); + i1 | ir1 | i2 | ir2 + ------+---------+------+--------- + 1001 | [10,80) | 2002 | [10,12) + 1001 | [10,80) | 2003 | [11,14) + 1001 | [10,80) | 2004 | [20,45) + 1001 | [10,80) | 2005 | [24,28) + 1002 | [20,30) | 2004 | [20,45) + 1002 | [20,30) | 2005 | [24,28) + 1003 | [21,25) | 2004 | [20,45) + 1003 | [21,25) | 2005 | [24,28) + 1004 | [22,35) | 2004 | [20,45) + 1004 | [22,35) | 2005 | [24,28) + 1005 | [40,60) | 2004 | [20,45) + 1006 | [50,60) | | + (12 rows) + + -- FULL JOIN doesn't support range join + explain (costs off) select i1, ir1, i2, ir2 + from rangejoin1_left full join rangejoin1_right + on (ir1 && ir2); + ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions + select i1, ir1, i2, ir2 + from rangejoin1_left full join rangejoin1_right + on (ir1 && ir2); + ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions + drop table rangejoin1_left; + drop table rangejoin1_right; diff --git a/src/test/regress/parallel_schedulindex 9f95b01..8bb9c80 100644 *** a/src/test/regress/parallel_schedule --- b/src/test/regress/parallel_schedule *************** *** 109,115 **** test: select_views portals_p2 foreign_key cluster dependency guc bitmapops combo # NB: temp.sql does a reconnect which transiently uses 2 connections, # so keep this parallel group to at most 19 tests # ---------- ! test: plancache limit plpgsql copy2 temp domain rangefuncs prepare without_oid conversion truncate alter_table sequence polymorphism rowtypes returning largeobject with xml # event triggers cannot run concurrently with any test that runs DDL test: event_trigger --- 109,115 ---- # NB: temp.sql does a reconnect which transiently uses 2 connections, # so keep this parallel group to at most 19 tests # ---------- ! test: plancache limit plpgsql copy2 temp domain rangefuncs rangejoin prepare without_oid conversion truncate alter_table sequence polymorphism rowtypes returning largeobject with xml # event triggers cannot run concurrently with any test that runs DDL test: event_trigger diff --git a/src/test/regress/serial_scheindex e026b7c..3b80f75 100644 *** a/src/test/regress/serial_schedule --- b/src/test/regress/serial_schedule *************** *** 164,169 **** test: copy2 --- 164,170 ---- test: temp test: domain test: rangefuncs + test: rangejoin test: prepare test: without_oid test: conversion
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers