On 10/06/2014 12:36 PM, Emre Hasegeli wrote:
Thanks. The main question now is design of this patch. Currently, it does
all the work inside access method. We already have some discussion of pro
and cons of this method. I would like to clarify alternatives now. I can
see following way:
1. Implement new executor node which performs sorting by priority queue.
Let's call it "Priority queue". I think it should be separate node from
"Sort" node. Despite "Priority queue" and "Sort" are essentially similar
from user view, they would be completely different in implementation.
2. Implement some interface to transfer distance values from access
method to "Priority queue" node.
If we assume that all of them need recheck, maybe it can be done
without passing distance values.
No, the executor needs the lower-bound distance value, as calculated by
the indexam, so that it knows which tuples it can return from the queue
already. For example, imagine the following items coming from the index:
tuple # lower bound actual distance
1 1 1
2 2 10
3 30 30
4 40 40
After the executor has fetched tuple 2, and re-checked the distance, it
pushes the tuple to the queue. It then fetches tuple 3, with lower bound
30, and it can now immediately return tuple # 2 from the queue. Because
10 < 30, so there cannot be any more tuples coming from the index that
would need to go before tuple # 2.
The executor needs the lower bound as calculated by the index, as well
as the actual distance it calculates itself, to make those decisions.
3. Somehow tell the planner that it could use "Priority queue" in
corresponding cases. I see two ways of doing this:
- Add flag to operator in opclass indicating that index can only
order by lower bound of "col op value", not by "col op value" itself.
- Define new relation between operators. Value of one operator could
be lower bound for value of another operator. So, planner can
put "Priority
queue" node when lower bound ordering is possible from index. Also "ALTER
OPERATOR" command would be reasonable, so extensions could upgrade.
I think, it would be better to make it a property of the operator
class. We can add a column to pg_amop or define another value for
amoppurpose on pg_amop. Syntax can be something like this:
CREATE OPERATOR CLASS circle_ops DEFAULT
FOR TYPE circle USING gist AS
OPERATOR 15 <->(circle, point) FOR ORDER BY pg_catalog.float_ops LOWER
BOUND;
While looking at it, I realize that current version of the patch does
not use the sort operator family defined with the operator class. It
assumes that the distance function will return values compatible with
the operator. Operator class definition makes me think that there is
not such an assumption.
Yeah. I also noticed that the type of the argument passed to the
consistent function varies, and doesn't necessarily match that declared
in pg_proc. Looking at gist_point_consistent, the argument type can be a
point, a polygon, or a circle, depending on the "strategy group". But
it's declared as a point in pg_proc.
Besides overhead, this way makes significant infrastructural changes. So,
it may be over-engineering. However, it's probably more clean and beautiful
solution.
I would like to get some feedback from people familiar with KNN-GiST like
Heikki or Tom. What do you think about this? Any other ideas?
I would be happy to test and review the changes. I think it is nice
to solve the problem in a generalized way improving the access method
infrastructure. Definitely, we should have a consensus on the design
before working on the infrastructure changes.
I took a stab on this. I added the reorder queue directly to the Index
Scan node, rather than adding a whole new node type for it. It seems
reasonable, as Index Scan is responsible for rechecking the quals, too,
even though re-ordering the tuples is more complicated than rechecking
quals.
To recap, the idea is that the index can define an ordering op, even if
it cannot return the tuples in exactly the right order. It is enough
that for each tuple, it returns a lower bound of the expression that is
used for sorting. For example, for "ORDER BY key <-> column", it is
enough that it returns a lower bound of "key <-> column" for each tuple.
The index must return the tuples ordered by the lower bounds. The
executor re-checks the expressions, and re-orders the tuples to the
correct order.
Patch attached. It should be applied on top of my pairing heap patch at
http://www.postgresql.org/message-id/548ffa2c.7060...@vmware.com. Some
caveats:
* The signature of the distance function is unchanged, it doesn't get a
recheck argument. It is just assumed that if the consistent function
sets the recheck flag, then the distance needs to be rechecked as well.
We might want to add the recheck argument, like you Alexander did in
your patch, but it's not important right now.
* I used the "distance" term in the executor, although the ORDER BY expr
machinery is more general than that. The value returned by the ORDER BY
expression doesn't have to be a distance, although that's the only thing
supported by GiST and the built-in opclasses.
* I short-circuited the planner to assume that the ORDER BY expression
always returns a float. That's true today for knn-GiST, but is obviously
a bogus assumption in general.
This needs some work to get into a committable state, but from a
modularity point of view, this is much better than having the indexam to
peek into the heap.
- Heikki
diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml
index 5de282b..0430738 100644
--- a/doc/src/sgml/gist.sgml
+++ b/doc/src/sgml/gist.sgml
@@ -105,6 +105,7 @@
<literal>~=</>
</entry>
<entry>
+ <literal><-></>
</entry>
</row>
<row>
@@ -163,6 +164,7 @@
<literal>~=</>
</entry>
<entry>
+ <literal><-></>
</entry>
</row>
<row>
@@ -207,6 +209,12 @@
</table>
<para>
+ Currently, ordering by the distance operator <literal><-></>
+ is supported only with <literal>point</> by the operator classes
+ of the geometric types.
+ </para>
+
+ <para>
For historical reasons, the <literal>inet_ops</> operator class is
not the default class for types <type>inet</> and <type>cidr</>.
To use it, mention the class name in <command>CREATE INDEX</>,
@@ -798,13 +806,22 @@ my_distance(PG_FUNCTION_ARGS)
The arguments to the <function>distance</> function are identical to
the arguments of the <function>consistent</> function, except that no
- recheck flag is used. The distance to a leaf index entry must always
- be determined exactly, since there is no way to re-order the tuples
- once they are returned. Some approximation is allowed when determining
- the distance to an internal tree node, so long as the result is never
- greater than any child's actual distance. Thus, for example, distance
- to a bounding box is usually sufficient in geometric applications. The
- result value can be any finite <type>float8</> value. (Infinity and
+ recheck flag is used.
+ </para>
+
+ <para>
+ Some approximation is allowed when determining the distance to an
+ internal tree node, so long as the result is never greater than any
+ child's actual distance. Thus, for example, distance
+ to a bounding box is usually sufficient in geometric applications. For
+ leaf nodes, the returned distance must be accurate, if the consistent
+ function returns *recheck == false for the tuple. Otherwise the same
+ approximation is allowed, and the executor will re-order ambiguous cases
+ after recalculating the actual distance.
+ </para>
+
+ <para>
+ The result value can be any finite <type>float8</> value. (Infinity and
minus infinity are used internally to handle cases such as nulls, so it
is not recommended that <function>distance</> functions return these
values.)
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index e5eb6f6..8569928 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -192,9 +192,8 @@ gistindex_keytest(IndexScanDesc scan,
* always be zero, but might as well pass it for possible future
* use.)
*
- * Note that Distance functions don't get a recheck argument. We
- * can't tolerate lossy distance calculations on leaf tuples;
- * there is no opportunity to re-sort the tuples afterwards.
+ * Note that Distance functions don't get a recheck argument.
+ * Distance is rechecked whenever the quals are.
*/
dist = FunctionCall4Coll(&key->sk_func,
key->sk_collation,
@@ -411,6 +410,7 @@ getNextNearest(IndexScanDesc scan)
{
GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
bool res = false;
+ int i;
do
{
@@ -423,7 +423,11 @@ getNextNearest(IndexScanDesc scan)
{
/* found a heap item at currently minimal distance */
scan->xs_ctup.t_self = item->data.heap.heapPtr;
- scan->xs_recheck = item->data.heap.recheck;
+ for (i = 0; i < scan->numberOfOrderBys; i++)
+ {
+ scan->xs_distances[i] = Float8GetDatum(item->distances[i]);
+ scan->xs_distance_nulls[i] = false;
+ }
res = true;
}
else
diff --git a/src/backend/access/gist/gistproc.c b/src/backend/access/gist/gistproc.c
index db0bec6..22ab256 100644
--- a/src/backend/access/gist/gistproc.c
+++ b/src/backend/access/gist/gistproc.c
@@ -1459,3 +1459,36 @@ gist_point_distance(PG_FUNCTION_ARGS)
PG_RETURN_FLOAT8(distance);
}
+
+/*
+ * The inexact GiST distance method for geometric types that store bounding
+ * boxes.
+ *
+ * Compute lossy distance from point to index entries. The result is inexact
+ * because index entries are bounding boxes, not the exact shapes of the
+ * indexed geometric types. We use distance from point to MBR of index entry.
+ * This is correct lower bound estimate of distance from point to indexed
+ * geometric type.
+ */
+Datum
+gist_bbox_distance(PG_FUNCTION_ARGS)
+{
+ GISTENTRY *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
+ StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
+ double distance;
+ StrategyNumber strategyGroup = strategy / GeoStrategyNumberOffset;
+
+ switch (strategyGroup)
+ {
+ case PointStrategyNumberGroup:
+ distance = computeDistance(false,
+ DatumGetBoxP(entry->key),
+ PG_GETARG_POINT_P(1));
+ break;
+ default:
+ elog(ERROR, "unknown strategy number: %d", strategy);
+ distance = 0.0; /* keep compiler quiet */
+ }
+
+ PG_RETURN_FLOAT8(distance);
+}
diff --git a/src/backend/access/gist/gistscan.c b/src/backend/access/gist/gistscan.c
index eff02c4..1837b78 100644
--- a/src/backend/access/gist/gistscan.c
+++ b/src/backend/access/gist/gistscan.c
@@ -85,6 +85,11 @@ gistbeginscan(PG_FUNCTION_ARGS)
/* workspaces with size dependent on numberOfOrderBys: */
so->distances = palloc(sizeof(double) * scan->numberOfOrderBys);
so->qual_ok = true; /* in case there are zero keys */
+ if (scan->numberOfOrderBys > 0)
+ {
+ scan->xs_distances = palloc(sizeof(Datum) * scan->numberOfOrderBys);
+ scan->xs_distance_nulls = palloc(sizeof(bool) * scan->numberOfOrderBys);
+ }
scan->opaque = so;
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 2b89dc6..9279454 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -28,14 +28,86 @@
#include "access/relscan.h"
#include "executor/execdebug.h"
#include "executor/nodeIndexscan.h"
+#include "lib/pairingheap.h"
#include "optimizer/clauses.h"
#include "utils/array.h"
+#include "utils/datum.h"
#include "utils/lsyscache.h"
#include "utils/memutils.h"
#include "utils/rel.h"
+/*
+ * When an ordering operator is used, tuples fetched from the index that
+ * need to be reordered are queued in a pairing heap, as ReorderTuples.
+ */
+typedef struct
+{
+ pairingheap_node ph_node;
+ HeapTuple htup;
+ Datum *distances;
+ bool *distance_nulls;
+} ReorderTuple;
+
+static int
+cmp_distances(const Datum *adist, const bool *anulls,
+ const Datum *bdist, const bool *bnulls,
+ IndexScanState *node)
+{
+ int i;
+ int result;
+
+ for (i = 0; i < node->iss_NumOrderByKeys; i++)
+ {
+ SortSupport ssup = &node->iss_SortSupport[i];
+
+ /* Handle nulls. We only support NULLS LAST */
+ if (anulls[i] && !bnulls[i])
+ return 1;
+ else if (!anulls[i] && bnulls[i])
+ return -1;
+ else if (anulls[i] && bnulls[i])
+ return 0;
+
+ result = ssup->comparator(adist[i], bdist[i], ssup);
+ if (result != 0)
+ return result;
+ }
+
+ return 0;
+}
+
+static int
+reorderbuffer_cmp(const pairingheap_node *a, const pairingheap_node *b, void *arg)
+{
+ ReorderTuple *rta = (ReorderTuple *) a;
+ ReorderTuple *rtb = (ReorderTuple *) b;
+ IndexScanState *node = (IndexScanState *) arg;
+
+ return -cmp_distances(rta->distances, rta->distance_nulls,
+ rtb->distances, rtb->distance_nulls,
+ node);
+}
+
+static void
+copyDistances(IndexScanState *node, const Datum *src_datums, const bool *src_nulls,
+ Datum *dst_datums, bool *dst_nulls)
+{
+ int i;
+
+ for (i = 0; i < node->iss_NumOrderByKeys; i++)
+ {
+ if (!src_nulls[i])
+ dst_datums[i] = datumCopy(src_datums[i],
+ node->iss_DistanceTypByVals[i],
+ node->iss_DistanceTypLens[i]);
+ else
+ dst_datums[i] = (Datum) 0;
+ dst_nulls[i] = src_nulls[i];
+ }
+}
static TupleTableSlot *IndexNext(IndexScanState *node);
+static void RecheckOrderBys(IndexScanState *node, TupleTableSlot *slot);
/* ----------------------------------------------------------------
@@ -54,6 +126,8 @@ IndexNext(IndexScanState *node)
IndexScanDesc scandesc;
HeapTuple tuple;
TupleTableSlot *slot;
+ MemoryContext oldContext;
+ ReorderTuple *reordertuple;
/*
* extract necessary information from index scan node
@@ -72,11 +146,60 @@ IndexNext(IndexScanState *node)
econtext = node->ss.ps.ps_ExprContext;
slot = node->ss.ss_ScanTupleSlot;
- /*
- * ok, now that we have what we need, fetch the next tuple.
- */
- while ((tuple = index_getnext(scandesc, direction)) != NULL)
+ for (;;)
{
+ /* Check the reorder queue first */
+ if (node->iss_ReorderQueue)
+ {
+ if (pairingheap_is_empty(node->iss_ReorderQueue))
+ {
+ if (node->iss_ReachedEnd)
+ break;
+ }
+ else
+ {
+ reordertuple = (ReorderTuple *) pairingheap_first(node->iss_ReorderQueue);
+
+ /* Check if we can return this tuple */
+ if (node->iss_ReachedEnd ||
+ cmp_distances(reordertuple->distances,
+ reordertuple->distance_nulls,
+ scandesc->xs_distances,
+ scandesc->xs_distance_nulls,
+ node) < 0)
+ {
+ (void) pairingheap_remove_first(node->iss_ReorderQueue);
+
+ tuple = reordertuple->htup;
+ pfree(reordertuple);
+
+ /*
+ * Store the buffered tuple in the scan tuple slot of the
+ * scan state.
+ */
+ ExecStoreTuple(tuple, slot, InvalidBuffer, true);
+ return slot;
+ }
+ }
+ }
+
+ /* Fetch next tuple from the index */
+ tuple = index_getnext(scandesc, direction);
+
+ if (!tuple)
+ {
+ /*
+ * No more tuples from the index. If we have a reorder queue,
+ * we still need to drain all the remaining tuples in the queue
+ * before we're done.
+ */
+ node->iss_ReachedEnd = true;
+ if (node->iss_ReorderQueue)
+ continue;
+ else
+ break;
+ }
+
/*
* Store the scanned tuple in the scan tuple slot of the scan state.
* Note: we pass 'false' because tuples returned by amgetnext are
@@ -103,6 +226,71 @@ IndexNext(IndexScanState *node)
}
}
+ /*
+ * Re-check the ordering.
+ */
+ if (node->iss_ReorderQueue)
+ {
+ /*
+ * The index returned the distance, as calculated by the indexam,
+ * in scandesc->xs_distances. If the index was lossy, we have to
+ * recheck the ordering expression too. Otherwise we take the
+ * indexam's values as is.
+ */
+ if (scandesc->xs_recheck)
+ RecheckOrderBys(node, slot);
+ else
+ copyDistances(node,
+ scandesc->xs_distances,
+ scandesc->xs_distance_nulls,
+ node->iss_Distances,
+ node->iss_DistanceNulls);
+
+ /*
+ * Can we return this tuple immediately, or does it need to be
+ * pushed to the reorder queue? If this tuple's distance was
+ * inaccurate, we can't return it yet, because the next tuple
+ * from the index might need to come before this one. Also,
+ * we can't return it yet if there are any smaller tuples in the
+ * queue already.
+ */
+ if (!pairingheap_is_empty(node->iss_ReorderQueue))
+ reordertuple = (ReorderTuple *) pairingheap_first(node->iss_ReorderQueue);
+ else
+ reordertuple = NULL;
+
+ if ((cmp_distances(node->iss_Distances,
+ node->iss_DistanceNulls,
+ scandesc->xs_distances,
+ scandesc->xs_distance_nulls,
+ node) > 0) ||
+ (reordertuple && cmp_distances(node->iss_Distances,
+ node->iss_DistanceNulls,
+ reordertuple->distances,
+ reordertuple->distance_nulls,
+ node) > 0))
+ {
+ /* Need to put this to the queue */
+ oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
+ reordertuple = (ReorderTuple *) palloc(sizeof(ReorderTuple));
+ reordertuple->htup = heap_copytuple(tuple);
+ reordertuple->distances = (Datum *) palloc(sizeof(Datum) * scandesc->numberOfOrderBys);
+ reordertuple->distance_nulls = (bool *) palloc(sizeof(bool) * scandesc->numberOfOrderBys);
+ copyDistances(node,
+ node->iss_Distances,
+ node->iss_DistanceNulls,
+ reordertuple->distances,
+ reordertuple->distance_nulls);
+
+ pairingheap_add(node->iss_ReorderQueue, &reordertuple->ph_node);
+
+ MemoryContextSwitchTo(oldContext);
+
+ continue;
+ }
+ }
+
+ /* Ok, got a tuple to return */
return slot;
}
@@ -114,6 +302,41 @@ IndexNext(IndexScanState *node)
}
/*
+ * Calculate the expressions in the ORDER BY clause, based on the heap tuple.
+ */
+static void
+RecheckOrderBys(IndexScanState *node, TupleTableSlot *slot)
+{
+ IndexScanDesc scandesc;
+ ExprContext *econtext;
+ int i;
+ ListCell *l;
+ MemoryContext oldContext;
+
+ scandesc = node->iss_ScanDesc;
+ econtext = node->ss.ps.ps_ExprContext;
+ econtext->ecxt_scantuple = slot;
+ ResetExprContext(econtext);
+
+ oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+ i = 0;
+ foreach(l, node->indexorderbyorig)
+ {
+ ExprState *orderby = (ExprState *) lfirst(l);
+
+ Assert(i < scandesc->numberOfOrderBys);
+
+ node->iss_Distances[i] = ExecEvalExpr(orderby,
+ econtext,
+ &node->iss_DistanceNulls[i],
+ NULL);
+ }
+
+ MemoryContextSwitchTo(oldContext);
+}
+
+/*
* IndexRecheck -- access method routine to recheck a tuple in EvalPlanQual
*/
static bool
@@ -465,6 +688,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
IndexScanState *indexstate;
Relation currentRelation;
bool relistarget;
+ int i;
/*
* create state structure
@@ -501,6 +725,9 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
indexstate->indexqualorig = (List *)
ExecInitExpr((Expr *) node->indexqualorig,
(PlanState *) indexstate);
+ indexstate->indexorderbyorig = (List *)
+ ExecInitExpr((Expr *) node->indexorderbyorig,
+ (PlanState *) indexstate);
/*
* tuple table initialization
@@ -581,6 +808,52 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
NULL, /* no ArrayKeys */
NULL);
+ /* Initialize sort support, if we need to re-check ORDER BY exprs */
+ if (indexstate->iss_NumOrderByKeys > 0)
+ {
+ int numOrderByKeys = indexstate->iss_NumOrderByKeys;
+
+ /*
+ * Prepare sort support, and look up the distance type for each
+ * ORDER BY expression.
+ */
+ indexstate->iss_SortSupport =
+ palloc0(numOrderByKeys * sizeof(SortSupportData));
+ indexstate->iss_DistanceTypByVals =
+ palloc(numOrderByKeys * sizeof(bool));
+ indexstate->iss_DistanceTypLens =
+ palloc(numOrderByKeys * sizeof(int16));
+ for (i = 0; i < indexstate->iss_NumOrderByKeys; i++)
+ {
+ Oid distanceType;
+ Oid opfamily;
+ int16 strategy;
+
+ PrepareSortSupportFromOrderingOp(node->indexsortops[i],
+ &indexstate->iss_SortSupport[i]);
+
+ if (!get_ordering_op_properties(node->indexsortops[i],
+ &opfamily, &distanceType, &strategy))
+ {
+ elog(LOG, "operator %u is not a valid ordering operator",
+ node->indexsortops[i]);
+ }
+ get_typlenbyval(distanceType,
+ &indexstate->iss_DistanceTypLens[i],
+ &indexstate->iss_DistanceTypByVals[i]);
+ }
+
+ /* allocate arrays to hold the re-calculated distances */
+ indexstate->iss_Distances =
+ palloc(indexstate->iss_NumOrderByKeys * sizeof(Datum));
+ indexstate->iss_DistanceNulls =
+ palloc(indexstate->iss_NumOrderByKeys * sizeof(bool));
+
+ /* and initialize the reorder queue */
+ indexstate->iss_ReorderQueue = pairingheap_allocate(reorderbuffer_cmp,
+ indexstate);
+ }
+
/*
* If we have runtime keys, we need an ExprContext to evaluate them. The
* node's standard context won't do because we want to reset that context
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8f9ae4f..645c6d8 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -22,6 +22,7 @@
#include "access/skey.h"
#include "access/sysattr.h"
#include "catalog/pg_class.h"
+#include "catalog/pg_operator.h"
#include "foreign/fdwapi.h"
#include "miscadmin.h"
#include "nodes/makefuncs.h"
@@ -102,7 +103,7 @@ static void copy_plan_costsize(Plan *dest, Plan *src);
static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
Oid indexid, List *indexqual, List *indexqualorig,
- List *indexorderby, List *indexorderbyorig,
+ List *indexorderby, List *indexorderbyorig, Oid *sortOperators,
ScanDirection indexscandir);
static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
Index scanrelid, Oid indexid,
@@ -1158,6 +1159,7 @@ create_indexscan_plan(PlannerInfo *root,
List *stripped_indexquals;
List *fixed_indexquals;
List *fixed_indexorderbys;
+ Oid *sortOperators = NULL;
ListCell *l;
/* it should be a base rel... */
@@ -1266,6 +1268,25 @@ create_indexscan_plan(PlannerInfo *root,
replace_nestloop_params(root, (Node *) indexorderbys);
}
+ if (best_path->path.pathkeys && indexorderbys)
+ {
+ int numOrderBys = list_length(indexorderbys);
+ int i;
+
+ sortOperators = (Oid *) palloc(numOrderBys * sizeof(Oid));
+
+ for (i = 0; i < numOrderBys; i++)
+ {
+ /*
+ * FIXME: Currently, amcanorderbyop is only supported by GiST, and
+ * this is only used for float8 distances. The correct way would
+ * be to dig this from the path key, like make_sort_from_pathkeys
+ * does.
+ */
+ sortOperators[i] = Float8LessOperator;
+ }
+ }
+
/* Finally ready to build the plan node */
if (indexonly)
scan_plan = (Scan *) make_indexonlyscan(tlist,
@@ -1285,6 +1306,7 @@ create_indexscan_plan(PlannerInfo *root,
stripped_indexquals,
fixed_indexorderbys,
indexorderbys,
+ sortOperators,
best_path->indexscandir);
copy_path_costsize(&scan_plan->plan, &best_path->path);
@@ -3327,6 +3349,7 @@ make_indexscan(List *qptlist,
List *indexqualorig,
List *indexorderby,
List *indexorderbyorig,
+ Oid *sortOperators,
ScanDirection indexscandir)
{
IndexScan *node = makeNode(IndexScan);
@@ -3344,6 +3367,7 @@ make_indexscan(List *qptlist,
node->indexorderby = indexorderby;
node->indexorderbyorig = indexorderbyorig;
node->indexorderdir = indexscandir;
+ node->indexsortops = sortOperators;
return node;
}
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index bc56b0a..85abfcf 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -2676,6 +2676,18 @@ dist_ppoly(PG_FUNCTION_ARGS)
PG_RETURN_FLOAT8(result);
}
+Datum
+dist_polyp(PG_FUNCTION_ARGS)
+{
+ POLYGON *poly = PG_GETARG_POLYGON_P(0);
+ Point *point = PG_GETARG_POINT_P(1);
+ float8 result;
+
+ result = dist_ppoly_internal(point, poly);
+
+ PG_RETURN_FLOAT8(result);
+}
+
static double
dist_ppoly_internal(Point *pt, POLYGON *poly)
{
@@ -5092,6 +5104,21 @@ dist_pc(PG_FUNCTION_ARGS)
PG_RETURN_FLOAT8(result);
}
+/*
+ * Distance from a circle to a point
+ */
+Datum
+dist_cpoint(PG_FUNCTION_ARGS)
+{
+ CIRCLE *circle = PG_GETARG_CIRCLE_P(0);
+ Point *point = PG_GETARG_POINT_P(1);
+ float8 result;
+
+ result = point_dt(point, &circle->center) - circle->radius;
+ if (result < 0)
+ result = 0;
+ PG_RETURN_FLOAT8(result);
+}
/* circle_center - returns the center point of the circle.
*/
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index d99158f..170069e 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -147,7 +147,10 @@ extern void index_restrpos(IndexScanDesc scan);
extern ItemPointer index_getnext_tid(IndexScanDesc scan,
ScanDirection direction);
extern HeapTuple index_fetch_heap(IndexScanDesc scan);
+extern bool index_get_heap_values(IndexScanDesc scan, ItemPointer heapPtr,
+ Datum values[INDEX_MAX_KEYS], bool isnull[INDEX_MAX_KEYS]);
extern HeapTuple index_getnext(IndexScanDesc scan, ScanDirection direction);
+
extern int64 index_getbitmap(IndexScanDesc scan, TIDBitmap *bitmap);
extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index f2c7ca1..8cfd0c4 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -91,6 +91,15 @@ typedef struct IndexScanDescData
/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
bool xs_recheck; /* T means scan keys must be rechecked */
+ /*
+ * If fetching with an ordering operator, the "distance" of the last
+ * returned heap tuple according to the index. If xs_recheck is true,
+ * this needs to be rechecked just like the scan keys, and the value
+ * returned here is a lower-bound on the actual distance.
+ */
+ Datum *xs_distances;
+ bool *xs_distance_nulls;
+
/* state data for traversing HOT chains in index_getnext */
bool xs_continue_hot; /* T if must keep walking HOT chain */
} IndexScanDescData;
diff --git a/src/include/catalog/pg_amop.h b/src/include/catalog/pg_amop.h
index 7165f54..a28e67d 100644
--- a/src/include/catalog/pg_amop.h
+++ b/src/include/catalog/pg_amop.h
@@ -650,6 +650,7 @@ DATA(insert ( 2594 604 604 11 s 2577 783 0 ));
DATA(insert ( 2594 604 604 12 s 2576 783 0 ));
DATA(insert ( 2594 604 604 13 s 2861 783 0 ));
DATA(insert ( 2594 604 604 14 s 2860 783 0 ));
+DATA(insert ( 2594 604 600 15 o 3588 783 1970 ));
/*
* gist circle_ops
@@ -669,6 +670,7 @@ DATA(insert ( 2595 718 718 11 s 1514 783 0 ));
DATA(insert ( 2595 718 718 12 s 2590 783 0 ));
DATA(insert ( 2595 718 718 13 s 2865 783 0 ));
DATA(insert ( 2595 718 718 14 s 2864 783 0 ));
+DATA(insert ( 2595 718 600 15 o 3586 783 1970 ));
/*
* gin array_ops (these anyarray operators are used with all the opclasses
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index 309aee6..41a8011 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -205,6 +205,7 @@ DATA(insert ( 2594 604 604 4 2580 ));
DATA(insert ( 2594 604 604 5 2581 ));
DATA(insert ( 2594 604 604 6 2582 ));
DATA(insert ( 2594 604 604 7 2584 ));
+DATA(insert ( 2594 604 604 8 3589 ));
DATA(insert ( 2595 718 718 1 2591 ));
DATA(insert ( 2595 718 718 2 2583 ));
DATA(insert ( 2595 718 718 3 2592 ));
@@ -212,6 +213,7 @@ DATA(insert ( 2595 718 718 4 2580 ));
DATA(insert ( 2595 718 718 5 2581 ));
DATA(insert ( 2595 718 718 6 2582 ));
DATA(insert ( 2595 718 718 7 2584 ));
+DATA(insert ( 2595 718 718 8 3589 ));
DATA(insert ( 3655 3614 3614 1 3654 ));
DATA(insert ( 3655 3614 3614 2 3651 ));
DATA(insert ( 3655 3614 3614 3 3648 ));
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 88c737b..63ae366 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -1014,9 +1014,13 @@ DATA(insert OID = 1520 ( "<->" PGNSP PGUID b f f 718 718 701 1520 0 ci
DESCR("distance between");
DATA(insert OID = 1521 ( "#" PGNSP PGUID l f f 0 604 23 0 0 poly_npoints - - ));
DESCR("number of points");
-DATA(insert OID = 1522 ( "<->" PGNSP PGUID b f f 600 718 701 0 0 dist_pc - - ));
+DATA(insert OID = 1522 ( "<->" PGNSP PGUID b f f 600 718 701 3586 0 dist_pc - - ));
DESCR("distance between");
-DATA(insert OID = 3276 ( "<->" PGNSP PGUID b f f 600 604 701 0 0 dist_ppoly - - ));
+DATA(insert OID = 3586 ( "<->" PGNSP PGUID b f f 718 600 701 1522 0 dist_cpoint - - ));
+DESCR("distance between");
+DATA(insert OID = 3276 ( "<->" PGNSP PGUID b f f 600 604 701 3588 0 dist_ppoly - - ));
+DESCR("distance between");
+DATA(insert OID = 3588 ( "<->" PGNSP PGUID b f f 604 600 701 3276 0 dist_polyp - - ));
DESCR("distance between");
DATA(insert OID = 1523 ( "<->" PGNSP PGUID b f f 718 604 701 0 0 dist_cpoly - - ));
DESCR("distance between");
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index eace352..11ac6fa 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -845,6 +845,8 @@ DATA(insert OID = 727 ( dist_sl PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 70
DATA(insert OID = 728 ( dist_cpoly PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 701 "718 604" _null_ _null_ _null_ _null_ dist_cpoly _null_ _null_ _null_ ));
DATA(insert OID = 729 ( poly_distance PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 701 "604 604" _null_ _null_ _null_ _null_ poly_distance _null_ _null_ _null_ ));
DATA(insert OID = 3275 ( dist_ppoly PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 701 "600 604" _null_ _null_ _null_ _null_ dist_ppoly _null_ _null_ _null_ ));
+DATA(insert OID = 3587 ( dist_polyp PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 701 "604 600" _null_ _null_ _null_ _null_ dist_polyp _null_ _null_ _null_ ));
+DATA(insert OID = 3585 ( dist_cpoint PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 701 "718 600" _null_ _null_ _null_ _null_ dist_cpoint _null_ _null_ _null_ ));
DATA(insert OID = 740 ( text_lt PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "25 25" _null_ _null_ _null_ _null_ text_lt _null_ _null_ _null_ ));
DATA(insert OID = 741 ( text_le PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "25 25" _null_ _null_ _null_ _null_ text_le _null_ _null_ _null_ ));
@@ -4078,6 +4080,8 @@ DATA(insert OID = 2179 ( gist_point_consistent PGNSP PGUID 12 1 0 0 0 f f f f t
DESCR("GiST support");
DATA(insert OID = 3064 ( gist_point_distance PGNSP PGUID 12 1 0 0 0 f f f f t f i 4 0 701 "2281 600 23 26" _null_ _null_ _null_ _null_ gist_point_distance _null_ _null_ _null_ ));
DESCR("GiST support");
+DATA(insert OID = 3589 ( gist_bbox_distance PGNSP PGUID 12 1 0 0 0 f f f f t f i 4 0 701 "2281 600 23 26" _null_ _null_ _null_ _null_ gist_bbox_distance _null_ _null_ _null_ ));
+DESCR("GiST support");
/* GIN */
DATA(insert OID = 2731 ( gingetbitmap PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 20 "2281 2281" _null_ _null_ _null_ _null_ gingetbitmap _null_ _null_ _null_ ));
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 41b13b2..23e278c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
#include "access/genam.h"
#include "access/heapam.h"
#include "executor/instrument.h"
+#include "lib/pairingheap.h"
#include "nodes/params.h"
#include "nodes/plannodes.h"
#include "utils/reltrigger.h"
@@ -1237,6 +1238,7 @@ typedef struct
* IndexScanState information
*
* indexqualorig execution state for indexqualorig expressions
+ * indexorderbyorig execution state for indexorderbyorig expressions
* ScanKeys Skey structures for index quals
* NumScanKeys number of ScanKeys
* OrderByKeys Skey structures for index ordering operators
@@ -1247,12 +1249,20 @@ typedef struct
* RuntimeContext expr context for evaling runtime Skeys
* RelationDesc index relation descriptor
* ScanDesc index scan descriptor
+ *
+ * ReorderQueue queue of re-check tuples that need reordering
+ * Distances re-checked distances of last fetched tuple
+ * SortSupport for re-ordering ORDER BY exprs
+ * ReachedEnd have we fetched all tuples from index already?
+ * DistanceTypByVals is the datatype of order by expression pass-by-value?
+ * DistanceTypLens typlens of the datatypes of order by expressions
* ----------------
*/
typedef struct IndexScanState
{
ScanState ss; /* its first field is NodeTag */
List *indexqualorig;
+ List *indexorderbyorig;
ScanKey iss_ScanKeys;
int iss_NumScanKeys;
ScanKey iss_OrderByKeys;
@@ -1263,6 +1273,15 @@ typedef struct IndexScanState
ExprContext *iss_RuntimeContext;
Relation iss_RelationDesc;
IndexScanDesc iss_ScanDesc;
+
+ /* These are needed for re-checking ORDER BY expr ordering */
+ pairingheap *iss_ReorderQueue;
+ Datum *iss_Distances;
+ bool *iss_DistanceNulls;
+ SortSupport iss_SortSupport;
+ bool *iss_DistanceTypByVals;
+ int16 *iss_DistanceTypLens;
+ bool iss_ReachedEnd;
} IndexScanState;
/* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 48203a0..2219ed6 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -302,7 +302,11 @@ typedef Scan SeqScan;
* index column order. Only the expressions are provided, not the auxiliary
* sort-order information from the ORDER BY SortGroupClauses; it's assumed
* that the sort ordering is fully determinable from the top-level operators.
- * indexorderbyorig is unused at run time, but is needed for EXPLAIN.
+ * indexorderbyorig is used at run time to recheck the ordering, if the index
+ * does not calculate an accurate ordering. It is also needed for EXPLAIN.
+ *
+ * indexsortops is an array of operators used to sort the ORDER BY expressions,
+ * used together with indexorderbyorig to recheck ordering at run time.
* (Note these fields are used for amcanorderbyop cases, not amcanorder cases.)
*
* indexorderdir specifies the scan ordering, for indexscans on amcanorder
@@ -316,7 +320,8 @@ typedef struct IndexScan
List *indexqual; /* list of index quals (usually OpExprs) */
List *indexqualorig; /* the same in original form */
List *indexorderby; /* list of index ORDER BY exprs */
- List *indexorderbyorig; /* the same in original form */
+ List *indexorderbyorig; /* the same in original form */
+ Oid *indexsortops; /* OIDs of operators to sort ORDER BY exprs */
ScanDirection indexorderdir; /* forward or backward or don't care */
} IndexScan;
diff --git a/src/include/utils/geo_decls.h b/src/include/utils/geo_decls.h
index 91610d8..a0a8abe 100644
--- a/src/include/utils/geo_decls.h
+++ b/src/include/utils/geo_decls.h
@@ -394,8 +394,10 @@ extern Datum circle_diameter(PG_FUNCTION_ARGS);
extern Datum circle_radius(PG_FUNCTION_ARGS);
extern Datum circle_distance(PG_FUNCTION_ARGS);
extern Datum dist_pc(PG_FUNCTION_ARGS);
+extern Datum dist_cpoint(PG_FUNCTION_ARGS);
extern Datum dist_cpoly(PG_FUNCTION_ARGS);
extern Datum dist_ppoly(PG_FUNCTION_ARGS);
+extern Datum dist_polyp(PG_FUNCTION_ARGS);
extern Datum circle_center(PG_FUNCTION_ARGS);
extern Datum cr_circle(PG_FUNCTION_ARGS);
extern Datum box_circle(PG_FUNCTION_ARGS);
@@ -419,6 +421,7 @@ extern Datum gist_circle_consistent(PG_FUNCTION_ARGS);
extern Datum gist_point_compress(PG_FUNCTION_ARGS);
extern Datum gist_point_consistent(PG_FUNCTION_ARGS);
extern Datum gist_point_distance(PG_FUNCTION_ARGS);
+extern Datum gist_bbox_distance(PG_FUNCTION_ARGS);
/* geo_selfuncs.c */
extern Datum areasel(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 5603817..cb18986 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -372,6 +372,36 @@ SELECT count(*) FROM radix_text_tbl WHERE t ~>~ 'Worth
48
(1 row)
+SELECT * FROM gpolygon_tbl ORDER BY f1 <-> '(0,0)'::point LIMIT 10;
+ f1
+-------------------------------------------------
+ ((240,359),(240,455),(337,455),(337,359))
+ ((662,163),(662,187),(759,187),(759,163))
+ ((1000,0),(0,1000))
+ ((0,1000),(1000,1000))
+ ((1346,344),(1346,403),(1444,403),(1444,344))
+ ((278,1409),(278,1457),(369,1457),(369,1409))
+ ((907,1156),(907,1201),(948,1201),(948,1156))
+ ((1517,971),(1517,1043),(1594,1043),(1594,971))
+ ((175,1820),(175,1850),(259,1850),(259,1820))
+ ((2424,81),(2424,160),(2424,160),(2424,81))
+(10 rows)
+
+SELECT * FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
+ f1
+-----------------------------------
+ <(288.5,407),68.2367203197809>
+ <(710.5,175),49.9624859269432>
+ <(323.5,1433),51.4417145903983>
+ <(927.5,1178.5),30.4384625104489>
+ <(1395,373.5),57.1948424248201>
+ <(1555.5,1007),52.7091073724456>
+ <(217,1835),44.5982062419555>
+ <(489,2421.5),22.3886131772381>
+ <(2424,120.5),39.5>
+ <(751.5,2655),20.4022057631032>
+(10 rows)
+
-- Now check the results from plain indexscan
SET enable_seqscan = OFF;
SET enable_indexscan = ON;
@@ -1152,6 +1182,54 @@ SELECT count(*) FROM radix_text_tbl WHERE t ~>~ 'Worth
48
(1 row)
+EXPLAIN (COSTS OFF)
+SELECT * FROM gpolygon_tbl ORDER BY f1 <-> '(0,0)'::point LIMIT 10;
+ QUERY PLAN
+-----------------------------------------------------
+ Limit
+ -> Index Scan using ggpolygonind on gpolygon_tbl
+ Order By: (f1 <-> '(0,0)'::point)
+(3 rows)
+
+SELECT * FROM gpolygon_tbl ORDER BY f1 <-> '(0,0)'::point LIMIT 10;
+ f1
+-------------------------------------------------
+ ((240,359),(240,455),(337,455),(337,359))
+ ((662,163),(662,187),(759,187),(759,163))
+ ((1000,0),(0,1000))
+ ((0,1000),(1000,1000))
+ ((1346,344),(1346,403),(1444,403),(1444,344))
+ ((278,1409),(278,1457),(369,1457),(369,1409))
+ ((907,1156),(907,1201),(948,1201),(948,1156))
+ ((1517,971),(1517,1043),(1594,1043),(1594,971))
+ ((175,1820),(175,1850),(259,1850),(259,1820))
+ ((2424,81),(2424,160),(2424,160),(2424,81))
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
+ QUERY PLAN
+---------------------------------------------------
+ Limit
+ -> Index Scan using ggcircleind on gcircle_tbl
+ Order By: (f1 <-> '(200,300)'::point)
+(3 rows)
+
+SELECT * FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
+ f1
+-----------------------------------
+ <(288.5,407),68.2367203197809>
+ <(710.5,175),49.9624859269432>
+ <(323.5,1433),51.4417145903983>
+ <(927.5,1178.5),30.4384625104489>
+ <(1395,373.5),57.1948424248201>
+ <(1555.5,1007),52.7091073724456>
+ <(217,1835),44.5982062419555>
+ <(489,2421.5),22.3886131772381>
+ <(2424,120.5),39.5>
+ <(751.5,2655),20.4022057631032>
+(10 rows)
+
-- Now check the results from bitmap indexscan
SET enable_seqscan = OFF;
SET enable_indexscan = OFF;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index f779fa0..5df9008 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -224,6 +224,10 @@ SELECT count(*) FROM radix_text_tbl WHERE t > 'Worth
SELECT count(*) FROM radix_text_tbl WHERE t ~>~ 'Worth St ';
+SELECT * FROM gpolygon_tbl ORDER BY f1 <-> '(0,0)'::point LIMIT 10;
+
+SELECT * FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
+
-- Now check the results from plain indexscan
SET enable_seqscan = OFF;
SET enable_indexscan = ON;
@@ -437,6 +441,14 @@ EXPLAIN (COSTS OFF)
SELECT count(*) FROM radix_text_tbl WHERE t ~>~ 'Worth St ';
SELECT count(*) FROM radix_text_tbl WHERE t ~>~ 'Worth St ';
+EXPLAIN (COSTS OFF)
+SELECT * FROM gpolygon_tbl ORDER BY f1 <-> '(0,0)'::point LIMIT 10;
+SELECT * FROM gpolygon_tbl ORDER BY f1 <-> '(0,0)'::point LIMIT 10;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
+SELECT * FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
+
-- Now check the results from bitmap indexscan
SET enable_seqscan = OFF;
SET enable_indexscan = OFF;
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers