On 10/06/2014 12:36 PM, Emre Hasegeli wrote:
Thanks. The main question now is design of this patch. Currently, it does
all the work inside access method. We already have some discussion of pro
and cons of this method. I would like to clarify alternatives now. I can
see following way:

    1. Implement new executor node which performs sorting by priority queue.
    Let's call it "Priority queue". I think it should be separate node from
    "Sort" node. Despite "Priority queue" and "Sort" are essentially similar
    from user view, they would be completely different in implementation.
    2. Implement some interface to transfer distance values from access
    method to "Priority queue" node.

If we assume that all of them need recheck, maybe it can be done
without passing distance values.

No, the executor needs the lower-bound distance value, as calculated by the indexam, so that it knows which tuples it can return from the queue already. For example, imagine the following items coming from the index:

tuple # lower bound     actual distance
1       1               1
2       2               10
3       30              30
4       40              40

After the executor has fetched tuple 2, and re-checked the distance, it pushes the tuple to the queue. It then fetches tuple 3, with lower bound 30, and it can now immediately return tuple # 2 from the queue. Because 10 < 30, so there cannot be any more tuples coming from the index that would need to go before tuple # 2.

The executor needs the lower bound as calculated by the index, as well as the actual distance it calculates itself, to make those decisions.

    3. Somehow tell the planner that it could use "Priority queue" in
    corresponding cases. I see two ways of doing this:
       - Add flag to operator in opclass indicating that index can only
       order by lower bound of "col op value", not by "col op value" itself.
       - Define new relation between operators. Value of one operator could
       be lower bound for value of another operator. So, planner can
put "Priority
       queue" node when lower bound ordering is possible from index. Also "ALTER
       OPERATOR" command would be reasonable, so extensions could upgrade.

I think, it would be better to make it a property of the operator
class.  We can add a column to pg_amop or define another value for
amoppurpose on pg_amop.  Syntax can be something like this:

CREATE OPERATOR CLASS circle_ops DEFAULT
    FOR TYPE circle USING gist AS
    OPERATOR 15  <->(circle, point) FOR ORDER BY pg_catalog.float_ops LOWER 
BOUND;

While looking at it, I realize that current version of the patch does
not use the sort operator family defined with the operator class.  It
assumes that the distance function will return values compatible with
the operator.  Operator class definition makes me think that there is
not such an assumption.

Yeah. I also noticed that the type of the argument passed to the consistent function varies, and doesn't necessarily match that declared in pg_proc. Looking at gist_point_consistent, the argument type can be a point, a polygon, or a circle, depending on the "strategy group". But it's declared as a point in pg_proc.

Besides overhead, this way makes significant infrastructural changes. So,
it may be over-engineering. However, it's probably more clean and beautiful
solution.
I would like to get some feedback from people familiar with KNN-GiST like
Heikki or Tom. What do you think about this? Any other ideas?

I would be happy to test and review the changes.  I think it is nice
to solve the problem in a generalized way improving the access method
infrastructure.  Definitely, we should have a consensus on the design
before working on the infrastructure changes.

I took a stab on this. I added the reorder queue directly to the Index Scan node, rather than adding a whole new node type for it. It seems reasonable, as Index Scan is responsible for rechecking the quals, too, even though re-ordering the tuples is more complicated than rechecking quals.

To recap, the idea is that the index can define an ordering op, even if it cannot return the tuples in exactly the right order. It is enough that for each tuple, it returns a lower bound of the expression that is used for sorting. For example, for "ORDER BY key <-> column", it is enough that it returns a lower bound of "key <-> column" for each tuple. The index must return the tuples ordered by the lower bounds. The executor re-checks the expressions, and re-orders the tuples to the correct order.

Patch attached. It should be applied on top of my pairing heap patch at http://www.postgresql.org/message-id/548ffa2c.7060...@vmware.com. Some caveats:

* The signature of the distance function is unchanged, it doesn't get a recheck argument. It is just assumed that if the consistent function sets the recheck flag, then the distance needs to be rechecked as well. We might want to add the recheck argument, like you Alexander did in your patch, but it's not important right now.

* I used the "distance" term in the executor, although the ORDER BY expr machinery is more general than that. The value returned by the ORDER BY expression doesn't have to be a distance, although that's the only thing supported by GiST and the built-in opclasses.

* I short-circuited the planner to assume that the ORDER BY expression always returns a float. That's true today for knn-GiST, but is obviously a bogus assumption in general.

This needs some work to get into a committable state, but from a modularity point of view, this is much better than having the indexam to peek into the heap.

- Heikki

diff --git a/doc/src/sgml/gist.sgml b/doc/src/sgml/gist.sgml
index 5de282b..0430738 100644
--- a/doc/src/sgml/gist.sgml
+++ b/doc/src/sgml/gist.sgml
@@ -105,6 +105,7 @@
        <literal>~=</>
       </entry>
       <entry>
+       <literal>&lt;-&gt;</>
       </entry>
      </row>
      <row>
@@ -163,6 +164,7 @@
        <literal>~=</>
       </entry>
       <entry>
+       <literal>&lt;-&gt;</>
       </entry>
      </row>
      <row>
@@ -207,6 +209,12 @@
   </table>
 
  <para>
+  Currently, ordering by the distance operator <literal>&lt;-&gt;</>
+  is supported only with <literal>point</> by the operator classes
+  of the geometric types.
+ </para>
+
+ <para>
   For historical reasons, the <literal>inet_ops</> operator class is
   not the default class for types <type>inet</> and <type>cidr</>.
   To use it, mention the class name in <command>CREATE INDEX</>,
@@ -798,13 +806,22 @@ my_distance(PG_FUNCTION_ARGS)
 
        The arguments to the <function>distance</> function are identical to
        the arguments of the <function>consistent</> function, except that no
-       recheck flag is used.  The distance to a leaf index entry must always
-       be determined exactly, since there is no way to re-order the tuples
-       once they are returned.  Some approximation is allowed when determining
-       the distance to an internal tree node, so long as the result is never
-       greater than any child's actual distance.  Thus, for example, distance
-       to a bounding box is usually sufficient in geometric applications.  The
-       result value can be any finite <type>float8</> value.  (Infinity and
+       recheck flag is used.
+      </para>
+
+      <para>
+       Some approximation is allowed when determining the distance to an
+       internal tree node, so long as the result is never greater than any
+       child's actual distance.  Thus, for example, distance
+       to a bounding box is usually sufficient in geometric applications. For
+       leaf nodes, the returned distance must be accurate, if the consistent
+       function returns *recheck == false for the tuple. Otherwise the same
+       approximation is allowed, and the executor will re-order ambiguous cases
+       after recalculating the actual distance.
+      </para>
+
+      <para>
+       The result value can be any finite <type>float8</> value.  (Infinity and
        minus infinity are used internally to handle cases such as nulls, so it
        is not recommended that <function>distance</> functions return these
        values.)
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index e5eb6f6..8569928 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -192,9 +192,8 @@ gistindex_keytest(IndexScanDesc scan,
 			 * always be zero, but might as well pass it for possible future
 			 * use.)
 			 *
-			 * Note that Distance functions don't get a recheck argument. We
-			 * can't tolerate lossy distance calculations on leaf tuples;
-			 * there is no opportunity to re-sort the tuples afterwards.
+			 * Note that Distance functions don't get a recheck argument.
+			 * Distance is rechecked whenever the quals are.
 			 */
 			dist = FunctionCall4Coll(&key->sk_func,
 									 key->sk_collation,
@@ -411,6 +410,7 @@ getNextNearest(IndexScanDesc scan)
 {
 	GISTScanOpaque so = (GISTScanOpaque) scan->opaque;
 	bool		res = false;
+	int			i;
 
 	do
 	{
@@ -423,7 +423,11 @@ getNextNearest(IndexScanDesc scan)
 		{
 			/* found a heap item at currently minimal distance */
 			scan->xs_ctup.t_self = item->data.heap.heapPtr;
-			scan->xs_recheck = item->data.heap.recheck;
+			for (i = 0; i < scan->numberOfOrderBys; i++)
+			{
+				scan->xs_distances[i] = Float8GetDatum(item->distances[i]);
+				scan->xs_distance_nulls[i] = false;
+			}
 			res = true;
 		}
 		else
diff --git a/src/backend/access/gist/gistproc.c b/src/backend/access/gist/gistproc.c
index db0bec6..22ab256 100644
--- a/src/backend/access/gist/gistproc.c
+++ b/src/backend/access/gist/gistproc.c
@@ -1459,3 +1459,36 @@ gist_point_distance(PG_FUNCTION_ARGS)
 
 	PG_RETURN_FLOAT8(distance);
 }
+
+/*
+ * The inexact GiST distance method for geometric types that store bounding
+ * boxes.
+ *
+ * Compute lossy distance from point to index entries.  The result is inexact
+ * because index entries are bounding boxes, not the exact shapes of the
+ * indexed geometric types.  We use distance from point to MBR of index entry.
+ * This is correct lower bound estimate of distance from point to indexed
+ * geometric type.
+ */
+Datum
+gist_bbox_distance(PG_FUNCTION_ARGS)
+{
+	GISTENTRY  *entry = (GISTENTRY *) PG_GETARG_POINTER(0);
+	StrategyNumber strategy = (StrategyNumber) PG_GETARG_UINT16(2);
+	double		distance;
+	StrategyNumber strategyGroup = strategy / GeoStrategyNumberOffset;
+
+	switch (strategyGroup)
+	{
+		case PointStrategyNumberGroup:
+			distance = computeDistance(false,
+									   DatumGetBoxP(entry->key),
+									   PG_GETARG_POINT_P(1));
+			break;
+		default:
+			elog(ERROR, "unknown strategy number: %d", strategy);
+			distance = 0.0;		/* keep compiler quiet */
+	}
+
+	PG_RETURN_FLOAT8(distance);
+}
diff --git a/src/backend/access/gist/gistscan.c b/src/backend/access/gist/gistscan.c
index eff02c4..1837b78 100644
--- a/src/backend/access/gist/gistscan.c
+++ b/src/backend/access/gist/gistscan.c
@@ -85,6 +85,11 @@ gistbeginscan(PG_FUNCTION_ARGS)
 	/* workspaces with size dependent on numberOfOrderBys: */
 	so->distances = palloc(sizeof(double) * scan->numberOfOrderBys);
 	so->qual_ok = true;			/* in case there are zero keys */
+	if (scan->numberOfOrderBys > 0)
+	{
+		scan->xs_distances = palloc(sizeof(Datum) * scan->numberOfOrderBys);
+		scan->xs_distance_nulls = palloc(sizeof(bool) * scan->numberOfOrderBys);
+	}
 
 	scan->opaque = so;
 
diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c
index 2b89dc6..9279454 100644
--- a/src/backend/executor/nodeIndexscan.c
+++ b/src/backend/executor/nodeIndexscan.c
@@ -28,14 +28,86 @@
 #include "access/relscan.h"
 #include "executor/execdebug.h"
 #include "executor/nodeIndexscan.h"
+#include "lib/pairingheap.h"
 #include "optimizer/clauses.h"
 #include "utils/array.h"
+#include "utils/datum.h"
 #include "utils/lsyscache.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
+/*
+ * When an ordering operator is used, tuples fetched from the index that
+ * need to be reordered are queued in a pairing heap, as ReorderTuples.
+ */
+typedef struct
+{
+	pairingheap_node ph_node;
+	HeapTuple	htup;
+	Datum	   *distances;
+	bool	   *distance_nulls;
+} ReorderTuple;
+
+static int
+cmp_distances(const Datum *adist, const bool *anulls,
+			  const Datum *bdist, const bool *bnulls,
+			  IndexScanState *node)
+{
+	int			i;
+	int			result;
+
+	for (i = 0; i < node->iss_NumOrderByKeys; i++)
+	{
+		SortSupport ssup = &node->iss_SortSupport[i];
+
+		/* Handle nulls. We only support NULLS LAST */
+		if (anulls[i] && !bnulls[i])
+			return 1;
+		else if (!anulls[i] && bnulls[i])
+			return -1;
+		else if (anulls[i] && bnulls[i])
+			return 0;
+
+		result = ssup->comparator(adist[i], bdist[i], ssup);
+		if (result != 0)
+			return result;
+	}
+
+	return 0;
+}
+
+static int
+reorderbuffer_cmp(const pairingheap_node *a, const pairingheap_node *b, void *arg)
+{
+	ReorderTuple *rta = (ReorderTuple *) a;
+	ReorderTuple *rtb = (ReorderTuple *) b;
+	IndexScanState *node = (IndexScanState *) arg;
+
+	return -cmp_distances(rta->distances, rta->distance_nulls,
+						  rtb->distances, rtb->distance_nulls,
+						  node);
+}
+
+static void
+copyDistances(IndexScanState *node, const Datum *src_datums, const bool *src_nulls,
+			  Datum *dst_datums, bool *dst_nulls)
+{
+	int			i;
+
+	for (i = 0; i < node->iss_NumOrderByKeys; i++)
+	{
+		if (!src_nulls[i])
+			dst_datums[i] =	datumCopy(src_datums[i],
+									  node->iss_DistanceTypByVals[i],
+									  node->iss_DistanceTypLens[i]);
+		else
+			dst_datums[i] = (Datum) 0;
+		dst_nulls[i] = src_nulls[i];
+	}
+}
 
 static TupleTableSlot *IndexNext(IndexScanState *node);
+static void RecheckOrderBys(IndexScanState *node, TupleTableSlot *slot);
 
 
 /* ----------------------------------------------------------------
@@ -54,6 +126,8 @@ IndexNext(IndexScanState *node)
 	IndexScanDesc scandesc;
 	HeapTuple	tuple;
 	TupleTableSlot *slot;
+	MemoryContext oldContext;
+	ReorderTuple *reordertuple;
 
 	/*
 	 * extract necessary information from index scan node
@@ -72,11 +146,60 @@ IndexNext(IndexScanState *node)
 	econtext = node->ss.ps.ps_ExprContext;
 	slot = node->ss.ss_ScanTupleSlot;
 
-	/*
-	 * ok, now that we have what we need, fetch the next tuple.
-	 */
-	while ((tuple = index_getnext(scandesc, direction)) != NULL)
+	for (;;)
 	{
+		/* Check the reorder queue first */
+		if (node->iss_ReorderQueue)
+		{
+			if (pairingheap_is_empty(node->iss_ReorderQueue))
+			{
+				if (node->iss_ReachedEnd)
+					break;
+			}
+			else
+			{
+				reordertuple = (ReorderTuple *) pairingheap_first(node->iss_ReorderQueue);
+
+				/* Check if we can return this tuple */
+				if (node->iss_ReachedEnd ||
+					cmp_distances(reordertuple->distances,
+								  reordertuple->distance_nulls,
+								  scandesc->xs_distances,
+								  scandesc->xs_distance_nulls,
+								  node) < 0)
+				{
+					(void) pairingheap_remove_first(node->iss_ReorderQueue);
+
+					tuple = reordertuple->htup;
+					pfree(reordertuple);
+
+					/*
+					 * Store the buffered tuple in the scan tuple slot of the
+					 * scan state.
+					 */
+					ExecStoreTuple(tuple, slot, InvalidBuffer, true);
+					return slot;
+				}
+			}
+		}
+
+		/* Fetch next tuple from the index */
+		tuple = index_getnext(scandesc, direction);
+
+		if (!tuple)
+		{
+			/*
+			 * No more tuples from the index. If we have a reorder queue,
+			 * we still need to drain all the remaining tuples in the queue
+			 * before we're done.
+			 */
+			node->iss_ReachedEnd = true;
+			if (node->iss_ReorderQueue)
+				continue;
+			else
+				break;
+		}
+
 		/*
 		 * Store the scanned tuple in the scan tuple slot of the scan state.
 		 * Note: we pass 'false' because tuples returned by amgetnext are
@@ -103,6 +226,71 @@ IndexNext(IndexScanState *node)
 			}
 		}
 
+		/*
+		 * Re-check the ordering.
+		 */
+		if (node->iss_ReorderQueue)
+		{
+			/*
+			 * The index returned the distance, as calculated by the indexam,
+			 * in scandesc->xs_distances.  If the index was lossy, we have to
+			 * recheck the ordering expression too. Otherwise we take the
+			 * indexam's values as is.
+			 */
+			if (scandesc->xs_recheck)
+				RecheckOrderBys(node, slot);
+			else
+				copyDistances(node,
+							  scandesc->xs_distances,
+							  scandesc->xs_distance_nulls,
+							  node->iss_Distances,
+							  node->iss_DistanceNulls);
+
+			/*
+			 * Can we return this tuple immediately, or does it need to be
+			 * pushed to the reorder queue? If this tuple's distance was
+			 * inaccurate, we can't return it yet, because the next tuple
+			 * from the index might need to come before this one. Also,
+			 * we can't return it yet if there are any smaller tuples in the
+			 * queue already.
+			 */
+			if (!pairingheap_is_empty(node->iss_ReorderQueue))
+				reordertuple = (ReorderTuple *) pairingheap_first(node->iss_ReorderQueue);
+			else
+				reordertuple = NULL;
+
+			if ((cmp_distances(node->iss_Distances,
+							   node->iss_DistanceNulls,
+							   scandesc->xs_distances,
+							   scandesc->xs_distance_nulls,
+							   node) > 0) ||
+				(reordertuple && cmp_distances(node->iss_Distances,
+											   node->iss_DistanceNulls,
+											   reordertuple->distances,
+											   reordertuple->distance_nulls,
+											   node) > 0))
+			{
+				/* Need to put this to the queue */
+				oldContext = MemoryContextSwitchTo(estate->es_query_cxt);
+				reordertuple = (ReorderTuple *) palloc(sizeof(ReorderTuple));
+				reordertuple->htup = heap_copytuple(tuple);
+				reordertuple->distances = (Datum *) palloc(sizeof(Datum) * scandesc->numberOfOrderBys);
+				reordertuple->distance_nulls = (bool *) palloc(sizeof(bool) * scandesc->numberOfOrderBys);
+				copyDistances(node,
+							  node->iss_Distances,
+							  node->iss_DistanceNulls,
+							  reordertuple->distances,
+							  reordertuple->distance_nulls);
+
+				pairingheap_add(node->iss_ReorderQueue, &reordertuple->ph_node);
+
+				MemoryContextSwitchTo(oldContext);
+
+				continue;
+			}
+		}
+
+		/* Ok, got a tuple to return */
 		return slot;
 	}
 
@@ -114,6 +302,41 @@ IndexNext(IndexScanState *node)
 }
 
 /*
+ * Calculate the expressions in the ORDER BY clause, based on the heap tuple.
+ */
+static void
+RecheckOrderBys(IndexScanState *node, TupleTableSlot *slot)
+{
+	IndexScanDesc scandesc;
+	ExprContext *econtext;
+	int			i;
+	ListCell   *l;
+	MemoryContext oldContext;
+
+	scandesc = node->iss_ScanDesc;
+	econtext = node->ss.ps.ps_ExprContext;
+	econtext->ecxt_scantuple = slot;
+	ResetExprContext(econtext);
+
+	oldContext = MemoryContextSwitchTo(econtext->ecxt_per_tuple_memory);
+
+	i = 0;
+	foreach(l, node->indexorderbyorig)
+	{
+		ExprState *orderby = (ExprState *) lfirst(l);
+
+		Assert(i < scandesc->numberOfOrderBys);
+
+		node->iss_Distances[i] = ExecEvalExpr(orderby,
+											  econtext,
+											  &node->iss_DistanceNulls[i],
+											  NULL);
+	}
+
+	MemoryContextSwitchTo(oldContext);
+}
+
+/*
  * IndexRecheck -- access method routine to recheck a tuple in EvalPlanQual
  */
 static bool
@@ -465,6 +688,7 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	IndexScanState *indexstate;
 	Relation	currentRelation;
 	bool		relistarget;
+	int			i;
 
 	/*
 	 * create state structure
@@ -501,6 +725,9 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 	indexstate->indexqualorig = (List *)
 		ExecInitExpr((Expr *) node->indexqualorig,
 					 (PlanState *) indexstate);
+	indexstate->indexorderbyorig = (List *)
+		ExecInitExpr((Expr *) node->indexorderbyorig,
+					 (PlanState *) indexstate);
 
 	/*
 	 * tuple table initialization
@@ -581,6 +808,52 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags)
 						   NULL,	/* no ArrayKeys */
 						   NULL);
 
+	/* Initialize sort support, if we need to re-check ORDER BY exprs */
+	if (indexstate->iss_NumOrderByKeys > 0)
+	{
+		int			numOrderByKeys = indexstate->iss_NumOrderByKeys;
+
+		/*
+		 * Prepare sort support, and look up the distance type for each
+		 * ORDER BY expression.
+		 */
+		indexstate->iss_SortSupport =
+			palloc0(numOrderByKeys * sizeof(SortSupportData));
+		indexstate->iss_DistanceTypByVals =
+			palloc(numOrderByKeys * sizeof(bool));
+		indexstate->iss_DistanceTypLens =
+			palloc(numOrderByKeys * sizeof(int16));
+		for (i = 0; i < indexstate->iss_NumOrderByKeys; i++)
+		{
+			Oid		distanceType;
+			Oid		opfamily;
+			int16	strategy;
+
+			PrepareSortSupportFromOrderingOp(node->indexsortops[i],
+											 &indexstate->iss_SortSupport[i]);
+
+			if (!get_ordering_op_properties(node->indexsortops[i],
+										 &opfamily, &distanceType, &strategy))
+			{
+				elog(LOG, "operator %u is not a valid ordering operator",
+					 node->indexsortops[i]);
+			}
+			get_typlenbyval(distanceType,
+							&indexstate->iss_DistanceTypLens[i],
+							&indexstate->iss_DistanceTypByVals[i]);
+		}
+
+		/* allocate arrays to hold the re-calculated distances */
+		indexstate->iss_Distances =
+			palloc(indexstate->iss_NumOrderByKeys * sizeof(Datum));
+		indexstate->iss_DistanceNulls =
+			palloc(indexstate->iss_NumOrderByKeys * sizeof(bool));
+
+		/* and initialize the reorder queue */
+		indexstate->iss_ReorderQueue = pairingheap_allocate(reorderbuffer_cmp,
+															indexstate);
+	}
+
 	/*
 	 * If we have runtime keys, we need an ExprContext to evaluate them. The
 	 * node's standard context won't do because we want to reset that context
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index 8f9ae4f..645c6d8 100644
--- a/src/backend/optimizer/plan/createplan.c
+++ b/src/backend/optimizer/plan/createplan.c
@@ -22,6 +22,7 @@
 #include "access/skey.h"
 #include "access/sysattr.h"
 #include "catalog/pg_class.h"
+#include "catalog/pg_operator.h"
 #include "foreign/fdwapi.h"
 #include "miscadmin.h"
 #include "nodes/makefuncs.h"
@@ -102,7 +103,7 @@ static void copy_plan_costsize(Plan *dest, Plan *src);
 static SeqScan *make_seqscan(List *qptlist, List *qpqual, Index scanrelid);
 static IndexScan *make_indexscan(List *qptlist, List *qpqual, Index scanrelid,
 			   Oid indexid, List *indexqual, List *indexqualorig,
-			   List *indexorderby, List *indexorderbyorig,
+			   List *indexorderby, List *indexorderbyorig, Oid *sortOperators,
 			   ScanDirection indexscandir);
 static IndexOnlyScan *make_indexonlyscan(List *qptlist, List *qpqual,
 				   Index scanrelid, Oid indexid,
@@ -1158,6 +1159,7 @@ create_indexscan_plan(PlannerInfo *root,
 	List	   *stripped_indexquals;
 	List	   *fixed_indexquals;
 	List	   *fixed_indexorderbys;
+	Oid		   *sortOperators = NULL;
 	ListCell   *l;
 
 	/* it should be a base rel... */
@@ -1266,6 +1268,25 @@ create_indexscan_plan(PlannerInfo *root,
 			replace_nestloop_params(root, (Node *) indexorderbys);
 	}
 
+	if (best_path->path.pathkeys && indexorderbys)
+	{
+		int		numOrderBys = list_length(indexorderbys);
+		int		i;
+
+		sortOperators = (Oid *) palloc(numOrderBys * sizeof(Oid));
+
+		for (i = 0; i < numOrderBys; i++)
+		{
+			/*
+			 * FIXME: Currently, amcanorderbyop is only supported by GiST, and
+			 * this is only used for float8 distances. The correct way would
+			 * be to dig this from the path key, like make_sort_from_pathkeys
+			 * does.
+			 */
+			sortOperators[i] = Float8LessOperator;
+		}
+	}
+
 	/* Finally ready to build the plan node */
 	if (indexonly)
 		scan_plan = (Scan *) make_indexonlyscan(tlist,
@@ -1285,6 +1306,7 @@ create_indexscan_plan(PlannerInfo *root,
 											stripped_indexquals,
 											fixed_indexorderbys,
 											indexorderbys,
+											sortOperators,
 											best_path->indexscandir);
 
 	copy_path_costsize(&scan_plan->plan, &best_path->path);
@@ -3327,6 +3349,7 @@ make_indexscan(List *qptlist,
 			   List *indexqualorig,
 			   List *indexorderby,
 			   List *indexorderbyorig,
+			   Oid *sortOperators,
 			   ScanDirection indexscandir)
 {
 	IndexScan  *node = makeNode(IndexScan);
@@ -3344,6 +3367,7 @@ make_indexscan(List *qptlist,
 	node->indexorderby = indexorderby;
 	node->indexorderbyorig = indexorderbyorig;
 	node->indexorderdir = indexscandir;
+	node->indexsortops = sortOperators;
 
 	return node;
 }
diff --git a/src/backend/utils/adt/geo_ops.c b/src/backend/utils/adt/geo_ops.c
index bc56b0a..85abfcf 100644
--- a/src/backend/utils/adt/geo_ops.c
+++ b/src/backend/utils/adt/geo_ops.c
@@ -2676,6 +2676,18 @@ dist_ppoly(PG_FUNCTION_ARGS)
 	PG_RETURN_FLOAT8(result);
 }
 
+Datum
+dist_polyp(PG_FUNCTION_ARGS)
+{
+	POLYGON    *poly = PG_GETARG_POLYGON_P(0);
+	Point	   *point = PG_GETARG_POINT_P(1);
+	float8		result;
+
+	result = dist_ppoly_internal(point, poly);
+
+	PG_RETURN_FLOAT8(result);
+}
+
 static double
 dist_ppoly_internal(Point *pt, POLYGON *poly)
 {
@@ -5092,6 +5104,21 @@ dist_pc(PG_FUNCTION_ARGS)
 	PG_RETURN_FLOAT8(result);
 }
 
+/*
+ * Distance from a circle to a point
+ */
+Datum
+dist_cpoint(PG_FUNCTION_ARGS)
+{
+	CIRCLE	   *circle = PG_GETARG_CIRCLE_P(0);
+	Point	   *point = PG_GETARG_POINT_P(1);
+	float8		result;
+
+	result = point_dt(point, &circle->center) - circle->radius;
+	if (result < 0)
+		result = 0;
+	PG_RETURN_FLOAT8(result);
+}
 
 /*		circle_center	-		returns the center point of the circle.
  */
diff --git a/src/include/access/genam.h b/src/include/access/genam.h
index d99158f..170069e 100644
--- a/src/include/access/genam.h
+++ b/src/include/access/genam.h
@@ -147,7 +147,10 @@ extern void index_restrpos(IndexScanDesc scan);
 extern ItemPointer index_getnext_tid(IndexScanDesc scan,
 				  ScanDirection direction);
 extern HeapTuple index_fetch_heap(IndexScanDesc scan);
+extern bool index_get_heap_values(IndexScanDesc scan, ItemPointer heapPtr,
+					Datum values[INDEX_MAX_KEYS], bool isnull[INDEX_MAX_KEYS]);
 extern HeapTuple index_getnext(IndexScanDesc scan, ScanDirection direction);
+
 extern int64 index_getbitmap(IndexScanDesc scan, TIDBitmap *bitmap);
 
 extern IndexBulkDeleteResult *index_bulk_delete(IndexVacuumInfo *info,
diff --git a/src/include/access/relscan.h b/src/include/access/relscan.h
index f2c7ca1..8cfd0c4 100644
--- a/src/include/access/relscan.h
+++ b/src/include/access/relscan.h
@@ -91,6 +91,15 @@ typedef struct IndexScanDescData
 	/* NB: if xs_cbuf is not InvalidBuffer, we hold a pin on that buffer */
 	bool		xs_recheck;		/* T means scan keys must be rechecked */
 
+	/*
+	 * If fetching with an ordering operator, the "distance" of the last
+	 * returned heap tuple according to the index. If xs_recheck is true,
+	 * this needs to be rechecked just like the scan keys, and the value
+	 * returned here is a lower-bound on the actual distance.
+	 */
+	Datum	   *xs_distances;
+	bool	   *xs_distance_nulls;
+
 	/* state data for traversing HOT chains in index_getnext */
 	bool		xs_continue_hot;	/* T if must keep walking HOT chain */
 }	IndexScanDescData;
diff --git a/src/include/catalog/pg_amop.h b/src/include/catalog/pg_amop.h
index 7165f54..a28e67d 100644
--- a/src/include/catalog/pg_amop.h
+++ b/src/include/catalog/pg_amop.h
@@ -650,6 +650,7 @@ DATA(insert (	2594   604 604 11 s 2577 783 0 ));
 DATA(insert (	2594   604 604 12 s 2576 783 0 ));
 DATA(insert (	2594   604 604 13 s 2861 783 0 ));
 DATA(insert (	2594   604 604 14 s 2860 783 0 ));
+DATA(insert (	2594   604 600 15 o 3588 783 1970 ));
 
 /*
  *	gist circle_ops
@@ -669,6 +670,7 @@ DATA(insert (	2595   718 718 11 s 1514 783 0 ));
 DATA(insert (	2595   718 718 12 s 2590 783 0 ));
 DATA(insert (	2595   718 718 13 s 2865 783 0 ));
 DATA(insert (	2595   718 718 14 s 2864 783 0 ));
+DATA(insert (	2595   718 600 15 o 3586 783 1970 ));
 
 /*
  * gin array_ops (these anyarray operators are used with all the opclasses
diff --git a/src/include/catalog/pg_amproc.h b/src/include/catalog/pg_amproc.h
index 309aee6..41a8011 100644
--- a/src/include/catalog/pg_amproc.h
+++ b/src/include/catalog/pg_amproc.h
@@ -205,6 +205,7 @@ DATA(insert (	2594   604 604 4 2580 ));
 DATA(insert (	2594   604 604 5 2581 ));
 DATA(insert (	2594   604 604 6 2582 ));
 DATA(insert (	2594   604 604 7 2584 ));
+DATA(insert (	2594   604 604 8 3589 ));
 DATA(insert (	2595   718 718 1 2591 ));
 DATA(insert (	2595   718 718 2 2583 ));
 DATA(insert (	2595   718 718 3 2592 ));
@@ -212,6 +213,7 @@ DATA(insert (	2595   718 718 4 2580 ));
 DATA(insert (	2595   718 718 5 2581 ));
 DATA(insert (	2595   718 718 6 2582 ));
 DATA(insert (	2595   718 718 7 2584 ));
+DATA(insert (	2595   718 718 8 3589 ));
 DATA(insert (	3655   3614 3614 1 3654 ));
 DATA(insert (	3655   3614 3614 2 3651 ));
 DATA(insert (	3655   3614 3614 3 3648 ));
diff --git a/src/include/catalog/pg_operator.h b/src/include/catalog/pg_operator.h
index 88c737b..63ae366 100644
--- a/src/include/catalog/pg_operator.h
+++ b/src/include/catalog/pg_operator.h
@@ -1014,9 +1014,13 @@ DATA(insert OID = 1520 (  "<->"   PGNSP PGUID b f f  718	718  701   1520    0 ci
 DESCR("distance between");
 DATA(insert OID = 1521 (  "#"	  PGNSP PGUID l f f  0		604   23	  0    0 poly_npoints - - ));
 DESCR("number of points");
-DATA(insert OID = 1522 (  "<->"   PGNSP PGUID b f f  600	718  701	  0    0 dist_pc - - ));
+DATA(insert OID = 1522 (  "<->"   PGNSP PGUID b f f  600	718  701   3586    0 dist_pc - - ));
 DESCR("distance between");
-DATA(insert OID = 3276 (  "<->"	  PGNSP PGUID b f f	 600	604	 701	  0	   0 dist_ppoly - - ));
+DATA(insert OID = 3586 (  "<->"   PGNSP PGUID b f f  718	600  701   1522    0 dist_cpoint - - ));
+DESCR("distance between");
+DATA(insert OID = 3276 (  "<->"	  PGNSP PGUID b f f	 600	604	 701   3588	   0 dist_ppoly - - ));
+DESCR("distance between");
+DATA(insert OID = 3588 (  "<->"	  PGNSP PGUID b f f  604 	600  701   3276	   0 dist_polyp - - ));
 DESCR("distance between");
 DATA(insert OID = 1523 (  "<->"   PGNSP PGUID b f f  718	604  701	  0    0 dist_cpoly - - ));
 DESCR("distance between");
diff --git a/src/include/catalog/pg_proc.h b/src/include/catalog/pg_proc.h
index eace352..11ac6fa 100644
--- a/src/include/catalog/pg_proc.h
+++ b/src/include/catalog/pg_proc.h
@@ -845,6 +845,8 @@ DATA(insert OID = 727 (  dist_sl		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 70
 DATA(insert OID = 728 (  dist_cpoly		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 701 "718 604" _null_ _null_ _null_ _null_	dist_cpoly _null_ _null_ _null_ ));
 DATA(insert OID = 729 (  poly_distance	   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 701 "604 604" _null_ _null_ _null_ _null_	poly_distance _null_ _null_ _null_ ));
 DATA(insert OID = 3275 (  dist_ppoly	   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 701 "600 604" _null_ _null_ _null_ _null_	dist_ppoly _null_ _null_ _null_ ));
+DATA(insert OID = 3587 (  dist_polyp	   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 701 "604 600" _null_ _null_ _null_ _null_	dist_polyp _null_ _null_ _null_ ));
+DATA(insert OID = 3585 (  dist_cpoint	   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 701 "718 600" _null_ _null_ _null_ _null_ 	dist_cpoint _null_ _null_ _null_ ));
 
 DATA(insert OID = 740 (  text_lt		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "25 25" _null_ _null_ _null_ _null_ text_lt _null_ _null_ _null_ ));
 DATA(insert OID = 741 (  text_le		   PGNSP PGUID 12 1 0 0 0 f f f f t f i 2 0 16 "25 25" _null_ _null_ _null_ _null_ text_le _null_ _null_ _null_ ));
@@ -4078,6 +4080,8 @@ DATA(insert OID = 2179 (  gist_point_consistent PGNSP PGUID 12 1 0 0 0 f f f f t
 DESCR("GiST support");
 DATA(insert OID = 3064 (  gist_point_distance	PGNSP PGUID 12 1 0 0 0 f f f f t f i 4 0 701 "2281 600 23 26" _null_ _null_ _null_ _null_	gist_point_distance _null_ _null_ _null_ ));
 DESCR("GiST support");
+DATA(insert OID = 3589 (  gist_bbox_distance	PGNSP PGUID 12 1 0 0 0 f f f f t f i 4 0 701 "2281 600 23 26" _null_ _null_ _null_ _null_	gist_bbox_distance _null_ _null_ _null_ ));
+DESCR("GiST support");
 
 /* GIN */
 DATA(insert OID = 2731 (  gingetbitmap	   PGNSP PGUID 12 1 0 0 0 f f f f t f v 2 0 20 "2281 2281" _null_ _null_ _null_ _null_	gingetbitmap _null_ _null_ _null_ ));
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 41b13b2..23e278c 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -17,6 +17,7 @@
 #include "access/genam.h"
 #include "access/heapam.h"
 #include "executor/instrument.h"
+#include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
 #include "utils/reltrigger.h"
@@ -1237,6 +1238,7 @@ typedef struct
  *	 IndexScanState information
  *
  *		indexqualorig	   execution state for indexqualorig expressions
+ *		indexorderbyorig   execution state for indexorderbyorig expressions
  *		ScanKeys		   Skey structures for index quals
  *		NumScanKeys		   number of ScanKeys
  *		OrderByKeys		   Skey structures for index ordering operators
@@ -1247,12 +1249,20 @@ typedef struct
  *		RuntimeContext	   expr context for evaling runtime Skeys
  *		RelationDesc	   index relation descriptor
  *		ScanDesc		   index scan descriptor
+ *
+ *		ReorderQueue	   queue of re-check tuples that need reordering
+ *		Distances		   re-checked distances of last fetched tuple
+ *		SortSupport		   for re-ordering ORDER BY exprs
+ *		ReachedEnd		   have we fetched all tuples from index already?
+ *		DistanceTypByVals  is the datatype of order by expression pass-by-value?
+ *		DistanceTypLens	   typlens of the datatypes of order by expressions
  * ----------------
  */
 typedef struct IndexScanState
 {
 	ScanState	ss;				/* its first field is NodeTag */
 	List	   *indexqualorig;
+	List	   *indexorderbyorig;
 	ScanKey		iss_ScanKeys;
 	int			iss_NumScanKeys;
 	ScanKey		iss_OrderByKeys;
@@ -1263,6 +1273,15 @@ typedef struct IndexScanState
 	ExprContext *iss_RuntimeContext;
 	Relation	iss_RelationDesc;
 	IndexScanDesc iss_ScanDesc;
+
+	/* These are needed for re-checking ORDER BY expr ordering */
+	pairingheap *iss_ReorderQueue;
+	Datum	   *iss_Distances;
+	bool	   *iss_DistanceNulls;
+	SortSupport iss_SortSupport;
+	bool	   *iss_DistanceTypByVals;
+	int16	   *iss_DistanceTypLens;
+	bool		iss_ReachedEnd;
 } IndexScanState;
 
 /* ----------------
diff --git a/src/include/nodes/plannodes.h b/src/include/nodes/plannodes.h
index 48203a0..2219ed6 100644
--- a/src/include/nodes/plannodes.h
+++ b/src/include/nodes/plannodes.h
@@ -302,7 +302,11 @@ typedef Scan SeqScan;
  * index column order.  Only the expressions are provided, not the auxiliary
  * sort-order information from the ORDER BY SortGroupClauses; it's assumed
  * that the sort ordering is fully determinable from the top-level operators.
- * indexorderbyorig is unused at run time, but is needed for EXPLAIN.
+ * indexorderbyorig is used at run time to recheck the ordering, if the index
+ * does not calculate an accurate ordering. It is also needed for EXPLAIN.
+ *
+ * indexsortops is an array of operators used to sort the ORDER BY expressions,
+ * used together with indexorderbyorig to recheck ordering at run time.
  * (Note these fields are used for amcanorderbyop cases, not amcanorder cases.)
  *
  * indexorderdir specifies the scan ordering, for indexscans on amcanorder
@@ -316,7 +320,8 @@ typedef struct IndexScan
 	List	   *indexqual;		/* list of index quals (usually OpExprs) */
 	List	   *indexqualorig;	/* the same in original form */
 	List	   *indexorderby;	/* list of index ORDER BY exprs */
-	List	   *indexorderbyorig;		/* the same in original form */
+	List	   *indexorderbyorig;	/* the same in original form */
+	Oid		   *indexsortops;	/* OIDs of operators to sort ORDER BY exprs */
 	ScanDirection indexorderdir;	/* forward or backward or don't care */
 } IndexScan;
 
diff --git a/src/include/utils/geo_decls.h b/src/include/utils/geo_decls.h
index 91610d8..a0a8abe 100644
--- a/src/include/utils/geo_decls.h
+++ b/src/include/utils/geo_decls.h
@@ -394,8 +394,10 @@ extern Datum circle_diameter(PG_FUNCTION_ARGS);
 extern Datum circle_radius(PG_FUNCTION_ARGS);
 extern Datum circle_distance(PG_FUNCTION_ARGS);
 extern Datum dist_pc(PG_FUNCTION_ARGS);
+extern Datum dist_cpoint(PG_FUNCTION_ARGS);
 extern Datum dist_cpoly(PG_FUNCTION_ARGS);
 extern Datum dist_ppoly(PG_FUNCTION_ARGS);
+extern Datum dist_polyp(PG_FUNCTION_ARGS);
 extern Datum circle_center(PG_FUNCTION_ARGS);
 extern Datum cr_circle(PG_FUNCTION_ARGS);
 extern Datum box_circle(PG_FUNCTION_ARGS);
@@ -419,6 +421,7 @@ extern Datum gist_circle_consistent(PG_FUNCTION_ARGS);
 extern Datum gist_point_compress(PG_FUNCTION_ARGS);
 extern Datum gist_point_consistent(PG_FUNCTION_ARGS);
 extern Datum gist_point_distance(PG_FUNCTION_ARGS);
+extern Datum gist_bbox_distance(PG_FUNCTION_ARGS);
 
 /* geo_selfuncs.c */
 extern Datum areasel(PG_FUNCTION_ARGS);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 5603817..cb18986 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -372,6 +372,36 @@ SELECT count(*) FROM radix_text_tbl WHERE t ~>~  'Worth
     48
 (1 row)
 
+SELECT * FROM gpolygon_tbl ORDER BY f1 <-> '(0,0)'::point LIMIT 10;
+                       f1                        
+-------------------------------------------------
+ ((240,359),(240,455),(337,455),(337,359))
+ ((662,163),(662,187),(759,187),(759,163))
+ ((1000,0),(0,1000))
+ ((0,1000),(1000,1000))
+ ((1346,344),(1346,403),(1444,403),(1444,344))
+ ((278,1409),(278,1457),(369,1457),(369,1409))
+ ((907,1156),(907,1201),(948,1201),(948,1156))
+ ((1517,971),(1517,1043),(1594,1043),(1594,971))
+ ((175,1820),(175,1850),(259,1850),(259,1820))
+ ((2424,81),(2424,160),(2424,160),(2424,81))
+(10 rows)
+
+SELECT * FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
+                f1                 
+-----------------------------------
+ <(288.5,407),68.2367203197809>
+ <(710.5,175),49.9624859269432>
+ <(323.5,1433),51.4417145903983>
+ <(927.5,1178.5),30.4384625104489>
+ <(1395,373.5),57.1948424248201>
+ <(1555.5,1007),52.7091073724456>
+ <(217,1835),44.5982062419555>
+ <(489,2421.5),22.3886131772381>
+ <(2424,120.5),39.5>
+ <(751.5,2655),20.4022057631032>
+(10 rows)
+
 -- Now check the results from plain indexscan
 SET enable_seqscan = OFF;
 SET enable_indexscan = ON;
@@ -1152,6 +1182,54 @@ SELECT count(*) FROM radix_text_tbl WHERE t ~>~  'Worth
     48
 (1 row)
 
+EXPLAIN (COSTS OFF)
+SELECT * FROM gpolygon_tbl ORDER BY f1 <-> '(0,0)'::point LIMIT 10;
+                     QUERY PLAN                      
+-----------------------------------------------------
+ Limit
+   ->  Index Scan using ggpolygonind on gpolygon_tbl
+         Order By: (f1 <-> '(0,0)'::point)
+(3 rows)
+
+SELECT * FROM gpolygon_tbl ORDER BY f1 <-> '(0,0)'::point LIMIT 10;
+                       f1                        
+-------------------------------------------------
+ ((240,359),(240,455),(337,455),(337,359))
+ ((662,163),(662,187),(759,187),(759,163))
+ ((1000,0),(0,1000))
+ ((0,1000),(1000,1000))
+ ((1346,344),(1346,403),(1444,403),(1444,344))
+ ((278,1409),(278,1457),(369,1457),(369,1409))
+ ((907,1156),(907,1201),(948,1201),(948,1156))
+ ((1517,971),(1517,1043),(1594,1043),(1594,971))
+ ((175,1820),(175,1850),(259,1850),(259,1820))
+ ((2424,81),(2424,160),(2424,160),(2424,81))
+(10 rows)
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
+                    QUERY PLAN                     
+---------------------------------------------------
+ Limit
+   ->  Index Scan using ggcircleind on gcircle_tbl
+         Order By: (f1 <-> '(200,300)'::point)
+(3 rows)
+
+SELECT * FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
+                f1                 
+-----------------------------------
+ <(288.5,407),68.2367203197809>
+ <(710.5,175),49.9624859269432>
+ <(323.5,1433),51.4417145903983>
+ <(927.5,1178.5),30.4384625104489>
+ <(1395,373.5),57.1948424248201>
+ <(1555.5,1007),52.7091073724456>
+ <(217,1835),44.5982062419555>
+ <(489,2421.5),22.3886131772381>
+ <(2424,120.5),39.5>
+ <(751.5,2655),20.4022057631032>
+(10 rows)
+
 -- Now check the results from bitmap indexscan
 SET enable_seqscan = OFF;
 SET enable_indexscan = OFF;
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index f779fa0..5df9008 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -224,6 +224,10 @@ SELECT count(*) FROM radix_text_tbl WHERE t >    'Worth
 
 SELECT count(*) FROM radix_text_tbl WHERE t ~>~  'Worth                         St  ';
 
+SELECT * FROM gpolygon_tbl ORDER BY f1 <-> '(0,0)'::point LIMIT 10;
+
+SELECT * FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
+
 -- Now check the results from plain indexscan
 SET enable_seqscan = OFF;
 SET enable_indexscan = ON;
@@ -437,6 +441,14 @@ EXPLAIN (COSTS OFF)
 SELECT count(*) FROM radix_text_tbl WHERE t ~>~  'Worth                         St  ';
 SELECT count(*) FROM radix_text_tbl WHERE t ~>~  'Worth                         St  ';
 
+EXPLAIN (COSTS OFF)
+SELECT * FROM gpolygon_tbl ORDER BY f1 <-> '(0,0)'::point LIMIT 10;
+SELECT * FROM gpolygon_tbl ORDER BY f1 <-> '(0,0)'::point LIMIT 10;
+
+EXPLAIN (COSTS OFF)
+SELECT * FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
+SELECT * FROM gcircle_tbl ORDER BY f1 <-> '(200,300)'::point LIMIT 10;
+
 -- Now check the results from bitmap indexscan
 SET enable_seqscan = OFF;
 SET enable_indexscan = OFF;
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to