I am trying to understand the rangetypes spgist code and its interaction
with empty ranges. (Slightly embarrassing, because I reviewed the code.)
I see an old email here:

http://www.postgresql.org/message-id/50145a9c.7080...@enterprisedb.com

But still don't have a clear picture.

What I don't understand is:
1. Under what conditions might a tuple have no prefix (centroid), yet
also *not* be allTheSame?
2. Why would any tuple have 2 nodes? If there are some non-empty ranges,
why not make a centroid and have 4 or 5 nodes?

I added a bunch of assertions that seem reasonable to me based on my
understanding of the structure, and I attached them as a patch. They all
pass regression, which has a non-trivial set of input ranges, so there
is a reasonable chance the assertions are generally true. But if they
are true, some refactoring is in order.

It seems like the structure should be something like:

* Empty and non-empty ranges are only mixed in the root (level == 0).
* If a tuple has any non-empty ranges, there's a prefix (centroid).
AllTheSame may or may not be set (ordinarily not).
* If a tuple has only empty ranges, there's no prefix, and allTheSame is
set.
* The root tuple may have 5 nodes if there are any non-empty ranges,
otherwise 1 node (and allTheSame is set).
* Inner tuples may have either:
  - one node if it contains only empty ranges, in which case it is
allTheSame, and must be in the 5th node (quadrant) of the root node, and
be at level 1
  - four nodes if there are any non-empty ranges, in which case it has
*no* empty ranges

Note: I am using "allTheSame" in the logical sense; perhaps the flag is
an internal optimization that we can't rely on.

Regards,
        Jeff Davis

*** a/src/backend/utils/adt/rangetypes_spgist.c
--- b/src/backend/utils/adt/rangetypes_spgist.c
***************
*** 104,109 **** getQuadrant(TypeCacheEntry *typcache, RangeType *centroid, RangeType *tst)
--- 104,112 ----
  
  	range_deserialize(typcache, centroid, &centroidLower, &centroidUpper,
  					  &centroidEmpty);
+ 
+ 	Assert(!centroidEmpty);
+ 
  	range_deserialize(typcache, tst, &lower, &upper, &empty);
  
  	if (empty)
***************
*** 138,143 **** spg_range_quad_choose(PG_FUNCTION_ARGS)
--- 141,158 ----
  	int16		quadrant;
  	TypeCacheEntry *typcache;
  
+ 	/* if there is no prefix, they must be all the same */
+ 	Assert(in->hasPrefix || in->allTheSame);
+ 	/*
+ 	 * If there are two nodes, it must either be the root or at level 1, and
+ 	 * there must be no prefix.
+ 	 */
+ 	Assert(in->nNodes != 2 ||
+ 		   (!in->hasPrefix && (in->level == 0 || in->level == 1)));
+ 	/* if there are 4 or 5 nodes, there must be a prefix */
+ 	Assert(in->nNodes != 4 || in->hasPrefix);
+ 	Assert(in->nNodes != 5 || (in->hasPrefix && in->level == 0));
+ 
  	if (in->allTheSame)
  	{
  		out->resultType = spgMatchNode;
***************
*** 156,161 **** spg_range_quad_choose(PG_FUNCTION_ARGS)
--- 171,178 ----
  	 */
  	if (!in->hasPrefix)
  	{
+ 		Assert(false);
+ 
  		out->resultType = spgMatchNode;
  		if (RangeIsEmpty(inRange))
  			out->result.matchNode.nodeN = 0;
***************
*** 232,237 **** spg_range_quad_picksplit(PG_FUNCTION_ARGS)
--- 249,261 ----
  	nonEmptyCount = j;
  
  	/*
+ 	 * Only the root mixes empty and non-empty tuples.
+ 	 */
+ 	Assert(nonEmptyCount == 0 ||
+ 		   nonEmptyCount == in->nTuples ||
+ 		   in->level == 0);
+ 
+ 	/*
  	 * All the ranges are empty. The best we can do is to construct an inner
  	 * node with no centroid, and put all ranges into node 0. If non-empty
  	 * ranges are added later, they will be routed to node 1.
***************
*** 315,320 **** spg_range_quad_inner_consistent(PG_FUNCTION_ARGS)
--- 339,356 ----
  	 */
  	bool		needPrevious = false;
  
+ 	/* if there is no prefix, they must be all the same */
+ 	Assert(in->hasPrefix || in->allTheSame);
+ 	/*
+ 	 * If there are two nodes, it must either be the root or at level 1, and
+ 	 * there must be no prefix.
+ 	 */
+ 	Assert(in->nNodes != 2 ||
+ 		   (!in->hasPrefix && (in->level == 0 || in->level == 1)));
+ 	/* if there are 4 or 5 nodes, there must be a prefix */
+ 	Assert(in->nNodes != 4 || in->hasPrefix);
+ 	Assert(in->nNodes != 5 || (in->hasPrefix && in->level == 0));
+ 
  	if (in->allTheSame)
  	{
  		/* Report that all nodes should be visited */
***************
*** 327,332 **** spg_range_quad_inner_consistent(PG_FUNCTION_ARGS)
--- 363,370 ----
  
  	if (!in->hasPrefix)
  	{
+ 		Assert(false);
+ 
  		/*
  		 * No centroid on this inner node. Such a node has two child nodes,
  		 * the first for empty ranges, and the second for non-empty ones.
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to