Re: Index Skip Scan

Tomas Vondra Sun, 10 Nov 2019 13:19:26 -0800

Hi,

I've looked at the patch again - in general it seems in pretty good
shape, all the issues I found are mostly minor.


Firstly, I'd like to point out that not all of the things I complained
about in my 2019/06/23 review got addressed. Those were mostly related
to formatting and code style, and the attached patch fixes some (but
maybe not all) of them.

The patch also tweaks wording of some bits in the docs and comments that
I found unclear. Would be nice if a native speaker could take a look.

A couple more comments:


1) pathkey_is_unique

The one additional issue I found is pathkey_is_unique - it's not really
explained what "unique" means and hot it's different from "redundant"
(which has quite a long explanation before pathkey_is_redundant).

My understanding is that pathkey is "unique" when it's EC does not match
an EC of another pathkey in the list. But if that's the case, then the
function name is wrong - it does exactly the opposite (i.e. it returns
'true' when the pathkey is *not* unique).


2) explain

I wonder if we should print the "Skip scan" info always, or similarly to
"Inner Unique" which does this:

/* try not to be too chatty about this in text mode */
    if (es->format != EXPLAIN_FORMAT_TEXT ||
        (es->verbose && ((Join *) plan)->inner_unique))
        ExplainPropertyBool("Inner Unique",
                            ((Join *) plan)->inner_unique,
                            es);
    break;

I'd do the same thing for skip scan - print it only in verbose mode, or
when using non-text output format.


3) There's an annoying limitation that for this to kick in, the order of
expressions in the DISTINCT clause has to match the index, i.e. with
index on (a,b,c) the skip scan will only kick in for queries using

   DISTINCT a
   DISTINCT a,b
   DISTINCT a,b,c

but not e.g. DISTINCT a,c,b. I don't think there's anything forcing us
to sort result of DISTINCT in any particular case, except that we don't
consider the other orderings "interesting" so we don't really consider
using the index (so no chance of using the skip scan).

That leads to pretty annoying speedups/slowdowns due to seemingly
irrelevant changes:

-- everything great, a,b,c matches an index
test=# explain (analyze, verbose) select distinct a,b,c from t;

QUERY PLAN-------------------------------------------------------------------------------------------------------------------------------------

 Index Only Scan using t_a_b_c_idx on public.t  (cost=0.42..565.25 rows=1330 
width=12) (actual time=0.016..10.387 rows=1331 loops=1)
   Output: a, b, c
   Skip scan: true
   Heap Fetches: 1331
 Planning Time: 0.106 ms
 Execution Time: 10.843 ms
(6 rows)

-- slow, mismatch with index
test=# explain (analyze, verbose) select distinct a,c,b from t;

QUERY PLAN---------------------------------------------------------------------------------------------------------------------------

 HashAggregate  (cost=22906.00..22919.30 rows=1330 width=12) (actual 
time=802.067..802.612 rows=1331 loops=1)
   Output: a, c, b
   Group Key: t.a, t.c, t.b
   ->  Seq Scan on public.t  (cost=0.00..15406.00 rows=1000000 width=12) 
(actual time=0.010..355.361 rows=1000000 loops=1)
         Output: a, b, c
 Planning Time: 0.076 ms
 Execution Time: 803.078 ms
(7 rows)

-- fast again, the extra ordering allows using the index again
test=# explain (analyze, verbose) select distinct a,c,b from t order by a,b,c;

QUERY PLAN-------------------------------------------------------------------------------------------------------------------------------------

 Index Only Scan using t_a_b_c_idx on public.t  (cost=0.42..565.25 rows=1330 
width=12) (actual time=0.035..12.120 rows=1331 loops=1)
   Output: a, c, b
   Skip scan: true
   Heap Fetches: 1331
 Planning Time: 0.053 ms
 Execution Time: 12.632 ms
(6 rows)


This is a more generic issue, not specific to this patch, of course. I
think we saw it with the incremental sort patch, IIRC. I wonder how
difficult would it be to fix this here (not necessarily in v1).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 73b1b4fcf7..94e09835b4 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -144,7 +144,7 @@ typedef struct IndexAmRoutine
     amendscan_function amendscan;
     ammarkpos_function ammarkpos;       /* can be NULL */
     amrestrpos_function amrestrpos;     /* can be NULL */
-    amskip_function amskip;     /* can be NULL */
+    amskip_function amskip;             /* can be NULL */
 
     /* interface functions to support parallel index scans */
     amestimateparallelscan_function amestimateparallelscan;    /* can be NULL 
*/
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 567141046f..efc5e41389 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -1248,15 +1248,14 @@ SELECT target FROM tests WHERE subject = 'some-subject' 
AND success;
     </indexterm>
 
     <para>
-     In case if index scan is used to retrieve the distinct values of a column
-     efficiently, it can be not very efficient, since it requires to scan all
-     the equal values of a key. In such cases planner will consider apply index
-     skip scan approach, which is based on the idea of
-     <firstterm>Loose index scan</firstterm>. Rather than scanning all equal
-     values of a key, as soon as a new value is found, it will search for a
-     larger value on the same index page, and if not found, restart the search
-     by looking for a larger value. This is much faster when the index has many
-     equal keys.
+     When the rows retrieved from an index scan are then deduplicated by
+     eliminating rows matching on a prefix of index keys (e.g. when using
+     <literal>SELECT DISTINCT</literal>), the planner will consider
+     skipping groups of rows with a matching key prefix. When a row with
+     a particular prefix is found, remaining rows with the same key prefix
+     are skipped.  The larger the number of rows with the same key prefix
+     rows (i.e. the lower the number of distinct key prefixes in the index),
+     the more efficient this is.
     </para>
   </sect2>
  </sect1>
diff --git a/src/backend/access/nbtree/nbtsearch.c 
b/src/backend/access/nbtree/nbtsearch.c
index 72dcd4d734..74786bf4a2 100644
--- a/src/backend/access/nbtree/nbtsearch.c
+++ b/src/backend/access/nbtree/nbtsearch.c
@@ -1403,6 +1403,8 @@ _bt_next(IndexScanDesc scan, ScanDirection dir)
  *
  *             * Advancing backward and reading backward
  *                     simple scan with order by desc
+ *
+ * TODO Explain why we search the current page first, then do lookup from root?
  */
 bool
 _bt_skip(IndexScanDesc scan, ScanDirection dir,
@@ -1556,7 +1558,7 @@ _bt_skip(IndexScanDesc scan, ScanDirection dir,
        }
 
        /*
-        * Andvance forward but read backward. At this moment we are at the next
+        * Advance forward but read backward. At this moment we are at the next
         * distinct key at the beginning of the series. In case if scan just
         * started, we can go one step back and read forward without doing
         * anything else. Otherwise find the next distinct key and the beginning
@@ -1564,7 +1566,7 @@ _bt_skip(IndexScanDesc scan, ScanDirection dir,
         *
         * An interesting situation can happen if one of distinct keys do not 
pass
         * a corresponding index condition at all. In this case reading backward
-        * can lead to a previous distinc key being found, creating a loop. To
+        * can lead to a previous distinct key being found, creating a loop. To
         * avoid that check the value to be returned, and jump one more time if
         * it's the same as at the beginning.
         */
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index ad500de12b..b66296d6c9 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -130,7 +130,7 @@ static void ExplainDummyGroup(const char *objtype, const 
char *labelname,
 static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
-static void ExplainIndexSkipScanKeys(ExplainState *es, int skipPrefixSize);
+static void ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
 
 
@@ -1049,7 +1049,7 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset 
**rels_used)
  * Can be used to print the skip prefix size.
  */
 static void
-ExplainIndexSkipScanKeys(ExplainState *es, int skipPrefixSize)
+ExplainIndexSkipScanKeys(int skipPrefixSize, ExplainState *es)
 {
        if (skipPrefixSize > 0)
        {
@@ -1380,7 +1380,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
                        {
                                IndexScan  *indexscan = (IndexScan *) plan;
 
-                               ExplainIndexSkipScanKeys(es, 
indexscan->indexskipprefixsize);
+                               
ExplainIndexSkipScanKeys(indexscan->indexskipprefixsize, es);
 
                                ExplainIndexScanDetails(indexscan->indexid,
                                                                                
indexscan->indexorderdir,
@@ -1392,7 +1392,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
                        {
                                IndexOnlyScan *indexonlyscan = (IndexOnlyScan 
*) plan;
 
-                               ExplainIndexSkipScanKeys(es, 
indexonlyscan->indexskipprefixsize);
+                               
ExplainIndexSkipScanKeys(indexonlyscan->indexskipprefixsize, es);
 
                                ExplainIndexScanDetails(indexonlyscan->indexid,
                                                                                
indexonlyscan->indexorderdir,
@@ -1604,9 +1604,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
        {
                case T_IndexScan:
                        if (((IndexScan *) plan)->indexskipprefixsize > 0)
-                       {
-                               ExplainPropertyBool("Skip scan mode", true, es);
-                       }
+                               ExplainPropertyBool("Skip scan", true, es);
                        show_scan_qual(((IndexScan *) plan)->indexqualorig,
                                                   "Index Cond", planstate, 
ancestors, es);
                        if (((IndexScan *) plan)->indexqualorig)
@@ -1621,9 +1619,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
                        break;
                case T_IndexOnlyScan:
                        if (((IndexOnlyScan *) plan)->indexskipprefixsize > 0)
-                       {
-                               ExplainPropertyBool("Skip scan mode", true, es);
-                       }
+                               ExplainPropertyBool("Skip scan", true, es);
                        show_scan_qual(((IndexOnlyScan *) plan)->indexqual,
                                                   "Index Cond", planstate, 
ancestors, es);
                        if (((IndexOnlyScan *) plan)->indexqual)
diff --git a/src/backend/optimizer/path/pathkeys.c 
b/src/backend/optimizer/path/pathkeys.c
index 70c1df47a4..5683edcd2f 100644
--- a/src/backend/optimizer/path/pathkeys.c
+++ b/src/backend/optimizer/path/pathkeys.c
@@ -99,9 +99,12 @@ make_canonical_pathkey(PlannerInfo *root,
 
 /*
  * pathkey_is_unique
- *        Part of pathkey_is_redundant, that is reponsible for the case, when 
the
- *        new pathkey's equivalence class is the same as that of any existing
- *        member of the pathkey list.
+ *        Checks if the new pathkey's equivalence class is the same as that of
+ *     any existing member of the pathkey list.
+ *
+ * FIXME This seems to be misnamed, i.e. it returns true/false in a way
+ * that contradicts the name. If there already is a matching EC. it says
+ * true, but that means "not unique" no?
  */
 static bool
 pathkey_is_unique(PathKey *new_pathkey, List *pathkeys)
@@ -165,6 +168,10 @@ pathkey_is_redundant(PathKey *new_pathkey, List *pathkeys)
        if (EC_MUST_BE_REDUNDANT(new_ec))
                return true;
 
+       /*
+     * FIXME misnamed - this returns true when there is a matching EC, which
+     * however should ne "not unique" I think.
+     */
        return pathkey_is_unique(new_pathkey, pathkeys);
 }
 
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 644b8ad356..3f625938e8 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -3634,8 +3634,8 @@ standard_qp_callback(PlannerInfo *root, void *extra)
        }
        else
        {
-               root->distinct_pathkeys = NIL;
                root->uniq_distinct_pathkeys = NIL;
+               root->distinct_pathkeys = NIL;
        }
 
        root->sort_pathkeys =

Re: Index Skip Scan

Reply via email to