Re: [HACKERS] WIP: Covering + unique indexes.

Andres Freund Thu, 30 Mar 2017 12:12:32 -0700

On 2017-03-30 18:26:05 +0300, Teodor Sigaev wrote:
> Any objection from reviewers to push both patches?



> diff --git a/contrib/bloom/blutils.c b/contrib/bloom/blutils.c
> index f2eda67..59029b9 100644
> --- a/contrib/bloom/blutils.c
> +++ b/contrib/bloom/blutils.c
> @@ -120,6 +120,7 @@ blhandler(PG_FUNCTION_ARGS)
>       amroutine->amclusterable = false;
>       amroutine->ampredlocks = false;
>       amroutine->amcanparallel = false;
> +     amroutine->amcaninclude = false;

That name doesn't strike me as very descriptive.


> +      <term><literal>INCLUDE</literal></term>
> +      <listitem>
> +       <para>
> +        An optional <literal>INCLUDE</> clause allows a list of columns to be
> +        specified which will be included in the non-key portion of the index.
> +        Columns which are part of this clause cannot also exist in the
> +        key columns portion of the index, and vice versa. The
> +        <literal>INCLUDE</> columns exist solely to allow more queries to 
> benefit
> +        from <firstterm>index-only scans</> by including certain columns in 
> the
> +        index, the value of which would otherwise have to be obtained by 
> reading
> +        the table's heap. Having these columns in the <literal>INCLUDE</> 
> clause
> +        in some cases allows <productname>PostgreSQL</> to skip the heap read
> +        completely. This also allows <literal>UNIQUE</> indexes to be 
> defined on
> +        one set of columns, which can include another set of columns in the
> +       <literal>INCLUDE</> clause, on which the uniqueness is not enforced.
> +        It's the same with other constraints (PRIMARY KEY and EXCLUDE). This 
> can
> +        also can be used for non-unique indexes as any columns which are not 
> required
> +        for the searching or ordering of records can be used in the
> +        <literal>INCLUDE</> clause, which can slightly reduce the size of 
> the index.
> +        Currently, only the B-tree access method supports this feature.
> +        Expressions as included columns are not supported since they cannot 
> be used
> +        in index-only scans.
> +       </para>
> +      </listitem>
> +     </varlistentry>

This could use some polishing.


> +/*
> + * Reform index tuple. Truncate nonkey (INCLUDE) attributes.
> + */
> +IndexTuple
> +index_truncate_tuple(Relation idxrel, IndexTuple olditup)
> +{
> +     TupleDesc   itupdesc = RelationGetDescr(idxrel);
> +     Datum       values[INDEX_MAX_KEYS];
> +     bool        isnull[INDEX_MAX_KEYS];
> +     IndexTuple      newitup;
> +     int indnatts = IndexRelationGetNumberOfAttributes(idxrel);
> +     int indnkeyatts = IndexRelationGetNumberOfKeyAttributes(idxrel);
> +
> +     Assert(indnatts <= INDEX_MAX_KEYS);
> +     Assert(indnkeyatts > 0);
> +     Assert(indnkeyatts < indnatts);
> +
> +     index_deform_tuple(olditup, itupdesc, values, isnull);
> +
> +     /* form new tuple that will contain only key attributes */
> +     itupdesc->natts = indnkeyatts;
> +     newitup = index_form_tuple(itupdesc, values, isnull);
> +     newitup->t_tid = olditup->t_tid;
> +
> +     itupdesc->natts = indnatts;

Uh, isn't this a *seriously* bad idea?  If index_form_tuple errors out,
this'll corrupt the tuple descriptor.


Maybe also rename the function to index_build_key_tuple()?

>   * Construct a string describing the contents of an index entry, in the
>   * form "(key_name, ...)=(key_value, ...)".  This is currently used
> - * for building unique-constraint and exclusion-constraint error messages.
> + * for building unique-constraint and exclusion-constraint error messages,
> + * so only key columns of index are checked and printed.

s/index/the index/


> @@ -368,7 +370,7 @@ systable_beginscan(Relation heapRelation,
>               {
>                       int                     j;
>  
> -                     for (j = 0; j < irel->rd_index->indnatts; j++)
> +                     for (j = 0; j < 
> IndexRelationGetNumberOfAttributes(irel); j++)

>                       {
>                               if (key[i].sk_attno == 
> irel->rd_index->indkey.values[j])
>                               {
> @@ -376,7 +378,7 @@ systable_beginscan(Relation heapRelation,
>                                       break;
>                               }
>                       }
> -                     if (j == irel->rd_index->indnatts)
> +                     if (j == IndexRelationGetNumberOfAttributes(irel))
>                               elog(ERROR, "column is not in index");
>               }

Not that it matters overly much, but why are we doing this for all
attributes, rather than just key attributes?


> --- a/src/backend/bootstrap/bootstrap.c
> +++ b/src/backend/bootstrap/bootstrap.c
> @@ -600,7 +600,7 @@ boot_openrel(char *relname)
>                relname, (int) ATTRIBUTE_FIXED_PART_SIZE);
>  
>       boot_reldesc = heap_openrv(makeRangeVar(NULL, relname, -1), NoLock);
> -     numattr = boot_reldesc->rd_rel->relnatts;
> +     numattr = RelationGetNumberOfAttributes(boot_reldesc);
>       for (i = 0; i < numattr; i++)
>       {
>               if (attrtypes[i] == NULL)

That seems a bit unrelated.


> @@ -2086,7 +2086,8 @@ StoreRelCheck(Relation rel, char *ccname, Node *expr,
>                                                         is_validated,
>                                                         
> RelationGetRelid(rel),        /* relation */
>                                                         attNos,       /* 
> attrs in the constraint */
> -                                                       keycount, /* # attrs 
> in the constraint */
> +                                                       keycount, /* # key 
> attrs in the constraint */
> +                                                       keycount, /* # total 
> attrs in the constraint */
>                                                         InvalidOid,           
> /* not a domain constraint */
>                                                         InvalidOid,           
> /* no associated index */
>                                                         InvalidOid,           
> /* Foreign key fields */

It doesn't quite seem right to me to store this both in pg_index and
pg_constraint.



> @@ -340,14 +341,27 @@ DefineIndex(Oid relationId,
>       numberOfAttributes = list_length(stmt->indexParams);
> -     if (numberOfAttributes <= 0)
> -             ereport(ERROR,
> -                             (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
> -                              errmsg("must specify at least one column")));
> +

Huh, why's that check gone?

>  
> +opt_c_include:       INCLUDE '(' columnList ')'                      { $$ = 
> $3; }
> +                      |              /* EMPTY */                             
>                 { $$ = NIL; }
> +             ;

> +opt_include:         INCLUDE '(' index_including_params ')'                  
> { $$ = $3; }
> +                      |              /* EMPTY */                             
>                 { $$ = NIL; }
> +             ;
> +
> +index_including_params:      index_elem                                      
>         { $$ = list_make1($1); }
> +                     | index_including_params ',' index_elem         { $$ = 
> lappend($1, $3); }
> +             ;
> +

Why do we have multiple different definitions of this?


> @@ -1979,6 +2017,48 @@ transformIndexConstraint(Constraint *constraint, 
> CreateStmtContext *cxt)
>               index->indexParams = lappend(index->indexParams, iparam);
>       }
>  
> +     /* Here is some code duplication. But we do need it. */

Aha?


> +     foreach(lc, constraint->including)
> +     {
> +             char       *key = strVal(lfirst(lc));
> +             bool            found = false;
> +             ColumnDef  *column = NULL;
> +             ListCell   *columns;
> +             IndexElem  *iparam;
> +
> +             foreach(columns, cxt->columns)
> +             {
> +                     column = (ColumnDef *) lfirst(columns);
> +                     Assert(IsA(column, ColumnDef));
> +                     if (strcmp(column->colname, key) == 0)
> +                     {
> +                             found = true;
> +                             break;
> +                     }
> +             }
> +
> +             /*
> +              * In the ALTER TABLE case, don't complain about index keys not
> +              * created in the command; they may well exist already. 
> DefineIndex
> +              * will complain about them if not, and will also take care of 
> marking
> +              * them NOT NULL.
> +              */

Uh. Why should they be marked as NOT NULL? ISTM the comment has been
copied here without adjustments.



> @@ -1275,6 +1275,21 @@ pg_get_indexdef_worker(Oid indexrelid, int colno,
>               Oid                     keycoltype;
>               Oid                     keycolcollation;
>  
> +             /*
> +              * attrsOnly flag is used for building unique-constraint and
> +              * exclusion-constraint error messages. Included attrs are
> +              * meaningless there, so do not include them in the message.
> +              */
> +             if (attrsOnly && keyno >= idxrec->indnkeyatts)
> +                     break;

Sounds like the parameter should be renamed then.



> +Included attributes in B-tree indexes
> +-------------------------------------
> +
> +Since 10.0 there is an optional INCLUDE clause, that allows to add

10.0 isn't right, since that's the "patch" version now.


> +a portion of non-key attributes to index. They exist to allow more queries
> +to benefit from index-only scans. We never use included attributes in
> +ScanKeys, neither for search nor for inserts. That allows us to include
> +into B-tree any datatypes, even those which don't have suitable opclass.
> +Included columns only stored in regular items on leaf pages. All inner
> +keys and high keys are truncated and contain only key attributes.
> +That helps to reduce the size of index.

s/index/the index/



> @@ -537,6 +542,28 @@ _bt_buildadd(BTWriteState *wstate, BTPageState *state, 
> IndexTuple itup)
>               ItemIdSetUnused(ii);    /* redundant */
>               ((PageHeader) opage)->pd_lower -= sizeof(ItemIdData);
>  
> +             if (indnkeyatts != indnatts && P_ISLEAF(opageop))
> +             {
> +                     /*
> +                      * It's essential to truncate High key here.
> +                      * The purpose is not just to save more space on this 
> particular page,
> +                      * but to keep whole b-tree structure consistent. 
> Subsequent insertions
> +                      * assume that hikey is already truncated, and so they 
> should not
> +                      * worry about it, when copying the high key into the 
> parent page
> +                      * as a downlink.

s/should/need/

> +                      * NOTE It is not crutial for reliability in present,

s/crutial/crucial/

> +                      * but maybe it will be that in the future.
> +                      */

"it's essential" ... "it is not crutial" -- that's contradictory.

> +                     keytup = index_truncate_tuple(wstate->index, oitup);

The code in _bt_split previously claimed that it's the only place doing
truncation...


> +                     /*  delete "wrong" high key, insert keytup as P_HIKEY. 
> */
> +                     PageIndexTupleDelete(opage, P_HIKEY);

> +                     if (!_bt_pgaddtup(opage, IndexTupleSize(keytup), 
> keytup, P_HIKEY))
> +                             elog(ERROR, "failed to rewrite compressed item 
> in index \"%s\"",
> +                                     RelationGetRelationName(wstate->index));

Hm...


- Andres


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Covering + unique indexes.

Reply via email to