Re: ALTER tbl rewrite loses CLUSTER ON index

Michael Paquier Mon, 16 Mar 2020 22:34:28 -0700

On Mon, Mar 16, 2020 at 11:25:23AM -0300, Alvaro Herrera wrote:
> On 2020-Mar-16, Justin Pryzby wrote:
> 
> > Also, should we call it "is_index_clustered", since otherwise it sounds alot
> > like "+get_index_clustered" (without "is"), which sounds like it takes a 
> > table
> > and returns which index is clustered.  That might be just as useful for 
> > some of
> > these callers.
> 
> Amit's proposed name seems to match lsyscache.c usual conventions better.


There were no get_index_isvalid() (introduced by me) or
get_index_isreplident() (introduced by Peter) until last week, and
those names have been chosen to be consistent with the existing
get_index_column_opclass(), so using get_index_isclustered is in my
opinion the most consistent choice.

> Yeah, in cluster(), mark_index_clustered().

Patch 0002 from Justin does that, I would keep this refactoring as
HEAD-only material though, and I don't spot any other code paths in
need of patching.

The commit message of patch 0001 is not what you wanted I guess.

     if (OidIsValid(indexOid))
     {
-        indexTuple = SearchSysCache1(INDEXRELID, ObjectIdGetDatum(indexOid));
-        if (!HeapTupleIsValid(indexTuple))
-            elog(ERROR, "cache lookup failed for index %u", indexOid);
-        indexForm = (Form_pg_index) GETSTRUCT(indexTuple);
-
-        if (indexForm->indisclustered)
-        {
-            ReleaseSysCache(indexTuple);
+        if (get_index_isclustered(indexOid))
             return;
-        }
-
-        ReleaseSysCache(indexTuple);
     }
No need for two layers of if(s) here.

+create index alttype_cluster_a on alttype_cluster (a);
+alter table alttype_cluster cluster on alttype_cluster_a;
+select indisclustered from pg_index where indrelid = 
'alttype_cluster'::regclass;

Would it make sense to add a second index not used for clustering to
check after the case where isclustered is false?  A second thing would
be to check if relfilenode values match before and after each DDL to
make sure that a rewrite happened or not, see check_ddl_rewrite() for
example in alter_table.sql.

Keeping both RememberClusterOnForRebuilding and
RememberReplicaIdentityForRebuilding as separate looks fine to me. 
The code could be a bit more organized though.  We have
RememberIndexForRebuilding which may go through
RememberConstraintForRebuilding if the relation OID changed is a
constraint, and both register the replindent or isclustered
information if present.  Not really something for this patch set to
care about, just a thought while reading this code.

While looking at this bug, I have spotted a behavior which is perhaps
not welcome.  Take this test case:
create table aa (a int);
insert into aa values (1), (1);
create unique index concurrently aai on aa(a); -- fails
alter table aa alter column a type bigint;

This generates the following error:
ERROR:  23505: could not create unique index "aai"
DETAIL:  Key (a)=(1) is duplicated.
SCHEMA NAME:  public
TABLE NAME:  aa
CONSTRAINT NAME:  aai
LOCATION:  comparetup_index_btree, tuplesort.c:4049

After a REINDEX CONCURRENTLY, we may leave behind an invalid index
on the relation's toast table or even normal indexes.  CREATE INDEX
CONCURRENTLY may also leave behind invalid indexes.  If triggering an
ALTER TABLE that causes a rewrite of the relation, we have the
following behavior:
- An invalid toast index is correctly discarded, keeping one valid
toast index.  No problem here.
- Invalid non-toast indexes are all rebuilt.  If the index relies on a
constraint then ALTER TABLE would fail, like the above.

I am wondering if there is an argument for not including invalid
indexes in the rebuilt version, even if the existing behavior may be
useful for some users.
--
Michael

signature.asc
Description: PGP signature

Re: ALTER tbl rewrite loses CLUSTER ON index

Reply via email to