> On May 13, 2026, at 00:48, Antonin Houska <[email protected]> wrote: > > Álvaro Herrera <[email protected]> wrote: > >> On 2026-May-11, Chao Li wrote: >> >>>> On May 10, 2026, at 06:38, Álvaro Herrera <[email protected]> wrote: >> >>>> I think it would be a good idea to make identity_key_equal() not deform >>>> all attributes, but instead only up to the last one it needs for the key >>>> comparisons. >>> >>> That’s true. Please see v3. >> >> Thanks. I did one further small change, namely to determine these last >> attnums just once per run rather than once per tuple. Pushed now. > > I appreciate that REPACK can handle more cases now! However, I found a problem > (or at least a question) when rebasing the improvements for the next > release(s). (It's related to splitting the table scan into multiple block > ranges and use one snapshot per range, details are not too important here, ) > Assertion failure in the new code made me think if other than B-tree indexes > should be allowed in the USING INDEX clause of REPACK. > > AFAICS, only B-tree indexes (and some special ones that don't appear in the > core) provide ordering information - see get_relation_info(): > > /* > * Fetch the ordering information for the index, if any. > */ > if (info->relam == BTREE_AM_OID) > { > ... > info->sortopfamily = info->opfamily; > ... > } > else if (amroutine->amcanorder) > { > /* > * Otherwise, identify the corresponding btree opfamilies > * by trying to map this index's "<" operators into btree. > * Since "<" uniquely defines the behavior of a sort > * order, this is a sufficient test. > ... > } > else > { > ... > info->sortopfamily = NULL; > ... > } > > > Therefore, index scan shouldn't be possible for GIST index - see > build_index_paths(): > > index_is_ordered = (index->sortopfamily != NULL); > > > So I'm not sure if clustering makes sense here. What makes me confused is that > GIST has IndexAmRoutine.amclusterable=true. As it has amcanorder=false at the > same time, I suspect it might be just a thinko. However, if we simply set > amclusterable to false, it can break upgrade to PG 19 for users who already > "clustered" some table by a GIST index (for mysterious reasons). (BTW, do we > need the amclusterable field at all?) > > REPACK currently rejects explicit sort if non-B-tree index is specified (due > to lack of ordering information), but it still scans the index rather than > the heap - see copy_table_data() and heapam_relation_copy_for_cluster(). > > Does this seem worth fixing now? Or maybe at least worth some comments (unless > I'm completely wrong)?
After some investigation, I think I see the mismatch: * get_relation_info(): non-ordered GiST cannot provide sort order. That is expected. * copy_table_data() only uses plan_cluster_use_sort() for btree. For any other clusterable index, it sets use_sort = false and does a raw index scan. * The docs say REPACK can re-sort using index scan “if the index is a b-tree” or seqscan+sort, which does not describe what the code actually does for GiST. I am not sure whether we should change the behavior in PG19. Alvaro may have a better idea about that. But I agree that we can at least clarify the code comment and documentation. The attached patch attempts to do that. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
repack_comment.diff
Description: Binary data
