On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada <yamada.tats...@lab.ntt.co.jp> wrote: > > On 2019/03/06 15:38, Tatsuro Yamada wrote: > > On 2019/03/05 17:56, Tatsuro Yamada wrote: > >> On 2019/03/05 11:35, Robert Haas wrote: > >>> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada > >>> <yamada.tats...@lab.ntt.co.jp> wrote: > >>>> === Current design === > >>>> > >>>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap. > >>>> Depending on which one is chosen, the command will proceed in the > >>>> following sequence of phases: > >>>> > >>>> * Scan method: Seq Scan > >>>> 0. initializing (*2) > >>>> 1. seq scanning heap (*1) > >>>> 3. sorting tuples (*2) > >>>> 4. writing new heap (*1) > >>>> 5. swapping relation files (*2) > >>>> 6. rebuilding index (*2) > >>>> 7. performing final cleanup (*2) > >>>> > >>>> * Scan method: Index Scan > >>>> 0. initializing (*2) > >>>> 2. index scanning heap (*1) > >>>> 5. swapping relation files (*2) > >>>> 6. rebuilding index (*2) > >>>> 7. performing final cleanup (*2) > >>>> > >>>> VACUUM FULL command will proceed in the following sequence of phases: > >>>> > >>>> 1. seq scanning heap (*1) > >>>> 5. swapping relation files (*2) > >>>> 6. rebuilding index (*2) > >>>> 7. performing final cleanup (*2) > >>>> > >>>> (*1): increasing the value in heap_tuples_scanned column > >>>> (*2): only shows the phase in the phase column > >>> > >>> All of that sounds good. > >>> > >>>> The view provides the information of CLUSTER command progress details as > >>>> follows > >>>> # \d pg_stat_progress_cluster > >>>> View "pg_catalog.pg_stat_progress_cluster" > >>>> Column | Type | Collation | Nullable | Default > >>>> ---------------------------+---------+-----------+----------+--------- > >>>> pid | integer | | | > >>>> datid | oid | | | > >>>> datname | name | | | > >>>> relid | oid | | | > >>>> command | text | | | > >>>> phase | text | | | > >>>> cluster_index_relid | bigint | | | > >>>> heap_tuples_scanned | bigint | | | > >>>> heap_tuples_vacuumed | bigint | | | > >>> > >>> Still not sure if we need heap_tuples_vacuumed. We could try to > >>> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if > >>> we're using a Seq Scan. > >> > >> I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that > >> in > >> next patch. > >> > >> Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able > >> to > >> get those from initscan(). I'll investigate it more. > >> > >> cluster.c > >> copy_heap_data() > >> heap_beginscan() > >> heap_beginscan_internal() > >> initscan() > >> > >> > >> > >>>> === Discussion points === > >>>> > >>>> - Progress counter for "3. sorting tuples" phase > >>>> - Should we add pgstat_progress_update_param() in tuplesort.c like > >>>> a > >>>> "trace_sort"? > >>>> Thanks to Peter Geoghegan for the useful advice! > >>> > >>> How would we avoid an abstraction violation? > >> > >> Hmm... What do you mean an abstraction violation? > >> If it is difficult to solve, I'd not like to add the progress counter for > >> the sorting tuples. > >> > >> > >>>> - Progress counter for "6. rebuilding index" phase > >>>> - Should we add "index_vacuum_count" in the view like a vacuum > >>>> progress monitor? > >>>> If yes, I'll add pgstat_progress_update_param() to > >>>> reindex_relation() of index.c. > >>>> However, I'm not sure whether it is okay or not. > >>> > >>> Doesn't seem unreasonable to me. > >> > >> I see, I'll add it later. > > > > > > Attached file is revised and WIP patch including: > > > > - Remove heap_tuples_vacuumed > > - Add heap_blks_scanned and heap_blks_total > > - Add index_vacuum_count > > > > I tried to "add heap_blks_scanned and heap_blks_total" columns and I > > realized that > > "heap_tuples_scanned" column is suitable as a counter when a scan method is > > both index-scan and seq-scan because CLUSTER is on a tuple basis. > > > Attached file is rebased patch on current HEAD. > I changed a status. :) > > Looks like the patch needs a rebase. I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2
PFA reject file in case you want to have a look. > Regards, > Tatsuro Yamada > > > -- Regards, Rafia Sabih
--- src/backend/commands/cluster.c +++ src/backend/commands/cluster.c @@ -942,14 +960,33 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose, */ if (OldIndex != NULL && !use_sort) { + const int ci_index[] = { + PROGRESS_CLUSTER_PHASE, + PROGRESS_CLUSTER_INDEX_RELID + }; + int64 ci_val[2]; + + /* Set phase and OIDOldIndex to columns */ + ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP; + ci_val[1] = OIDOldIndex; + pgstat_progress_update_multi_param(2, ci_index, ci_val); + heapScan = NULL; indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, 0, 0); index_rescan(indexScan, NULL, 0, NULL, 0); } else { + /* Set phase */ + pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE, + PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP); + heapScan = heap_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL); indexScan = NULL; + + /* Set total heap blocks */ + pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS, + heapScan->rs_nblocks); } /* Log what we're doing */