Re: [HACKERS] CLUSTER command progress monitor

Rafia Sabih Mon, 18 Mar 2019 04:43:21 -0700

On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada
<[email protected]> wrote:
>
> On 2019/03/06 15:38, Tatsuro Yamada wrote:
> > On 2019/03/05 17:56, Tatsuro Yamada wrote:
> >> On 2019/03/05 11:35, Robert Haas wrote:
> >>> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
> >>> <[email protected]> wrote:
> >>>> === Current design ===
> >>>>
> >>>> CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
> >>>> Depending on which one is chosen, the command will proceed in the
> >>>> following sequence of phases:
> >>>>
> >>>>     * Scan method: Seq Scan
> >>>>       0. initializing                 (*2)
> >>>>       1. seq scanning heap            (*1)
> >>>>       3. sorting tuples               (*2)
> >>>>       4. writing new heap             (*1)
> >>>>       5. swapping relation files      (*2)
> >>>>       6. rebuilding index             (*2)
> >>>>       7. performing final cleanup     (*2)
> >>>>
> >>>>     * Scan method: Index Scan
> >>>>       0. initializing                 (*2)
> >>>>       2. index scanning heap          (*1)
> >>>>       5. swapping relation files      (*2)
> >>>>       6. rebuilding index             (*2)
> >>>>       7. performing final cleanup     (*2)
> >>>>
> >>>> VACUUM FULL command will proceed in the following sequence of phases:
> >>>>
> >>>>       1. seq scanning heap            (*1)
> >>>>       5. swapping relation files      (*2)
> >>>>       6. rebuilding index             (*2)
> >>>>       7. performing final cleanup     (*2)
> >>>>
> >>>> (*1): increasing the value in heap_tuples_scanned column
> >>>> (*2): only shows the phase in the phase column
> >>>
> >>> All of that sounds good.
> >>>
> >>>> The view provides the information of CLUSTER command progress details as 
> >>>> follows
> >>>> # \d pg_stat_progress_cluster
> >>>>                 View "pg_catalog.pg_stat_progress_cluster"
> >>>>             Column           |  Type   | Collation | Nullable | Default
> >>>> ---------------------------+---------+-----------+----------+---------
> >>>>    pid                       | integer |           |          |
> >>>>    datid                     | oid     |           |          |
> >>>>    datname                   | name    |           |          |
> >>>>    relid                     | oid     |           |          |
> >>>>    command                   | text    |           |          |
> >>>>    phase                     | text    |           |          |
> >>>>    cluster_index_relid       | bigint  |           |          |
> >>>>    heap_tuples_scanned       | bigint  |           |          |
> >>>>    heap_tuples_vacuumed      | bigint  |           |          |
> >>>
> >>> Still not sure if we need heap_tuples_vacuumed.  We could try to
> >>> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
> >>> we're using a Seq Scan.
> >>
> >> I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that 
> >> in
> >> next patch.
> >>
> >> Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able 
> >> to
> >> get those from initscan(). I'll investigate it more.
> >>
> >> cluster.c
> >>    copy_heap_data()
> >>      heap_beginscan()
> >>        heap_beginscan_internal()
> >>          initscan()
> >>
> >>
> >>
> >>>> === Discussion points ===
> >>>>
> >>>>    - Progress counter for "3. sorting tuples" phase
> >>>>       - Should we add pgstat_progress_update_param() in tuplesort.c like 
> >>>> a
> >>>>         "trace_sort"?
> >>>>         Thanks to Peter Geoghegan for the useful advice!
> >>>
> >>> How would we avoid an abstraction violation?
> >>
> >> Hmm... What do you mean an abstraction violation?
> >> If it is difficult to solve, I'd not like to add the progress counter for 
> >> the sorting tuples.
> >>
> >>
> >>>>    - Progress counter for "6. rebuilding index" phase
> >>>>       - Should we add "index_vacuum_count" in the view like a vacuum 
> >>>> progress monitor?
> >>>>         If yes, I'll add pgstat_progress_update_param() to 
> >>>> reindex_relation() of index.c.
> >>>>         However, I'm not sure whether it is okay or not.
> >>>
> >>> Doesn't seem unreasonable to me.
> >>
> >> I see, I'll add it later.
> >
> >
> > Attached file is revised and WIP patch including:
> >
> >    - Remove heap_tuples_vacuumed
> >    - Add heap_blks_scanned and heap_blks_total
> >    - Add index_vacuum_count
> >
> > I tried to "add heap_blks_scanned and heap_blks_total" columns and I 
> > realized that
> > "heap_tuples_scanned" column is suitable as a counter when a scan method is
> > both index-scan and seq-scan because CLUSTER is on a tuple basis.
>
>
> Attached file is rebased patch on current HEAD.
> I changed a status. :)
>
>
Looks like the patch needs a rebase.
I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2


PFA reject file in case you want to have a look.
> Regards,
> Tatsuro Yamada
>
>
>


-- 
Regards,
Rafia Sabih

--- src/backend/commands/cluster.c
+++ src/backend/commands/cluster.c
@@ -942,14 +960,33 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOldHeap, Oid OIDOldIndex, bool verbose,
 	 */
 	if (OldIndex != NULL && !use_sort)
 	{
+		const int   ci_index[] = {
+			PROGRESS_CLUSTER_PHASE,
+			PROGRESS_CLUSTER_INDEX_RELID
+		};
+		int64       ci_val[2];
+
+		/* Set phase and OIDOldIndex to columns */
+		ci_val[0] = PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP;
+		ci_val[1] = OIDOldIndex;
+		pgstat_progress_update_multi_param(2, ci_index, ci_val);
+
 		heapScan = NULL;
 		indexScan = index_beginscan(OldHeap, OldIndex, SnapshotAny, 0, 0);
 		index_rescan(indexScan, NULL, 0, NULL, 0);
 	}
 	else
 	{
+		/* Set phase */
+		pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
+									 PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP);
+
 		heapScan = heap_beginscan(OldHeap, SnapshotAny, 0, (ScanKey) NULL);
 		indexScan = NULL;
+
+		/* Set total heap blocks */
+		pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_BLKS,
+									 heapScan->rs_nblocks);
 	}
 
 	/* Log what we're doing */

Re: [HACKERS] CLUSTER command progress monitor

Reply via email to