> On 12 Sep 2017, at 14:57, Tatsuro Yamada <yamada.tats...@lab.ntt.co.jp> wrote: > > On 2017/09/12 21:20, Tatsuro Yamada wrote: >> On 2017/09/11 23:38, Robert Haas wrote: >>> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada >>> <yamada.tats...@lab.ntt.co.jp> wrote: >>>> Thanks for the comment. >>>> >>>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method >>>> by >>>> cost estimation. In the case of SEQ SCAN, these two phases not overlap. >>>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan >>>> heap and write new heap" when INDEX SCAN was selected. >>>> >>>> I agree that progress reporting for sort is difficult. So it only reports >>>> the phase ("sorting tuples") in the current design of progress monitor of >>>> cluster. >>>> It doesn't report counter of sort. >>> >>> Doesn't that make it almost useless? I would guess that scanning the >>> heap and writing the new heap would ordinarily account for most of the >>> runtime, or at least enough that you're going to want something more >>> than just knowing that's the phase you're in. >> >> Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort()) >> I know that external merge sort takes a time than quick sort. >> I'll try investigating how to get a counter from external merge sort >> processing. >> Is this the right way? >> >> >>>>> The patch is getting the value reported as heap_tuples_total from >>>>> OldHeap->rd_rel->reltuples. I think this is pointless: the user can >>>>> see that value anyway if they wish. The point of the progress >>>>> counters is to expose things the user couldn't otherwise see. It's >>>>> also not necessarily accurate: it's only an estimate in the best case, >>>>> and may be way off if the relation has recently be extended by a large >>>>> amount. I think it's pretty important that we try hard to only report >>>>> values that are known to be accurate, because users hate (and mock) >>>>> inaccurate progress reports. >>>> >>>> Do you mean to use the number of rows by using below calculation instead >>>> OldHeap->rd_rel->reltuples? >>>> >>>> estimate rows = physical table size / average row length >>> >>> No, I mean don't report it at all. The caller can do that calculation >>> if they wish, without any help from the progress reporting machinery. >> >> I see. I'll remove that column on next patch. > > > I will summarize the current design and future corrections before sending > the next patch. > > > === Current design === > > CLUSTER command may use Index Scan or Seq Scan when scanning the heap. > Depending on which one is chosen, the command will proceed in the > following sequence of phases: > > * Scan method: Seq Scan > 1. scanning heap (*1) > 2. sorting tuples (*2) > 3. writing new heap (*1) > 5. swapping relation files (*2) > 6. rebuilding index (*2) > 7. performing final cleanup (*2) > > * Scan method: Index Scan > 4. scan heap and write new heap (*1) > 5. swapping relation files (*2) > 6. rebuilding index (*2) > 7. performing final cleanup (*2) > > VACUUM FULL command will proceed in the following sequence of phases: > > 1. scanning heap (*1) > 5. swapping relation files (*2) > 6. rebuilding index (*2) > 7. performing final cleanup (*2) > > (*1): increasing the value in heap_tuples_scanned column > (*2): only shows the phase in the phase column > > The view provides the information of CLUSTER command progress details as > follows > # \d pg_stat_progress_cluster > View "pg_catalog.pg_stat_progress_cluster" > Column | Type | Collation | Nullable | Default > ---------------------------+---------+-----------+----------+--------- > pid | integer | | | > datid | oid | | | > datname | name | | | > relid | oid | | | > command | text | | | > phase | text | | | > scan_method | text | | | > scan_index_relid | bigint | | | > heap_tuples_total | bigint | | | > heap_tuples_scanned | bigint | | | > heap_tuples_vacuumed | bigint | | | > heap_tuples_recently_dead | bigint | | | > > > === It will be changed on next patch === > > - Rename to pg_stat_progress_reolg from pg_stat_progress_cluster > - Remove heap_tuples_total column from the view > - Add a progress counter in the phase of "sorting tuples" (difficult?!) > > > === My test case as a bonus === > > I share my test case of progress monitor. > If someone wants to watch the current progress monitor, you can use > this test case as a example. > > [Terminal1] > Run this query on psql: > > select * from pg_stat_progress_cluster; \watch 0.05 > > [Terminal2] > Run these queries on psql: > > drop table t1; > > create table t1 as select a, random() * 1000 as b from generate_series(0, > 99999999) a; > create index idx_t1 on t1(a); > create index idx_t1_b on t1(b); > analyze t1; > > -- index scan > set enable_seqscan to off; > cluster verbose t1 using idx_t1; > > -- seq scan > set enable_seqscan to on; > set enable_indexscan to off; > cluster verbose t1 using idx_t1; > > -- only given table name to cluster command > cluster verbose t1; > > -- only cluster command > cluster verbose; > > -- vacuum full > vacuum full t1; > > -- vacuum full > vacuum full;
Based on this thread, this patch has been marked Returned with Feedback. Please re-submit a new version to a future commitfest. cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers