Re: [HACKERS] CLUSTER command progress monitor

Daniel Gustafsson Sun, 01 Oct 2017 17:30:01 -0700

> On 12 Sep 2017, at 14:57, Tatsuro Yamada <[email protected]> wrote:
> 
> On 2017/09/12 21:20, Tatsuro Yamada wrote:
>> On 2017/09/11 23:38, Robert Haas wrote:
>>> On Sun, Sep 10, 2017 at 10:36 PM, Tatsuro Yamada
>>> <[email protected]> wrote:
>>>> Thanks for the comment.
>>>> 
>>>> As you know, CLUSTER command uses SEQ SCAN or INDEX SCAN as a scan method 
>>>> by
>>>> cost estimation. In the case of SEQ SCAN, these two phases not overlap.
>>>> However, in INDEX SCAN, it overlaps. Therefore I created the phase of "scan
>>>> heap and write new heap" when INDEX SCAN was selected.
>>>> 
>>>> I agree that progress reporting for sort is difficult. So it only reports
>>>> the phase ("sorting tuples") in the current design of progress monitor of
>>>> cluster.
>>>> It doesn't report counter of sort.
>>> 
>>> Doesn't that make it almost useless?  I would guess that scanning the
>>> heap and writing the new heap would ordinarily account for most of the
>>> runtime, or at least enough that you're going to want something more
>>> than just knowing that's the phase you're in.
>> 
>> Hmmm, Should I add a counter in tuplesort.c? (tuplesort_performsort())
>> I know that external merge sort takes a time than quick sort.
>> I'll try investigating how to get a counter from external merge sort 
>> processing.
>> Is this the right way?
>> 
>> 
>>>>> The patch is getting the value reported as heap_tuples_total from
>>>>> OldHeap->rd_rel->reltuples.  I think this is pointless: the user can
>>>>> see that value anyway if they wish.  The point of the progress
>>>>> counters is to expose things the user couldn't otherwise see.  It's
>>>>> also not necessarily accurate: it's only an estimate in the best case,
>>>>> and may be way off if the relation has recently be extended by a large
>>>>> amount.  I think it's pretty important that we try hard to only report
>>>>> values that are known to be accurate, because users hate (and mock)
>>>>> inaccurate progress reports.
>>>> 
>>>> Do you mean to use the number of rows by using below calculation instead
>>>> OldHeap->rd_rel->reltuples?
>>>> 
>>>>  estimate rows = physical table size / average row length
>>> 
>>> No, I mean don't report it at all.  The caller can do that calculation
>>> if they wish, without any help from the progress reporting machinery.
>> 
>> I see. I'll remove that column on next patch.
> 
> 
> I will summarize the current design and future corrections before sending
> the next patch.
> 
> 
> === Current design ===
> 
> CLUSTER command may use Index Scan or Seq Scan when scanning the heap.
> Depending on which one is chosen, the command will proceed in the
> following sequence of phases:
> 
>  * Scan method: Seq Scan
>    1. scanning heap                (*1)
>    2. sorting tuples               (*2)
>    3. writing new heap             (*1)
>    5. swapping relation files      (*2)
>    6. rebuilding index             (*2)
>    7. performing final cleanup     (*2)
> 
>  * Scan method: Index Scan
>    4. scan heap and write new heap (*1)
>    5. swapping relation files      (*2)
>    6. rebuilding index             (*2)
>    7. performing final cleanup     (*2)
> 
> VACUUM FULL command will proceed in the following sequence of phases:
> 
>    1. scanning heap                (*1)
>    5. swapping relation files      (*2)
>    6. rebuilding index             (*2)
>    7. performing final cleanup     (*2)
> 
> (*1): increasing the value in heap_tuples_scanned column
> (*2): only shows the phase in the phase column
> 
> The view provides the information of CLUSTER command progress details as 
> follows
> # \d pg_stat_progress_cluster
>              View "pg_catalog.pg_stat_progress_cluster"
>          Column           |  Type   | Collation | Nullable | Default
> ---------------------------+---------+-----------+----------+---------
> pid                       | integer |           |          |
> datid                     | oid     |           |          |
> datname                   | name    |           |          |
> relid                     | oid     |           |          |
> command                   | text    |           |          |
> phase                     | text    |           |          |
> scan_method               | text    |           |          |
> scan_index_relid          | bigint  |           |          |
> heap_tuples_total         | bigint  |           |          |
> heap_tuples_scanned       | bigint  |           |          |
> heap_tuples_vacuumed      | bigint  |           |          |
> heap_tuples_recently_dead | bigint  |           |          |
> 
> 
> === It will be changed on next patch ===
> 
> - Rename to pg_stat_progress_reolg from pg_stat_progress_cluster
> - Remove heap_tuples_total column from the view
> - Add a progress counter in the phase of "sorting tuples" (difficult?!)
> 
> 
> === My test case as a bonus ===
> 
> I share my test case of progress monitor.
> If someone wants to watch the current progress monitor, you can use
> this test case as a example.
> 
> [Terminal1]
> Run this query on psql:
> 
>   select * from pg_stat_progress_cluster; \watch 0.05
> 
> [Terminal2]
> Run these queries on psql:
> 
> drop table t1;
> 
> create table t1 as select a, random() * 1000 as b from generate_series(0, 
> 99999999) a;
> create index idx_t1 on t1(a);
> create index idx_t1_b on t1(b);
> analyze t1;
> 
> -- index scan
> set enable_seqscan to off;
> cluster verbose t1 using idx_t1;
> 
> -- seq scan
> set enable_seqscan to on;
> set enable_indexscan to off;
> cluster verbose t1 using idx_t1;
> 
> -- only given table name to cluster command
> cluster verbose t1;
> 
> -- only cluster command
> cluster verbose;
> 
> -- vacuum full
> vacuum full t1;
> 
> -- vacuum full
> vacuum full;


Based on this thread, this patch has been marked Returned with Feedback.
Please re-submit a new version to a future commitfest.

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CLUSTER command progress monitor

Reply via email to