subject:"Re\: \[HACKERS\] CLUSTER command progress monitor"

Re: [HACKERS] CLUSTER command progress monitor

2019-09-20 Thread Alvaro Herrera

On 2019-Sep-18, Michael Paquier wrote:

> On Tue, Sep 17, 2019 at 10:50:22PM -0300, Alvaro Herrera wrote:
> > On 2019-Sep-18, Michael Paquier wrote:
> >> So, with the clock ticking and the release getting close by, what do
> >> we do for this set of issues?  REINDEX, CREATE INDEX and CLUSTER all
> >> try to build indexes and the current infrastructure is not really
> >> adapted to hold all that.  Robert, Alvaro and Peter E, do you have any
> >> comments to offer?
> > 
> > Which part of it is not already fixed?
> 
> I can still see at least two problems.  There is one issue with
> pgstat_progress_update_param() which gets called in reindex_table() 
> for a progress phase of CLUSTER, and this even if
> REINDEXOPT_REPORT_PROGRESS is not set in the options.

(You mean reindex_relation.)

... but that param update is there for CLUSTER, not for REINDEX, so if
we made it dependent on the option to turn on CREATE INDEX progress
updates, it would break CLUSTER progress reporting.  Also, the parameter
being updated is not used by CREATE INDEX, so there's no spurious
change.  I think this ain't broke, and thus it don't need fixin'.

If anything, I would like the CLUSTER report to show index creation
progress, which would go the opposite way.  But that seems a patch for
pg13.

> Also it seems
> to me that the calls to pgstat_progress_start_command() and
> pgstat_progress_end_command() are at incorrect locations for
> reindex_index() and that those should be one level higher on the stack
> to avoid any kind of interactions with another command whose progress
> has already started.

That doesn't work, because the caller doesn't have the OID of the table,
which pgstat_progress_start_command() needs.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

2019-09-17 Thread Michael Paquier

On Tue, Sep 17, 2019 at 10:50:22PM -0300, Alvaro Herrera wrote:
> On 2019-Sep-18, Michael Paquier wrote:
>> So, with the clock ticking and the release getting close by, what do
>> we do for this set of issues?  REINDEX, CREATE INDEX and CLUSTER all
>> try to build indexes and the current infrastructure is not really
>> adapted to hold all that.  Robert, Alvaro and Peter E, do you have any
>> comments to offer?
> 
> Which part of it is not already fixed?

I can still see at least two problems.  There is one issue with
pgstat_progress_update_param() which gets called in reindex_table() 
for a progress phase of CLUSTER, and this even if
REINDEXOPT_REPORT_PROGRESS is not set in the options.  Also it seems
to me that the calls to pgstat_progress_start_command() and
pgstat_progress_end_command() are at incorrect locations for
reindex_index() and that those should be one level higher on the stack
to avoid any kind of interactions with another command whose progress
has already started.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-09-17 Thread Alvaro Herrera

On 2019-Sep-18, Michael Paquier wrote:

> So, with the clock ticking and the release getting close by, what do
> we do for this set of issues?  REINDEX, CREATE INDEX and CLUSTER all
> try to build indexes and the current infrastructure is not really
> adapted to hold all that.  Robert, Alvaro and Peter E, do you have any
> comments to offer?

Which part of it is not already fixed?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

2019-09-17 Thread Michael Paquier

On Sat, Sep 14, 2019 at 11:45:47AM +0900, Michael Paquier wrote:
> I have provided a short summary of the two issues on the open item
> page (https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items) as
> the open item was too much evasive.  Here is a copy-paste for the
> archives of what I wrote:
> 1) A progress may be started while another one is already in progress.
> Hence, if progress gets stopped the previously-started state is
> removed, causing all follow-up updates to not happen.
> 2) Progress updates happening in a code path shared between those
> three commands may clobber a previous state present.
> 
> Regarding 1) and based on what I found in the code, you can blame
> REINDEX reporting which has added progress_start calls in code paths
> which are also taken by CREATE INDEX and CLUSTER, causing their
> progress reporting to go to the void.  In order to fix this one we
> could do what I summarized in [1].
> 
> [1]: https://www.postgresql.org/message-id/20190905010316.gb14...@paquier.xyz

So, with the clock ticking and the release getting close by, what do
we do for this set of issues?  REINDEX, CREATE INDEX and CLUSTER all
try to build indexes and the current infrastructure is not really
adapted to hold all that.  Robert, Alvaro and Peter E, do you have any
comments to offer?
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-09-16 Thread Michael Paquier

On Mon, Sep 16, 2019 at 03:26:10PM +0900, Tattsu Yama wrote:
> I should have explained the API changes more. I added cmdtype as a given
> parameter for all functions (See below).
> Therefore, I suppose that my patch is similar to the following fix as you
> mentioned on -hackers.

Yes, that's an option I mentioned here, but it has drawbacks:
https://www.postgresql.org/message-id/20190914024547.gb15...@paquier.xyz

I have just looked at it again after a small rebase and there are
issues with the design of your patch:
- When aborting a transaction, we need to enforce a reset of the
command ID used in st_progress_command to be PROGRESS_COMMAND_INVALID.
Unfortunately, your patch does not consider the case where an error
happens while a command ID is set, causing a command to still be
tracked with the next transactions of the session.  Even worse, it
prevents pgstat_progress_start_command() to be called again in this
case for another command.
- CLUSTER can rebuild indexes, and we'd likely want to be able to
track some of the information from CREATE INDEX for CLUSTER.

The second issue is perhaps fine as it is not really straight-forward
to share the same progress phases across multiple commands, and we
could live without it for now, or require a follow-up patch to make
the information of CREATE INDEX available to CLUSTER.

Now, the first issue is of another caliber and a no-go :(

On HEAD, pgstat_progress_end_command() has the limitation to not be
able to stack multiple commands, so calling it in cascade has the
disadvantage to perhaps erase the progress state of a command (and it
is not designed for that anyway), which is what happens with CLUSTER
when reindex_index() starts a new progress report, but the simplicity
of the current infrastructure is very safe when it comes to failure
handling, to make sure that an reset happens as long as the command ID
is not invalid.  Your patch makes that part unpredictable.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-09-16 Thread Tatsuro Yamada


Hi Alvaro!


Is this fix strictly necessary for pg12, or is this something that we
can leave for pg13?


Not only me but many DBA needs this progress report feature on PG12,
therefore I'm trying to fix the problem. If you send other patch to
fix the problem, and it is more elegant than mine, I can withdraw my patch.
Anyway, I want to avoid this feature being reverted.
Do you have any ideas to fix the problem?


I committed a fix for the originally reported problem as da47e43dc32e in
branch REL_12_STABLE.  Is that insufficient, and if so why?



Ooops, I misunderstood. I now realized you committed your patch to
fix the problem. Thanks! I'll test it later. :)

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=da47e43dc32e3c5916396f0cbcfa974b371e4875



I tested your patch (da47e43d) and it works fine. Thanks! :)
So, my patch improving progress reporting API can leave for PG13.


#Test scenario
===
[Session #1]
select * from pg_stat_progress_cluster ; \watch 0.0001

[Session #2]
create table hoge as select a from generate_series(1, 10) a;
create index ind_hoge1 on hoge(a);
create index ind_hoge2 on hoge((a%2));
create index ind_hoge3 on hoge((a%3));
create index ind_hoge4 on hoge((a%4));
create index ind_hoge5 on hoge((a%5));
cluster hoge using ind_hoge1;
===

#Test result
===
22283|13593|postgres|16384|CLUSTER|initializing|0|0|0|0|0|0
...
22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|10|10|0|0|0 
<= Increasing from 0 to 5
22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|10|10|0|0|1
22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|10|10|0|0|2
22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|10|10|0|0|3
22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|10|10|0|0|4
22283|13593|postgres|16384|CLUSTER|performing final 
cleanup|16387|10|10|0|0|5
===


Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

2019-09-16 Thread Tatsuro Yamada


Hi Alvaro!


Is this fix strictly necessary for pg12, or is this something that we
can leave for pg13?


Not only me but many DBA needs this progress report feature on PG12,
therefore I'm trying to fix the problem. If you send other patch to
fix the problem, and it is more elegant than mine, I can withdraw my patch.
Anyway, I want to avoid this feature being reverted.
Do you have any ideas to fix the problem?


I committed a fix for the originally reported problem as da47e43dc32e in
branch REL_12_STABLE.  Is that insufficient, and if so why?



Ooops, I misunderstood. I now realized you committed your patch to
fix the problem. Thanks! I'll test it later. :)

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=da47e43dc32e3c5916396f0cbcfa974b371e4875


Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

2019-09-16 Thread Alvaro Herrera

On 2019-Sep-17, Tatsuro Yamada wrote:

> On 2019/09/16 23:12, Alvaro Herrera wrote:

> > Is this fix strictly necessary for pg12, or is this something that we
> > can leave for pg13?
> 
> Not only me but many DBA needs this progress report feature on PG12,
> therefore I'm trying to fix the problem. If you send other patch to
> fix the problem, and it is more elegant than mine, I can withdraw my patch.
> Anyway, I want to avoid this feature being reverted.
> Do you have any ideas to fix the problem?

I committed a fix for the originally reported problem as da47e43dc32e in
branch REL_12_STABLE.  Is that insufficient, and if so why?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

2019-09-16 Thread Tatsuro Yamada


Hi Alvaro,


On 2019/09/16 23:12, Alvaro Herrera wrote:

On 2019-Sep-16, Tattsu Yama wrote:


I should have explained the API changes more. I added cmdtype as a given
parameter for all functions (See below).
Therefore, I suppose that my patch is similar to the following fix as you
mentioned on -hackers.


Is this fix strictly necessary for pg12, or is this something that we
can leave for pg13?



Not only me but many DBA needs this progress report feature on PG12,
therefore I'm trying to fix the problem. If you send other patch to
fix the problem, and it is more elegant than mine, I can withdraw my patch.
Anyway, I want to avoid this feature being reverted.
Do you have any ideas to fix the problem?


Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

2019-09-16 Thread Alvaro Herrera

On 2019-Sep-16, Tattsu Yama wrote:

> I should have explained the API changes more. I added cmdtype as a given
> parameter for all functions (See below).
> Therefore, I suppose that my patch is similar to the following fix as you
> mentioned on -hackers.

Is this fix strictly necessary for pg12, or is this something that we
can leave for pg13?


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

2019-09-15 Thread Tattsu Yama

Hi Michael,


>> > Attached file is WIP patch.In my patch, I added "command id" to all
>> APIs of
>> > progress reporting to isolate commands. Therefore, it doesn't allow to
>> > cascade updating system views. And my patch is on WIP so it needs
>> clean-up
>> > and test.
>> > I share it anyway. :)
>>
>> +   if (cmdtype == PROGRESS_COMMAND_INVALID ||
>> beentry->st_progress_command == cmdtype)
>> +   {
>> +   PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
>> +   beentry->st_progress_param[index] = val;
>> +   PGSTAT_END_WRITE_ACTIVITY(beentry);
>> +   }
>> You basically don't need the progress reports if the command ID is
>> invalid, no?
>>
>
>
> Ah, right.
> I'll check and fix that today. :)
>
>

I fixed the patch based on your comment.
Please find attached file.  :)

I should have explained the API changes more. I added cmdtype as a given
parameter for all functions (See below).
Therefore, I suppose that my patch is similar to the following fix as you
mentioned on -hackers.

- Allow only reporting for a given command ID, which would basically
> require to pass down the command ID to progress update APIs and bypass an
> update
> if the command ID provided by caller does not match the existing one
> started (?).


#pgstat.c
pgstat_progress_start_command(ProgressCommandType cmdtype,...)
   - Progress reporter starts when beentry->st_progress_command is
PROGRESS_COMMAND_INVALID

pgstat_progress_end_command(ProgressCommandType cmdtype,...)
   - Progress reporter ends when beentry->st_progress_command equals cmdtype

pgstat_progress_update_param(ProgressCommandType cmdtype,...)  and
pgstat_progress_update_multi_param(ProgressCommandType cmdtype,...)
   - Progress reporter updates parameters if beentry->st_progress_command
equals cmdtype

Note:
cmdtype means the ProgressCommandType below:

# pgstat.h
typedef enum ProgressCommandType
{
PROGRESS_COMMAND_INVALID,
PROGRESS_COMMAND_VACUUM,
PROGRESS_COMMAND_CLUSTER,
PROGRESS_COMMAND_CREATE_INDEX
} ProgressCommandType;

Thanks,
Tatsuro Yamada


v2_fix_progress_report_for_cluster.patch
Description: Binary data

Re: [HACKERS] CLUSTER command progress monitor

2019-09-14 Thread Tattsu Yama

Hi Michael!

> Attached file is WIP patch.In my patch, I added "command id" to all APIs
> of
> > progress reporting to isolate commands. Therefore, it doesn't allow to
> > cascade updating system views. And my patch is on WIP so it needs
> clean-up
> > and test.
> > I share it anyway. :)
>
> +   if (cmdtype == PROGRESS_COMMAND_INVALID ||
> beentry->st_progress_command == cmdtype)
> +   {
> +   PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
> +   beentry->st_progress_param[index] = val;
> +   PGSTAT_END_WRITE_ACTIVITY(beentry);
> +   }
> You basically don't need the progress reports if the command ID is
> invalid, no?
>


Ah, right.
I'll check and fix that today. :)



>
> Another note is that you don't actually fix the problems related to
> the calls of pgstat_progress_end_command() which have been added for
> REINDEX reporting, so a progress report started for CLUSTER can get
> ended earlier than expected, preventing the follow-up progress updates
> to show up.
>
>

Hmm... I fixed the problem. Please confirm the test result repeated below.
CLUSTER is able to get the last phase: performing final clean up by using
the patch.

# Test result

postgres=# select * from pg_stat_progress_cluster ; \watch 0.001;
11636|13591|postgres|16384|CLUSTER|initializing|0|0|0|0|0|0
11636|13591|postgres|16384|CLUSTER|index scanning heap|16389|251|251|0|0|0
11636|13591|postgres|16384|CLUSTER|index scanning
heap|16389|1|1|0|0|0
11636|13591|postgres|16384|CLUSTER|rebuilding
index|16389|1|1|0|0|0  <== The last column rebuild_index_count is
increasing!
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|1
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|2
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|3
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|4
11636|13591|postgres|16384|CLUSTER|performing final
cleanup|16389|1|1|0|0|5  <== The last phase of CLUSTER!


Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

2019-09-13 Thread Michael Paquier

On Sat, Sep 14, 2019 at 01:06:32PM +0900, Tattsu Yama wrote:
> Thanks! I can review your patch for fix it.
> However, I was starting fixing the problem from the last day of PGConf.Asia
> (11 Sep).
> Attached file is WIP patch.In my patch, I added "command id" to all APIs of
> progress reporting to isolate commands. Therefore, it doesn't allow to
> cascade updating system views. And my patch is on WIP so it needs clean-up
> and test.
> I share it anyway. :)

+   if (cmdtype == PROGRESS_COMMAND_INVALID || beentry->st_progress_command 
== cmdtype)
+   {
+   PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
+   beentry->st_progress_param[index] = val;
+   PGSTAT_END_WRITE_ACTIVITY(beentry);
+   }
You basically don't need the progress reports if the command ID is
invalid, no?

Another note is that you don't actually fix the problems related to
the calls of pgstat_progress_end_command() which have been added for
REINDEX reporting, so a progress report started for CLUSTER can get
ended earlier than expected, preventing the follow-up progress updates
to show up.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-09-13 Thread Tattsu Yama

Hi Alvaro!



> Hello Tatsuro,
> On 2019-Aug-13, Tatsuro Yamada wrote:
> > On 2019/08/02 3:43, Alvaro Herrera wrote:
> > > Hmm, I'm trying this out now and I don't see the index_rebuild_count
> > > ever go up.  I think it's because the indexes are built using parallel
> > > index build ... or maybe it was the table AM changes that moved things
> > > around, not sure.  There's a period at the end when the CLUSTER command
> > > keeps working, but it's gone from pg_stat_progress_cluster.
> >
> > Thanks for your report.
> > I'll investigate it. :)
>


I have fixed it.  Can you please verify?
>


Thanks! I can review your patch for fix it.
However, I was starting fixing the problem from the last day of PGConf.Asia
(11 Sep).
Attached file is WIP patch.In my patch, I added "command id" to all APIs of
progress reporting to isolate commands. Therefore, it doesn't allow to
cascade updating system views. And my patch is on WIP so it needs clean-up
and test.
I share it anyway. :)

Here is a test result of my patch.
The last column index_rebuild count is increased.

postgres=# select * from pg_stat_progress_cluster ; \watch 0.001;
11636|13591|postgres|16384|CLUSTER|initializing|0|0|0|0|0|0
11636|13591|postgres|16384|CLUSTER|index scanning heap|16389|251|251|0|0|0
...
11636|13591|postgres|16384|CLUSTER|index scanning
heap|16389|1|1|0|0|0
11636|13591|postgres|16384|CLUSTER|rebuilding
index|16389|1|1|0|0|0...
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|1
...
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|2
...
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|3
...
11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|4
...
11636|13591|postgres|16384|CLUSTER|performing final
cleanup|16389|1|1|0|0|5


Thanks,
Tatsuro Yamada


v1_fix_progress_report_for_cluster.patch
Description: Binary data

Re: [HACKERS] CLUSTER command progress monitor

2019-09-13 Thread Michael Paquier

On Fri, Sep 13, 2019 at 12:48:40PM -0400, Robert Haas wrote:
> On Fri, Sep 13, 2019 at 12:03 PM Alvaro Herrera
>  wrote:
>> Ummm ... I've been operating --in this thread-- under the assumption
>> that it is REINDEX to blame for this problem, not CREATE INDEX, because
>> my recollection is that I tested CREATE INDEX together with CLUSTER and
>> it worked fine.  Has anybody done any actual research that the problem
>> is to blame on CREATE INDEX and not REINDEX?
> 
> I am not sure. I think, though, that the point is that all three
> commands rebuild indexes. So unless they all expect the same things in
> terms of which counters get set during that process, things will not
> work correctly.

I have provided a short summary of the two issues on the open item
page (https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items) as
the open item was too much evasive.  Here is a copy-paste for the
archives of what I wrote:
1) A progress may be started while another one is already in progress.
Hence, if progress gets stopped the previously-started state is
removed, causing all follow-up updates to not happen.
2) Progress updates happening in a code path shared between those
three commands may clobber a previous state present.

Regarding 1) and based on what I found in the code, you can blame
REINDEX reporting which has added progress_start calls in code paths
which are also taken by CREATE INDEX and CLUSTER, causing their
progress reporting to go to the void.  In order to fix this one we
could do what I summarized in [1].

As mentioned by Robert, the problem summarized in 2) is much more
complex using the current infrastructure, and one could blame all the
commands per the way they do not share the same set of progress
phases.  There are a couple of potential solutions which have been
discussed on the thread:
- Allow commands to share the same set of phases, which requires some
kind of mapping between the phases (?).
- Allow progress reports to become a stack.  This would also take care
of any kind of issues in 1) for the future, and this can cause the
incorrect command to be reported to pg_stat_activity if not being
careful.
- Allow only reporting for a given command ID, which would basically
require to pass down the command ID to progress update APIs and bypass
an update if the command ID provided by caller does not match the
existing one started (?).

1) is pretty easy to fix based on the current state of the code, 2)
requires much more consideration, and that's no material for v12.  It
could be perfectly possible as well that we have another solution not
discussed yet on this thread.

[1]: https://www.postgresql.org/message-id/20190905010316.gb14...@paquier.xyz
--
Michael

signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-09-13 Thread Alvaro Herrera

On 2019-Sep-13, Robert Haas wrote:

> On Fri, Sep 13, 2019 at 2:49 AM Michael Paquier  wrote:
> > On Fri, Sep 06, 2019 at 08:10:58AM -0400, Robert Haas wrote:
> > > It's fine if things are updated as well -- it's just you need to make
> > > sure that those places know whether or not they are supposed to be
> > > doing those updates. Again, why can't we just pass down a value
> > > telling them "do reindex-style progress updates" or "do cluster-style
> > > progress updates" or "do no progress updates"?
> >
> > That's invasive.  CREATE INDEX reporting goes pretty deep into the
> > tree, with steps dedicated to the builds and scans of btree (nbtsort.c
> > for example) and hash index AMs.

> Well, if CREATE INDEX progress reporting can't be reasonably done
> within the current infrastructure, then it should be reverted for v12
> and not committed again until somebody upgrades the infrastructure.

Ummm ... I've been operating --in this thread-- under the assumption
that it is REINDEX to blame for this problem, not CREATE INDEX, because
my recollection is that I tested CREATE INDEX together with CLUSTER and
it worked fine.  Has anybody done any actual research that the problem
is to blame on CREATE INDEX and not REINDEX?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

2019-09-13 Thread Alvaro Herrera

Hello Tatsuro,

On 2019-Aug-13, Tatsuro Yamada wrote:

> On 2019/08/02 3:43, Alvaro Herrera wrote:
> > Hmm, I'm trying this out now and I don't see the index_rebuild_count
> > ever go up.  I think it's because the indexes are built using parallel
> > index build ... or maybe it was the table AM changes that moved things
> > around, not sure.  There's a period at the end when the CLUSTER command
> > keeps working, but it's gone from pg_stat_progress_cluster.
> 
> Thanks for your report.
> I'll investigate it. :)

I have fixed it.  Can you please verify?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

2019-09-13 Thread Robert Haas

On Fri, Sep 13, 2019 at 12:03 PM Alvaro Herrera
 wrote:
> Ummm ... I've been operating --in this thread-- under the assumption
> that it is REINDEX to blame for this problem, not CREATE INDEX, because
> my recollection is that I tested CREATE INDEX together with CLUSTER and
> it worked fine.  Has anybody done any actual research that the problem
> is to blame on CREATE INDEX and not REINDEX?

I am not sure. I think, though, that the point is that all three
commands rebuild indexes. So unless they all expect the same things in
terms of which counters get set during that process, things will not
work correctly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

2019-09-13 Thread Robert Haas

On Fri, Sep 13, 2019 at 2:49 AM Michael Paquier  wrote:
> On Fri, Sep 06, 2019 at 08:10:58AM -0400, Robert Haas wrote:
> > It's fine if things are updated as well -- it's just you need to make
> > sure that those places know whether or not they are supposed to be
> > doing those updates. Again, why can't we just pass down a value
> > telling them "do reindex-style progress updates" or "do cluster-style
> > progress updates" or "do no progress updates"?
>
> That's invasive.  CREATE INDEX reporting goes pretty deep into the
> tree, with steps dedicated to the builds and scans of btree (nbtsort.c
> for example) and hash index AMs.  In this case we have something that
> does somewhat what you are looking for with report_progress which gets
> set to true only for VACUUM.  If we were to do something like that, we
> would also need to keep some sort of mapping regarding which command
> ID (as defined by ProgressCommandType) is able to use which set of
> parameter flags (as defined by the arguments of
> pgstat_progress_update_param() and its multi_* cousin).  Then comes
> the issue that some parameters may be used by multiple command types,
> while other don't (REINDEX and CREATE INDEX have some shared
> mapping).

Well, if CREATE INDEX progress reporting can't be reasonably done
within the current infrastructure, then it should be reverted for v12
and not committed again until somebody upgrades the infrastructure.

I admit that I was a bit suspicious about that commit, but I figured
Alvaro knew what he was doing and didn't study it in any depth.  And
perhaps he did know what he was doing and will disagree with your
assessment. But so far I haven't heard an idea for evolving the
infrastructure that sounds more than half-baked.

It's generally clear, though, that the existing infrastructure is not
well-suited to progress reporting for code that bounces all over the
tree.  It's not clear that *any* infrastructure is *entirely*
well-suited to do that; such problems are inherently complex. But this
is just a very simple system which was designed to be simple and very
low cost to use, and it may be that it's been stretched outside of its
comfort zone.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

2019-09-12 Thread Michael Paquier

On Fri, Sep 06, 2019 at 08:10:58AM -0400, Robert Haas wrote:
> It's fine if things are updated as well -- it's just you need to make
> sure that those places know whether or not they are supposed to be
> doing those updates. Again, why can't we just pass down a value
> telling them "do reindex-style progress updates" or "do cluster-style
> progress updates" or "do no progress updates"?

That's invasive.  CREATE INDEX reporting goes pretty deep into the
tree, with steps dedicated to the builds and scans of btree (nbtsort.c
for example) and hash index AMs.  In this case we have something that
does somewhat what you are looking for with report_progress which gets
set to true only for VACUUM.  If we were to do something like that, we
would also need to keep some sort of mapping regarding which command
ID (as defined by ProgressCommandType) is able to use which set of
parameter flags (as defined by the arguments of
pgstat_progress_update_param() and its multi_* cousin).  Then comes
the issue that some parameters may be used by multiple command types,
while other don't (REINDEX and CREATE INDEX have some shared
mapping).
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-09-09 Thread Peter Geoghegan

On Fri, Sep 6, 2019 at 5:11 AM Robert Haas  wrote:
> On Fri, Sep 6, 2019 at 1:44 AM Michael Paquier  wrote:
> > I don't see exactly why we could not switch to a fixed number of
> > slots, say 8, with one code path to start a progress which adds an
> > extra report on the stack, one to remove one entry from the stack, and
> > a new one to reset the whole thing for a backend.  This would not need
> > much restructuration of course.
>
> You could do that, but I don't think it's probably that great of an
> idea.  Now you've built something which is significantly more complex
> than the original design of this feature, but still not good enough to
> report on the progress of a query tree.  I tend to think we should
> confine ourselves to the progress reporting that can reasonably be
> done within the current infrastructure until somebody invents a really
> general mechanism that can handle, essentially, an EXPLAIN-on-the-fly
> of a current query tree.

+1. Let's not complicate the progress reporting infrastructure for an
uncertain benefit.

CLUSTER/VACUUM FULL is fundamentally an awkward utility command to
target with progress reporting infrastructure. I think that it's okay
to redefine how progress reporting works with CLUSTER now, in order to
fix the REINDEX/CLUSTER state clobbering bug.

-- 
Peter Geoghegan

Re: [HACKERS] CLUSTER command progress monitor

2019-09-09 Thread Michael Paquier

On Fri, Sep 06, 2019 at 10:27:02AM -0400, Alvaro Herrera from 2ndQuadrant wrote:
> That said, I did spend some time on this type of issue when doing CREATE
> INDEX support; you can tell because I defined the columns for block
> numbers in a scan separately from CREATE INDEX specific fields,
> precisely to avoid multiple commands running concurrently from
> clobbering unrelated columns:
> 
> /* Block numbers in a generic relation scan */
> #define PROGRESS_SCAN_BLOCKS_TOTAL15
> #define PROGRESS_SCAN_BLOCKS_DONE 16

Hm.  It is not really clear what is the intention by looking at the
contents progress.h.

> I would say that it's fairly useful to have CLUSTER report progress on
> indexes being created underneath, but I understand that it might be too
> late to be designing the CLUSTER report to take advantage of the CREATE
> INDEX metrics.

The same can be said about the reporting done in reindex_relation for
PROGRESS_CLUSTER_INDEX_REBUILD_COUNT.  I think that it should be
removed for now.

> I think a workable, not terribly invasive approach is to have REINDEX
> process its commands conditionally: have the caller indicate whether
> progress is to be reported, and skip the calls if not.  That would
> (should) prevent it from clobbering the state set up by CLUSTER.

So, you would basically add an extra flag in the options of
reindex_index() to decide if a progress report should be started or
not?  I am not a fan of that because it does not take care of the root
issue which is that the start of the progress reports is too much
internal.  I think that it would be actually less error prone to move
the start of the progress reporting for REINDEX out of reindex_index()
and start it at a higher level.  Looking again at the code, I would
recommend that we should start the progress in ReindexIndex() before
calling reindex_index(), ReindexMultipleTables() before calling
reindex_relation() and ReindexRelationConcurrently(), and
ReindexTable() before calling reindex_relation().  That will avoid
each command to clobber each other's in-progress reports.

It would be also very good to document clearly how the overlaps for
the progress parameter values are not happening.  For example for
CREATE INDEX, we don't know why 1, 2 and 7 are not used.

Note that there is an ID collision with PROGRESS_CREATEIDX_INDEX_OID
updated in reindex_index() and the CLUSTER part
PROGRESS_CLUSTER_HEAP_BLKS_SCANNED..  There could be an argument to
use a completely different range of IDs for each command phase to
avoid any overlap, but it is scary to consider that we may not have
found all the issues with one command cloberring another one's
state..
--
Michael

signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-09-06 Thread Alvaro Herrera from 2ndQuadrant

On 2019-Sep-06, Michael Paquier wrote:

> Finally comes the question of what do we do for v12?  I am adding in
> CC Peter, Alvaro being already present, who have been involved in the
> commits with CREATE INDEX and REINDEX.  It would be sad to revert a
> this feature, but well I'd rather do that now than regret later
> releasing the feature as it is currently shaped..  Let's see what the
> others think.

As far as I understand, CREATE INDEX is not affected -- only REINDEX is.
Of course, it would be sad to revert even the latter, but it's not as
bleak as reverting the whole thing.

That said, I did spend some time on this type of issue when doing CREATE
INDEX support; you can tell because I defined the columns for block
numbers in a scan separately from CREATE INDEX specific fields,
precisely to avoid multiple commands running concurrently from
clobbering unrelated columns:

/* Block numbers in a generic relation scan */
#define PROGRESS_SCAN_BLOCKS_TOTAL  15
#define PROGRESS_SCAN_BLOCKS_DONE   16

I would say that it's fairly useful to have CLUSTER report progress on
indexes being created underneath, but I understand that it might be too
late to be designing the CLUSTER report to take advantage of the CREATE
INDEX metrics.

I think a workable, not terribly invasive approach is to have REINDEX
process its commands conditionally: have the caller indicate whether
progress is to be reported, and skip the calls if not.  That would
(should) prevent it from clobbering the state set up by CLUSTER.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

2019-09-06 Thread Robert Haas

On Fri, Sep 6, 2019 at 1:44 AM Michael Paquier  wrote:
> One idea I got was to pass the command ID as an extra argument of the
> update routine.  I am not completely sure either if we need this level
> of complication.

I still don't think that's the right approach.

> > Those are just weaknesses of the infrastructure.  Perhaps there is a
> > better solution, but there's no intrinsic reason that we can't avoid
> > them by careful coding.
>
> Perhaps.  The current infra allows the addition of a progress report
> in code paths which are isolated from other things.  For CLUSTER, most
> things are fine as long as the progress is updated in cluster_rel(),
> the rest is too internal.

It's fine if things are updated as well -- it's just you need to make
sure that those places know whether or not they are supposed to be
doing those updates. Again, why can't we just pass down a value
telling them "do reindex-style progress updates" or "do cluster-style
progress updates" or "do no progress updates"?

> > Well, it might be OK to do that if we're clear that this is the index
> > progress-reporting view and the command is CLUSTER but it happens to
> > be building an index now so we're showing it here.  But I don't see
> > how it would work anyway: you can't reported cascading progress
> > reports in shared memory because you've only got a fixed amount of
> > space.
>
> I don't see exactly why we could not switch to a fixed number of
> slots, say 8, with one code path to start a progress which adds an
> extra report on the stack, one to remove one entry from the stack, and
> a new one to reset the whole thing for a backend.  This would not need
> much restructuration of course.

You could do that, but I don't think it's probably that great of an
idea.  Now you've built something which is significantly more complex
than the original design of this feature, but still not good enough to
report on the progress of a query tree.  I tend to think we should
confine ourselves to the progress reporting that can reasonably be
done within the current infrastructure until somebody invents a really
general mechanism that can handle, essentially, an EXPLAIN-on-the-fly
of a current query tree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

2019-09-05 Thread Michael Paquier

On Fri, Sep 06, 2019 at 02:44:18PM +0900, Michael Paquier wrote:
> I don't see exactly why we could not switch to a fixed number of
> slots, say 8, with one code path to start a progress which adds an
> extra report on the stack, one to remove one entry from the stack, and
> a new one to reset the whole thing for a backend.  This would not need
> much restructuration of course.

Wake up, Neo.  Your last sentence is confusing.  I meant that this
would need more design efforts, so that's not in scope for v12. 
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-09-05 Thread Michael Paquier

On Thu, Sep 05, 2019 at 03:17:51PM -0400, Robert Haas wrote:
> Oops.  Yeah, that's bogus (as are some of the other things you
> mention).  I think we're going to have to fix this by passing down
> some flags to these functions to tell them what kind of progress
> updates to do (or to do none).  Or else pass down a callback function
> and a context object, but that seems like it might be overkill.

One idea I got was to pass the command ID as an extra argument of the
update routine.  I am not completely sure either if we need this level
of complication.

> Those are just weaknesses of the infrastructure.  Perhaps there is a
> better solution, but there's no intrinsic reason that we can't avoid
> them by careful coding.

Perhaps.  The current infra allows the addition of a progress report
in code paths which are isolated from other things.  For CLUSTER, most
things are fine as long as the progress is updated in cluster_rel(),
the rest is too internal.

> Well, it might be OK to do that if we're clear that this is the index
> progress-reporting view and the command is CLUSTER but it happens to
> be building an index now so we're showing it here.  But I don't see
> how it would work anyway: you can't reported cascading progress
> reports in shared memory because you've only got a fixed amount of
> space.

I don't see exactly why we could not switch to a fixed number of
slots, say 8, with one code path to start a progress which adds an
extra report on the stack, one to remove one entry from the stack, and
a new one to reset the whole thing for a backend.  This would not need
much restructuration of course.

Finally comes the question of what do we do for v12?  I am adding in
CC Peter, Alvaro being already present, who have been involved in the
commits with CREATE INDEX and REINDEX.  It would be sad to revert a
this feature, but well I'd rather do that now than regret later
releasing the feature as it is currently shaped..  Let's see what the
others think.
--
Michael

signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-09-05 Thread Robert Haas

On Wed, Sep 4, 2019 at 9:03 PM Michael Paquier  wrote:
> For CLUSTER, the progress starts and ends in cluster_rel().  CLUSTER
> uses its code paths at the beginning, but then things get more
> complicated, particularly with finish_heap_swap() which calls directly
> reindex_table().  6f97457 includes one progress update at point which
> can be a problem per its shared nature in reindex_relation() with
> PROGRESS_CLUSTER_INDEX_REBUILD_COUNT.  This last part is wrong IMO,
> why should cluster reporting take priority in this code path,
> enforcing anything else?

Oops.  Yeah, that's bogus (as are some of the other things you
mention).  I think we're going to have to fix this by passing down
some flags to these functions to tell them what kind of progress
updates to do (or to do none).  Or else pass down a callback function
and a context object, but that seems like it might be overkill.

> On top of those issues, I see some problems with the current state of
> affairs, and I am repeating myself:
> - It is possible that pgstat_progress_update_param() is called for a
> given command for a code path taken by a completely different
> command, and that we have a real risk of showing a status completely
> buggy as the progress phases share the same numbers.
> - We don't consider wisely end and start progress handling for
> cascading calls, leading to a risk where we start command A, but
> for shared code paths where we assume that only command B can run then
> the processing abruptly ends for command A.

Those are just weaknesses of the infrastructure.  Perhaps there is a
better solution, but there's no intrinsic reason that we can't avoid
them by careful coding.

> - Is it actually fine to report information about a command completely
> different than the one provided by the client?  It is for example
> possible to call CLUSTER, but show up to the user progress report
> about PROGRESS_COMMAND_CREATE_INDEX (see reindex_index).  This could
> actually make sense if we are able to handle cascading progress
> reports.

Well, it might be OK to do that if we're clear that this is the index
progress-reporting view and the command is CLUSTER but it happens to
be building an index now so we're showing it here.  But I don't see
how it would work anyway: you can't reported cascading progress
reports in shared memory because you've only got a fixed amount of
space.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

2019-09-04 Thread Michael Paquier

On Wed, Sep 04, 2019 at 09:18:39AM -0400, Robert Haas wrote:
> I think this is all going in the wrong direction.  It's the
> responsibility of the people who are calling the pgstat_progress_*
> functions to only do so when it's appropriate.  Having the
> pgstat_progress_* functions try to untangle whether or not they ought
> to ignore calls made to them is backwards.

Check.

> It seems to me that the problem here can be summarized in this way:
> there's a bunch of code reuse in the relevant paths here, and somebody
> wasn't careful enough when adding progress reporting for one of the
> new commands, and so now things are broken, because e.g. CLUSTER
> progress reporting gets ended by a pgstat_progress_end_command() call
> that was intended for some other utility command but is reached in the
> CLUSTER case anyway.  The solution to that problem in my book is to
> figure out which commit broke it, and then the person who made that
> commit either needs to fix it or revert it.

I am not sure that it is right as well to say that the first committed
patch is right and that the follow-up ones are wrong.  CLUSTER
progress was committed first (6f97457), followed a couple of days
after by CREATE INDEX (ab0dfc9) and then REINDEX (03f9e5c).  So let's
have a look at them..

For CLUSTER, the progress starts and ends in cluster_rel().  CLUSTER
uses its code paths at the beginning, but then things get more
complicated, particularly with finish_heap_swap() which calls directly
reindex_table().  6f97457 includes one progress update at point which
can be a problem per its shared nature in reindex_relation() with
PROGRESS_CLUSTER_INDEX_REBUILD_COUNT.  This last part is wrong IMO,
why should cluster reporting take priority in this code path,
enforcing anything else?

For CREATE INDEX, the progress reporting starts and ends once in
DefineIndex().  However, we have updates of progress within each index
AM build routine, which could be taken by many code paths.  Is it
actually fine to give priority to CREATE INDEX in those cases?  Those
paths can as well be taken by REINDEX or CLUSTER (right?), so having a
counter for CREATE INDEX looks logically wrong to me.  The part where
we wait for snapshots looks actually good from the perspective of
REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY.

For REINDEX, we have a problematic start progress call in
reindex_index() which is for example called by reindex_relation for
each relation's index for a non-concurrent case (also in
ReindexRelationConcurrently()).  I think that these are incorrect
locations, and I would have placed them in ReindexIndex(),
ReindexTable() and ReindexMultipleTables() so as we avoid anything
low-level.  This has added two calls to pgstat_progress_update_param()
in reindex_index(), which is shared between all.  Why would it be fine
to give the priority to a CREATE INDEX marker here if CLUSTER can also
cross this way?

On top of those issues, I see some problems with the current state of
affairs, and I am repeating myself:
- It is possible that pgstat_progress_update_param() is called for a
given command for a code path taken by a completely different
command, and that we have a real risk of showing a status completely
buggy as the progress phases share the same numbers.
- We don't consider wisely end and start progress handling for
cascading calls, leading to a risk where we start command A, but 
for shared code paths where we assume that only command B can run then
the processing abruptly ends for command A.
- Is it actually fine to report information about a command completely
different than the one provided by the client?  It is for example
possible to call CLUSTER, but show up to the user progress report
about PROGRESS_COMMAND_CREATE_INDEX (see reindex_index).  This could
actually make sense if we are able to handle cascading progress
reports.

These are, at least it seems to me, fundamental problems we need to
ponder more about if we begin to include more progress reporting, and
we don't have that now, and that worries me.

> It's quite possible here that we need a bigger redesign to make adding
> progress reporting for new command easier and less prone to bugs, but
> I don't think it's at all clear what such a redesign should look like
> or even that we definitely need one, and September is not the right
> time to be redesigning features for the pending release.

Definitely.
--
Michael

signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-09-04 Thread Robert Haas

On Tue, Sep 3, 2019 at 1:02 AM Masahiko Sawada  wrote:
> After more thought, even if we don't start a new command progress when
> there is another one already started the progress update functions
> could be called and these functions don't specify the command type.
> Therefore, the progress information might be changed with wrong value
> by different command. Probably we can change the caller of progress
> updating function so that it doesn't call all of them if the command
> could not start a new progress report, but it might be a big change.
>
> As an alternative idea, we can make pgstat_progress_end_command() have
> one argument that is command the caller wants to end. That is, we
> don't end the command progress when the specified command type doesn't
> match to the command type of current running command progress. We
> unconditionally clear the progress information at CommitTransaction()
> and AbortTransaction() but we can do that by passing
> PROGRESS_COMMAND_INVALID to pgstat_progress_end_command().

I think this is all going in the wrong direction.  It's the
responsibility of the people who are calling the pgstat_progress_*
functions to only do so when it's appropriate.  Having the
pgstat_progress_* functions try to untangle whether or not they ought
to ignore calls made to them is backwards.

It seems to me that the problem here can be summarized in this way:
there's a bunch of code reuse in the relevant paths here, and somebody
wasn't careful enough when adding progress reporting for one of the
new commands, and so now things are broken, because e.g. CLUSTER
progress reporting gets ended by a pgstat_progress_end_command() call
that was intended for some other utility command but is reached in the
CLUSTER case anyway.  The solution to that problem in my book is to
figure out which commit broke it, and then the person who made that
commit either needs to fix it or revert it.

It's quite possible here that we need a bigger redesign to make adding
progress reporting for new command easier and less prone to bugs, but
I don't think it's at all clear what such a redesign should look like
or even that we definitely need one, and September is not the right
time to be redesigning features for the pending release.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

2019-09-04 Thread Masahiko Sawada

On Wed, Sep 4, 2019 at 3:48 PM Michael Paquier  wrote:
>
> On Tue, Sep 03, 2019 at 02:52:28PM +0900, Michael Paquier wrote:
> > Indeed, good catch.  This is wrong since b6fb647 which has introduced
> > the progress reports.  I'll fix that one and back-patch if there are
> > no objections.
>
> OK, applied this part down to 9.6.

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] CLUSTER command progress monitor

2019-09-03 Thread Michael Paquier

On Tue, Sep 03, 2019 at 02:52:28PM +0900, Michael Paquier wrote:
> Indeed, good catch.  This is wrong since b6fb647 which has introduced
> the progress reports.  I'll fix that one and back-patch if there are
> no objections.

OK, applied this part down to 9.6.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-09-02 Thread Michael Paquier

On Tue, Sep 03, 2019 at 01:59:00PM +0900, Masahiko Sawada wrote:
> After more thought, even if we don't start a new command progress when
> there is another one already started the progress update functions
> could be called and these functions don't specify the command type.
> Therefore, the progress information might be changed with wrong value
> by different command. Probably we can change the caller of progress
> updating function so that it doesn't call all of them if the command
> could not start a new progress report, but it might be a big change.

That's one issue.

> As an alternative idea, we can make pgstat_progress_end_command() have
> one argument that is command the caller wants to end. That is, we
> don't end the command progress when the specified command type doesn't
> match to the command type of current running command progress. We
> unconditionally clear the progress information at CommitTransaction()
> and AbortTransaction() but we can do that by passing
> PROGRESS_COMMAND_INVALID to pgstat_progress_end_command().

Possibly.  I don't dislike the idea of piling up the progress
information for cascading calls and I would use that with a depth
counter and a fixed-size array.  

> BTW the following condition in pgstat_progress_end_command() seems to
> be wrong. We should return from the function when either
> pgstat_track_activities is disabled or the current progress command is
> invalid.
> 
>if (!pgstat_track_activities
>   && beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
>   return;
> 
> I've attached the patch fixes the issue I newly found.

Indeed, good catch.  This is wrong since b6fb647 which has introduced
the progress reports.  I'll fix that one and back-patch if there are
no objections.

With my RMT hat on for v12, I don't think that it is really the moment
to discuss how we want to change this API post beta3, and we have room
for that in future development cycles.  There are quite some questions
which need answers I am unsure of:
- Do we really want to support nested calls of progress reports for
multiple command?
- Perhaps for some commands it makes sense to have an overlap of the
fields used, but we need a clear definition of what can be done or
not.  I am not really comfortable with the idea of having in
reindex_relation() a progress report related only to CLUSTER, which is
also a REINDEX code path.  The semantics shared between both commands
need to be thought a bit more.  For example
PROGRESS_CLUSTER_INDEX_REBUILD_COUNT could cause the system catalog to
report PROGRESS_CREATEIDX_PHASE_WAIT_3 because of an incorrect command
type, which would be just wrong for a CLUSTER command.
- Which command should be reported to the user, only the upper-level
one?
- Perhaps we can live only with the approach of not registering a new
command if one already exists, and actually be more flexible with the
phase fields used, in short we use unique numbers for each phase?

The most conservative bet from a release point of view, and actually
my bet because that's safe, would be to basically revert 6f97457
(CLUSTER), 03f9e5c (REINDEX) and ab0dfc96 (CREATE INDEX which has
overlaps with REINDEX in the build and validation paths).  What I am
really scared of is that we have just barely scratched the surface of
the issues caused by the inter-dependencies between all the code
paths of those commands, and that we have much more waiting behind
this open item.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-09-02 Thread Masahiko Sawada

On Mon, Sep 2, 2019 at 6:32 PM Masahiko Sawada  wrote:
>
> On Mon, Sep 2, 2019 at 4:59 PM Michael Paquier  wrote:
> >
> > On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote:
> > > > I tried 1. and it shown index_rebuild_count, but it also shown
> > > > "initializing" phase again and other columns were empty. So, it is
> > > > not suitable to fix the problem. :(
> > > > I'm going to try 2. and 3., but, unfortunately, I can't get enough
> > > > time to do that after PGConf.Asia 2019.
> > > > If we selected 3., it affects following these progress reporting
> > > > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay,
> > > > I suppose. Any comments welcome! :)
> > >
> > > I looked at this open item. I prefer #3 but I think it would be enough
> > > to have a small stack using a fixed length array to track nested
> > > progress information and the current executing command (maybe 4 or 8
> > > would be enough for maximum nested level for now?). That way, we don't
> > > need any change the interfaces. AFAICS there is no case where we call
> > > only either pgstat_progress_start_command or
> > > pgstat_progress_end_command without each other (although
> > > pgstat_progress_update_param is called without start). So I think that
> > > having a small stack for tracking multiple reports would work.
> > >
> > > Attached the draft patch that fixes this issue. Please review it.
> >
> > Do we actually want to show to the user information about CREATE INDEX
> > which is different than CLUSTER?  It could be confusing for the user
> > to see a progress report from a command different than the one
> > actually launched.
>
> I personally think it would be helpful for users. We provide the
> progress reporting for each commands but it might not be detailed
> enough. But we can provide more details of progress information of
> each commands by combining them. Only users who want to confirm the
> details need to see different progress reports.
>
> > There could be a method 4 here: do not start a new
> > command progress when there is another one already started, and do not
> > try to end it in the code path where it could not be started as it did
> > not stack.  So while I see the advantages of stacking the progress
> > records as you do when doing cascading calls of the progress
> > reporting, I am not sure that:
> > 1) We should bloat more PgBackendStatus for that.
> > 2) We want to add more complication in this area close to the
> > release of 12.
>
> I agreed, especially to 2. We can live with #4 method in PG12 and the
> patch I proposed could be discussed as a new feature.

After more thought, even if we don't start a new command progress when
there is another one already started the progress update functions
could be called and these functions don't specify the command type.
Therefore, the progress information might be changed with wrong value
by different command. Probably we can change the caller of progress
updating function so that it doesn't call all of them if the command
could not start a new progress report, but it might be a big change.

As an alternative idea, we can make pgstat_progress_end_command() have
one argument that is command the caller wants to end. That is, we
don't end the command progress when the specified command type doesn't
match to the command type of current running command progress. We
unconditionally clear the progress information at CommitTransaction()
and AbortTransaction() but we can do that by passing
PROGRESS_COMMAND_INVALID to pgstat_progress_end_command().

BTW the following condition in pgstat_progress_end_command() seems to
be wrong. We should return from the function when either
pgstat_track_activities is disabled or the current progress command is
invalid.

   if (!pgstat_track_activities
  && beentry->st_progress_command == PROGRESS_COMMAND_INVALID)
  return;

I've attached the patch fixes the issue I newly found.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


fix_progress_end_command.patch
Description: Binary data

Re: [HACKERS] CLUSTER command progress monitor

2019-09-02 Thread Masahiko Sawada

On Mon, Sep 2, 2019 at 4:59 PM Michael Paquier  wrote:
>
> On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote:
> > > I tried 1. and it shown index_rebuild_count, but it also shown
> > > "initializing" phase again and other columns were empty. So, it is
> > > not suitable to fix the problem. :(
> > > I'm going to try 2. and 3., but, unfortunately, I can't get enough
> > > time to do that after PGConf.Asia 2019.
> > > If we selected 3., it affects following these progress reporting
> > > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay,
> > > I suppose. Any comments welcome! :)
> >
> > I looked at this open item. I prefer #3 but I think it would be enough
> > to have a small stack using a fixed length array to track nested
> > progress information and the current executing command (maybe 4 or 8
> > would be enough for maximum nested level for now?). That way, we don't
> > need any change the interfaces. AFAICS there is no case where we call
> > only either pgstat_progress_start_command or
> > pgstat_progress_end_command without each other (although
> > pgstat_progress_update_param is called without start). So I think that
> > having a small stack for tracking multiple reports would work.
> >
> > Attached the draft patch that fixes this issue. Please review it.
>
> Do we actually want to show to the user information about CREATE INDEX
> which is different than CLUSTER?  It could be confusing for the user
> to see a progress report from a command different than the one
> actually launched.

I personally think it would be helpful for users. We provide the
progress reporting for each commands but it might not be detailed
enough. But we can provide more details of progress information of
each commands by combining them. Only users who want to confirm the
details need to see different progress reports.

> There could be a method 4 here: do not start a new
> command progress when there is another one already started, and do not
> try to end it in the code path where it could not be started as it did
> not stack.  So while I see the advantages of stacking the progress
> records as you do when doing cascading calls of the progress
> reporting, I am not sure that:
> 1) We should bloat more PgBackendStatus for that.
> 2) We want to add more complication in this area close to the
> release of 12.

I agreed, especially to 2. We can live with #4 method in PG12 and the
patch I proposed could be discussed as a new feature.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: [HACKERS] CLUSTER command progress monitor

2019-09-02 Thread Michael Paquier

On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote:
> > I tried 1. and it shown index_rebuild_count, but it also shown
> > "initializing" phase again and other columns were empty. So, it is
> > not suitable to fix the problem. :(
> > I'm going to try 2. and 3., but, unfortunately, I can't get enough
> > time to do that after PGConf.Asia 2019.
> > If we selected 3., it affects following these progress reporting
> > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay,
> > I suppose. Any comments welcome! :)
> 
> I looked at this open item. I prefer #3 but I think it would be enough
> to have a small stack using a fixed length array to track nested
> progress information and the current executing command (maybe 4 or 8
> would be enough for maximum nested level for now?). That way, we don't
> need any change the interfaces. AFAICS there is no case where we call
> only either pgstat_progress_start_command or
> pgstat_progress_end_command without each other (although
> pgstat_progress_update_param is called without start). So I think that
> having a small stack for tracking multiple reports would work.
> 
> Attached the draft patch that fixes this issue. Please review it.

Do we actually want to show to the user information about CREATE INDEX
which is different than CLUSTER?  It could be confusing for the user
to see a progress report from a command different than the one
actually launched.  There could be a method 4 here: do not start a new
command progress when there is another one already started, and do not
try to end it in the code path where it could not be started as it did
not stack.  So while I see the advantages of stacking the progress
records as you do when doing cascading calls of the progress
reporting, I am not sure that:
1) We should bloat more PgBackendStatus for that.
2) We want to add more complication in this area close to the
release of 12.

Another solution as mentioned by Yamada-san is just to start again the
progress for the cluster command in reindex_relation() however you got
an issue here as reindex_relation() is also called by REINDEX TABLE. I
find actually very weird that we have a cluster-related field added
for REINDEX, so it seems to me that all the interactions between the
code paths of CLUSTER and REINDEX have not been completely thought
through.  This part has been added by 6f97457 :(
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-08-30 Thread Masahiko Sawada

On Thu, Aug 15, 2019 at 12:48 PM Tatsuro Yamada
 wrote:
>
> Hi Michael, Alvaro and Robert!
>
> On 2019/08/14 11:52, Michael Paquier wrote:
> > On Wed, Aug 14, 2019 at 11:38:01AM +0900, Tatsuro Yamada wrote:
> >> On 2019/08/13 14:40, Tatsuro Yamada wrote:
> >>> On 2019/08/02 3:43, Alvaro Herrera wrote:
>  Hmm, I'm trying this out now and I don't see the index_rebuild_count
>  ever go up.  I think it's because the indexes are built using parallel
>  index build ... or maybe it was the table AM changes that moved things
>  around, not sure.  There's a period at the end when the CLUSTER command
>  keeps working, but it's gone from pg_stat_progress_cluster.
> >>>
> >>> Thanks for your report.
> >>> I'll investigate it. :)
> >>
> >> I did "git bisect" and found the commit:
> >>
> >>   03f9e5cba0ee1633af4abe734504df50af46fbd8
> >>   Report progress of REINDEX operations
> >
> > I am adding an open item for this one.
> > --
> > Michael
>
>
> Okay, I checked it on the wiki.
>
>https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items
>- index_rebuild_count in CLUSTER reporting never increments
>
> To be clear, 03f9e5cb broke CLUSTER progress reporting, but
> I investigated little more and share my ideas to fix the problem.
>
> * Call stack
> 
> cluster_rel
>pgstat_progress_start_command(CLUSTER) *A1
>rebuild_relation
>  finish_heap_swap
>reindex_relation
>  reindex_index
>pgstat_progress_start_command(CREATE_INDEX) *B1
>pgstat_progress_end_command() *B2
>  pgstat_progress_update_param(INDEX_REBUILD_COUNT, i) <- failed :(
>pgstat_progress_end_command() *A2
>
>Note
>  These are sets:
>A1 and A2,
>B1 and B2
> 
>
>
> * Ideas to fix
>There are Three options, I guess.
> 
>1. Call pgstat_progress_start_command(CLUSTER) again
>   before pgstat_progress_update_param(INDEX_REBUILD_COUNT, i).
>
>2. Add "save and restore" functions for the following two
>   variables of MyBeentry in pgstat.c.
>  - st_progress_command
>  - st_progress_command_target
>
>3. Use Hash or List to store multiple values for the two
>   variables in pgstat.c.
> 
>
>
> I tried 1. and it shown index_rebuild_count, but it also shown
> "initializing" phase again and other columns were empty. So, it is
> not suitable to fix the problem. :(
> I'm going to try 2. and 3., but, unfortunately, I can't get enough
> time to do that after PGConf.Asia 2019.
> If we selected 3., it affects following these progress reporting
> features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay,
> I suppose. Any comments welcome! :)

I looked at this open item. I prefer #3 but I think it would be enough
to have a small stack using a fixed length array to track nested
progress information and the current executing command (maybe 4 or 8
would be enough for maximum nested level for now?). That way, we don't
need any change the interfaces. AFAICS there is no case where we call
only either pgstat_progress_start_command or
pgstat_progress_end_command without each other (although
pgstat_progress_update_param is called without start). So I think that
having a small stack for tracking multiple reports would work.

Attached the draft patch that fixes this issue. Please review it.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index d362e7f7d7..99e4844e6c 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3016,8 +3016,10 @@ pgstat_bestart(void)
 #endif
 
 	lbeentry.st_state = STATE_UNDEFINED;
-	lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
-	lbeentry.st_progress_command_target = InvalidOid;
+	MemSet(&(lbeentry.st_progress_cmds), 0,
+		   sizeof(PgBackendProgressInfo) * PGSTAT_MAX_PROGRESS_INFO);
+	/* Set invalid command index */
+	lbeentry.st_current_cmd = -1;
 
 	/*
 	 * we don't zero st_progress_param here to save cycles; nobody should
@@ -3203,10 +3205,22 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid)
 	if (!beentry || !pgstat_track_activities)
 		return;
 
+	Assert(beentry->st_current_cmd >= -1);
+
+	/* The given command is already started */
+	if (beentry->st_progress_cmds[beentry->st_current_cmd].command == cmdtype)
+		return;
+
+	/* The progress information queue is full */
+	if (beentry->st_current_cmd >= PGSTAT_MAX_PROGRESS_INFO - 1)
+		elog(ERROR, "progress information per backends is full");
+
 	PGSTAT_BEGIN_WRITE_ACTIVITY(beentry);
-	beentry->st_progress_command = cmdtype;
-	beentry->st_progress_command_target = relid;
-	MemSet(&beentry->st_progress_param, 0, sizeof(beentry->st_progress_param));
+	beentry->st_current_cmd++;
+	MemSet(&(beentry->st_pr

Re: [HACKERS] CLUSTER command progress monitor

2019-08-14 Thread Tatsuro Yamada


Hi Michael, Alvaro and Robert!

On 2019/08/14 11:52, Michael Paquier wrote:

On Wed, Aug 14, 2019 at 11:38:01AM +0900, Tatsuro Yamada wrote:

On 2019/08/13 14:40, Tatsuro Yamada wrote:

On 2019/08/02 3:43, Alvaro Herrera wrote:

Hmm, I'm trying this out now and I don't see the index_rebuild_count
ever go up.  I think it's because the indexes are built using parallel
index build ... or maybe it was the table AM changes that moved things
around, not sure.  There's a period at the end when the CLUSTER command
keeps working, but it's gone from pg_stat_progress_cluster.


Thanks for your report.
I'll investigate it. :)


I did "git bisect" and found the commit:

  03f9e5cba0ee1633af4abe734504df50af46fbd8
  Report progress of REINDEX operations


I am adding an open item for this one.
--
Michael



Okay, I checked it on the wiki.

  https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items
  - index_rebuild_count in CLUSTER reporting never increments

To be clear, 03f9e5cb broke CLUSTER progress reporting, but
I investigated little more and share my ideas to fix the problem.

* Call stack

cluster_rel
  pgstat_progress_start_command(CLUSTER) *A1
  rebuild_relation
finish_heap_swap
  reindex_relation
reindex_index
  pgstat_progress_start_command(CREATE_INDEX) *B1
  pgstat_progress_end_command() *B2
pgstat_progress_update_param(INDEX_REBUILD_COUNT, i) <- failed :(
  pgstat_progress_end_command() *A2

  Note
These are sets:
  A1 and A2,
  B1 and B2



* Ideas to fix
  There are Three options, I guess.

  1. Call pgstat_progress_start_command(CLUSTER) again
 before pgstat_progress_update_param(INDEX_REBUILD_COUNT, i).

  2. Add "save and restore" functions for the following two
 variables of MyBeentry in pgstat.c.
- st_progress_command
- st_progress_command_target

  3. Use Hash or List to store multiple values for the two
 variables in pgstat.c.



I tried 1. and it shown index_rebuild_count, but it also shown
"initializing" phase again and other columns were empty. So, it is
not suitable to fix the problem. :(
I'm going to try 2. and 3., but, unfortunately, I can't get enough
time to do that after PGConf.Asia 2019.
If we selected 3., it affects following these progress reporting
features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay,
I suppose. Any comments welcome! :)


Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

2019-08-13 Thread Michael Paquier

On Wed, Aug 14, 2019 at 11:38:01AM +0900, Tatsuro Yamada wrote:
> On 2019/08/13 14:40, Tatsuro Yamada wrote:
>> On 2019/08/02 3:43, Alvaro Herrera wrote:
>>> Hmm, I'm trying this out now and I don't see the index_rebuild_count
>>> ever go up.  I think it's because the indexes are built using parallel
>>> index build ... or maybe it was the table AM changes that moved things
>>> around, not sure.  There's a period at the end when the CLUSTER command
>>> keeps working, but it's gone from pg_stat_progress_cluster.
>> 
>> Thanks for your report.
>> I'll investigate it. :)
> 
> I did "git bisect" and found the commit:
> 
>  03f9e5cba0ee1633af4abe734504df50af46fbd8
>  Report progress of REINDEX operations

I am adding an open item for this one.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-08-13 Thread Tatsuro Yamada


Hi Alvaro and All,

On 2019/08/13 14:40, Tatsuro Yamada wrote:

Hi Alvaro!

On 2019/08/02 3:43, Alvaro Herrera wrote:

Hmm, I'm trying this out now and I don't see the index_rebuild_count
ever go up.  I think it's because the indexes are built using parallel
index build ... or maybe it was the table AM changes that moved things
around, not sure.  There's a period at the end when the CLUSTER command
keeps working, but it's gone from pg_stat_progress_cluster.



Thanks for your report.
I'll investigate it. :)



I did "git bisect" and found the commit:

 03f9e5cba0ee1633af4abe734504df50af46fbd8
 Report progress of REINDEX operations

In src/backend/catalog/index.c,
CLUSTER progress reporting increases index_rebuild_count in reindex_relation()
by pgstat_progress_update_param().
However, reindex_relation() calls reindex_index(), and REINDEX progress 
reporting is existing on the latter function, and it starts 
pgstat_progress_start_command() pgstat_progress_end_command() for REINDEX 
progress reporting.
Therefore, CLUSTER progress reporting failed to update index_rebuild_count 
because
it made a mistake to update the REINDEX's view, I think.

My Idea to fix that is following:

 - Add a target view name parameter to Progress monitor's API

  For example:
  
pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT, i).

  
pgstat_progress_update_param(*PROGRESS_CLUSTER_VIEW*, 
PROGRESS_CLUSTER_INDEX_REBUILD_COUNT, i).

However, I'm not sure whether it is able or not because I haven't read
the code of the API yet.
What do you think about that? :)

Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

2019-08-12 Thread Tatsuro Yamada


Hi Alvaro!

On 2019/08/02 3:43, Alvaro Herrera wrote:

Hmm, I'm trying this out now and I don't see the index_rebuild_count
ever go up.  I think it's because the indexes are built using parallel
index build ... or maybe it was the table AM changes that moved things
around, not sure.  There's a period at the end when the CLUSTER command
keeps working, but it's gone from pg_stat_progress_cluster.



Thanks for your report.
I'll investigate it. :)


Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

2019-08-01 Thread Alvaro Herrera

Hmm, I'm trying this out now and I don't see the index_rebuild_count
ever go up.  I think it's because the indexes are built using parallel
index build ... or maybe it was the table AM changes that moved things
around, not sure.  There's a period at the end when the CLUSTER command
keeps working, but it's gone from pg_stat_progress_cluster.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

2019-03-25 Thread Michael Paquier

On Tue, Mar 26, 2019 at 10:04:48AM +0900, Tatsuro Yamada wrote:
> Hope this feature will help DBA and user. :)

Congrats, Yamada-san.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2019-03-25 Thread Tatsuro Yamada


Hi Robert and Reviewers!

On 2019/03/26 0:02, Robert Haas wrote:

On Sun, Mar 24, 2019 at 9:02 PM Tatsuro Yamada
 wrote:

Please find attached files. :)


Committed.  Thanks!


Thank you!
Hope this feature will help DBA and user. :)

Regards,
Tatsuro Yamada
NTT Open Source Software Center

Re: [HACKERS] CLUSTER command progress monitor

2019-03-25 Thread Robert Haas

On Sun, Mar 24, 2019 at 9:02 PM Tatsuro Yamada
 wrote:
> Please find attached files. :)

Committed.  Thanks!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

2019-03-24 Thread Tatsuro Yamada


Hi Robert!

On 2019/03/23 3:31, Robert Haas wrote:

On Tue, Mar 19, 2019 at 2:47 PM Robert Haas  wrote:

how close you were getting to rewriting the entire heap.  This is the
one thing I found but did not fix; any chance you could make this
change and update the documentation to match?


Hi, is anybody working on this?


I sent this email using my personal email address: yamatattsu@gmail-.
I re-send it with the patch and my test result.

Thank you so much for reviewing the patch and sorry for the late reply.
Today, I realized that you sent the email for the patch because I took a
sick leave from work for a while. So, I created new patch based on your 
comments asap.
I hope it is acceptable to you. :)

Changes
  - Add new column *heap_tuples_written* in the view
  This column is updated when the phases are "seq scanning heap",
  "index scanning heap" or "writing new heap".
  - Fix document
  - Revised the patch on the current head: 
940311e4bb32a5fe99155052e41179c88b5d48af.

Please find attached files. :)


Regards,
Tatsuro Yamada



diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index ac2721c..26a6899 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -340,7 +340,15 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   pg_stat_progress_vacuumpg_stat_progress_vacuum
   One row for each backend (including autovacuum worker processes) running
VACUUM, showing current progress.
-   See .
+   See .
+  
+ 
+
+ 
+  pg_stat_progress_clusterpg_stat_progress_cluster
+  One row for each backend running
+   CLUSTER or VACUUM FULL, showing current progress.
+   See .
   
  
 
@@ -3394,9 +3402,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
   
PostgreSQL has the ability to report the progress of
-   certain commands during command execution.  Currently, the only command
-   which supports progress reporting is VACUUM.  This may be
-   expanded in the future.
+   certain commands during command execution.  Currently, the only commands
+   which support progress reporting are VACUUM and
+   CLUSTER. This may be expanded in the future.
   
 
  
@@ -3408,9 +3416,11 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
one row for each backend (including autovacuum worker processes) that is
currently vacuuming.  The tables below describe the information
that will be reported and provide information about how to interpret it.
-   Progress reporting is not currently supported for VACUUM FULL
-   and backends running VACUUM FULL will not be listed in this
-   view.
+   Progress for VACUUM FULL commands is reported via
+   pg_stat_progress_cluster
+   because both VACUUM FULL and CLUSTER 
+   rewrite the table, while regular VACUUM only modifies it 
+   in place. See .
   
 
   
@@ -3588,6 +3598,183 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
 
  
+
+ 
+  CLUSTER Progress Reporting
+
+  
+   Whenever CLUSTER or VACUUM FULL is
+   running, the pg_stat_progress_cluster view will
+   contain a row for each backend that is currently running either command. 
+   The tables below describe the information that will be reported and
+   provide information about how to interpret it.
+  
+
+  
+   pg_stat_progress_cluster View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+
+ datid
+ oid
+ OID of the database to which this backend is connected.
+
+
+ datname
+ name
+ Name of the database to which this backend is connected.
+
+
+ relid
+ oid
+ OID of the table being clustered.
+
+
+ command
+ text
+ 
+   The command that is running. Either CLUSTER or VACUUM FULL.
+ 
+
+
+ phase
+ text
+ 
+   Current processing phase. See .
+ 
+
+
+ cluster_index_relid
+ bigint
+ 
+   If the table is being scanned using an index, this is the OID of the
+   index being used; otherwise, it is zero.
+ 
+
+
+ heap_tuples_scanned
+ bigint
+ 
+   Number of heap tuples scanned.
+   This counter only advances when the phase is
+   seq scanning heap,
+   index scanning heap
+   or writing new heap.
+ 
+
+
+ heap_tuples_written
+ bigint
+ 
+   Number of heap tuples written.
+   This counter only advances when the phase is
+   seq scanning heap,
+   index scanning heap
+   or writing new heap.
+ 
+
+
+ heap_blks_total
+ bigint
+ 
+   Total number of heap blocks in the table.  This number is reported
+   as of the beginning of seq scanning heap.
+ 
+
+
+ heap_blks_scanned
+ bigint
+ 
+   Number of heap blocks scanned.  This counter only advances when the
+   phase is seq scanning heap.
+

Re: [HACKERS] CLUSTER command progress monitor

2019-03-24 Thread Tattsu Yama

Hi Robert! >On Tue, Mar 19, 2019 at 2:47 PM Robert Haas
 wrote: >> how close you were getting to
rewriting the entire heap. This is the >> one thing I found but did not
fix; any chance you could make this >> change and update the documentation
to match? > > >Hi, is anybody working on this? Thank you so much for
reviewing the patch and sorry for the late reply. Today, I realized that
you sent the email for the patch because I took a sick leave from work for
a while. So, I created new patch based on your comments asap. I hope it is
acceptable to you. :) Please find attached file. Changes - Add new column
*heap_tuples_written* in the view This column is updated when the phases
are "seq scanning heap", "index scanning heap" or "writing new heap". - Fix
document - Revised the patch on 280a408b48d5ee42969f981bceb9e9426c3a344c









Regards,



Tatsuro Yamada


progress_monitor_for_cluster_command_v12.patch
Description: Binary data

Re: [HACKERS] CLUSTER command progress monitor

2019-03-22 Thread Robert Haas

On Tue, Mar 19, 2019 at 2:47 PM Robert Haas  wrote:
> how close you were getting to rewriting the entire heap.  This is the
> one thing I found but did not fix; any chance you could make this
> change and update the documentation to match?

Hi, is anybody working on this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

2019-03-19 Thread Robert Haas

On Mon, Mar 18, 2019 at 10:03 PM Tatsuro Yamada
 wrote:
> Attached patch is a rebased document patch. :)

Attached is an updated patch.  I went through this patch carefully
today, in the hopes of committing it, and I think the attached version
is pretty closet to being committable, but there's at least one open
issue remaining, as described below.

- The regression tests did not pass because expected/rules.out was not
properly updated.  I fixed that.

- The documentation did not build because some tags were not properly
terminated e.g.  rather than .  I also fixed that.

- The documentation had two nearly-identical lists of phases.  I
merged them into one.  There might be room for some further
fine-tuning here.

- cluster_rel() had multiple places where it could return without
calling pgstat_progress_end_command().  I fixed that.

- cluster_rel() inexplicably delayed updating PROGRESS_CLUSTER_COMMAND
for longer than seems necessary.  I fixed that.

- copy_heap_data() zeroed out the heap-tuples-scanned,
heap-blocks-scanned, and total-heap-blocks counters when it began
PROGRESS_CLUSTER_PHASE_SORT_TUPLES and
PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES.  This seems like throwing away
useful information for no good reason.  I changed it not to do that in
all cases except the one mentioned in the next paragraph.

- It *is* currently to reset PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED
because that counter gets reused to indicate the number of heap tuples
*written back out*, but I think that is bad design for two reasons.
First, the documentation does not explain that sometimes the number of
heap tuples scanned is really reporting the number of heap tuples
written.  Second, it's bad for columns to have misleading names.
Third, it would actually be really useful to store these values in
separate columns, because then you could expect that the number tuples
written would eventually equal the number scanned, and you'd still
have the number that were scanned around so that you could clearly see
how close you were getting to rewriting the entire heap.  This is the
one thing I found but did not fix; any chance you could make this
change and update the documentation to match?

- The comment about reporting that we are now reindexing relations was
jammed in between an existing comment and the associated code.  I
moved it to a more logical place.

- The new if-statement in pg_stat_get_progress_info was missing a
space required by project style.  I added the space.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


progress_monitor_for_cluster_command_v11.patch
Description: Binary data

Re: [HACKERS] CLUSTER command progress monitor

2019-03-19 Thread Kyotaro HORIGUCHI

At Tue, 19 Mar 2019 11:02:57 +0900, Tatsuro Yamada 
 wrote in 

> On 2019/03/19 10:43, Tatsuro Yamada wrote:
> > Hi Rafia!
> > On 2019/03/18 20:42, Rafia Sabih wrote:
> >> On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada
> >>  wrote:
> >>> Attached file is rebased patch on current HEAD.
> >>> I changed a status. :)
> >>>
> >>>
> >> Looks like the patch needs a rebase.
> >> I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2
> >>
> >> PFA reject file in case you want to have a look.
> > Thanks for testing it. :)
> > I rebased the patch on the current head:
> > f2004f19ed9c9228d3ea2b12379ccb4b9212641f.
> > Please find attached file.
> > Also, I share my test case of progress monitor below.
> 
> 
> Attached patch is a rebased document patch. :)

The monitor view has four columns:

  heap_tuples_scanned
: used while the "seq scan" and "index scan" phases.

  heap_blks_total, heap_blks_scanned
: used only while the "seq scan" phase.

  index_rebuild_count:
: used only while the "rebuilding index" phase.

Couldn't we change the view like the following?

.. |   phase| heap tuples scanned | progress indicator  | value
..-++-+-+
.. | seq scan ..|  2134214| heap blocks pct | 23.5%
.. | index sc ..|  5453395| (null)  |  (null)
.. | sorting  ..| | (null)  |  (null)
.. | swapping ..| | (null)  |  (null)
.. | rebuildi ..| | index count |  2
.. | performi ..| | (null)  |  (null)

Only seq scan phase has two kind of progress indicator so if we
choose "heap blks pct" as the only indicator for the phase, it
could be simplified as:

.. |   phase | progress indicator  | value
..-+-+-+
.. | seq scan .. | heap blocks pct | 23.5%
.. | index sc .. | heap tuples scanned | 2398457
.. | sorting  .. | (null)  | (null)
.. | swapping .. | (null)  | (null)
.. | rebuildi .. | index count | 2
.. | performi .. | (null)  | (null)

A downside of the view is that it looks quite differently from
pg_stat_progress_vacuum. Since I'm not sure it's a good design,
feel free to oppose/reject this.

finish_heap_swap is also called in matview code path but the
function doesn't seem to detect that situation. Is it right
behavior? (I didn't confirm what happens when it is called from
matview refresh path, though.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] CLUSTER command progress monitor

2019-03-18 Thread Tatsuro Yamada


On 2019/03/19 10:43, Tatsuro Yamada wrote:

Hi Rafia!

On 2019/03/18 20:42, Rafia Sabih wrote:

On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada
 wrote:

Attached file is rebased patch on current HEAD.
I changed a status. :)



Looks like the patch needs a rebase.
I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2

PFA reject file in case you want to have a look.


Thanks for testing it. :)
I rebased the patch on the current head: 
f2004f19ed9c9228d3ea2b12379ccb4b9212641f.

Please find attached file.

Also, I share my test case of progress monitor below.



Attached patch is a rebased document patch. :)


Thanks,
Tatsuro Yamada

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index ac2721c8ad..79d98bb601 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -344,6 +344,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_progress_clusterpg_stat_progress_cluster
+  One row for each backend running
+   CLUSTER or VACUUM FULL, showing current progress.
+   See .
+  
+ 
+
 

   
@@ -3394,9 +3402,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
   
PostgreSQL has the ability to report the progress of
-   certain commands during command execution.  Currently, the only command
-   which supports progress reporting is VACUUM.  This may be
-   expanded in the future.
+   certain commands during command execution.  Currently, the only commands which 
+   support progress reporting are VACUUM and
+   CLUSTER. This may be expanded in the future.
   
 
  
@@ -3408,9 +3416,10 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
one row for each backend (including autovacuum worker processes) that is
currently vacuuming.  The tables below describe the information
that will be reported and provide information about how to interpret it.
-   Progress reporting is not currently supported for VACUUM FULL
-   and backends running VACUUM FULL will not be listed in this
-   view.
+   Running VACUUM FULL is listed in pg_stat_progress_cluster
+   because both VACUUM FULL and CLUSTER 
+   rewrite the table, while regular VACUUM only modifies it 
+   in place. See .
   
 
   
@@ -3587,6 +3596,218 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,

   
 
+ 
+
+ 
+  CLUSTER Progress Reporting
+
+  
+   Whenever CLUSTER is running, the
+   pg_stat_progress_cluster view will contain
+   a row for each backend that is currently running CLUSTER or VACUUM FULL. 
+   The tables below describe the information that will be reported and
+   provide information about how to interpret it.
+  
+
+  
+   pg_stat_progress_cluster View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+
+ datid
+ oid
+ OID of the database to which this backend is connected.
+
+
+ datname
+ name
+ Name of the database to which this backend is connected.
+
+
+ relid
+ oid
+ OID of the table being clustered.
+
+
+ command
+ text
+ 
+   The command that is running. Either CLUSTER or VACUUM FULL.
+ 
+
+
+ phase
+ text
+ 
+   Current processing phase. See  or .
+ 
+
+
+ cluster_index_relid
+ bigint
+ 
+   If the table is being scanned using an index, this is the OID of the
+   index being used; otherwise, it is zero.
+ 
+
+
+ heap_tuples_scanned
+ bigint
+ 
+   Number of heap tuples scanned.
+   This counter only advances when the phase is seq scanning heap, 
+   index scanning heap and writing new heap.
+ 
+
+
+ heap_blks_total
+ bigint
+ 
+   Total number of heap blocks in the table.  This number is reported
+   as of the beginning of seq scanning heap.
+ 
+
+
+ heap_blks_scanned
+ bigint
+ 
+   Number of heap blocks scanned. 
+   This counter only advances when the phase is seq scanning heap.
+ 
+
+
+ index_rebuild_count
+ bigint
+ 
+   Number of rebuilded indexes.
+   This counter only advances when the phase is rebuilding index.
+ 
+
+   
+   
+  
+
+  
+   CLUSTER phases
+   
+
+
+  Phase
+  Description
+ 
+
+
+   
+
+ initializing
+ 
+   CLUSTER is preparing to begin scanning the heap.  This
+   phase is expected to be very brief.
+ 
+
+
+ seq scanning heap
+ 
+   CLUSTER is currently scanning heap from the table by
+   seq scan.
+ 
+
+
+ index scanning heap
+ 
+   CLUSTER is currently scanning heap from the table by
+   index scan.
+ 
+
+
+ sorting tuples
+ 
+   CLUSTER is currently sorting tuples. 
+ 
+
+
+ swapping relation files
+ 
+   CLUSTER is currently

Re: [HACKERS] CLUSTER command progress monitor

2019-03-18 Thread Tatsuro Yamada


Hi Rafia!

On 2019/03/18 20:42, Rafia Sabih wrote:

On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada
 wrote:

Attached file is rebased patch on current HEAD.
I changed a status. :)



Looks like the patch needs a rebase.
I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2

PFA reject file in case you want to have a look.


Thanks for testing it. :)
I rebased the patch on the current head: 
f2004f19ed9c9228d3ea2b12379ccb4b9212641f.

Please find attached file.

Also, I share my test case of progress monitor below.

=== My test case ===

[Terminal1]
Run this query on psql:

   \a \t
   select * from pg_stat_progress_cluster; \watch 0.05

[Terminal2]
Run these queries on psql:

drop table t1;

create table t1 as select a, random() * 1000 as b from generate_series(0, 
99) a;
create index idx_t1 on t1(a);
create index idx_t1_b on t1(b);
analyze t1;

-- index scan
set enable_seqscan to off;
cluster verbose t1 using idx_t1;

-- seq scan
set enable_seqscan to on;
set enable_indexscan to off;
cluster verbose t1 using idx_t1;

-- only given table name to cluster command
cluster verbose t1;

-- only cluster command
cluster verbose;

-- vacuum full
vacuum full t1;

-- vacuum full
vacuum full;




Regards,
Tatsuro Yamada









diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index c339a2bb77..8a634dd57e 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -52,6 +52,7 @@
 #include "catalog/storage.h"
 #include "commands/tablecmds.h"
 #include "commands/event_trigger.h"
+#include "commands/progress.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -59,6 +60,7 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/optimizer.h"
 #include "parser/parser.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
@@ -3851,6 +3853,7 @@ reindex_relation(Oid relid, int flags, int options)
 	List	   *indexIds;
 	bool		is_pg_class;
 	bool		result;
+	int			i;
 
 	/*
 	 * Open and lock the relation.  ShareLock is sufficient since we only need
@@ -3938,6 +3941,7 @@ reindex_relation(Oid relid, int flags, int options)
 
 		/* Reindex all the indexes. */
 		doneIndexes = NIL;
+		i = 1;
 		foreach(indexId, indexIds)
 		{
 			Oid			indexOid = lfirst_oid(indexId);
@@ -3955,6 +3959,11 @@ reindex_relation(Oid relid, int flags, int options)
 
 			if (is_pg_class)
 doneIndexes = lappend_oid(doneIndexes, indexOid);
+
+			/* Set index rebuild count */
+			pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT,
+		 i);
+			i++;
 		}
 	}
 	PG_CATCH();
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index d962648bc5..87c0092787 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -907,6 +907,32 @@ CREATE VIEW pg_stat_progress_vacuum AS
 FROM pg_stat_get_progress_info('VACUUM') AS S
 		LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_cluster AS
+SELECT
+S.pid AS pid,
+S.datid AS datid,
+D.datname AS datname,
+S.relid AS relid,
+CASE S.param1 WHEN 1 THEN 'CLUSTER'
+  WHEN 2 THEN 'VACUUM FULL'
+  END AS command,
+CASE S.param2 WHEN 0 THEN 'initializing'
+  WHEN 1 THEN 'seq scanning heap'
+  WHEN 2 THEN 'index scanning heap'
+  WHEN 3 THEN 'sorting tuples'
+  WHEN 4 THEN 'writing new heap'
+  WHEN 5 THEN 'swapping relation files'
+  WHEN 6 THEN 'rebuilding index'
+  WHEN 7 THEN 'performing final cleanup'
+  END AS phase,
+S.param3 AS cluster_index_relid,
+S.param4 AS heap_tuples_scanned,
+S.param5 AS heap_blks_total,
+S.param6 AS heap_blks_scanned,
+S.param7 AS index_rebuild_count
+FROM pg_stat_get_progress_info('CLUSTER') AS S
+LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_user_mappings AS
 SELECT
 U.oid   AS umid,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index 3e2a807640..478894c869 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -36,10 +36,12 @@
 #include "catalog/objectaccess.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
@@ -276,6 +278,8 @@ cluster_rel(Oid tableOid, Oid indexOid, int options)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
+	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
+
 	/*
 	 * We grab exclusive access t

Re: [HACKERS] CLUSTER command progress monitor

2019-03-18 Thread Rafia Sabih

On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada
 wrote:
>
> On 2019/03/06 15:38, Tatsuro Yamada wrote:
> > On 2019/03/05 17:56, Tatsuro Yamada wrote:
> >> On 2019/03/05 11:35, Robert Haas wrote:
> >>> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
> >>>  wrote:
>  === Current design ===
> 
>  CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
>  Depending on which one is chosen, the command will proceed in the
>  following sequence of phases:
> 
>  * Scan method: Seq Scan
>    0. initializing (*2)
>    1. seq scanning heap(*1)
>    3. sorting tuples   (*2)
>    4. writing new heap (*1)
>    5. swapping relation files  (*2)
>    6. rebuilding index (*2)
>    7. performing final cleanup (*2)
> 
>  * Scan method: Index Scan
>    0. initializing (*2)
>    2. index scanning heap  (*1)
>    5. swapping relation files  (*2)
>    6. rebuilding index (*2)
>    7. performing final cleanup (*2)
> 
>  VACUUM FULL command will proceed in the following sequence of phases:
> 
>    1. seq scanning heap(*1)
>    5. swapping relation files  (*2)
>    6. rebuilding index (*2)
>    7. performing final cleanup (*2)
> 
>  (*1): increasing the value in heap_tuples_scanned column
>  (*2): only shows the phase in the phase column
> >>>
> >>> All of that sounds good.
> >>>
>  The view provides the information of CLUSTER command progress details as 
>  follows
>  # \d pg_stat_progress_cluster
>  View "pg_catalog.pg_stat_progress_cluster"
>  Column   |  Type   | Collation | Nullable | Default
>  ---+-+---+--+-
> pid   | integer |   |  |
> datid | oid |   |  |
> datname   | name|   |  |
> relid | oid |   |  |
> command   | text|   |  |
> phase | text|   |  |
> cluster_index_relid   | bigint  |   |  |
> heap_tuples_scanned   | bigint  |   |  |
> heap_tuples_vacuumed  | bigint  |   |  |
> >>>
> >>> Still not sure if we need heap_tuples_vacuumed.  We could try to
> >>> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
> >>> we're using a Seq Scan.
> >>
> >> I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that 
> >> in
> >> next patch.
> >>
> >> Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able 
> >> to
> >> get those from initscan(). I'll investigate it more.
> >>
> >> cluster.c
> >>copy_heap_data()
> >>  heap_beginscan()
> >>heap_beginscan_internal()
> >>  initscan()
> >>
> >>
> >>
>  === Discussion points ===
> 
> - Progress counter for "3. sorting tuples" phase
>    - Should we add pgstat_progress_update_param() in tuplesort.c like 
>  a
>  "trace_sort"?
>  Thanks to Peter Geoghegan for the useful advice!
> >>>
> >>> How would we avoid an abstraction violation?
> >>
> >> Hmm... What do you mean an abstraction violation?
> >> If it is difficult to solve, I'd not like to add the progress counter for 
> >> the sorting tuples.
> >>
> >>
> - Progress counter for "6. rebuilding index" phase
>    - Should we add "index_vacuum_count" in the view like a vacuum 
>  progress monitor?
>  If yes, I'll add pgstat_progress_update_param() to 
>  reindex_relation() of index.c.
>  However, I'm not sure whether it is okay or not.
> >>>
> >>> Doesn't seem unreasonable to me.
> >>
> >> I see, I'll add it later.
> >
> >
> > Attached file is revised and WIP patch including:
> >
> >- Remove heap_tuples_vacuumed
> >- Add heap_blks_scanned and heap_blks_total
> >- Add index_vacuum_count
> >
> > I tried to "add heap_blks_scanned and heap_blks_total" columns and I 
> > realized that
> > "heap_tuples_scanned" column is suitable as a counter when a scan method is
> > both index-scan and seq-scan because CLUSTER is on a tuple basis.
>
>
> Attached file is rebased patch on current HEAD.
> I changed a status. :)
>
>
Looks like the patch needs a rebase.
I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2

PFA reject file in case you want to have a look.
> Regards,
> Tatsuro Yamada
>
>
>


-- 
Regards,
Rafia Sabih
--- src/backend/commands/cluster.c
+++ src/backend/commands/cluster.c
@@ -942,14 +960,33 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOld

Re: [HACKERS] CLUSTER command progress monitor

2019-03-08 Thread Tatsuro Yamada


On 2019/03/06 15:38, Tatsuro Yamada wrote:

On 2019/03/05 17:56, Tatsuro Yamada wrote:

On 2019/03/05 11:35, Robert Haas wrote:

On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
 wrote:

=== Current design ===

CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
Depending on which one is chosen, the command will proceed in the
following sequence of phases:

    * Scan method: Seq Scan
  0. initializing (*2)
  1. seq scanning heap    (*1)
  3. sorting tuples   (*2)
  4. writing new heap (*1)
  5. swapping relation files  (*2)
  6. rebuilding index (*2)
  7. performing final cleanup (*2)

    * Scan method: Index Scan
  0. initializing (*2)
  2. index scanning heap  (*1)
  5. swapping relation files  (*2)
  6. rebuilding index (*2)
  7. performing final cleanup (*2)

VACUUM FULL command will proceed in the following sequence of phases:

  1. seq scanning heap    (*1)
  5. swapping relation files  (*2)
  6. rebuilding index (*2)
  7. performing final cleanup (*2)

(*1): increasing the value in heap_tuples_scanned column
(*2): only shows the phase in the phase column


All of that sounds good.


The view provides the information of CLUSTER command progress details as follows
# \d pg_stat_progress_cluster
    View "pg_catalog.pg_stat_progress_cluster"
    Column   |  Type   | Collation | Nullable | Default
---+-+---+--+-
   pid   | integer |   |  |
   datid | oid |   |  |
   datname   | name    |   |  |
   relid | oid |   |  |
   command   | text    |   |  |
   phase | text    |   |  |
   cluster_index_relid   | bigint  |   |  |
   heap_tuples_scanned   | bigint  |   |  |
   heap_tuples_vacuumed  | bigint  |   |  |


Still not sure if we need heap_tuples_vacuumed.  We could try to
report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
we're using a Seq Scan.


I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in
next patch.

Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to
get those from initscan(). I'll investigate it more.

cluster.c
   copy_heap_data()
 heap_beginscan()
   heap_beginscan_internal()
 initscan()




=== Discussion points ===

   - Progress counter for "3. sorting tuples" phase
  - Should we add pgstat_progress_update_param() in tuplesort.c like a
    "trace_sort"?
    Thanks to Peter Geoghegan for the useful advice!


How would we avoid an abstraction violation?


Hmm... What do you mean an abstraction violation?
If it is difficult to solve, I'd not like to add the progress counter for the 
sorting tuples.



   - Progress counter for "6. rebuilding index" phase
  - Should we add "index_vacuum_count" in the view like a vacuum progress 
monitor?
    If yes, I'll add pgstat_progress_update_param() to reindex_relation() 
of index.c.
    However, I'm not sure whether it is okay or not.


Doesn't seem unreasonable to me.


I see, I'll add it later.



Attached file is revised and WIP patch including:

   - Remove heap_tuples_vacuumed
   - Add heap_blks_scanned and heap_blks_total
   - Add index_vacuum_count

I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized 
that
"heap_tuples_scanned" column is suitable as a counter when a scan method is
both index-scan and seq-scan because CLUSTER is on a tuple basis.



Attached file is rebased patch on current HEAD.
I changed a status. :)


Regards,
Tatsuro Yamada



diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 1ee1ed2894..acda12bf52 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -51,6 +51,7 @@
 #include "catalog/storage.h"
 #include "commands/tablecmds.h"
 #include "commands/event_trigger.h"
+#include "commands/progress.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -58,6 +59,7 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/optimizer.h"
 #include "parser/parser.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
@@ -3851,6 +3853,7 @@ reindex_relation(Oid relid, int flags, int options)
 	List	   *indexIds;
 	bool		is_pg_class;
 	bool		result;
+	int			i;
 
 	/*
 	 * Open and lock the relation.  ShareLock is sufficient since we only need
@@ -3938,6 +3941,7 @@ reindex_relation(Oid relid, int flags, int options)
 
 		/* Reindex all the indexes. */
 		doneIndexes = NIL;
+		i = 1;
 		foreach

Re: [HACKERS] CLUSTER command progress monitor

2019-03-06 Thread Alvaro Herrera

On 2019-Mar-06, Robert Haas wrote:

> On Tue, Mar 5, 2019 at 8:03 PM Alvaro Herrera  
> wrote:
> > One, err, small issue with that idea is that we need the param numbers
> > not to conflict for any "progress update providers" that are to be used
> > simultaneously by any command.
> 
> Is that really an issue?  I think progress reporting -- at least with
> the current infrastructure -- is only ever going to be possible for
> utility commands, not queries.  And those really shouldn't have very
> many sorts going on at once.

Well, I don't think it is, but I thought it was worth pointing out
explicitly.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

2019-03-06 Thread Robert Haas

On Tue, Mar 5, 2019 at 8:03 PM Alvaro Herrera  wrote:
> One, err, small issue with that idea is that we need the param numbers
> not to conflict for any "progress update providers" that are to be used
> simultaneously by any command.

Is that really an issue?  I think progress reporting -- at least with
the current infrastructure -- is only ever going to be possible for
utility commands, not queries.  And those really shouldn't have very
many sorts going on at once.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

2019-03-05 Thread Tatsuro Yamada


On 2019/03/05 17:56, Tatsuro Yamada wrote:

Hi Robert!

On 2019/03/05 11:35, Robert Haas wrote:

On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
 wrote:

=== Current design ===

CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
Depending on which one is chosen, the command will proceed in the
following sequence of phases:

    * Scan method: Seq Scan
  0. initializing (*2)
  1. seq scanning heap    (*1)
  3. sorting tuples   (*2)
  4. writing new heap (*1)
  5. swapping relation files  (*2)
  6. rebuilding index (*2)
  7. performing final cleanup (*2)

    * Scan method: Index Scan
  0. initializing (*2)
  2. index scanning heap  (*1)
  5. swapping relation files  (*2)
  6. rebuilding index (*2)
  7. performing final cleanup (*2)

VACUUM FULL command will proceed in the following sequence of phases:

  1. seq scanning heap    (*1)
  5. swapping relation files  (*2)
  6. rebuilding index (*2)
  7. performing final cleanup (*2)

(*1): increasing the value in heap_tuples_scanned column
(*2): only shows the phase in the phase column


All of that sounds good.


The view provides the information of CLUSTER command progress details as follows
# \d pg_stat_progress_cluster
    View "pg_catalog.pg_stat_progress_cluster"
    Column   |  Type   | Collation | Nullable | Default
---+-+---+--+-
   pid   | integer |   |  |
   datid | oid |   |  |
   datname   | name    |   |  |
   relid | oid |   |  |
   command   | text    |   |  |
   phase | text    |   |  |
   cluster_index_relid   | bigint  |   |  |
   heap_tuples_scanned   | bigint  |   |  |
   heap_tuples_vacuumed  | bigint  |   |  |


Still not sure if we need heap_tuples_vacuumed.  We could try to
report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
we're using a Seq Scan.


I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in
next patch.

Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to
get those from initscan(). I'll investigate it more.

cluster.c
   copy_heap_data()
     heap_beginscan()
   heap_beginscan_internal()
     initscan()




=== Discussion points ===

   - Progress counter for "3. sorting tuples" phase
  - Should we add pgstat_progress_update_param() in tuplesort.c like a
    "trace_sort"?
    Thanks to Peter Geoghegan for the useful advice!


How would we avoid an abstraction violation?


Hmm... What do you mean an abstraction violation?
If it is difficult to solve, I'd not like to add the progress counter for the 
sorting tuples.



   - Progress counter for "6. rebuilding index" phase
  - Should we add "index_vacuum_count" in the view like a vacuum progress 
monitor?
    If yes, I'll add pgstat_progress_update_param() to reindex_relation() 
of index.c.
    However, I'm not sure whether it is okay or not.


Doesn't seem unreasonable to me.


I see, I'll add it later.



Attached file is revised and WIP patch including:

  - Remove heap_tuples_vacuumed
  - Add heap_blks_scanned and heap_blks_total
  - Add index_vacuum_count

I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized 
that
"heap_tuples_scanned" column is suitable as a counter when a scan method is
both index-scan and seq-scan because CLUSTER is on a tuple basis.



Regards,
Tatsuro Yamada


diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index d16c3d0ea5..a88e8d2492 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -51,6 +51,7 @@
 #include "catalog/storage.h"
 #include "commands/tablecmds.h"
 #include "commands/event_trigger.h"
+#include "commands/progress.h"
 #include "commands/trigger.h"
 #include "executor/executor.h"
 #include "miscadmin.h"
@@ -58,6 +59,7 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/optimizer.h"
 #include "parser/parser.h"
+#include "pgstat.h"
 #include "rewrite/rewriteManip.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
@@ -3850,6 +3852,7 @@ reindex_relation(Oid relid, int flags, int options)
 	List	   *indexIds;
 	bool		is_pg_class;
 	bool		result;
+	int			i;
 
 	/*
 	 * Open and lock the relation.  ShareLock is sufficient since we only need
@@ -3937,6 +3940,7 @@ reindex_relation(Oid relid, int flags, int options)
 
 		/* Reindex all the indexes. */
 		doneIndexes = NIL;
+		i = 1;
 		foreach(indexId, indexIds)
 		{
 			Oid			indexOid = lfirst_oid(indexId);
@@ -3954,6 +3958,11 @@ reindex_relation(Oi

Re: [HACKERS] CLUSTER command progress monitor

2019-03-05 Thread Tatsuro Yamada


On 2019/03/06 1:13, Robert Haas wrote:

On Tue, Mar 5, 2019 at 3:56 AM Tatsuro Yamada
 wrote:

=== Discussion points ===

- Progress counter for "3. sorting tuples" phase
   - Should we add pgstat_progress_update_param() in tuplesort.c like a
 "trace_sort"?
 Thanks to Peter Geoghegan for the useful advice!


How would we avoid an abstraction violation?


Hmm... What do you mean an abstraction violation?
If it is difficult to solve, I'd not like to add the progress counter for the 
sorting tuples.


What I mean is... I think it would be useful to have this counter, but
I'm not sure how the tuplesort code would know to update the counter
in this case and not in other cases.  The tuplesort code is used for
lots of things; we can't update a counter for CLUSTER if the tuplesort
is being used for CREATE INDEX or a Sort node in a query or whatever.
So my question is how we would indicate to the tuplesort that it needs
to do the counter update, and whether that would end up making for
ugly code.


Thanks for your explanation!
I understood that now. I guess it means an API to get a progress for sort 
processing,
so let us just put it aside now. I'd like to leave that as is for an 
appropriate person.

Regards,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

2019-03-05 Thread Alvaro Herrera

On 2019-Mar-04, Robert Haas wrote:

> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
>  wrote:

> > === Discussion points ===
> >
> >   - Progress counter for "3. sorting tuples" phase
> >  - Should we add pgstat_progress_update_param() in tuplesort.c like a
> >"trace_sort"?
> >Thanks to Peter Geoghegan for the useful advice!
> 
> How would we avoid an abstraction violation?

The theory embodied in my patch at 
https://postgr.es/m/20190304204607.GA15946@alvherre.pgsql
is that we don't; tuplesort.c functions (index.c's IndexBuildHeapScan in
my case) would get a boolean parameter to indicate whether to update
some params or not -- the param number(s) to update are supposed to be
generic in the sense that it's not part of any individual command's
implementation (PROGRESS_SCAN_BLOCKS_DONE for what you call "blks
scanned", PROGRESS_SCAN_BLOCKS_TOTAL for "blks total"), but rather
defined by the "progress update provider" (index.c or tuplesort.c).

One, err, small issue with that idea is that we need the param numbers
not to conflict for any "progress update providers" that are to be used
simultaneously by any command.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

2019-03-05 Thread Robert Haas

On Tue, Mar 5, 2019 at 3:56 AM Tatsuro Yamada
 wrote:
> >> === Discussion points ===
> >>
> >>- Progress counter for "3. sorting tuples" phase
> >>   - Should we add pgstat_progress_update_param() in tuplesort.c like a
> >> "trace_sort"?
> >> Thanks to Peter Geoghegan for the useful advice!
> >
> > How would we avoid an abstraction violation?
>
> Hmm... What do you mean an abstraction violation?
> If it is difficult to solve, I'd not like to add the progress counter for the 
> sorting tuples.

What I mean is... I think it would be useful to have this counter, but
I'm not sure how the tuplesort code would know to update the counter
in this case and not in other cases.  The tuplesort code is used for
lots of things; we can't update a counter for CLUSTER if the tuplesort
is being used for CREATE INDEX or a Sort node in a query or whatever.
So my question is how we would indicate to the tuplesort that it needs
to do the counter update, and whether that would end up making for
ugly code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

2019-03-05 Thread Tatsuro Yamada


Hi Robert!

On 2019/03/05 11:35, Robert Haas wrote:

On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
 wrote:

=== Current design ===

CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
Depending on which one is chosen, the command will proceed in the
following sequence of phases:

* Scan method: Seq Scan
  0. initializing (*2)
  1. seq scanning heap(*1)
  3. sorting tuples   (*2)
  4. writing new heap (*1)
  5. swapping relation files  (*2)
  6. rebuilding index (*2)
  7. performing final cleanup (*2)

* Scan method: Index Scan
  0. initializing (*2)
  2. index scanning heap  (*1)
  5. swapping relation files  (*2)
  6. rebuilding index (*2)
  7. performing final cleanup (*2)

VACUUM FULL command will proceed in the following sequence of phases:

  1. seq scanning heap(*1)
  5. swapping relation files  (*2)
  6. rebuilding index (*2)
  7. performing final cleanup (*2)

(*1): increasing the value in heap_tuples_scanned column
(*2): only shows the phase in the phase column


All of that sounds good.


The view provides the information of CLUSTER command progress details as follows
# \d pg_stat_progress_cluster
View "pg_catalog.pg_stat_progress_cluster"
Column   |  Type   | Collation | Nullable | Default
---+-+---+--+-
   pid   | integer |   |  |
   datid | oid |   |  |
   datname   | name|   |  |
   relid | oid |   |  |
   command   | text|   |  |
   phase | text|   |  |
   cluster_index_relid   | bigint  |   |  |
   heap_tuples_scanned   | bigint  |   |  |
   heap_tuples_vacuumed  | bigint  |   |  |


Still not sure if we need heap_tuples_vacuumed.  We could try to
report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
we're using a Seq Scan.


I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in
next patch.

Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to
get those from initscan(). I'll investigate it more.

cluster.c
  copy_heap_data()
heap_beginscan()
  heap_beginscan_internal()
initscan()




=== Discussion points ===

   - Progress counter for "3. sorting tuples" phase
  - Should we add pgstat_progress_update_param() in tuplesort.c like a
"trace_sort"?
Thanks to Peter Geoghegan for the useful advice!


How would we avoid an abstraction violation?


Hmm... What do you mean an abstraction violation?
If it is difficult to solve, I'd not like to add the progress counter for the 
sorting tuples.



   - Progress counter for "6. rebuilding index" phase
  - Should we add "index_vacuum_count" in the view like a vacuum progress 
monitor?
If yes, I'll add pgstat_progress_update_param() to reindex_relation() 
of index.c.
However, I'm not sure whether it is okay or not.


Doesn't seem unreasonable to me.


I see, I'll add it later.


Regards,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

2019-03-05 Thread Tatsuro Yamada


Hi David,

On 2019/03/05 17:29, David Steele wrote:

On 3/1/19 7:48 AM, Etsuro Fujita wrote:

(2019/03/01 14:17), Tatsuro Yamada wrote:

Attached patch is wip patch.


Is it possible to remove the following patch?
Because I registered the patch twice on CF Mar.

https://commitfest.postgresql.org/22/2049/


Please remove the above and keep this:

https://commitfest.postgresql.org/22/1779/

which I moved from the January CF to the March one on behalf of him.


I have closed the duplicate entry (#2049) and retained the entry which contains 
the CF history (#1779).



Thank you! :)


Regards,
Tatsuro Yamada

Re: Re: [HACKERS] CLUSTER command progress monitor

2019-03-05 Thread David Steele


On 3/1/19 7:48 AM, Etsuro Fujita wrote:

(2019/03/01 14:17), Tatsuro Yamada wrote:

Attached patch is wip patch.


Is it possible to remove the following patch?
Because I registered the patch twice on CF Mar.

https://commitfest.postgresql.org/22/2049/


Please remove the above and keep this:

https://commitfest.postgresql.org/22/1779/

which I moved from the January CF to the March one on behalf of him.


I have closed the duplicate entry (#2049) and retained the entry which 
contains the CF history (#1779).


Regards,
--
-David
da...@pgmasters.net

Re: [HACKERS] CLUSTER command progress monitor

2019-03-04 Thread Robert Haas

On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada
 wrote:
> === Current design ===
>
> CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
> Depending on which one is chosen, the command will proceed in the
> following sequence of phases:
>
>* Scan method: Seq Scan
>  0. initializing (*2)
>  1. seq scanning heap(*1)
>  3. sorting tuples   (*2)
>  4. writing new heap (*1)
>  5. swapping relation files  (*2)
>  6. rebuilding index (*2)
>  7. performing final cleanup (*2)
>
>* Scan method: Index Scan
>  0. initializing (*2)
>  2. index scanning heap  (*1)
>  5. swapping relation files  (*2)
>  6. rebuilding index (*2)
>  7. performing final cleanup (*2)
>
> VACUUM FULL command will proceed in the following sequence of phases:
>
>  1. seq scanning heap(*1)
>  5. swapping relation files  (*2)
>  6. rebuilding index (*2)
>  7. performing final cleanup (*2)
>
> (*1): increasing the value in heap_tuples_scanned column
> (*2): only shows the phase in the phase column

All of that sounds good.

> The view provides the information of CLUSTER command progress details as 
> follows
> # \d pg_stat_progress_cluster
>View "pg_catalog.pg_stat_progress_cluster"
>Column   |  Type   | Collation | Nullable | Default
> ---+-+---+--+-
>   pid   | integer |   |  |
>   datid | oid |   |  |
>   datname   | name|   |  |
>   relid | oid |   |  |
>   command   | text|   |  |
>   phase | text|   |  |
>   cluster_index_relid   | bigint  |   |  |
>   heap_tuples_scanned   | bigint  |   |  |
>   heap_tuples_vacuumed  | bigint  |   |  |

Still not sure if we need heap_tuples_vacuumed.  We could try to
report heap_blks_scanned and heap_blks_total like we do for VACUUM, if
we're using a Seq Scan.

> === Discussion points ===
>
>   - Progress counter for "3. sorting tuples" phase
>  - Should we add pgstat_progress_update_param() in tuplesort.c like a
>"trace_sort"?
>Thanks to Peter Geoghegan for the useful advice!

How would we avoid an abstraction violation?

>   - Progress counter for "6. rebuilding index" phase
>  - Should we add "index_vacuum_count" in the view like a vacuum progress 
> monitor?
>If yes, I'll add pgstat_progress_update_param() to reindex_relation() 
> of index.c.
>However, I'm not sure whether it is okay or not.

Doesn't seem unreasonable to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

2019-03-04 Thread Tatsuro Yamada


On 2019/03/02 4:15, Robert Haas wrote:

On Thu, Feb 28, 2019 at 11:54 PM Tatsuro Yamada
 wrote:

Attached patch is wip patch.


I rewrote the current design of the progress monitor and also
wrote discussion points in the middle of this email. I'd like to
get any feedback from -hackers.


=== Current design ===

CLUSTER command uses Index Scan or Seq Scan when scanning the heap.
Depending on which one is chosen, the command will proceed in the
following sequence of phases:

  * Scan method: Seq Scan
0. initializing (*2)
1. seq scanning heap(*1)
3. sorting tuples   (*2)
4. writing new heap (*1)
5. swapping relation files  (*2)
6. rebuilding index (*2)
7. performing final cleanup (*2)

  * Scan method: Index Scan
0. initializing (*2)
2. index scanning heap  (*1)
5. swapping relation files  (*2)
6. rebuilding index (*2)
7. performing final cleanup (*2)

VACUUM FULL command will proceed in the following sequence of phases:

1. seq scanning heap(*1)
5. swapping relation files  (*2)
6. rebuilding index (*2)
7. performing final cleanup (*2)

(*1): increasing the value in heap_tuples_scanned column
(*2): only shows the phase in the phase column

The view provides the information of CLUSTER command progress details as follows
# \d pg_stat_progress_cluster
  View "pg_catalog.pg_stat_progress_cluster"
  Column   |  Type   | Collation | Nullable | Default
---+-+---+--+-
 pid   | integer |   |  |
 datid | oid |   |  |
 datname   | name|   |  |
 relid | oid |   |  |
 command   | text|   |  |
 phase | text|   |  |
 cluster_index_relid   | bigint  |   |  |
 heap_tuples_scanned   | bigint  |   |  |
 heap_tuples_vacuumed  | bigint  |   |  |


=== Discussion points ===

 - Progress counter for "3. sorting tuples" phase
- Should we add pgstat_progress_update_param() in tuplesort.c like a
  "trace_sort"?
  Thanks to Peter Geoghegan for the useful advice!

 - Progress counter for "6. rebuilding index" phase
- Should we add "index_vacuum_count" in the view like a vacuum progress 
monitor?
  If yes, I'll add pgstat_progress_update_param() to reindex_relation() of 
index.c.
  However, I'm not sure whether it is okay or not.

 - pg_stat_progress_rewrite
- TBA


=== My test case ===

I share my test case of progress monitor.
If someone wants to watch the current progress monitor, you can use
this test case as a example.

[Terminal1]
Run this query on psql:

   select * from pg_stat_progress_cluster; \watch 0.05

[Terminal2]
Run these queries on psql:

drop table t1;

create table t1 as select a, random() * 1000 as b from generate_series(0, 
99) a;
create index idx_t1 on t1(a);
create index idx_t1_b on t1(b);
analyze t1;

-- index scan
set enable_seqscan to off;
cluster verbose t1 using idx_t1;

-- seq scan
set enable_seqscan to on;
set enable_indexscan to off;
cluster verbose t1 using idx_t1;

-- only given table name to cluster command
cluster verbose t1;

-- only cluster command
cluster verbose;

-- vacuum full
vacuum full t1;

-- vacuum full
vacuum full;




Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

2019-03-04 Thread Tatsuro Yamada


On 2019/03/02 4:15, Robert Haas wrote:

On Thu, Feb 28, 2019 at 11:54 PM Tatsuro Yamada
 wrote:

Attached patch is wip patch.



Thanks for your comments! :)
I revised the code and the document.



+   CLUSTER and VACUUM FULL,
showing current progress.

and -> or


Fixed.



+   certain commands during command execution.  Currently, the suppoted
+   progress reporting commands are VACUUM and
CLUSTER.

suppoted -> supported

But I'd just say: Currently, the only commands which support progress
reporting are VACUUM and
CLUSTER.


I choose the latter. Fixed.



+   Running VACUUM FULL is listed in
pg_stat_progress_cluster
+   view because it uses CLUSTER command
internally.  See .

How about: Running VACUUM FULL is listed in
pg_stat_progress_cluster because both
VACUUM FULL and CLUSTER rewrite
the table, while regular VACUUM only modifies it in
place.


Fixed.



+   Current processing command: CLUSTER/VACUUM FULL.

The command that is running.  Either CLUSTER or VACUUM FULL.


Fixed.



+   Current processing phase of cluster/vacuum full.  See  or .

Current processing phase of CLUSTER or VACUUM FULL.

Or maybe better, just abbreviate to: Current processing phase.


Fixed as you suggested.



+   Scan method of table: index scan/seq scan.

Eh, shouldn't this be gone now?  And likewise for the view definition?


Fixed. Sorry, It was an oversight.



+   OID of the index.

If the table is being scanned using an index, this is the OID of the
index being used; otherwise, it is zero.


Fixed.



+ heap_tuples_total

Leftovers.  Skipping over the rest of your documentation changes since
it looks like a bunch of things there still need to be updated.


I agree. Thanks a lot!
I'll divide the patch into two patch such as code and document.



+ pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);

This now appears inside cluster_rel(), but also vacuum_rel() is still
doing the same thing.  That's wrong.


It was an oversight too. I fixed.



+ if(OidIsValid(indexOid))

Missing space.  Please pgindent.


Fixed.
I Will do pgindent later.


Please find attached files. :)

Thanks,
Tatsuro Yamada

diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index 3e229c693c..0d0f8f0e31 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -906,6 +906,30 @@ CREATE VIEW pg_stat_progress_vacuum AS
 FROM pg_stat_get_progress_info('VACUUM') AS S
 		LEFT JOIN pg_database D ON S.datid = D.oid;
 
+CREATE VIEW pg_stat_progress_cluster AS
+SELECT
+S.pid AS pid,
+S.datid AS datid,
+D.datname AS datname,
+S.relid AS relid,
+CASE S.param1 WHEN 1 THEN 'CLUSTER'
+  WHEN 2 THEN 'VACUUM FULL'
+  END AS command,
+CASE S.param2 WHEN 0 THEN 'initializing'
+  WHEN 1 THEN 'seq scanning heap'
+  WHEN 2 THEN 'index scanning heap'
+  WHEN 3 THEN 'sorting tuples'
+  WHEN 4 THEN 'writing new heap'
+  WHEN 5 THEN 'swapping relation files'
+  WHEN 6 THEN 'rebuilding index'
+  WHEN 7 THEN 'performing final cleanup'
+  END AS phase,
+S.param3 AS cluster_index_relid,
+S.param4 AS heap_tuples_scanned,
+S.param5 AS heap_tuples_vacuumed
+FROM pg_stat_get_progress_info('CLUSTER') AS S
+LEFT JOIN pg_database D ON S.datid = D.oid;
+
 CREATE VIEW pg_user_mappings AS
 SELECT
 U.oid   AS umid,
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index a74af4c171..f22ff590f0 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -35,10 +35,12 @@
 #include "catalog/objectaccess.h"
 #include "catalog/toasting.h"
 #include "commands/cluster.h"
+#include "commands/progress.h"
 #include "commands/tablecmds.h"
 #include "commands/vacuum.h"
 #include "miscadmin.h"
 #include "optimizer/optimizer.h"
+#include "pgstat.h"
 #include "storage/bufmgr.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
@@ -275,6 +277,8 @@ cluster_rel(Oid tableOid, Oid indexOid, int options)
 	/* Check for user-requested abort. */
 	CHECK_FOR_INTERRUPTS();
 
+	pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
+
 	/*
 	 * We grab exclusive access to the target rel and index for the duration
 	 * of the transaction.  (This is redundant for the single-transaction
@@ -385,6 +389,18 @@ cluster_rel(Oid tableOid, Oid indexOid, int options)
 	 */
 	CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM");
 
+	/* Set command to column */
+	if (OidIsValid(indexOid))
+	{
+		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+	 PROGRESS_CLUSTER_COMMAND_CLUSTER);
+	}
+	else
+	{
+		pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND,
+	 PROGRESS_CLUSTER_COMMAND_VACUUM_FULL);
+	}
+
 	/*

Re: [HACKERS] CLUSTER command progress monitor

2019-03-01 Thread Robert Haas

On Thu, Feb 28, 2019 at 11:54 PM Tatsuro Yamada
 wrote:
> Attached patch is wip patch.

+   CLUSTER and VACUUM FULL,
showing current progress.

and -> or

+   certain commands during command execution.  Currently, the suppoted
+   progress reporting commands are VACUUM and
CLUSTER.

suppoted -> supported

But I'd just say: Currently, the only commands which support progress
reporting are VACUUM and
CLUSTER.

+   Running VACUUM FULL is listed in
pg_stat_progress_cluster
+   view because it uses CLUSTER command
internally.  See .

How about: Running VACUUM FULL is listed in
pg_stat_progress_cluster because both
VACUUM FULL and CLUSTER rewrite
the table, while regular VACUUM only modifies it in
place.

+   Current processing command: CLUSTER/VACUUM FULL.

The command that is running.  Either CLUSTER or VACUUM FULL.

+   Current processing phase of cluster/vacuum full.  See  or .

Current processing phase of CLUSTER or VACUUM FULL.

Or maybe better, just abbreviate to: Current processing phase.

+   Scan method of table: index scan/seq scan.

Eh, shouldn't this be gone now?  And likewise for the view definition?

+   OID of the index.

If the table is being scanned using an index, this is the OID of the
index being used; otherwise, it is zero.

+ heap_tuples_total

Leftovers.  Skipping over the rest of your documentation changes since
it looks like a bunch of things there still need to be updated.

+ pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);

This now appears inside cluster_rel(), but also vacuum_rel() is still
doing the same thing.  That's wrong.

+ if(OidIsValid(indexOid))

Missing space.  Please pgindent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

2019-02-28 Thread Etsuro Fujita


(2019/03/01 14:17), Tatsuro Yamada wrote:

Attached patch is wip patch.


Is it possible to remove the following patch?
Because I registered the patch twice on CF Mar.

https://commitfest.postgresql.org/22/2049/


Please remove the above and keep this:

https://commitfest.postgresql.org/22/1779/

which I moved from the January CF to the March one on behalf of him.

Best regards,
Etsuro Fujita

Re: [HACKERS] CLUSTER command progress monitor

2019-02-28 Thread Tatsuro Yamada


Attached patch is wip patch.


Is it possible to remove the following patch?
Because I registered the patch twice on CF Mar.

   https://commitfest.postgresql.org/22/2049/

Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

2019-02-28 Thread Tatsuro Yamada


On 2019/02/23 6:02, Robert Haas wrote:

On Fri, Dec 28, 2018 at 3:20 AM Tatsuro Yamada
 wrote:

This patch is rebased on HEAD.
I'll tackle revising the patch based on feedbacks next month.


+   Running VACUUM FULL is listed in
pg_stat_progress_cluster
+   view because it uses CLUSTER command
internally.  See .

It's not really true to say that VACUUM FULL uses the CLUSTER command
internally.  It's not really true.  It uses a good chunk of the same
infrastructure, but it certainly doesn't use the actual command, and
it's not really the exact same thing either, because internally it's
doing a sequential scan but no sort, which never happens with CLUSTER.
I'm not sure exactly how to rephrase this, but I think we need to make
it more precise.

One idea is that maybe we should try to think of a design that could
also handle the rewriting variants of ALTER TABLE, and call it
pg_stat_progress_rewrite.  Maybe that's moving the goalposts too far,
but I'm not saying we'd necessarily have to do all the work now, just
have a view that we think could also handle that.  Then again, maybe
the needs are too different.



Hmm..., I see.
If possible, I'd like to stop thinking of VACUUM FULL to avoid complication of
the implementation.
For now, I haven't enough time to design pg_stat_progress_rewrite. I suppose 
that
it's tough work.



+   Whenever CLUSTER is running, the
+   pg_stat_progress_cluster view will contain
+   one row for each backend that is currently clustering or vacuuming
(VACUUM FULL).

That sentence contradicts itself.  Just say that it contains a row for
each backend that is currently running CLUSTER or VACUUM FULL.



Fixed.



@@ -105,6 +107,7 @@ static void reform_and_rewrite_tuple(HeapTuple tuple,
  void
  cluster(ClusterStmt *stmt, bool isTopLevel)
  {
+
   if (stmt->relation != NULL)
   {
   /* This is the single-relation case. */

Useless hunk.



Fixed.



@@ -186,7 +189,9 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
   heap_close(rel, NoLock);

   /* Do the job. */
+ pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
   cluster_rel(tableOid, indexOid, stmt->options);
+ pgstat_progress_end_command();
   }
   else
   {

It seems like that stuff should be inside cluster_rel().



Fixed.



+ /* Set reltuples to total_tuples */
+ pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_TUPLES,
OldHeap->rd_rel->reltuples);

I object.  If the user wants that, they can get it from pg_class
themselves via an SQL query.  It's also an estimate, not something we
know to be accurate; I want us to only report facts here, not theories


I understand that progress monitor should only report facts, so I
removed that code.




+ pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP);
+ pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD,
PROGRESS_CLUSTER_METHOD_INDEX_SCAN);

I think you should use pgstat_progress_update_multi_param() if
updating multiple parameters at the same time.

Also, some lines in this patch, such as this one, are very long.
Consider techniques to reduce the line length to 80 characters or
less, such as inserting a line break between the two arguments.


Fixed.



+ /* Set scan_method to NULL */
+ pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, -1);

NULL and -1 are not the same thing.


Oops, fixed.



I think that we shouldn't have both
PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP and
PROGRESS_CLUSTER_PHASE_SCAN_HEAP.  They're the same thing.  Let's just
use PROGRESS_CLUSTER_PHASE_SCAN_HEAP for both.  Actually, better yet,
let's get rid of PROGRESS_CLUSTER_SCAN_METHOD and have
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP and
PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP.  That seems noticeably
simpler.


Fixed.



I agree that it's acceptable to report
PROGRESS_CLUSTER_HEAP_TUPLES_VACUUMED and
PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD, but I'm not sure I
understand why it's valuable to do so in the context of a progress
indicator.


Actually, I'm not sure why I added it since so much time has passed. :(
So, I'll remove PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD at least.


Attached patch is wip patch.


Thanks!
Tatsuro Yamada

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0e73cdcdda..8cf829e72c 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -344,6 +344,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_progress_clusterpg_stat_progress_cluster
+  One row for each backend running
+   CLUSTER and VACUUM FULL, showing current progress.
+   See .
+  
+ 
+
 

   
@@ -3376,9 +3384,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
   
PostgreSQL has the ability to report the progress of
-   certain commands during command execution.  Currently, the only command
-   which supports progress reporting is VACUUM.  This may be
-   expanded in

Re: [HACKERS] CLUSTER command progress monitor

2019-02-22 Thread Robert Haas

On Fri, Dec 28, 2018 at 3:20 AM Tatsuro Yamada
 wrote:
> This patch is rebased on HEAD.
> I'll tackle revising the patch based on feedbacks next month.

+   Running VACUUM FULL is listed in
pg_stat_progress_cluster
+   view because it uses CLUSTER command
internally.  See .

It's not really true to say that VACUUM FULL uses the CLUSTER command
internally.  It's not really true.  It uses a good chunk of the same
infrastructure, but it certainly doesn't use the actual command, and
it's not really the exact same thing either, because internally it's
doing a sequential scan but no sort, which never happens with CLUSTER.
I'm not sure exactly how to rephrase this, but I think we need to make
it more precise.

One idea is that maybe we should try to think of a design that could
also handle the rewriting variants of ALTER TABLE, and call it
pg_stat_progress_rewrite.  Maybe that's moving the goalposts too far,
but I'm not saying we'd necessarily have to do all the work now, just
have a view that we think could also handle that.  Then again, maybe
the needs are too different.

+   Whenever CLUSTER is running, the
+   pg_stat_progress_cluster view will contain
+   one row for each backend that is currently clustering or vacuuming
(VACUUM FULL).

That sentence contradicts itself.  Just say that it contains a row for
each backend that is currently running CLUSTER or VACUUM FULL.

@@ -105,6 +107,7 @@ static void reform_and_rewrite_tuple(HeapTuple tuple,
 void
 cluster(ClusterStmt *stmt, bool isTopLevel)
 {
+
  if (stmt->relation != NULL)
  {
  /* This is the single-relation case. */

Useless hunk.

@@ -186,7 +189,9 @@ cluster(ClusterStmt *stmt, bool isTopLevel)
  heap_close(rel, NoLock);

  /* Do the job. */
+ pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid);
  cluster_rel(tableOid, indexOid, stmt->options);
+ pgstat_progress_end_command();
  }
  else
  {

It seems like that stuff should be inside cluster_rel().

+ /* Set reltuples to total_tuples */
+ pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_TUPLES,
OldHeap->rd_rel->reltuples);

I object.  If the user wants that, they can get it from pg_class
themselves via an SQL query.  It's also an estimate, not something we
know to be accurate; I want us to only report facts here, not theories

+ pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE,
PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP);
+ pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD,
PROGRESS_CLUSTER_METHOD_INDEX_SCAN);

I think you should use pgstat_progress_update_multi_param() if
updating multiple parameters at the same time.

Also, some lines in this patch, such as this one, are very long.
Consider techniques to reduce the line length to 80 characters or
less, such as inserting a line break between the two arguments.

+ /* Set scan_method to NULL */
+ pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, -1);

NULL and -1 are not the same thing.

I think that we shouldn't have both
PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP and
PROGRESS_CLUSTER_PHASE_SCAN_HEAP.  They're the same thing.  Let's just
use PROGRESS_CLUSTER_PHASE_SCAN_HEAP for both.  Actually, better yet,
let's get rid of PROGRESS_CLUSTER_SCAN_METHOD and have
PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP and
PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP.  That seems noticeably
simpler.

I agree that it's acceptable to report
PROGRESS_CLUSTER_HEAP_TUPLES_VACUUMED and
PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD, but I'm not sure I
understand why it's valuable to do so in the context of a progress
indicator.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

2018-12-28 Thread Tatsuro Yamada


Hi,


Do you have a new version of this patch?  If not, do you think you'll
have something in time for the upcoming commitfest?


Not yet, I'll be able to send only a rebased patch by the end of this month.
I mean it has no design change because I can't catch up on how to get a progress
from sort and index scan. However I'm going to register the patch on next CF.
I'm happy if you have interested in the patch.



This patch is rebased on HEAD.
I'll tackle revising the patch based on feedbacks next month.

Happy holidays!
Tatsuro Yamada



diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 96bcc3a63b..83421e5105 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -332,6 +332,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_progress_clusterpg_stat_progress_cluster
+  One row for each backend running
+   CLUSTER and VACUUM FULL, showing current progress.
+   See .
+  
+ 
+
 

   
@@ -3351,9 +3359,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
   
PostgreSQL has the ability to report the progress of
-   certain commands during command execution.  Currently, the only command
-   which supports progress reporting is VACUUM.  This may be
-   expanded in the future.
+   certain commands during command execution.  Currently, the suppoted 
+   progress reporting commands are VACUUM and CLUSTER.
+   This may be expanded in the future.
   
 
  
@@ -3365,9 +3373,8 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
one row for each backend (including autovacuum worker processes) that is
currently vacuuming.  The tables below describe the information
that will be reported and provide information about how to interpret it.
-   Progress reporting is not currently supported for VACUUM FULL
-   and backends running VACUUM FULL will not be listed in this
-   view.
+   Running VACUUM FULL is listed in pg_stat_progress_cluster
+   view because it uses CLUSTER command internally.  See .
   
 
   
@@ -3544,6 +3551,228 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,

   
 
+ 
+
+ 
+  CLUSTER Progress Reporting
+
+  
+   Whenever CLUSTER is running, the
+   pg_stat_progress_cluster view will contain
+   one row for each backend that is currently clustering or vacuuming (VACUUM FULL). 
+   The tables below describe the information that will be reported and
+   provide information about how to interpret it.
+  
+
+  
+   pg_stat_progress_cluster View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+
+ datid
+ oid
+ OID of the database to which this backend is connected.
+
+
+ datname
+ name
+ Name of the database to which this backend is connected.
+
+
+ relid
+ oid
+ OID of the table being clustered.
+
+
+ command
+ text
+ 
+   Current processing command: CLUSTER/VACUUM FULL.
+ 
+
+
+ phase
+ text
+ 
+   Current processing phase of cluster/vacuum full.  See  or .
+ 
+
+
+ scan_method
+ text
+ 
+   Scan method of table: index scan/seq scan.
+ 
+
+
+ cluster_index_relid
+ bigint
+ 
+   OID of the index.
+ 
+
+
+ heap_tuples_total
+ bigint
+ 
+   Total number of heap tuples in the table.  This number is reported
+   as of the beginning of the scan; tuples added later will not be (and
+   need not be) visited by this CLUSTER and VACUUM FULL.
+ 
+
+
+ heap_tuples_scanned
+ bigint
+ 
+   Number of heap tuples scanned.
+   This counter only advances when the phase is scanning heap, 
+   writing new heap and scan heap and write new heap.
+ 
+
+
+ heap_tuples_vacuumed
+ bigint
+ 
+   Number of heap tuples vacuumed. This counter only advances when the
+   command is VACUUM FULL and the phase is scanning heap.
+ 
+
+
+ heap_tuples_recently_dead
+ bigint
+ 
+   Number of heap tuples not vacuumed since these tuples marked recently dead.
+   This counter only advances when the command is VACUUM FULL and 
+   the phase is scanning heap.
+ 
+
+   
+   
+  
+
+  
+   CLUSTER phases
+   
+
+
+  Phase
+  Description
+ 
+
+
+   
+
+ initializing
+ 
+   CLUSTER is preparing to begin scanning the heap.  This
+   phase is expected to be very brief.
+ 
+
+
+ scanning heap
+ 
+   CLUSTER is currently scanning heap from the table by
+   seq scan. This phase is shown when the scan_method is seq scan.
+ 
+
+
+ sorting tuples
+ 
+   CLUSTER is currently sorting tuples. 
+   This phase is shown when the scan_method is seq scan.
+ 
+
+
+ scan hea

Re: [HACKERS] CLUSTER command progress monitor

2018-12-18 Thread Tatsuro Yamada


Hi Alvaro,

On 2018/12/19 2:23, Alvaro Herrera wrote:

Hello Yamada-san,

On 2018-Dec-03, Tatsuro Yamada wrote:


Thank you for managing the CF and Sorry for the late reply.
I'll rebase it for the next CF and also I'll clear my head because the patch
needs design change to address the feedbacks, I guess. Therefore, the status is
reconsidering the design of the patch. :)


Do you have a new version of this patch?  If not, do you think you'll
have something in time for the upcoming commitfest?


Not yet, I'll be able to send only a rebased patch by the end of this month.
I mean it has no design change because I can't catch up on how to get a progress
from sort and index scan. However I'm going to register the patch on next CF.
I'm happy if you have interested in the patch. :)

Thanks,
Tatsuro Yamada

Re: [HACKERS] CLUSTER command progress monitor

2018-12-18 Thread Peter Geoghegan

On Tue, Dec 18, 2018 at 2:47 PM Alvaro Herrera  wrote:
> Well, if you think about individual blocks in terms of storage space,
> maybe that's true, but I meant in an Heraclitus way of men never
> stepping into the same river -- the second time you write the block,
> it's not the same block you wrote before, so you count it twice.  It's
> not the actual disk space utilization that matters, but how much I/O
> have you done (even if it is just to kernel cache, I suppose).

Right.

> I suppose mapping such numbers to actual progress is a bit of an art (or
> intuition as you say), but it seems to be the best we can do, if we do
> anything at all.

I think that it's fairly useful. I suspect that you don't have to have
my theoretical grounding in sorting to be able to do almost as well.
All you need is a little bit of experience.

> How good are those predictions?  The feeling I get from this thread is
> that if the estimation of the number of passes is unreliable, it's
> better not to report it at all; just return how many we've done thus
> far.  It's undesirable to report that we're about 150% done (or take
> hours to get to 40% done, then suddenly be over).

Maybe it isn't that reliable. But on second thought I think that it
might not matter, and maybe we should just not do that.

"How slow can I make this sort go by subtracting work_mem?" is a game
that I like to play sometimes. This blogpost plays that game, and
reaches some pretty amusing conclusions:

https://www.cybertec-postgresql.com/en/postgresql-improving-sort-performance/

It says that sorting numeric is 60% slower when you do an external
sort. But it's an apples to asteroids comparison, because the
comparison is made between 4MB of work_mem, and 1GB. I think that it's
pretty damn impressive that it's only 60% slower! Besides, even that
difference is probably on the high side of average, because numeric
abbreviated keys work particularly well, and you won't get the same
benefit with a unique numeric values when you happen to be doing a lot
of merging. If you tried the same experiment with integers, or even
text + abbreviated keys, I bet the difference would be a lot smaller.
Despite the huge gap in the amount of memory used.

On modern hardware, where doing some amount of random I/O is not that
noticeable, you'll have a very hard time finding a case where even a
paltry amount of memory with many passes does all that much worse than
an internal sort (OTOH, it's not hard to find cases where an external
sort is *faster*). Even if you make a generic estimate, it's still
probably going to be pretty good, because there just isn't that much
variation in how long the sort will take as you vary the amount of
memory it can use. Some people will be surprised at this, but it's a
pretty robust effect. (This is why I think that a hash_mem GUC might
be a good medium term solution that improves upon work_mem -- the
situation is dramatically different when it comes to hashing.)

My point is that you could offer users the kind of insight they'd find
very useful with only a very crude estimate of the amount of merging.
Even if it was 60% slower than initially projected, that's still not
an awful estimate to most users. That just leaves initial run
generation, but it's relatively easy to accurately estimate the amount
of initial runs. I rarely see a case where merging takes more than 40%
of the total, barring parallel CREATE INDEX.

> I wonder if internal sorts are really all that interesting from the PoV
> of progress reporting.  Also, I have the impression that quicksort isn't
> very amenable to letting you know how much work is left.

It is hard to predict the duration of one massive quicksort, but it's
seems fairly easy to recognize a kind of cadence across multiple
quicksorts/runs that each take seconds to a couple of minutes. That's
going to be the vast, vast majority of cases we care about.

-- 
Peter Geoghegan

Re: [HACKERS] CLUSTER command progress monitor

2018-12-18 Thread Alvaro Herrera

On 2018-Dec-18, Peter Geoghegan wrote:

> On Tue, Dec 18, 2018 at 1:02 PM Alvaro Herrera  
> wrote:
> > If we see this in terms of tapes and merges, we can report the total
> > number of each of those that we have completed.  As far as I understand,
> > we write one tape to completion, and only then start another one, right?
> > Since there's no way to know how many tapes/merges are needed in total,
> > it's not possible to compute a percentage of completion.  That's seems
> > okay -- we're just telling the user that progress is being made, and we
> > only report facts not theory.  Perhaps we can (also?) indicate disk I/O
> > utilization, in terms of the number of blocks written by tuplesort.
> 
> The number of blocks tuplesort uses is constant from the end of
> initial run generation, since logtape.c will recycle blocks.

Well, if you think about individual blocks in terms of storage space,
maybe that's true, but I meant in an Heraclitus way of men never
stepping into the same river -- the second time you write the block,
it's not the same block you wrote before, so you count it twice.  It's
not the actual disk space utilization that matters, but how much I/O
have you done (even if it is just to kernel cache, I suppose).

> > I suppose that in order to have tuplesort.c report progress, we would
> > have to have some kind of API that tuplesort would invoke internally to
> > indicate events such as "tape started/completed", "merge started/completed".
> > One idea is to use a callback system; each tuplesort caller could
> > optionally pass a callback to the "begin" function, for progress
> > reporting purposes.  Initially only cluster.c would use it, but I
> > suppose eventually every tuplesort caller would want that.
> 
> I think that you could have a callback that did something with the
> information currently reported by trace_sort. That's not a bad way of
> scoping the problem. That's how I myself monitor the progress of a
> sort, and it works pretty well (whether or not that means other people
> can do it is not exactly clear to me).

Thanks, that looks useful.

I suppose mapping such numbers to actual progress is a bit of an art (or
intuition as you say), but it seems to be the best we can do, if we do
anything at all.

> We predict the number of merge passes within cost_sort() already. That
> doesn't seem all that hard to generalize, so that you report the
> expected number of passes against the current pass. Some passes are
> much quicker than others, but you generally don't have that many with
> realistic cases. I don't expect that it will work very well with an
> internal sort, but in the case of CLUSTER that almost seems
> irrelevant. And maybe even in all cases.

How good are those predictions?  The feeling I get from this thread is
that if the estimation of the number of passes is unreliable, it's
better not to report it at all; just return how many we've done thus
far.  It's undesirable to report that we're about 150% done (or take
hours to get to 40% done, then suddenly be over).

I wonder if internal sorts are really all that interesting from the PoV
of progress reporting.  Also, I have the impression that quicksort isn't
very amenable to letting you know how much work is left.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

2018-12-18 Thread Peter Geoghegan

On Tue, Dec 18, 2018 at 1:02 PM Alvaro Herrera  wrote:
> If we see this in terms of tapes and merges, we can report the total
> number of each of those that we have completed.  As far as I understand,
> we write one tape to completion, and only then start another one, right?
> Since there's no way to know how many tapes/merges are needed in total,
> it's not possible to compute a percentage of completion.  That's seems
> okay -- we're just telling the user that progress is being made, and we
> only report facts not theory.  Perhaps we can (also?) indicate disk I/O
> utilization, in terms of the number of blocks written by tuplesort.

The number of blocks tuplesort uses is constant from the end of
initial run generation, since logtape.c will recycle blocks.

> I suppose that in order to have tuplesort.c report progress, we would
> have to have some kind of API that tuplesort would invoke internally to
> indicate events such as "tape started/completed", "merge started/completed".
> One idea is to use a callback system; each tuplesort caller could
> optionally pass a callback to the "begin" function, for progress
> reporting purposes.  Initially only cluster.c would use it, but I
> suppose eventually every tuplesort caller would want that.

I think that you could have a callback that did something with the
information currently reported by trace_sort. That's not a bad way of
scoping the problem. That's how I myself monitor the progress of a
sort, and it works pretty well (whether or not that means other people
can do it is not exactly clear to me).

We predict the number of merge passes within cost_sort() already. That
doesn't seem all that hard to generalize, so that you report the
expected number of passes against the current pass. Some passes are
much quicker than others, but you generally don't have that many with
realistic cases. I don't expect that it will work very well with an
internal sort, but in the case of CLUSTER that almost seems
irrelevant. And maybe even in all cases.

I think that the user is going to have to be willing to develop some
intuition about the progress for it to be all that useful. They're
really looking for something that gives a clue if they'll have to wait
an hour, a day, or a week, which it seems like trace_sort-like
information gives you some idea of. (BTW, dtrace probes can already
give the user much the same information -- I think that more people
should use those, since tracing technology on Linux has improved
drastically in the last few years.)

-- 
Peter Geoghegan

Re: [HACKERS] CLUSTER command progress monitor

2018-12-18 Thread Alvaro Herrera

On 2017-Nov-21, Peter Geoghegan wrote:

> On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas  wrote:
> > Progress reporting on sorts seems like a tricky problem to me, as I
> > said before.  In most cases, a sort is going to involve an initial
> > stage where it reads all the input tuples and writes out quicksorted
> > runs, and then a merge phase where it merges all the output tapes into
> > a sorted result.  There are some complexities; for example, if the
> > number of tapes is really large, then we might need multiple merge
> > phases, only the last of which will produce tuples.
> 
> This would ordinarily be the point at which I'd say "but you're very
> unlikely to require multiple passes for an external sort these days".
> But I won't say that on this thread, because CLUSTER generally has
> unusually wide tuples, and so is much more likely to be I/O bound, to
> require multiple passes, etc. (I bet the v10 enhancements
> disproportionately improved CLUSTER performance.)

When the seqscan-and-sort strategy is used, we feed tuplesort with every
tuple from the scan.  Once that's completed, we call `performsort`, then
retrieve tuples.

If we see this in terms of tapes and merges, we can report the total
number of each of those that we have completed.  As far as I understand,
we write one tape to completion, and only then start another one, right?
Since there's no way to know how many tapes/merges are needed in total,
it's not possible to compute a percentage of completion.  That's seems
okay -- we're just telling the user that progress is being made, and we
only report facts not theory.  Perhaps we can (also?) indicate disk I/O
utilization, in terms of the number of blocks written by tuplesort.

I suppose that in order to have tuplesort.c report progress, we would
have to have some kind of API that tuplesort would invoke internally to
indicate events such as "tape started/completed", "merge started/completed".
One idea is to use a callback system; each tuplesort caller could
optionally pass a callback to the "begin" function, for progress
reporting purposes.  Initially only cluster.c would use it, but I
suppose eventually every tuplesort caller would want that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

2018-12-18 Thread Alvaro Herrera

Hello Yamada-san,

On 2018-Dec-03, Tatsuro Yamada wrote:

> Thank you for managing the CF and Sorry for the late reply.
> I'll rebase it for the next CF and also I'll clear my head because the patch
> needs design change to address the feedbacks, I guess. Therefore, the status 
> is
> reconsidering the design of the patch. :)

Do you have a new version of this patch?  If not, do you think you'll
have something in time for the upcoming commitfest?

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

2018-12-03 Thread Michael Paquier

On Mon, Dec 03, 2018 at 02:17:25PM -0300, Alvaro Herrera wrote:
> I think we should mark it as Returned with Feedback then.

+1.
--
Michael


signature.asc
Description: PGP signature

Re: [HACKERS] CLUSTER command progress monitor

2018-12-03 Thread Alvaro Herrera

On 2018-Dec-03, Tatsuro Yamada wrote:

> > In the meantime I'm moving it to the next CF.
> 
> Thank you for managing the CF and Sorry for the late reply.
> I'll rebase it for the next CF and also I'll clear my head because the patch
> needs design change to address the feedbacks, I guess. Therefore, the status 
> is
> reconsidering the design of the patch. :)

I think we should mark it as Returned with Feedback then.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] CLUSTER command progress monitor

2018-12-03 Thread Tatsuro Yamada


On 2018/11/29 21:20, Dmitry Dolgov wrote:

On Fri, Aug 24, 2018 at 7:06 AM Tatsuro Yamada  
wrote:

On 2017/11/22 6:07, Peter Geoghegan wrote:

On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas  wrote:

Progress reporting on sorts seems like a tricky problem to me, as I
said before.  In most cases, a sort is going to involve an initial
stage where it reads all the input tuples and writes out quicksorted
runs, and then a merge phase where it merges all the output tapes into
a sorted result.  There are some complexities; for example, if the
number of tapes is really large, then we might need multiple merge
phases, only the last of which will produce tuples.


This would ordinarily be the point at which I'd say "but you're very
unlikely to require multiple passes for an external sort these days".
But I won't say that on this thread, because CLUSTER generally has
unusually wide tuples, and so is much more likely to be I/O bound, to
require multiple passes, etc. (I bet the v10 enhancements
disproportionately improved CLUSTER performance.)


Hi,

I came back to develop the feature for community.
V4 patch is corrected these following points:

- Rebase on master (143290efd)
- Fix document
- Replace the column name scan_index_relid with cluster_index_relid.
Thanks to Jeff Janes!

I'm now working on improving the patch based on Robert's comment related to
"Seqscan and Sort case" and also considering how to handle the "Index scan 
case".


Thank you,

Unfortunately, this patch has some conflicts now, could you rebase it? Also
what's is the status of your work on improving it based on the
provided feedback?

In the meantime I'm moving it to the next CF.


Thank you for managing the CF and Sorry for the late reply.
I'll rebase it for the next CF and also I'll clear my head because the patch
needs design change to address the feedbacks, I guess. Therefore, the status is
reconsidering the design of the patch. :)

Regards,
Tatsuro Yamada
NTT Open Source Software Center

Re: [HACKERS] CLUSTER command progress monitor

2018-11-29 Thread Dmitry Dolgov

> On Fri, Aug 24, 2018 at 7:06 AM Tatsuro Yamada  
> wrote:
>
> On 2017/11/22 6:07, Peter Geoghegan wrote:
> > On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas  wrote:
> >> Progress reporting on sorts seems like a tricky problem to me, as I
> >> said before.  In most cases, a sort is going to involve an initial
> >> stage where it reads all the input tuples and writes out quicksorted
> >> runs, and then a merge phase where it merges all the output tapes into
> >> a sorted result.  There are some complexities; for example, if the
> >> number of tapes is really large, then we might need multiple merge
> >> phases, only the last of which will produce tuples.
> >
> > This would ordinarily be the point at which I'd say "but you're very
> > unlikely to require multiple passes for an external sort these days".
> > But I won't say that on this thread, because CLUSTER generally has
> > unusually wide tuples, and so is much more likely to be I/O bound, to
> > require multiple passes, etc. (I bet the v10 enhancements
> > disproportionately improved CLUSTER performance.)
>
> Hi,
>
> I came back to develop the feature for community.
> V4 patch is corrected these following points:
>
>- Rebase on master (143290efd)
>- Fix document
>- Replace the column name scan_index_relid with cluster_index_relid.
>Thanks to Jeff Janes!
>
> I'm now working on improving the patch based on Robert's comment related to
> "Seqscan and Sort case" and also considering how to handle the "Index scan 
> case".

Thank you,

Unfortunately, this patch has some conflicts now, could you rebase it? Also
what's is the status of your work on improving it based on the
provided feedback?

In the meantime I'm moving it to the next CF.

Re: [HACKERS] CLUSTER command progress monitor

2018-08-23 Thread Tatsuro Yamada


On 2017/11/22 6:07, Peter Geoghegan wrote:

On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas  wrote:

Progress reporting on sorts seems like a tricky problem to me, as I
said before.  In most cases, a sort is going to involve an initial
stage where it reads all the input tuples and writes out quicksorted
runs, and then a merge phase where it merges all the output tapes into
a sorted result.  There are some complexities; for example, if the
number of tapes is really large, then we might need multiple merge
phases, only the last of which will produce tuples.


This would ordinarily be the point at which I'd say "but you're very
unlikely to require multiple passes for an external sort these days".
But I won't say that on this thread, because CLUSTER generally has
unusually wide tuples, and so is much more likely to be I/O bound, to
require multiple passes, etc. (I bet the v10 enhancements
disproportionately improved CLUSTER performance.)


Hi,

I came back to develop the feature for community.
V4 patch is corrected these following points:

  - Rebase on master (143290efd)
  - Fix document
  - Replace the column name scan_index_relid with cluster_index_relid.
  Thanks to Jeff Janes!

I'm now working on improving the patch based on Robert's comment related to
"Seqscan and Sort case" and also considering how to handle the "Index scan 
case".

Please find attached file.

Regards,
Tatsuro Yamada

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 0484cfa77a..5a4bd203ea 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -332,6 +332,14 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   
  
 
+ 
+  pg_stat_progress_clusterpg_stat_progress_cluster
+  One row for each backend running
+   CLUSTER and VACUUM FULL, showing current progress.
+   See .
+  
+ 
+
 

   
@@ -3338,9 +3346,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
   
PostgreSQL has the ability to report the progress of
-   certain commands during command execution.  Currently, the only command
-   which supports progress reporting is VACUUM.  This may be
-   expanded in the future.
+   certain commands during command execution.  Currently, the suppoted 
+   progress reporting commands are VACUUM and CLUSTER.
+   This may be expanded in the future.
   
 
  
@@ -3352,9 +3360,8 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
one row for each backend (including autovacuum worker processes) that is
currently vacuuming.  The tables below describe the information
that will be reported and provide information about how to interpret it.
-   Progress reporting is not currently supported for VACUUM FULL
-   and backends running VACUUM FULL will not be listed in this
-   view.
+   Running VACUUM FULL is listed in pg_stat_progress_cluster
+   view because it uses CLUSTER command internally.  See .
   
 
   
@@ -3531,6 +3538,228 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,

   
 
+ 
+
+ 
+  CLUSTER Progress Reporting
+
+  
+   Whenever CLUSTER is running, the
+   pg_stat_progress_cluster view will contain
+   one row for each backend that is currently clustering or vacuuming (VACUUM FULL). 
+   The tables below describe the information that will be reported and
+   provide information about how to interpret it.
+  
+
+  
+   pg_stat_progress_cluster View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+
+ datid
+ oid
+ OID of the database to which this backend is connected.
+
+
+ datname
+ name
+ Name of the database to which this backend is connected.
+
+
+ relid
+ oid
+ OID of the table being clustered.
+
+
+ command
+ text
+ 
+   Current processing command: CLUSTER/VACUUM FULL.
+ 
+
+
+ phase
+ text
+ 
+   Current processing phase of cluster/vacuum full.  See  or .
+ 
+
+
+ scan_method
+ text
+ 
+   Scan method of table: index scan/seq scan.
+ 
+
+
+ cluster_index_relid
+ bigint
+ 
+   OID of the index.
+ 
+
+
+ heap_tuples_total
+ bigint
+ 
+   Total number of heap tuples in the table.  This number is reported
+   as of the beginning of the scan; tuples added later will not be (and
+   need not be) visited by this CLUSTER and VACUUM FULL.
+ 
+
+
+ heap_tuples_scanned
+ bigint
+ 
+   Number of heap tuples scanned.
+   This counter only advances when the phase is scanning heap, 
+   writing new heap and scan heap and write new heap.
+ 
+
+
+ heap_tuples_vacuumed
+ bigint
+ 
+   Number of heap tuples vacuumed. This counter only advances when the
+   command is VACUUM FULL and the phase is scanning heap.
+ 
+
+
+

Re: [HACKERS] CLUSTER command progress monitor

2017-11-21 Thread Thomas Munro

On Wed, Nov 22, 2017 at 1:53 PM, Michael Paquier
 wrote:
> On Wed, Nov 22, 2017 at 5:55 AM, Robert Haas  wrote:
>> I have been of the opinion all along that progress monitoring needs to
>> report facts, not theories.  The number of tuples read thus far is a
>> fact, and is fine to report for whatever value it may have to someone.
>> The number of tuples that will be read in the future is a theory, and
>> as you say, progress monitoring is most likely to be used in cases
>> where theory and practice ended up being very different.
>
> +1. We should never as well enter in things like trying to estimate
> the amount of time remaining to finish a task [1].
>
> [1]: https://www.xkcd.com/612/

+1

That is one reason I made pg_stat_replication.XXX_lag report the lag
of WAL that has been processed, not (say) the time until we catch up.
In some information-poor scenarios it interpolates which isn't perfect
but the general idea is that is shows you measurements of the past
(facts), not predictions about the future (theories).

-- 
Thomas Munro
http://www.enterprisedb.com

Re: [HACKERS] CLUSTER command progress monitor

2017-11-21 Thread Michael Paquier

On Wed, Nov 22, 2017 at 5:55 AM, Robert Haas  wrote:
> I have been of the opinion all along that progress monitoring needs to
> report facts, not theories.  The number of tuples read thus far is a
> fact, and is fine to report for whatever value it may have to someone.
> The number of tuples that will be read in the future is a theory, and
> as you say, progress monitoring is most likely to be used in cases
> where theory and practice ended up being very different.

+1. We should never as well enter in things like trying to estimate
the amount of time remaining to finish a task [1].

[1]: https://www.xkcd.com/612/
-- 
Michael

Re: [HACKERS] CLUSTER command progress monitor

2017-11-21 Thread Peter Geoghegan

On Tue, Nov 21, 2017 at 12:55 PM, Robert Haas  wrote:
> I agree.
>
> I have been of the opinion all along that progress monitoring needs to
> report facts, not theories.  The number of tuples read thus far is a
> fact, and is fine to report for whatever value it may have to someone.

That makes a lot of sense to me. I sometimes think that we're too
hesitant to expose internal information due to concerns about it being
hard to interpret. I see wait events as bucking this trend, which I
welcome. We see similar trends in the Linux kernel, with tools like
perf and BCC/eBPF now being regularly used to debug production issues.

> The number of tuples that will be read in the future is a theory, and
> as you say, progress monitoring is most likely to be used in cases
> where theory and practice ended up being very different.

You hit the nail on the head here.

It's not that these things are not difficult to interpret - the
concern itself is justified. It just needs to be weighed against the
benefit of having some instrumentation to start with. People are much
more likely to complain about obscure debug information, which makes
them feel dumb, than they are to complain about the absence of any
instrumentation, but I still think that the latter is the bigger
problem.

Besides, you don't necessarily have to understand something to act on
it. The internals of Oracle are trade secrets, but they were the first
to have wait events, I think. At least having something that you can
Google can make all the difference.

-- 
Peter Geoghegan

Re: [HACKERS] CLUSTER command progress monitor

2017-11-21 Thread Peter Geoghegan

On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas  wrote:
> Progress reporting on sorts seems like a tricky problem to me, as I
> said before.  In most cases, a sort is going to involve an initial
> stage where it reads all the input tuples and writes out quicksorted
> runs, and then a merge phase where it merges all the output tapes into
> a sorted result.  There are some complexities; for example, if the
> number of tapes is really large, then we might need multiple merge
> phases, only the last of which will produce tuples.

This would ordinarily be the point at which I'd say "but you're very
unlikely to require multiple passes for an external sort these days".
But I won't say that on this thread, because CLUSTER generally has
unusually wide tuples, and so is much more likely to be I/O bound, to
require multiple passes, etc. (I bet the v10 enhancements
disproportionately improved CLUSTER performance.)

-- 
Peter Geoghegan

Re: [HACKERS] CLUSTER command progress monitor

2017-11-21 Thread Robert Haas

On Mon, Nov 20, 2017 at 12:05 PM, Antonin Houska  wrote:
> Robert Haas  wrote:
>> On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada
>>  wrote:
>> >   1. scanning heap
>> >   2. sort tuples
>>
>> These two phases overlap, though. I believe progress reporting for
>> sorts is really hard.  In the simple case where the data fits in
>> work_mem, none of the work of the sort gets done until all the data is
>> read.  Once you switch to an external sort, you're writing batch
>> files, so a lot of the work is now being done during data loading.
>> But as the number of batch files grows, the final merge at the end
>> becomes an increasingly noticeable part of the cost, and eventually
>> you end up needing multiple merge passes.  I think we need some smart
>> way to report on sorts so that we can tell how much of the work has
>> really been done, but I don't know how to do it.
>
> Whatever complexity is hidden in the sort, cost_sort() should have taken it
> into consideration when called via plan_cluster_use_sort(). Thus I think that
> once we have both startup and total cost, the current progress of the sort
> stage can be estimated from the current number of input and output
> rows. Please remind me if my proposal appears to be too simplistic.

I think it is far too simplistic.  If the sort is being fed by a
sequential scan, reporting the number of blocks scanned so far as
compared to the total number that will be scanned would be a fine way
of reporting on the progress of the sequential scan -- and it's better
to use blocks, which we know for sure about, than rows, at which we
can only guess.  But that's the *scan* progress, not the *sort*
progress.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

2017-11-21 Thread Robert Haas

On Mon, Nov 20, 2017 at 12:25 PM, Tom Lane  wrote:
> Antonin Houska  writes:
>> Robert Haas  wrote:
>>> These two phases overlap, though. I believe progress reporting for
>>> sorts is really hard.
>
>> Whatever complexity is hidden in the sort, cost_sort() should have taken it
>> into consideration when called via plan_cluster_use_sort(). Thus I think that
>> once we have both startup and total cost, the current progress of the sort
>> stage can be estimated from the current number of input and output
>> rows. Please remind me if my proposal appears to be too simplistic.
>
> Well, even if you assume that the planner's cost model omits nothing
> (which I wouldn't bet on), its result is only going to be as good as the
> planner's estimate of the number of rows to be sorted.  And, in cases
> where people actually care about progress monitoring, it's likely that
> the planner got that wrong, maybe horribly so.  I think it's a bad idea
> for progress monitoring to depend on the planner's estimates in any way
> whatsoever.

I agree.

I have been of the opinion all along that progress monitoring needs to
report facts, not theories.  The number of tuples read thus far is a
fact, and is fine to report for whatever value it may have to someone.
The number of tuples that will be read in the future is a theory, and
as you say, progress monitoring is most likely to be used in cases
where theory and practice ended up being very different.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] CLUSTER command progress monitor

2017-11-21 Thread Antonin Houska

Tom Lane  wrote:

> Antonin Houska  writes:
> > Robert Haas  wrote:
> >> These two phases overlap, though. I believe progress reporting for
> >> sorts is really hard.
> 
> > Whatever complexity is hidden in the sort, cost_sort() should have taken it
> > into consideration when called via plan_cluster_use_sort(). Thus I think 
> > that
> > once we have both startup and total cost, the current progress of the sort
> > stage can be estimated from the current number of input and output
> > rows. Please remind me if my proposal appears to be too simplistic.
> 
> Well, even if you assume that the planner's cost model omits nothing
> (which I wouldn't bet on), its result is only going to be as good as the
> planner's estimate of the number of rows to be sorted.  And, in cases
> where people actually care about progress monitoring, it's likely that
> the planner got that wrong, maybe horribly so.  I think it's a bad idea
> for progress monitoring to depend on the planner's estimates in any way
> whatsoever.

The general idea was that some sort of prediction of the total cost is needed
anyway if we should tell during execution what fraction of work has already
been done. And also that the cost computation that we perform during execution
shouldn't (ideally) differ from cost_sort(). So I thought that it's easier to
refine cost_sort() than to implement the same computation from scratch
elsewhere.

Besides that I see 2 circumstances that make the estimate of the number of
input tuples simpler in the CLUSTER case:

* There's only 1 input relation w/o any kind of clause.

* CLUSTER uses SnapshotAny, so pg_class(reltuples) is closer to the actual
  number of input rows than it would be in general case. (Of course, pg_class
  would only be useful for the initial estimate.)

Unlike planner, the executor could recalculate the cost estimate at some
point(s) as it recognizes that the actual number of tuples per page appears to
differ from the density derived from pg_class initially. Still wrong?

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

Re: [HACKERS] CLUSTER command progress monitor

2017-11-20 Thread Tom Lane

Antonin Houska  writes:
> Robert Haas  wrote:
>> These two phases overlap, though. I believe progress reporting for
>> sorts is really hard.

> Whatever complexity is hidden in the sort, cost_sort() should have taken it
> into consideration when called via plan_cluster_use_sort(). Thus I think that
> once we have both startup and total cost, the current progress of the sort
> stage can be estimated from the current number of input and output
> rows. Please remind me if my proposal appears to be too simplistic.

Well, even if you assume that the planner's cost model omits nothing
(which I wouldn't bet on), its result is only going to be as good as the
planner's estimate of the number of rows to be sorted.  And, in cases
where people actually care about progress monitoring, it's likely that
the planner got that wrong, maybe horribly so.  I think it's a bad idea
for progress monitoring to depend on the planner's estimates in any way
whatsoever.

regards, tom lane

Re: [HACKERS] CLUSTER command progress monitor

2017-11-20 Thread Antonin Houska

Robert Haas  wrote:

> On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada
>  wrote:
> >   1. scanning heap
> >   2. sort tuples
> 
> These two phases overlap, though. I believe progress reporting for
> sorts is really hard.  In the simple case where the data fits in
> work_mem, none of the work of the sort gets done until all the data is
> read.  Once you switch to an external sort, you're writing batch
> files, so a lot of the work is now being done during data loading.
> But as the number of batch files grows, the final merge at the end
> becomes an increasingly noticeable part of the cost, and eventually
> you end up needing multiple merge passes.  I think we need some smart
> way to report on sorts so that we can tell how much of the work has
> really been done, but I don't know how to do it.

Whatever complexity is hidden in the sort, cost_sort() should have taken it
into consideration when called via plan_cluster_use_sort(). Thus I think that
once we have both startup and total cost, the current progress of the sort
stage can be estimated from the current number of input and output
rows. Please remind me if my proposal appears to be too simplistic.

-- 
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

91 matches

Mail list logo