Re: progress reporting for partitioned REINDEX

2021-02-19 Thread Justin Pryzby
On Sat, Feb 20, 2021 at 10:37:08AM +0900, Michael Paquier wrote:
> > Also, I noticed that vacuum recurses into partition heirarchies since v10, 
> > but
> > pg_stat_progress_vacuum also doesn't show anything about the parent table or
> > the progress of recursing through the hierarchy.
> 
> Yeah, that's an area where it would be possible to improve the
> monitoring, for both autovacuums and manual VACUUMs.

I was thinking that instead of reporting partitions_done/partitions_total in
the individual progress views, maybe the progress across partitions should be
reported in a separate pg_stat_progress_partitioned.  This would apply to my
CLUSTER patch as well as VACUUM.  I haven't though about the implementation,
though.

If the partitions_done/total were *removed* from the create_index view, that
would resolve the odd behavior that a single row simultanously shows 1) the
overall progress of the operation across partitions; and, 2) the detailed
information about the status of the operation on the current leaf partition.

However I guess it's not general enough to support progress reports of
execution of planned (not utility) statements.

-- 
Justin




Re: progress reporting for partitioned REINDEX

2021-02-19 Thread Michael Paquier
On Fri, Feb 19, 2021 at 12:12:54AM -0600, Justin Pryzby wrote:
> Looks fine.

Thanks, applied then to clarify things.

> Also, I noticed that vacuum recurses into partition heirarchies since v10, but
> pg_stat_progress_vacuum also doesn't show anything about the parent table or
> the progress of recursing through the hierarchy.

Yeah, that's an area where it would be possible to improve the
monitoring, for both autovacuums and manual VACUUMs.
--
Michael


signature.asc
Description: PGP signature


Re: progress reporting for partitioned REINDEX

2021-02-18 Thread Justin Pryzby
On Fri, Feb 19, 2021 at 03:06:04PM +0900, Michael Paquier wrote:
> On Thu, Feb 18, 2021 at 02:17:00PM +0900, Michael Paquier wrote:
> > I have no issues with documenting more precisely on which commands
> > partitions_total and partitions_done apply currently, by citing the
> > commands where these are effective.  We do that for index_relid for
> > instance.
> 
> Please find attached a patch to do that.  Justin, what do you think?

Looks fine.

I removed this from opened items.

Also, I noticed that vacuum recurses into partition heirarchies since v10, but
pg_stat_progress_vacuum also doesn't show anything about the parent table or
the progress of recursing through the hierarchy.

-- 
Justin




Re: progress reporting for partitioned REINDEX

2021-02-18 Thread Michael Paquier
On Thu, Feb 18, 2021 at 02:17:00PM +0900, Michael Paquier wrote:
> I have no issues with documenting more precisely on which commands
> partitions_total and partitions_done apply currently, by citing the
> commands where these are effective.  We do that for index_relid for
> instance.

Please find attached a patch to do that.  Justin, what do you think?
--
Michael
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index c602ee4427..3513e127b7 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5716,6 +5716,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
When creating an index on a partitioned table, this column is set to
the total number of partitions on which the index is to be created.
+   This field is 0 during a REINDEX.
   
  
 
@@ -5726,6 +5727,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
When creating an index on a partitioned table, this column is set to
the number of partitions on which the index has been completed.
+   This field is 0 during a REINDEX.
   
  
 


signature.asc
Description: PGP signature


Re: progress reporting for partitioned REINDEX

2021-02-17 Thread Michael Paquier
On Wed, Feb 17, 2021 at 10:24:37AM -0600, Justin Pryzby wrote:
> When we implemented REINDEX of partitioned tables, it should've handled
> progress reporting in the fields where that's reported for CREATE INDEX.
> Or else we should document that "partitions_total/done are not populated for
> REINDEX of a partitioned table as they are for CREATE INDEX".

CREATE INDEX and REINDEX are two completely separate commands, with
separate code paths, and mostly separate logics.  When it comes to
REINDEX, the information that is currently showed to the user is not
incorrect, but in line with what the progress reporting of ~13 is able
to do: each index gets reported with its parent table one-by-one,
depending on if CONCURRENTLY is used or not, in consistency with what
ReindexMultipleTables() does for all the REINDEX commands working on
multiple objects, processing in one transaction each object listed
previously.

Now, coming back to the ask, I think that if we want to provide some
information in the REINDEX with the list of relations to work on, we
are going to need more fields than what we have now, to report:
1) The total number of indexes on which REINDEX is working on for the
current relation worked on.
2) The n-th index being worked on by REINDEX, as of the number of
indexes in 1).
3) The total number of relations a given command is working on, aka
the number of tables REINDEX SCHEMA, DATABASE, SYSTEM or REINDEX on a
partitioned relation has accumulated.
4) The n-th relation listed in 3) currently worked on.

The current columns partitions_total and partitions_done are partially
able to fill in the roles of 3) and 4), if we'd rename those columns
to relations_done and relations_total, still they could also mean 1)
and 2) in some contexts, like the number of indexes worked on for a
single relation.  So the problem is more complex than you make it
sound, and needs to consider a certain number of cases to be
consistent across all the REINDEX commands that exist.  In short, this
is not only a problem related to partitioned tables.

I have no issues with documenting more precisely on which commands
partitions_total and partitions_done apply currently, by citing the
commands where these are effective.  We do that for index_relid for
instance.
--
Michael


signature.asc
Description: PGP signature


Re: progress reporting for partitioned REINDEX

2021-02-17 Thread Justin Pryzby
On Wed, Feb 17, 2021 at 03:36:20PM +0900, Michael Paquier wrote:
> On Wed, Feb 17, 2021 at 12:10:43AM -0600, Justin Pryzby wrote:
> > On Wed, Feb 17, 2021 at 02:55:04PM +0900, Michael Paquier wrote:
> >> I see no bug here.
> > 
> > pg_stat_progress_create_index includes partitions_{done,total} for
> > CREATE INDEX p, so isn't it strange if it wouldn't do likewise for
> > REINDEX INDEX p ?
> 
> There is always room for improvement.  This stuff applies now only
> when creating an index in the non-concurrent case because an index
> cannot be created on a partitioned table concurrently, and this
> behavior is documented as such.  If we are going to improve this area,
> it seems to me that we may want to consider more cases than just the
> case of partitions, as it could also help the monitoring of REINDEX on
> schemas and databases.
> 
> I don't think that this fits as an open item.  That's just a different
> feature.

I see it as an omission in the existing feature.

Since v13, pg_stat_progress_create_index does progress reports for CREATE INDEX
(partitioned and nonpartitioned), and REINDEX of nonpartitioned tables.

When we implemented REINDEX of partitioned tables, it should've handled
progress reporting in the fields where that's reported for CREATE INDEX.
Or else we should document that "partitions_total/done are not populated for
REINDEX of a partitioned table as they are for CREATE INDEX".

-- 
Justin




Re: progress reporting for partitioned REINDEX

2021-02-16 Thread Michael Paquier
On Wed, Feb 17, 2021 at 12:10:43AM -0600, Justin Pryzby wrote:
> On Wed, Feb 17, 2021 at 02:55:04PM +0900, Michael Paquier wrote:
>> I see no bug here.
> 
> pg_stat_progress_create_index includes partitions_{done,total} for
> CREATE INDEX p, so isn't it strange if it wouldn't do likewise for
> REINDEX INDEX p ?

There is always room for improvement.  This stuff applies now only
when creating an index in the non-concurrent case because an index
cannot be created on a partitioned table concurrently, and this
behavior is documented as such.  If we are going to improve this area,
it seems to me that we may want to consider more cases than just the
case of partitions, as it could also help the monitoring of REINDEX on
schemas and databases.

I don't think that this fits as an open item.  That's just a different
feature.
--
Michael


signature.asc
Description: PGP signature


Re: progress reporting for partitioned REINDEX

2021-02-16 Thread Justin Pryzby
On Wed, Feb 17, 2021 at 02:55:04PM +0900, Michael Paquier wrote:
> I see no bug here.

pg_stat_progress_create_index includes partitions_{done,total} for
CREATE INDEX p, so isn't it strange if it wouldn't do likewise for
REINDEX INDEX p ?

-- 
Justin




Re: progress reporting for partitioned REINDEX

2021-02-16 Thread Michael Paquier
On Tue, Feb 16, 2021 at 12:39:08PM +0100, Matthias van de Meent wrote:
> These were added to report the index and table that are currently
> being worked on in concurrent reindexes of tables, schemas and
> databases. Before that commit, it would only report up to the last
> index being prepared in phase 1, leaving the user with no info on
> which index is being rebuilt.

Nothing much to add on top of what Matthias is saying here.  REINDEX
for partitioned relations builds first the full list of partitions to
work on, and then processes each one of them in a separate
transaction.  This is consistent with what we do for other commands
that need to handle an object different than a non-partitioned table
or a non-partitioned index.  The progress reporting has to report the
index whose storage is manipulated and its parent table.

> Why pgstat_progress_start_command specifically was chosen? That is
> because there is no method to update the
> beentry->st_progress_command_target other than through
> stat_progress_start_command, and according to the docs that field
> should contain the tableId of the index that is currently being worked
> on. This field needs a pgstat_progress_start_command because CIC / RiC
> reindexes all indexes concurrently at the same time (and not grouped
> by e.g. table), so we must re-start reporting for each index in each
> new phase in which we report data to get the heapId reported correctly
> for that index.

Using pgstat_progress_start_command() for this purpose is fine IMO.
This provides enough information for the user without complicating
more this API layer.

-   if (progress)
-   
pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID,
-
iRel->rd_rel->relam);
+   // Do this unconditionally?
+   pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID,
+
iRel->rd_rel->relam);
You cannot do that, this would clobber the progress information of any
upper layer that already reports something to the progress infra in
the backend's MyProc.  CLUSTER is one example calling reindex_relation()
that does *not* want progress reporting to happen in REINDEX. 

+   /* progress reporting for partitioned indexes */
+   if (relkind == RELKIND_PARTITIONED_INDEX)
+   {
+   const int   progress_index[3] = {
+   PROGRESS_CREATEIDX_COMMAND,
+   PROGRESS_CREATEIDX_INDEX_OID,
+   PROGRESS_CREATEIDX_PARTITIONS_TOTAL
+   };
This does not make sense in ReindexPartitions() IMO because this
relation is not reindexed as it has no storage, and you would lose the
context of each partition.

Something that we may want to study instead is whether we'd like to
report to the user the set of relations a REINDEX command is working
on and on which relation the work is currently done.  But I am not
really sure that we need that as long a we have a VERBOSE option that
lets us know via the logs what already happened in a single command.

I see no bug here.
--
Michael


signature.asc
Description: PGP signature


Re: progress reporting for partitioned REINDEX

2021-02-16 Thread Matthias van de Meent
On Tue, 16 Feb 2021, 07:42 Justin Pryzby,  wrote:
>
> It looks like we missed this in a6642b3ae.
>
> I think it's an odd behavior of pg_stat_progress_create_index to 
> simultaneously
> show the global progress as well as the progress for the current partition ...
>
> It seems like for partitioned reindex, reindex_index() should set the AM, 
> which
> is used in the view:
>
> src/backend/catalog/system_views.sql-   WHEN 2 THEN 
> 'building index' ||
> src/backend/catalog/system_views.sql:   COALESCE((': 
> ' || pg_indexam_progress_phasename(S.param9::oid, S.param11)),
>
> Maybe it needs a new flag, like:
> params->options & REINDEXOPT_REPORT_PROGRESS_AM
>
> I don't understand why e66bcfb4c added multiple calls to
> pgstat_progress_start_command().


These were added to report the index and table that are currently
being worked on in concurrent reindexes of tables, schemas and
databases. Before that commit, it would only report up to the last
index being prepared in phase 1, leaving the user with no info on
which index is being rebuilt.

Why pgstat_progress_start_command specifically was chosen? That is
because there is no method to update the
beentry->st_progress_command_target other than through
stat_progress_start_command, and according to the docs that field
should contain the tableId of the index that is currently being worked
on. This field needs a pgstat_progress_start_command because CIC / RiC
reindexes all indexes concurrently at the same time (and not grouped
by e.g. table), so we must re-start reporting for each index in each
new phase in which we report data to get the heapId reported correctly
for that index.


With regards,

Matthias van de Meent