Re: progress reporting for partitioned REINDEX
On Sat, Feb 20, 2021 at 10:37:08AM +0900, Michael Paquier wrote: > > Also, I noticed that vacuum recurses into partition heirarchies since v10, > > but > > pg_stat_progress_vacuum also doesn't show anything about the parent table or > > the progress of recursing through the hierarchy. > > Yeah, that's an area where it would be possible to improve the > monitoring, for both autovacuums and manual VACUUMs. I was thinking that instead of reporting partitions_done/partitions_total in the individual progress views, maybe the progress across partitions should be reported in a separate pg_stat_progress_partitioned. This would apply to my CLUSTER patch as well as VACUUM. I haven't though about the implementation, though. If the partitions_done/total were *removed* from the create_index view, that would resolve the odd behavior that a single row simultanously shows 1) the overall progress of the operation across partitions; and, 2) the detailed information about the status of the operation on the current leaf partition. However I guess it's not general enough to support progress reports of execution of planned (not utility) statements. -- Justin
Re: progress reporting for partitioned REINDEX
On Fri, Feb 19, 2021 at 12:12:54AM -0600, Justin Pryzby wrote: > Looks fine. Thanks, applied then to clarify things. > Also, I noticed that vacuum recurses into partition heirarchies since v10, but > pg_stat_progress_vacuum also doesn't show anything about the parent table or > the progress of recursing through the hierarchy. Yeah, that's an area where it would be possible to improve the monitoring, for both autovacuums and manual VACUUMs. -- Michael signature.asc Description: PGP signature
Re: progress reporting for partitioned REINDEX
On Fri, Feb 19, 2021 at 03:06:04PM +0900, Michael Paquier wrote: > On Thu, Feb 18, 2021 at 02:17:00PM +0900, Michael Paquier wrote: > > I have no issues with documenting more precisely on which commands > > partitions_total and partitions_done apply currently, by citing the > > commands where these are effective. We do that for index_relid for > > instance. > > Please find attached a patch to do that. Justin, what do you think? Looks fine. I removed this from opened items. Also, I noticed that vacuum recurses into partition heirarchies since v10, but pg_stat_progress_vacuum also doesn't show anything about the parent table or the progress of recursing through the hierarchy. -- Justin
Re: progress reporting for partitioned REINDEX
On Thu, Feb 18, 2021 at 02:17:00PM +0900, Michael Paquier wrote: > I have no issues with documenting more precisely on which commands > partitions_total and partitions_done apply currently, by citing the > commands where these are effective. We do that for index_relid for > instance. Please find attached a patch to do that. Justin, what do you think? -- Michael diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index c602ee4427..3513e127b7 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -5716,6 +5716,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, When creating an index on a partitioned table, this column is set to the total number of partitions on which the index is to be created. + This field is 0 during a REINDEX. @@ -5726,6 +5727,7 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, When creating an index on a partitioned table, this column is set to the number of partitions on which the index has been completed. + This field is 0 during a REINDEX. signature.asc Description: PGP signature
Re: progress reporting for partitioned REINDEX
On Wed, Feb 17, 2021 at 10:24:37AM -0600, Justin Pryzby wrote: > When we implemented REINDEX of partitioned tables, it should've handled > progress reporting in the fields where that's reported for CREATE INDEX. > Or else we should document that "partitions_total/done are not populated for > REINDEX of a partitioned table as they are for CREATE INDEX". CREATE INDEX and REINDEX are two completely separate commands, with separate code paths, and mostly separate logics. When it comes to REINDEX, the information that is currently showed to the user is not incorrect, but in line with what the progress reporting of ~13 is able to do: each index gets reported with its parent table one-by-one, depending on if CONCURRENTLY is used or not, in consistency with what ReindexMultipleTables() does for all the REINDEX commands working on multiple objects, processing in one transaction each object listed previously. Now, coming back to the ask, I think that if we want to provide some information in the REINDEX with the list of relations to work on, we are going to need more fields than what we have now, to report: 1) The total number of indexes on which REINDEX is working on for the current relation worked on. 2) The n-th index being worked on by REINDEX, as of the number of indexes in 1). 3) The total number of relations a given command is working on, aka the number of tables REINDEX SCHEMA, DATABASE, SYSTEM or REINDEX on a partitioned relation has accumulated. 4) The n-th relation listed in 3) currently worked on. The current columns partitions_total and partitions_done are partially able to fill in the roles of 3) and 4), if we'd rename those columns to relations_done and relations_total, still they could also mean 1) and 2) in some contexts, like the number of indexes worked on for a single relation. So the problem is more complex than you make it sound, and needs to consider a certain number of cases to be consistent across all the REINDEX commands that exist. In short, this is not only a problem related to partitioned tables. I have no issues with documenting more precisely on which commands partitions_total and partitions_done apply currently, by citing the commands where these are effective. We do that for index_relid for instance. -- Michael signature.asc Description: PGP signature
Re: progress reporting for partitioned REINDEX
On Wed, Feb 17, 2021 at 03:36:20PM +0900, Michael Paquier wrote: > On Wed, Feb 17, 2021 at 12:10:43AM -0600, Justin Pryzby wrote: > > On Wed, Feb 17, 2021 at 02:55:04PM +0900, Michael Paquier wrote: > >> I see no bug here. > > > > pg_stat_progress_create_index includes partitions_{done,total} for > > CREATE INDEX p, so isn't it strange if it wouldn't do likewise for > > REINDEX INDEX p ? > > There is always room for improvement. This stuff applies now only > when creating an index in the non-concurrent case because an index > cannot be created on a partitioned table concurrently, and this > behavior is documented as such. If we are going to improve this area, > it seems to me that we may want to consider more cases than just the > case of partitions, as it could also help the monitoring of REINDEX on > schemas and databases. > > I don't think that this fits as an open item. That's just a different > feature. I see it as an omission in the existing feature. Since v13, pg_stat_progress_create_index does progress reports for CREATE INDEX (partitioned and nonpartitioned), and REINDEX of nonpartitioned tables. When we implemented REINDEX of partitioned tables, it should've handled progress reporting in the fields where that's reported for CREATE INDEX. Or else we should document that "partitions_total/done are not populated for REINDEX of a partitioned table as they are for CREATE INDEX". -- Justin
Re: progress reporting for partitioned REINDEX
On Wed, Feb 17, 2021 at 12:10:43AM -0600, Justin Pryzby wrote: > On Wed, Feb 17, 2021 at 02:55:04PM +0900, Michael Paquier wrote: >> I see no bug here. > > pg_stat_progress_create_index includes partitions_{done,total} for > CREATE INDEX p, so isn't it strange if it wouldn't do likewise for > REINDEX INDEX p ? There is always room for improvement. This stuff applies now only when creating an index in the non-concurrent case because an index cannot be created on a partitioned table concurrently, and this behavior is documented as such. If we are going to improve this area, it seems to me that we may want to consider more cases than just the case of partitions, as it could also help the monitoring of REINDEX on schemas and databases. I don't think that this fits as an open item. That's just a different feature. -- Michael signature.asc Description: PGP signature
Re: progress reporting for partitioned REINDEX
On Wed, Feb 17, 2021 at 02:55:04PM +0900, Michael Paquier wrote: > I see no bug here. pg_stat_progress_create_index includes partitions_{done,total} for CREATE INDEX p, so isn't it strange if it wouldn't do likewise for REINDEX INDEX p ? -- Justin
Re: progress reporting for partitioned REINDEX
On Tue, Feb 16, 2021 at 12:39:08PM +0100, Matthias van de Meent wrote: > These were added to report the index and table that are currently > being worked on in concurrent reindexes of tables, schemas and > databases. Before that commit, it would only report up to the last > index being prepared in phase 1, leaving the user with no info on > which index is being rebuilt. Nothing much to add on top of what Matthias is saying here. REINDEX for partitioned relations builds first the full list of partitions to work on, and then processes each one of them in a separate transaction. This is consistent with what we do for other commands that need to handle an object different than a non-partitioned table or a non-partitioned index. The progress reporting has to report the index whose storage is manipulated and its parent table. > Why pgstat_progress_start_command specifically was chosen? That is > because there is no method to update the > beentry->st_progress_command_target other than through > stat_progress_start_command, and according to the docs that field > should contain the tableId of the index that is currently being worked > on. This field needs a pgstat_progress_start_command because CIC / RiC > reindexes all indexes concurrently at the same time (and not grouped > by e.g. table), so we must re-start reporting for each index in each > new phase in which we report data to get the heapId reported correctly > for that index. Using pgstat_progress_start_command() for this purpose is fine IMO. This provides enough information for the user without complicating more this API layer. - if (progress) - pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID, - iRel->rd_rel->relam); + // Do this unconditionally? + pgstat_progress_update_param(PROGRESS_CREATEIDX_ACCESS_METHOD_OID, + iRel->rd_rel->relam); You cannot do that, this would clobber the progress information of any upper layer that already reports something to the progress infra in the backend's MyProc. CLUSTER is one example calling reindex_relation() that does *not* want progress reporting to happen in REINDEX. + /* progress reporting for partitioned indexes */ + if (relkind == RELKIND_PARTITIONED_INDEX) + { + const int progress_index[3] = { + PROGRESS_CREATEIDX_COMMAND, + PROGRESS_CREATEIDX_INDEX_OID, + PROGRESS_CREATEIDX_PARTITIONS_TOTAL + }; This does not make sense in ReindexPartitions() IMO because this relation is not reindexed as it has no storage, and you would lose the context of each partition. Something that we may want to study instead is whether we'd like to report to the user the set of relations a REINDEX command is working on and on which relation the work is currently done. But I am not really sure that we need that as long a we have a VERBOSE option that lets us know via the logs what already happened in a single command. I see no bug here. -- Michael signature.asc Description: PGP signature
Re: progress reporting for partitioned REINDEX
On Tue, 16 Feb 2021, 07:42 Justin Pryzby, wrote: > > It looks like we missed this in a6642b3ae. > > I think it's an odd behavior of pg_stat_progress_create_index to > simultaneously > show the global progress as well as the progress for the current partition ... > > It seems like for partitioned reindex, reindex_index() should set the AM, > which > is used in the view: > > src/backend/catalog/system_views.sql- WHEN 2 THEN > 'building index' || > src/backend/catalog/system_views.sql: COALESCE((': > ' || pg_indexam_progress_phasename(S.param9::oid, S.param11)), > > Maybe it needs a new flag, like: > params->options & REINDEXOPT_REPORT_PROGRESS_AM > > I don't understand why e66bcfb4c added multiple calls to > pgstat_progress_start_command(). These were added to report the index and table that are currently being worked on in concurrent reindexes of tables, schemas and databases. Before that commit, it would only report up to the last index being prepared in phase 1, leaving the user with no info on which index is being rebuilt. Why pgstat_progress_start_command specifically was chosen? That is because there is no method to update the beentry->st_progress_command_target other than through stat_progress_start_command, and according to the docs that field should contain the tableId of the index that is currently being worked on. This field needs a pgstat_progress_start_command because CIC / RiC reindexes all indexes concurrently at the same time (and not grouped by e.g. table), so we must re-start reporting for each index in each new phase in which we report data to get the heapId reported correctly for that index. With regards, Matthias van de Meent