Re: [HACKERS] CLUSTER command progress monitor
On 2019-Sep-18, Michael Paquier wrote: > On Tue, Sep 17, 2019 at 10:50:22PM -0300, Alvaro Herrera wrote: > > On 2019-Sep-18, Michael Paquier wrote: > >> So, with the clock ticking and the release getting close by, what do > >> we do for this set of issues? REINDEX, CREATE INDEX and CLUSTER all > >> try to build indexes and the current infrastructure is not really > >> adapted to hold all that. Robert, Alvaro and Peter E, do you have any > >> comments to offer? > > > > Which part of it is not already fixed? > > I can still see at least two problems. There is one issue with > pgstat_progress_update_param() which gets called in reindex_table() > for a progress phase of CLUSTER, and this even if > REINDEXOPT_REPORT_PROGRESS is not set in the options. (You mean reindex_relation.) ... but that param update is there for CLUSTER, not for REINDEX, so if we made it dependent on the option to turn on CREATE INDEX progress updates, it would break CLUSTER progress reporting. Also, the parameter being updated is not used by CREATE INDEX, so there's no spurious change. I think this ain't broke, and thus it don't need fixin'. If anything, I would like the CLUSTER report to show index creation progress, which would go the opposite way. But that seems a patch for pg13. > Also it seems > to me that the calls to pgstat_progress_start_command() and > pgstat_progress_end_command() are at incorrect locations for > reindex_index() and that those should be one level higher on the stack > to avoid any kind of interactions with another command whose progress > has already started. That doesn't work, because the caller doesn't have the OID of the table, which pgstat_progress_start_command() needs. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
On Tue, Sep 17, 2019 at 10:50:22PM -0300, Alvaro Herrera wrote: > On 2019-Sep-18, Michael Paquier wrote: >> So, with the clock ticking and the release getting close by, what do >> we do for this set of issues? REINDEX, CREATE INDEX and CLUSTER all >> try to build indexes and the current infrastructure is not really >> adapted to hold all that. Robert, Alvaro and Peter E, do you have any >> comments to offer? > > Which part of it is not already fixed? I can still see at least two problems. There is one issue with pgstat_progress_update_param() which gets called in reindex_table() for a progress phase of CLUSTER, and this even if REINDEXOPT_REPORT_PROGRESS is not set in the options. Also it seems to me that the calls to pgstat_progress_start_command() and pgstat_progress_end_command() are at incorrect locations for reindex_index() and that those should be one level higher on the stack to avoid any kind of interactions with another command whose progress has already started. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
On 2019-Sep-18, Michael Paquier wrote: > So, with the clock ticking and the release getting close by, what do > we do for this set of issues? REINDEX, CREATE INDEX and CLUSTER all > try to build indexes and the current infrastructure is not really > adapted to hold all that. Robert, Alvaro and Peter E, do you have any > comments to offer? Which part of it is not already fixed? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
On Sat, Sep 14, 2019 at 11:45:47AM +0900, Michael Paquier wrote: > I have provided a short summary of the two issues on the open item > page (https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items) as > the open item was too much evasive. Here is a copy-paste for the > archives of what I wrote: > 1) A progress may be started while another one is already in progress. > Hence, if progress gets stopped the previously-started state is > removed, causing all follow-up updates to not happen. > 2) Progress updates happening in a code path shared between those > three commands may clobber a previous state present. > > Regarding 1) and based on what I found in the code, you can blame > REINDEX reporting which has added progress_start calls in code paths > which are also taken by CREATE INDEX and CLUSTER, causing their > progress reporting to go to the void. In order to fix this one we > could do what I summarized in [1]. > > [1]: https://www.postgresql.org/message-id/20190905010316.gb14...@paquier.xyz So, with the clock ticking and the release getting close by, what do we do for this set of issues? REINDEX, CREATE INDEX and CLUSTER all try to build indexes and the current infrastructure is not really adapted to hold all that. Robert, Alvaro and Peter E, do you have any comments to offer? -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
On Mon, Sep 16, 2019 at 03:26:10PM +0900, Tattsu Yama wrote: > I should have explained the API changes more. I added cmdtype as a given > parameter for all functions (See below). > Therefore, I suppose that my patch is similar to the following fix as you > mentioned on -hackers. Yes, that's an option I mentioned here, but it has drawbacks: https://www.postgresql.org/message-id/20190914024547.gb15...@paquier.xyz I have just looked at it again after a small rebase and there are issues with the design of your patch: - When aborting a transaction, we need to enforce a reset of the command ID used in st_progress_command to be PROGRESS_COMMAND_INVALID. Unfortunately, your patch does not consider the case where an error happens while a command ID is set, causing a command to still be tracked with the next transactions of the session. Even worse, it prevents pgstat_progress_start_command() to be called again in this case for another command. - CLUSTER can rebuild indexes, and we'd likely want to be able to track some of the information from CREATE INDEX for CLUSTER. The second issue is perhaps fine as it is not really straight-forward to share the same progress phases across multiple commands, and we could live without it for now, or require a follow-up patch to make the information of CREATE INDEX available to CLUSTER. Now, the first issue is of another caliber and a no-go :( On HEAD, pgstat_progress_end_command() has the limitation to not be able to stack multiple commands, so calling it in cascade has the disadvantage to perhaps erase the progress state of a command (and it is not designed for that anyway), which is what happens with CLUSTER when reindex_index() starts a new progress report, but the simplicity of the current infrastructure is very safe when it comes to failure handling, to make sure that an reset happens as long as the command ID is not invalid. Your patch makes that part unpredictable. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
Hi Alvaro! Is this fix strictly necessary for pg12, or is this something that we can leave for pg13? Not only me but many DBA needs this progress report feature on PG12, therefore I'm trying to fix the problem. If you send other patch to fix the problem, and it is more elegant than mine, I can withdraw my patch. Anyway, I want to avoid this feature being reverted. Do you have any ideas to fix the problem? I committed a fix for the originally reported problem as da47e43dc32e in branch REL_12_STABLE. Is that insufficient, and if so why? Ooops, I misunderstood. I now realized you committed your patch to fix the problem. Thanks! I'll test it later. :) https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=da47e43dc32e3c5916396f0cbcfa974b371e4875 I tested your patch (da47e43d) and it works fine. Thanks! :) So, my patch improving progress reporting API can leave for PG13. #Test scenario === [Session #1] select * from pg_stat_progress_cluster ; \watch 0.0001 [Session #2] create table hoge as select a from generate_series(1, 10) a; create index ind_hoge1 on hoge(a); create index ind_hoge2 on hoge((a%2)); create index ind_hoge3 on hoge((a%3)); create index ind_hoge4 on hoge((a%4)); create index ind_hoge5 on hoge((a%5)); cluster hoge using ind_hoge1; === #Test result === 22283|13593|postgres|16384|CLUSTER|initializing|0|0|0|0|0|0 ... 22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|10|10|0|0|0 <= Increasing from 0 to 5 22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|10|10|0|0|1 22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|10|10|0|0|2 22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|10|10|0|0|3 22283|13593|postgres|16384|CLUSTER|rebuilding index|16387|10|10|0|0|4 22283|13593|postgres|16384|CLUSTER|performing final cleanup|16387|10|10|0|0|5 === Thanks, Tatsuro Yamada
Re: [HACKERS] CLUSTER command progress monitor
Hi Alvaro! Is this fix strictly necessary for pg12, or is this something that we can leave for pg13? Not only me but many DBA needs this progress report feature on PG12, therefore I'm trying to fix the problem. If you send other patch to fix the problem, and it is more elegant than mine, I can withdraw my patch. Anyway, I want to avoid this feature being reverted. Do you have any ideas to fix the problem? I committed a fix for the originally reported problem as da47e43dc32e in branch REL_12_STABLE. Is that insufficient, and if so why? Ooops, I misunderstood. I now realized you committed your patch to fix the problem. Thanks! I'll test it later. :) https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=da47e43dc32e3c5916396f0cbcfa974b371e4875 Thanks, Tatsuro Yamada
Re: [HACKERS] CLUSTER command progress monitor
On 2019-Sep-17, Tatsuro Yamada wrote: > On 2019/09/16 23:12, Alvaro Herrera wrote: > > Is this fix strictly necessary for pg12, or is this something that we > > can leave for pg13? > > Not only me but many DBA needs this progress report feature on PG12, > therefore I'm trying to fix the problem. If you send other patch to > fix the problem, and it is more elegant than mine, I can withdraw my patch. > Anyway, I want to avoid this feature being reverted. > Do you have any ideas to fix the problem? I committed a fix for the originally reported problem as da47e43dc32e in branch REL_12_STABLE. Is that insufficient, and if so why? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
Hi Alvaro, On 2019/09/16 23:12, Alvaro Herrera wrote: On 2019-Sep-16, Tattsu Yama wrote: I should have explained the API changes more. I added cmdtype as a given parameter for all functions (See below). Therefore, I suppose that my patch is similar to the following fix as you mentioned on -hackers. Is this fix strictly necessary for pg12, or is this something that we can leave for pg13? Not only me but many DBA needs this progress report feature on PG12, therefore I'm trying to fix the problem. If you send other patch to fix the problem, and it is more elegant than mine, I can withdraw my patch. Anyway, I want to avoid this feature being reverted. Do you have any ideas to fix the problem? Thanks, Tatsuro Yamada
Re: [HACKERS] CLUSTER command progress monitor
On 2019-Sep-16, Tattsu Yama wrote: > I should have explained the API changes more. I added cmdtype as a given > parameter for all functions (See below). > Therefore, I suppose that my patch is similar to the following fix as you > mentioned on -hackers. Is this fix strictly necessary for pg12, or is this something that we can leave for pg13? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
Hi Michael, >> > Attached file is WIP patch.In my patch, I added "command id" to all >> APIs of >> > progress reporting to isolate commands. Therefore, it doesn't allow to >> > cascade updating system views. And my patch is on WIP so it needs >> clean-up >> > and test. >> > I share it anyway. :) >> >> + if (cmdtype == PROGRESS_COMMAND_INVALID || >> beentry->st_progress_command == cmdtype) >> + { >> + PGSTAT_BEGIN_WRITE_ACTIVITY(beentry); >> + beentry->st_progress_param[index] = val; >> + PGSTAT_END_WRITE_ACTIVITY(beentry); >> + } >> You basically don't need the progress reports if the command ID is >> invalid, no? >> > > > Ah, right. > I'll check and fix that today. :) > > I fixed the patch based on your comment. Please find attached file. :) I should have explained the API changes more. I added cmdtype as a given parameter for all functions (See below). Therefore, I suppose that my patch is similar to the following fix as you mentioned on -hackers. - Allow only reporting for a given command ID, which would basically > require to pass down the command ID to progress update APIs and bypass an > update > if the command ID provided by caller does not match the existing one > started (?). #pgstat.c pgstat_progress_start_command(ProgressCommandType cmdtype,...) - Progress reporter starts when beentry->st_progress_command is PROGRESS_COMMAND_INVALID pgstat_progress_end_command(ProgressCommandType cmdtype,...) - Progress reporter ends when beentry->st_progress_command equals cmdtype pgstat_progress_update_param(ProgressCommandType cmdtype,...) and pgstat_progress_update_multi_param(ProgressCommandType cmdtype,...) - Progress reporter updates parameters if beentry->st_progress_command equals cmdtype Note: cmdtype means the ProgressCommandType below: # pgstat.h typedef enum ProgressCommandType { PROGRESS_COMMAND_INVALID, PROGRESS_COMMAND_VACUUM, PROGRESS_COMMAND_CLUSTER, PROGRESS_COMMAND_CREATE_INDEX } ProgressCommandType; Thanks, Tatsuro Yamada v2_fix_progress_report_for_cluster.patch Description: Binary data
Re: [HACKERS] CLUSTER command progress monitor
Hi Michael! > Attached file is WIP patch.In my patch, I added "command id" to all APIs > of > > progress reporting to isolate commands. Therefore, it doesn't allow to > > cascade updating system views. And my patch is on WIP so it needs > clean-up > > and test. > > I share it anyway. :) > > + if (cmdtype == PROGRESS_COMMAND_INVALID || > beentry->st_progress_command == cmdtype) > + { > + PGSTAT_BEGIN_WRITE_ACTIVITY(beentry); > + beentry->st_progress_param[index] = val; > + PGSTAT_END_WRITE_ACTIVITY(beentry); > + } > You basically don't need the progress reports if the command ID is > invalid, no? > Ah, right. I'll check and fix that today. :) > > Another note is that you don't actually fix the problems related to > the calls of pgstat_progress_end_command() which have been added for > REINDEX reporting, so a progress report started for CLUSTER can get > ended earlier than expected, preventing the follow-up progress updates > to show up. > > Hmm... I fixed the problem. Please confirm the test result repeated below. CLUSTER is able to get the last phase: performing final clean up by using the patch. # Test result postgres=# select * from pg_stat_progress_cluster ; \watch 0.001; 11636|13591|postgres|16384|CLUSTER|initializing|0|0|0|0|0|0 11636|13591|postgres|16384|CLUSTER|index scanning heap|16389|251|251|0|0|0 11636|13591|postgres|16384|CLUSTER|index scanning heap|16389|1|1|0|0|0 11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|0 <== The last column rebuild_index_count is increasing! 11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|1 11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|2 11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|3 11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|4 11636|13591|postgres|16384|CLUSTER|performing final cleanup|16389|1|1|0|0|5 <== The last phase of CLUSTER! Thanks, Tatsuro Yamada
Re: [HACKERS] CLUSTER command progress monitor
On Sat, Sep 14, 2019 at 01:06:32PM +0900, Tattsu Yama wrote: > Thanks! I can review your patch for fix it. > However, I was starting fixing the problem from the last day of PGConf.Asia > (11 Sep). > Attached file is WIP patch.In my patch, I added "command id" to all APIs of > progress reporting to isolate commands. Therefore, it doesn't allow to > cascade updating system views. And my patch is on WIP so it needs clean-up > and test. > I share it anyway. :) + if (cmdtype == PROGRESS_COMMAND_INVALID || beentry->st_progress_command == cmdtype) + { + PGSTAT_BEGIN_WRITE_ACTIVITY(beentry); + beentry->st_progress_param[index] = val; + PGSTAT_END_WRITE_ACTIVITY(beentry); + } You basically don't need the progress reports if the command ID is invalid, no? Another note is that you don't actually fix the problems related to the calls of pgstat_progress_end_command() which have been added for REINDEX reporting, so a progress report started for CLUSTER can get ended earlier than expected, preventing the follow-up progress updates to show up. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
Hi Alvaro! > Hello Tatsuro, > On 2019-Aug-13, Tatsuro Yamada wrote: > > On 2019/08/02 3:43, Alvaro Herrera wrote: > > > Hmm, I'm trying this out now and I don't see the index_rebuild_count > > > ever go up. I think it's because the indexes are built using parallel > > > index build ... or maybe it was the table AM changes that moved things > > > around, not sure. There's a period at the end when the CLUSTER command > > > keeps working, but it's gone from pg_stat_progress_cluster. > > > > Thanks for your report. > > I'll investigate it. :) > I have fixed it. Can you please verify? > Thanks! I can review your patch for fix it. However, I was starting fixing the problem from the last day of PGConf.Asia (11 Sep). Attached file is WIP patch.In my patch, I added "command id" to all APIs of progress reporting to isolate commands. Therefore, it doesn't allow to cascade updating system views. And my patch is on WIP so it needs clean-up and test. I share it anyway. :) Here is a test result of my patch. The last column index_rebuild count is increased. postgres=# select * from pg_stat_progress_cluster ; \watch 0.001; 11636|13591|postgres|16384|CLUSTER|initializing|0|0|0|0|0|0 11636|13591|postgres|16384|CLUSTER|index scanning heap|16389|251|251|0|0|0 ... 11636|13591|postgres|16384|CLUSTER|index scanning heap|16389|1|1|0|0|0 11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|0... 11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|1 ... 11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|2 ... 11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|3 ... 11636|13591|postgres|16384|CLUSTER|rebuilding index|16389|1|1|0|0|4 ... 11636|13591|postgres|16384|CLUSTER|performing final cleanup|16389|1|1|0|0|5 Thanks, Tatsuro Yamada v1_fix_progress_report_for_cluster.patch Description: Binary data
Re: [HACKERS] CLUSTER command progress monitor
On Fri, Sep 13, 2019 at 12:48:40PM -0400, Robert Haas wrote: > On Fri, Sep 13, 2019 at 12:03 PM Alvaro Herrera > wrote: >> Ummm ... I've been operating --in this thread-- under the assumption >> that it is REINDEX to blame for this problem, not CREATE INDEX, because >> my recollection is that I tested CREATE INDEX together with CLUSTER and >> it worked fine. Has anybody done any actual research that the problem >> is to blame on CREATE INDEX and not REINDEX? > > I am not sure. I think, though, that the point is that all three > commands rebuild indexes. So unless they all expect the same things in > terms of which counters get set during that process, things will not > work correctly. I have provided a short summary of the two issues on the open item page (https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items) as the open item was too much evasive. Here is a copy-paste for the archives of what I wrote: 1) A progress may be started while another one is already in progress. Hence, if progress gets stopped the previously-started state is removed, causing all follow-up updates to not happen. 2) Progress updates happening in a code path shared between those three commands may clobber a previous state present. Regarding 1) and based on what I found in the code, you can blame REINDEX reporting which has added progress_start calls in code paths which are also taken by CREATE INDEX and CLUSTER, causing their progress reporting to go to the void. In order to fix this one we could do what I summarized in [1]. As mentioned by Robert, the problem summarized in 2) is much more complex using the current infrastructure, and one could blame all the commands per the way they do not share the same set of progress phases. There are a couple of potential solutions which have been discussed on the thread: - Allow commands to share the same set of phases, which requires some kind of mapping between the phases (?). - Allow progress reports to become a stack. This would also take care of any kind of issues in 1) for the future, and this can cause the incorrect command to be reported to pg_stat_activity if not being careful. - Allow only reporting for a given command ID, which would basically require to pass down the command ID to progress update APIs and bypass an update if the command ID provided by caller does not match the existing one started (?). 1) is pretty easy to fix based on the current state of the code, 2) requires much more consideration, and that's no material for v12. It could be perfectly possible as well that we have another solution not discussed yet on this thread. [1]: https://www.postgresql.org/message-id/20190905010316.gb14...@paquier.xyz -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
On 2019-Sep-13, Robert Haas wrote: > On Fri, Sep 13, 2019 at 2:49 AM Michael Paquier wrote: > > On Fri, Sep 06, 2019 at 08:10:58AM -0400, Robert Haas wrote: > > > It's fine if things are updated as well -- it's just you need to make > > > sure that those places know whether or not they are supposed to be > > > doing those updates. Again, why can't we just pass down a value > > > telling them "do reindex-style progress updates" or "do cluster-style > > > progress updates" or "do no progress updates"? > > > > That's invasive. CREATE INDEX reporting goes pretty deep into the > > tree, with steps dedicated to the builds and scans of btree (nbtsort.c > > for example) and hash index AMs. > Well, if CREATE INDEX progress reporting can't be reasonably done > within the current infrastructure, then it should be reverted for v12 > and not committed again until somebody upgrades the infrastructure. Ummm ... I've been operating --in this thread-- under the assumption that it is REINDEX to blame for this problem, not CREATE INDEX, because my recollection is that I tested CREATE INDEX together with CLUSTER and it worked fine. Has anybody done any actual research that the problem is to blame on CREATE INDEX and not REINDEX? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
Hello Tatsuro, On 2019-Aug-13, Tatsuro Yamada wrote: > On 2019/08/02 3:43, Alvaro Herrera wrote: > > Hmm, I'm trying this out now and I don't see the index_rebuild_count > > ever go up. I think it's because the indexes are built using parallel > > index build ... or maybe it was the table AM changes that moved things > > around, not sure. There's a period at the end when the CLUSTER command > > keeps working, but it's gone from pg_stat_progress_cluster. > > Thanks for your report. > I'll investigate it. :) I have fixed it. Can you please verify? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
On Fri, Sep 13, 2019 at 12:03 PM Alvaro Herrera wrote: > Ummm ... I've been operating --in this thread-- under the assumption > that it is REINDEX to blame for this problem, not CREATE INDEX, because > my recollection is that I tested CREATE INDEX together with CLUSTER and > it worked fine. Has anybody done any actual research that the problem > is to blame on CREATE INDEX and not REINDEX? I am not sure. I think, though, that the point is that all three commands rebuild indexes. So unless they all expect the same things in terms of which counters get set during that process, things will not work correctly. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] CLUSTER command progress monitor
On Fri, Sep 13, 2019 at 2:49 AM Michael Paquier wrote: > On Fri, Sep 06, 2019 at 08:10:58AM -0400, Robert Haas wrote: > > It's fine if things are updated as well -- it's just you need to make > > sure that those places know whether or not they are supposed to be > > doing those updates. Again, why can't we just pass down a value > > telling them "do reindex-style progress updates" or "do cluster-style > > progress updates" or "do no progress updates"? > > That's invasive. CREATE INDEX reporting goes pretty deep into the > tree, with steps dedicated to the builds and scans of btree (nbtsort.c > for example) and hash index AMs. In this case we have something that > does somewhat what you are looking for with report_progress which gets > set to true only for VACUUM. If we were to do something like that, we > would also need to keep some sort of mapping regarding which command > ID (as defined by ProgressCommandType) is able to use which set of > parameter flags (as defined by the arguments of > pgstat_progress_update_param() and its multi_* cousin). Then comes > the issue that some parameters may be used by multiple command types, > while other don't (REINDEX and CREATE INDEX have some shared > mapping). Well, if CREATE INDEX progress reporting can't be reasonably done within the current infrastructure, then it should be reverted for v12 and not committed again until somebody upgrades the infrastructure. I admit that I was a bit suspicious about that commit, but I figured Alvaro knew what he was doing and didn't study it in any depth. And perhaps he did know what he was doing and will disagree with your assessment. But so far I haven't heard an idea for evolving the infrastructure that sounds more than half-baked. It's generally clear, though, that the existing infrastructure is not well-suited to progress reporting for code that bounces all over the tree. It's not clear that *any* infrastructure is *entirely* well-suited to do that; such problems are inherently complex. But this is just a very simple system which was designed to be simple and very low cost to use, and it may be that it's been stretched outside of its comfort zone. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] CLUSTER command progress monitor
On Fri, Sep 06, 2019 at 08:10:58AM -0400, Robert Haas wrote: > It's fine if things are updated as well -- it's just you need to make > sure that those places know whether or not they are supposed to be > doing those updates. Again, why can't we just pass down a value > telling them "do reindex-style progress updates" or "do cluster-style > progress updates" or "do no progress updates"? That's invasive. CREATE INDEX reporting goes pretty deep into the tree, with steps dedicated to the builds and scans of btree (nbtsort.c for example) and hash index AMs. In this case we have something that does somewhat what you are looking for with report_progress which gets set to true only for VACUUM. If we were to do something like that, we would also need to keep some sort of mapping regarding which command ID (as defined by ProgressCommandType) is able to use which set of parameter flags (as defined by the arguments of pgstat_progress_update_param() and its multi_* cousin). Then comes the issue that some parameters may be used by multiple command types, while other don't (REINDEX and CREATE INDEX have some shared mapping). -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
On Fri, Sep 6, 2019 at 5:11 AM Robert Haas wrote: > On Fri, Sep 6, 2019 at 1:44 AM Michael Paquier wrote: > > I don't see exactly why we could not switch to a fixed number of > > slots, say 8, with one code path to start a progress which adds an > > extra report on the stack, one to remove one entry from the stack, and > > a new one to reset the whole thing for a backend. This would not need > > much restructuration of course. > > You could do that, but I don't think it's probably that great of an > idea. Now you've built something which is significantly more complex > than the original design of this feature, but still not good enough to > report on the progress of a query tree. I tend to think we should > confine ourselves to the progress reporting that can reasonably be > done within the current infrastructure until somebody invents a really > general mechanism that can handle, essentially, an EXPLAIN-on-the-fly > of a current query tree. +1. Let's not complicate the progress reporting infrastructure for an uncertain benefit. CLUSTER/VACUUM FULL is fundamentally an awkward utility command to target with progress reporting infrastructure. I think that it's okay to redefine how progress reporting works with CLUSTER now, in order to fix the REINDEX/CLUSTER state clobbering bug. -- Peter Geoghegan
Re: [HACKERS] CLUSTER command progress monitor
On Fri, Sep 06, 2019 at 10:27:02AM -0400, Alvaro Herrera from 2ndQuadrant wrote: > That said, I did spend some time on this type of issue when doing CREATE > INDEX support; you can tell because I defined the columns for block > numbers in a scan separately from CREATE INDEX specific fields, > precisely to avoid multiple commands running concurrently from > clobbering unrelated columns: > > /* Block numbers in a generic relation scan */ > #define PROGRESS_SCAN_BLOCKS_TOTAL15 > #define PROGRESS_SCAN_BLOCKS_DONE 16 Hm. It is not really clear what is the intention by looking at the contents progress.h. > I would say that it's fairly useful to have CLUSTER report progress on > indexes being created underneath, but I understand that it might be too > late to be designing the CLUSTER report to take advantage of the CREATE > INDEX metrics. The same can be said about the reporting done in reindex_relation for PROGRESS_CLUSTER_INDEX_REBUILD_COUNT. I think that it should be removed for now. > I think a workable, not terribly invasive approach is to have REINDEX > process its commands conditionally: have the caller indicate whether > progress is to be reported, and skip the calls if not. That would > (should) prevent it from clobbering the state set up by CLUSTER. So, you would basically add an extra flag in the options of reindex_index() to decide if a progress report should be started or not? I am not a fan of that because it does not take care of the root issue which is that the start of the progress reports is too much internal. I think that it would be actually less error prone to move the start of the progress reporting for REINDEX out of reindex_index() and start it at a higher level. Looking again at the code, I would recommend that we should start the progress in ReindexIndex() before calling reindex_index(), ReindexMultipleTables() before calling reindex_relation() and ReindexRelationConcurrently(), and ReindexTable() before calling reindex_relation(). That will avoid each command to clobber each other's in-progress reports. It would be also very good to document clearly how the overlaps for the progress parameter values are not happening. For example for CREATE INDEX, we don't know why 1, 2 and 7 are not used. Note that there is an ID collision with PROGRESS_CREATEIDX_INDEX_OID updated in reindex_index() and the CLUSTER part PROGRESS_CLUSTER_HEAP_BLKS_SCANNED.. There could be an argument to use a completely different range of IDs for each command phase to avoid any overlap, but it is scary to consider that we may not have found all the issues with one command cloberring another one's state.. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
On 2019-Sep-06, Michael Paquier wrote: > Finally comes the question of what do we do for v12? I am adding in > CC Peter, Alvaro being already present, who have been involved in the > commits with CREATE INDEX and REINDEX. It would be sad to revert a > this feature, but well I'd rather do that now than regret later > releasing the feature as it is currently shaped.. Let's see what the > others think. As far as I understand, CREATE INDEX is not affected -- only REINDEX is. Of course, it would be sad to revert even the latter, but it's not as bleak as reverting the whole thing. That said, I did spend some time on this type of issue when doing CREATE INDEX support; you can tell because I defined the columns for block numbers in a scan separately from CREATE INDEX specific fields, precisely to avoid multiple commands running concurrently from clobbering unrelated columns: /* Block numbers in a generic relation scan */ #define PROGRESS_SCAN_BLOCKS_TOTAL 15 #define PROGRESS_SCAN_BLOCKS_DONE 16 I would say that it's fairly useful to have CLUSTER report progress on indexes being created underneath, but I understand that it might be too late to be designing the CLUSTER report to take advantage of the CREATE INDEX metrics. I think a workable, not terribly invasive approach is to have REINDEX process its commands conditionally: have the caller indicate whether progress is to be reported, and skip the calls if not. That would (should) prevent it from clobbering the state set up by CLUSTER. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
On Fri, Sep 6, 2019 at 1:44 AM Michael Paquier wrote: > One idea I got was to pass the command ID as an extra argument of the > update routine. I am not completely sure either if we need this level > of complication. I still don't think that's the right approach. > > Those are just weaknesses of the infrastructure. Perhaps there is a > > better solution, but there's no intrinsic reason that we can't avoid > > them by careful coding. > > Perhaps. The current infra allows the addition of a progress report > in code paths which are isolated from other things. For CLUSTER, most > things are fine as long as the progress is updated in cluster_rel(), > the rest is too internal. It's fine if things are updated as well -- it's just you need to make sure that those places know whether or not they are supposed to be doing those updates. Again, why can't we just pass down a value telling them "do reindex-style progress updates" or "do cluster-style progress updates" or "do no progress updates"? > > Well, it might be OK to do that if we're clear that this is the index > > progress-reporting view and the command is CLUSTER but it happens to > > be building an index now so we're showing it here. But I don't see > > how it would work anyway: you can't reported cascading progress > > reports in shared memory because you've only got a fixed amount of > > space. > > I don't see exactly why we could not switch to a fixed number of > slots, say 8, with one code path to start a progress which adds an > extra report on the stack, one to remove one entry from the stack, and > a new one to reset the whole thing for a backend. This would not need > much restructuration of course. You could do that, but I don't think it's probably that great of an idea. Now you've built something which is significantly more complex than the original design of this feature, but still not good enough to report on the progress of a query tree. I tend to think we should confine ourselves to the progress reporting that can reasonably be done within the current infrastructure until somebody invents a really general mechanism that can handle, essentially, an EXPLAIN-on-the-fly of a current query tree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] CLUSTER command progress monitor
On Fri, Sep 06, 2019 at 02:44:18PM +0900, Michael Paquier wrote: > I don't see exactly why we could not switch to a fixed number of > slots, say 8, with one code path to start a progress which adds an > extra report on the stack, one to remove one entry from the stack, and > a new one to reset the whole thing for a backend. This would not need > much restructuration of course. Wake up, Neo. Your last sentence is confusing. I meant that this would need more design efforts, so that's not in scope for v12. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
On Thu, Sep 05, 2019 at 03:17:51PM -0400, Robert Haas wrote: > Oops. Yeah, that's bogus (as are some of the other things you > mention). I think we're going to have to fix this by passing down > some flags to these functions to tell them what kind of progress > updates to do (or to do none). Or else pass down a callback function > and a context object, but that seems like it might be overkill. One idea I got was to pass the command ID as an extra argument of the update routine. I am not completely sure either if we need this level of complication. > Those are just weaknesses of the infrastructure. Perhaps there is a > better solution, but there's no intrinsic reason that we can't avoid > them by careful coding. Perhaps. The current infra allows the addition of a progress report in code paths which are isolated from other things. For CLUSTER, most things are fine as long as the progress is updated in cluster_rel(), the rest is too internal. > Well, it might be OK to do that if we're clear that this is the index > progress-reporting view and the command is CLUSTER but it happens to > be building an index now so we're showing it here. But I don't see > how it would work anyway: you can't reported cascading progress > reports in shared memory because you've only got a fixed amount of > space. I don't see exactly why we could not switch to a fixed number of slots, say 8, with one code path to start a progress which adds an extra report on the stack, one to remove one entry from the stack, and a new one to reset the whole thing for a backend. This would not need much restructuration of course. Finally comes the question of what do we do for v12? I am adding in CC Peter, Alvaro being already present, who have been involved in the commits with CREATE INDEX and REINDEX. It would be sad to revert a this feature, but well I'd rather do that now than regret later releasing the feature as it is currently shaped.. Let's see what the others think. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
On Wed, Sep 4, 2019 at 9:03 PM Michael Paquier wrote: > For CLUSTER, the progress starts and ends in cluster_rel(). CLUSTER > uses its code paths at the beginning, but then things get more > complicated, particularly with finish_heap_swap() which calls directly > reindex_table(). 6f97457 includes one progress update at point which > can be a problem per its shared nature in reindex_relation() with > PROGRESS_CLUSTER_INDEX_REBUILD_COUNT. This last part is wrong IMO, > why should cluster reporting take priority in this code path, > enforcing anything else? Oops. Yeah, that's bogus (as are some of the other things you mention). I think we're going to have to fix this by passing down some flags to these functions to tell them what kind of progress updates to do (or to do none). Or else pass down a callback function and a context object, but that seems like it might be overkill. > On top of those issues, I see some problems with the current state of > affairs, and I am repeating myself: > - It is possible that pgstat_progress_update_param() is called for a > given command for a code path taken by a completely different > command, and that we have a real risk of showing a status completely > buggy as the progress phases share the same numbers. > - We don't consider wisely end and start progress handling for > cascading calls, leading to a risk where we start command A, but > for shared code paths where we assume that only command B can run then > the processing abruptly ends for command A. Those are just weaknesses of the infrastructure. Perhaps there is a better solution, but there's no intrinsic reason that we can't avoid them by careful coding. > - Is it actually fine to report information about a command completely > different than the one provided by the client? It is for example > possible to call CLUSTER, but show up to the user progress report > about PROGRESS_COMMAND_CREATE_INDEX (see reindex_index). This could > actually make sense if we are able to handle cascading progress > reports. Well, it might be OK to do that if we're clear that this is the index progress-reporting view and the command is CLUSTER but it happens to be building an index now so we're showing it here. But I don't see how it would work anyway: you can't reported cascading progress reports in shared memory because you've only got a fixed amount of space. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] CLUSTER command progress monitor
On Wed, Sep 04, 2019 at 09:18:39AM -0400, Robert Haas wrote: > I think this is all going in the wrong direction. It's the > responsibility of the people who are calling the pgstat_progress_* > functions to only do so when it's appropriate. Having the > pgstat_progress_* functions try to untangle whether or not they ought > to ignore calls made to them is backwards. Check. > It seems to me that the problem here can be summarized in this way: > there's a bunch of code reuse in the relevant paths here, and somebody > wasn't careful enough when adding progress reporting for one of the > new commands, and so now things are broken, because e.g. CLUSTER > progress reporting gets ended by a pgstat_progress_end_command() call > that was intended for some other utility command but is reached in the > CLUSTER case anyway. The solution to that problem in my book is to > figure out which commit broke it, and then the person who made that > commit either needs to fix it or revert it. I am not sure that it is right as well to say that the first committed patch is right and that the follow-up ones are wrong. CLUSTER progress was committed first (6f97457), followed a couple of days after by CREATE INDEX (ab0dfc9) and then REINDEX (03f9e5c). So let's have a look at them.. For CLUSTER, the progress starts and ends in cluster_rel(). CLUSTER uses its code paths at the beginning, but then things get more complicated, particularly with finish_heap_swap() which calls directly reindex_table(). 6f97457 includes one progress update at point which can be a problem per its shared nature in reindex_relation() with PROGRESS_CLUSTER_INDEX_REBUILD_COUNT. This last part is wrong IMO, why should cluster reporting take priority in this code path, enforcing anything else? For CREATE INDEX, the progress reporting starts and ends once in DefineIndex(). However, we have updates of progress within each index AM build routine, which could be taken by many code paths. Is it actually fine to give priority to CREATE INDEX in those cases? Those paths can as well be taken by REINDEX or CLUSTER (right?), so having a counter for CREATE INDEX looks logically wrong to me. The part where we wait for snapshots looks actually good from the perspective of REINDEX CONCURRENTLY and CREATE INDEX CONCURRENTLY. For REINDEX, we have a problematic start progress call in reindex_index() which is for example called by reindex_relation for each relation's index for a non-concurrent case (also in ReindexRelationConcurrently()). I think that these are incorrect locations, and I would have placed them in ReindexIndex(), ReindexTable() and ReindexMultipleTables() so as we avoid anything low-level. This has added two calls to pgstat_progress_update_param() in reindex_index(), which is shared between all. Why would it be fine to give the priority to a CREATE INDEX marker here if CLUSTER can also cross this way? On top of those issues, I see some problems with the current state of affairs, and I am repeating myself: - It is possible that pgstat_progress_update_param() is called for a given command for a code path taken by a completely different command, and that we have a real risk of showing a status completely buggy as the progress phases share the same numbers. - We don't consider wisely end and start progress handling for cascading calls, leading to a risk where we start command A, but for shared code paths where we assume that only command B can run then the processing abruptly ends for command A. - Is it actually fine to report information about a command completely different than the one provided by the client? It is for example possible to call CLUSTER, but show up to the user progress report about PROGRESS_COMMAND_CREATE_INDEX (see reindex_index). This could actually make sense if we are able to handle cascading progress reports. These are, at least it seems to me, fundamental problems we need to ponder more about if we begin to include more progress reporting, and we don't have that now, and that worries me. > It's quite possible here that we need a bigger redesign to make adding > progress reporting for new command easier and less prone to bugs, but > I don't think it's at all clear what such a redesign should look like > or even that we definitely need one, and September is not the right > time to be redesigning features for the pending release. Definitely. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
On Tue, Sep 3, 2019 at 1:02 AM Masahiko Sawada wrote: > After more thought, even if we don't start a new command progress when > there is another one already started the progress update functions > could be called and these functions don't specify the command type. > Therefore, the progress information might be changed with wrong value > by different command. Probably we can change the caller of progress > updating function so that it doesn't call all of them if the command > could not start a new progress report, but it might be a big change. > > As an alternative idea, we can make pgstat_progress_end_command() have > one argument that is command the caller wants to end. That is, we > don't end the command progress when the specified command type doesn't > match to the command type of current running command progress. We > unconditionally clear the progress information at CommitTransaction() > and AbortTransaction() but we can do that by passing > PROGRESS_COMMAND_INVALID to pgstat_progress_end_command(). I think this is all going in the wrong direction. It's the responsibility of the people who are calling the pgstat_progress_* functions to only do so when it's appropriate. Having the pgstat_progress_* functions try to untangle whether or not they ought to ignore calls made to them is backwards. It seems to me that the problem here can be summarized in this way: there's a bunch of code reuse in the relevant paths here, and somebody wasn't careful enough when adding progress reporting for one of the new commands, and so now things are broken, because e.g. CLUSTER progress reporting gets ended by a pgstat_progress_end_command() call that was intended for some other utility command but is reached in the CLUSTER case anyway. The solution to that problem in my book is to figure out which commit broke it, and then the person who made that commit either needs to fix it or revert it. It's quite possible here that we need a bigger redesign to make adding progress reporting for new command easier and less prone to bugs, but I don't think it's at all clear what such a redesign should look like or even that we definitely need one, and September is not the right time to be redesigning features for the pending release. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] CLUSTER command progress monitor
On Wed, Sep 4, 2019 at 3:48 PM Michael Paquier wrote: > > On Tue, Sep 03, 2019 at 02:52:28PM +0900, Michael Paquier wrote: > > Indeed, good catch. This is wrong since b6fb647 which has introduced > > the progress reports. I'll fix that one and back-patch if there are > > no objections. > > OK, applied this part down to 9.6. Thank you! Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] CLUSTER command progress monitor
On Tue, Sep 03, 2019 at 02:52:28PM +0900, Michael Paquier wrote: > Indeed, good catch. This is wrong since b6fb647 which has introduced > the progress reports. I'll fix that one and back-patch if there are > no objections. OK, applied this part down to 9.6. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
On Tue, Sep 03, 2019 at 01:59:00PM +0900, Masahiko Sawada wrote: > After more thought, even if we don't start a new command progress when > there is another one already started the progress update functions > could be called and these functions don't specify the command type. > Therefore, the progress information might be changed with wrong value > by different command. Probably we can change the caller of progress > updating function so that it doesn't call all of them if the command > could not start a new progress report, but it might be a big change. That's one issue. > As an alternative idea, we can make pgstat_progress_end_command() have > one argument that is command the caller wants to end. That is, we > don't end the command progress when the specified command type doesn't > match to the command type of current running command progress. We > unconditionally clear the progress information at CommitTransaction() > and AbortTransaction() but we can do that by passing > PROGRESS_COMMAND_INVALID to pgstat_progress_end_command(). Possibly. I don't dislike the idea of piling up the progress information for cascading calls and I would use that with a depth counter and a fixed-size array. > BTW the following condition in pgstat_progress_end_command() seems to > be wrong. We should return from the function when either > pgstat_track_activities is disabled or the current progress command is > invalid. > >if (!pgstat_track_activities > && beentry->st_progress_command == PROGRESS_COMMAND_INVALID) > return; > > I've attached the patch fixes the issue I newly found. Indeed, good catch. This is wrong since b6fb647 which has introduced the progress reports. I'll fix that one and back-patch if there are no objections. With my RMT hat on for v12, I don't think that it is really the moment to discuss how we want to change this API post beta3, and we have room for that in future development cycles. There are quite some questions which need answers I am unsure of: - Do we really want to support nested calls of progress reports for multiple command? - Perhaps for some commands it makes sense to have an overlap of the fields used, but we need a clear definition of what can be done or not. I am not really comfortable with the idea of having in reindex_relation() a progress report related only to CLUSTER, which is also a REINDEX code path. The semantics shared between both commands need to be thought a bit more. For example PROGRESS_CLUSTER_INDEX_REBUILD_COUNT could cause the system catalog to report PROGRESS_CREATEIDX_PHASE_WAIT_3 because of an incorrect command type, which would be just wrong for a CLUSTER command. - Which command should be reported to the user, only the upper-level one? - Perhaps we can live only with the approach of not registering a new command if one already exists, and actually be more flexible with the phase fields used, in short we use unique numbers for each phase? The most conservative bet from a release point of view, and actually my bet because that's safe, would be to basically revert 6f97457 (CLUSTER), 03f9e5c (REINDEX) and ab0dfc96 (CREATE INDEX which has overlaps with REINDEX in the build and validation paths). What I am really scared of is that we have just barely scratched the surface of the issues caused by the inter-dependencies between all the code paths of those commands, and that we have much more waiting behind this open item. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
On Mon, Sep 2, 2019 at 6:32 PM Masahiko Sawada wrote: > > On Mon, Sep 2, 2019 at 4:59 PM Michael Paquier wrote: > > > > On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote: > > > > I tried 1. and it shown index_rebuild_count, but it also shown > > > > "initializing" phase again and other columns were empty. So, it is > > > > not suitable to fix the problem. :( > > > > I'm going to try 2. and 3., but, unfortunately, I can't get enough > > > > time to do that after PGConf.Asia 2019. > > > > If we selected 3., it affects following these progress reporting > > > > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay, > > > > I suppose. Any comments welcome! :) > > > > > > I looked at this open item. I prefer #3 but I think it would be enough > > > to have a small stack using a fixed length array to track nested > > > progress information and the current executing command (maybe 4 or 8 > > > would be enough for maximum nested level for now?). That way, we don't > > > need any change the interfaces. AFAICS there is no case where we call > > > only either pgstat_progress_start_command or > > > pgstat_progress_end_command without each other (although > > > pgstat_progress_update_param is called without start). So I think that > > > having a small stack for tracking multiple reports would work. > > > > > > Attached the draft patch that fixes this issue. Please review it. > > > > Do we actually want to show to the user information about CREATE INDEX > > which is different than CLUSTER? It could be confusing for the user > > to see a progress report from a command different than the one > > actually launched. > > I personally think it would be helpful for users. We provide the > progress reporting for each commands but it might not be detailed > enough. But we can provide more details of progress information of > each commands by combining them. Only users who want to confirm the > details need to see different progress reports. > > > There could be a method 4 here: do not start a new > > command progress when there is another one already started, and do not > > try to end it in the code path where it could not be started as it did > > not stack. So while I see the advantages of stacking the progress > > records as you do when doing cascading calls of the progress > > reporting, I am not sure that: > > 1) We should bloat more PgBackendStatus for that. > > 2) We want to add more complication in this area close to the > > release of 12. > > I agreed, especially to 2. We can live with #4 method in PG12 and the > patch I proposed could be discussed as a new feature. After more thought, even if we don't start a new command progress when there is another one already started the progress update functions could be called and these functions don't specify the command type. Therefore, the progress information might be changed with wrong value by different command. Probably we can change the caller of progress updating function so that it doesn't call all of them if the command could not start a new progress report, but it might be a big change. As an alternative idea, we can make pgstat_progress_end_command() have one argument that is command the caller wants to end. That is, we don't end the command progress when the specified command type doesn't match to the command type of current running command progress. We unconditionally clear the progress information at CommitTransaction() and AbortTransaction() but we can do that by passing PROGRESS_COMMAND_INVALID to pgstat_progress_end_command(). BTW the following condition in pgstat_progress_end_command() seems to be wrong. We should return from the function when either pgstat_track_activities is disabled or the current progress command is invalid. if (!pgstat_track_activities && beentry->st_progress_command == PROGRESS_COMMAND_INVALID) return; I've attached the patch fixes the issue I newly found. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center fix_progress_end_command.patch Description: Binary data
Re: [HACKERS] CLUSTER command progress monitor
On Mon, Sep 2, 2019 at 4:59 PM Michael Paquier wrote: > > On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote: > > > I tried 1. and it shown index_rebuild_count, but it also shown > > > "initializing" phase again and other columns were empty. So, it is > > > not suitable to fix the problem. :( > > > I'm going to try 2. and 3., but, unfortunately, I can't get enough > > > time to do that after PGConf.Asia 2019. > > > If we selected 3., it affects following these progress reporting > > > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay, > > > I suppose. Any comments welcome! :) > > > > I looked at this open item. I prefer #3 but I think it would be enough > > to have a small stack using a fixed length array to track nested > > progress information and the current executing command (maybe 4 or 8 > > would be enough for maximum nested level for now?). That way, we don't > > need any change the interfaces. AFAICS there is no case where we call > > only either pgstat_progress_start_command or > > pgstat_progress_end_command without each other (although > > pgstat_progress_update_param is called without start). So I think that > > having a small stack for tracking multiple reports would work. > > > > Attached the draft patch that fixes this issue. Please review it. > > Do we actually want to show to the user information about CREATE INDEX > which is different than CLUSTER? It could be confusing for the user > to see a progress report from a command different than the one > actually launched. I personally think it would be helpful for users. We provide the progress reporting for each commands but it might not be detailed enough. But we can provide more details of progress information of each commands by combining them. Only users who want to confirm the details need to see different progress reports. > There could be a method 4 here: do not start a new > command progress when there is another one already started, and do not > try to end it in the code path where it could not be started as it did > not stack. So while I see the advantages of stacking the progress > records as you do when doing cascading calls of the progress > reporting, I am not sure that: > 1) We should bloat more PgBackendStatus for that. > 2) We want to add more complication in this area close to the > release of 12. I agreed, especially to 2. We can live with #4 method in PG12 and the patch I proposed could be discussed as a new feature. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
Re: [HACKERS] CLUSTER command progress monitor
On Fri, Aug 30, 2019 at 07:45:57PM +0900, Masahiko Sawada wrote: > > I tried 1. and it shown index_rebuild_count, but it also shown > > "initializing" phase again and other columns were empty. So, it is > > not suitable to fix the problem. :( > > I'm going to try 2. and 3., but, unfortunately, I can't get enough > > time to do that after PGConf.Asia 2019. > > If we selected 3., it affects following these progress reporting > > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay, > > I suppose. Any comments welcome! :) > > I looked at this open item. I prefer #3 but I think it would be enough > to have a small stack using a fixed length array to track nested > progress information and the current executing command (maybe 4 or 8 > would be enough for maximum nested level for now?). That way, we don't > need any change the interfaces. AFAICS there is no case where we call > only either pgstat_progress_start_command or > pgstat_progress_end_command without each other (although > pgstat_progress_update_param is called without start). So I think that > having a small stack for tracking multiple reports would work. > > Attached the draft patch that fixes this issue. Please review it. Do we actually want to show to the user information about CREATE INDEX which is different than CLUSTER? It could be confusing for the user to see a progress report from a command different than the one actually launched. There could be a method 4 here: do not start a new command progress when there is another one already started, and do not try to end it in the code path where it could not be started as it did not stack. So while I see the advantages of stacking the progress records as you do when doing cascading calls of the progress reporting, I am not sure that: 1) We should bloat more PgBackendStatus for that. 2) We want to add more complication in this area close to the release of 12. Another solution as mentioned by Yamada-san is just to start again the progress for the cluster command in reindex_relation() however you got an issue here as reindex_relation() is also called by REINDEX TABLE. I find actually very weird that we have a cluster-related field added for REINDEX, so it seems to me that all the interactions between the code paths of CLUSTER and REINDEX have not been completely thought through. This part has been added by 6f97457 :( -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
On Thu, Aug 15, 2019 at 12:48 PM Tatsuro Yamada wrote: > > Hi Michael, Alvaro and Robert! > > On 2019/08/14 11:52, Michael Paquier wrote: > > On Wed, Aug 14, 2019 at 11:38:01AM +0900, Tatsuro Yamada wrote: > >> On 2019/08/13 14:40, Tatsuro Yamada wrote: > >>> On 2019/08/02 3:43, Alvaro Herrera wrote: > Hmm, I'm trying this out now and I don't see the index_rebuild_count > ever go up. I think it's because the indexes are built using parallel > index build ... or maybe it was the table AM changes that moved things > around, not sure. There's a period at the end when the CLUSTER command > keeps working, but it's gone from pg_stat_progress_cluster. > >>> > >>> Thanks for your report. > >>> I'll investigate it. :) > >> > >> I did "git bisect" and found the commit: > >> > >> 03f9e5cba0ee1633af4abe734504df50af46fbd8 > >> Report progress of REINDEX operations > > > > I am adding an open item for this one. > > -- > > Michael > > > Okay, I checked it on the wiki. > >https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items >- index_rebuild_count in CLUSTER reporting never increments > > To be clear, 03f9e5cb broke CLUSTER progress reporting, but > I investigated little more and share my ideas to fix the problem. > > * Call stack > > cluster_rel >pgstat_progress_start_command(CLUSTER) *A1 >rebuild_relation > finish_heap_swap >reindex_relation > reindex_index >pgstat_progress_start_command(CREATE_INDEX) *B1 >pgstat_progress_end_command() *B2 > pgstat_progress_update_param(INDEX_REBUILD_COUNT, i) <- failed :( >pgstat_progress_end_command() *A2 > >Note > These are sets: >A1 and A2, >B1 and B2 > > > > * Ideas to fix >There are Three options, I guess. > >1. Call pgstat_progress_start_command(CLUSTER) again > before pgstat_progress_update_param(INDEX_REBUILD_COUNT, i). > >2. Add "save and restore" functions for the following two > variables of MyBeentry in pgstat.c. > - st_progress_command > - st_progress_command_target > >3. Use Hash or List to store multiple values for the two > variables in pgstat.c. > > > > I tried 1. and it shown index_rebuild_count, but it also shown > "initializing" phase again and other columns were empty. So, it is > not suitable to fix the problem. :( > I'm going to try 2. and 3., but, unfortunately, I can't get enough > time to do that after PGConf.Asia 2019. > If we selected 3., it affects following these progress reporting > features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay, > I suppose. Any comments welcome! :) I looked at this open item. I prefer #3 but I think it would be enough to have a small stack using a fixed length array to track nested progress information and the current executing command (maybe 4 or 8 would be enough for maximum nested level for now?). That way, we don't need any change the interfaces. AFAICS there is no case where we call only either pgstat_progress_start_command or pgstat_progress_end_command without each other (although pgstat_progress_update_param is called without start). So I think that having a small stack for tracking multiple reports would work. Attached the draft patch that fixes this issue. Please review it. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c index d362e7f7d7..99e4844e6c 100644 --- a/src/backend/postmaster/pgstat.c +++ b/src/backend/postmaster/pgstat.c @@ -3016,8 +3016,10 @@ pgstat_bestart(void) #endif lbeentry.st_state = STATE_UNDEFINED; - lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID; - lbeentry.st_progress_command_target = InvalidOid; + MemSet(&(lbeentry.st_progress_cmds), 0, + sizeof(PgBackendProgressInfo) * PGSTAT_MAX_PROGRESS_INFO); + /* Set invalid command index */ + lbeentry.st_current_cmd = -1; /* * we don't zero st_progress_param here to save cycles; nobody should @@ -3203,10 +3205,22 @@ pgstat_progress_start_command(ProgressCommandType cmdtype, Oid relid) if (!beentry || !pgstat_track_activities) return; + Assert(beentry->st_current_cmd >= -1); + + /* The given command is already started */ + if (beentry->st_progress_cmds[beentry->st_current_cmd].command == cmdtype) + return; + + /* The progress information queue is full */ + if (beentry->st_current_cmd >= PGSTAT_MAX_PROGRESS_INFO - 1) + elog(ERROR, "progress information per backends is full"); + PGSTAT_BEGIN_WRITE_ACTIVITY(beentry); - beentry->st_progress_command = cmdtype; - beentry->st_progress_command_target = relid; - MemSet(&beentry->st_progress_param, 0, sizeof(beentry->st_progress_param)); + beentry->st_current_cmd++; + MemSet(&(beentry->st_pr
Re: [HACKERS] CLUSTER command progress monitor
Hi Michael, Alvaro and Robert! On 2019/08/14 11:52, Michael Paquier wrote: On Wed, Aug 14, 2019 at 11:38:01AM +0900, Tatsuro Yamada wrote: On 2019/08/13 14:40, Tatsuro Yamada wrote: On 2019/08/02 3:43, Alvaro Herrera wrote: Hmm, I'm trying this out now and I don't see the index_rebuild_count ever go up. I think it's because the indexes are built using parallel index build ... or maybe it was the table AM changes that moved things around, not sure. There's a period at the end when the CLUSTER command keeps working, but it's gone from pg_stat_progress_cluster. Thanks for your report. I'll investigate it. :) I did "git bisect" and found the commit: 03f9e5cba0ee1633af4abe734504df50af46fbd8 Report progress of REINDEX operations I am adding an open item for this one. -- Michael Okay, I checked it on the wiki. https://wiki.postgresql.org/wiki/PostgreSQL_12_Open_Items - index_rebuild_count in CLUSTER reporting never increments To be clear, 03f9e5cb broke CLUSTER progress reporting, but I investigated little more and share my ideas to fix the problem. * Call stack cluster_rel pgstat_progress_start_command(CLUSTER) *A1 rebuild_relation finish_heap_swap reindex_relation reindex_index pgstat_progress_start_command(CREATE_INDEX) *B1 pgstat_progress_end_command() *B2 pgstat_progress_update_param(INDEX_REBUILD_COUNT, i) <- failed :( pgstat_progress_end_command() *A2 Note These are sets: A1 and A2, B1 and B2 * Ideas to fix There are Three options, I guess. 1. Call pgstat_progress_start_command(CLUSTER) again before pgstat_progress_update_param(INDEX_REBUILD_COUNT, i). 2. Add "save and restore" functions for the following two variables of MyBeentry in pgstat.c. - st_progress_command - st_progress_command_target 3. Use Hash or List to store multiple values for the two variables in pgstat.c. I tried 1. and it shown index_rebuild_count, but it also shown "initializing" phase again and other columns were empty. So, it is not suitable to fix the problem. :( I'm going to try 2. and 3., but, unfortunately, I can't get enough time to do that after PGConf.Asia 2019. If we selected 3., it affects following these progress reporting features: VACUUM, CLUSTER, CREATE_INDEX and ANALYZE. But it's okay, I suppose. Any comments welcome! :) Thanks, Tatsuro Yamada
Re: [HACKERS] CLUSTER command progress monitor
On Wed, Aug 14, 2019 at 11:38:01AM +0900, Tatsuro Yamada wrote: > On 2019/08/13 14:40, Tatsuro Yamada wrote: >> On 2019/08/02 3:43, Alvaro Herrera wrote: >>> Hmm, I'm trying this out now and I don't see the index_rebuild_count >>> ever go up. I think it's because the indexes are built using parallel >>> index build ... or maybe it was the table AM changes that moved things >>> around, not sure. There's a period at the end when the CLUSTER command >>> keeps working, but it's gone from pg_stat_progress_cluster. >> >> Thanks for your report. >> I'll investigate it. :) > > I did "git bisect" and found the commit: > > 03f9e5cba0ee1633af4abe734504df50af46fbd8 > Report progress of REINDEX operations I am adding an open item for this one. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
Hi Alvaro and All, On 2019/08/13 14:40, Tatsuro Yamada wrote: Hi Alvaro! On 2019/08/02 3:43, Alvaro Herrera wrote: Hmm, I'm trying this out now and I don't see the index_rebuild_count ever go up. I think it's because the indexes are built using parallel index build ... or maybe it was the table AM changes that moved things around, not sure. There's a period at the end when the CLUSTER command keeps working, but it's gone from pg_stat_progress_cluster. Thanks for your report. I'll investigate it. :) I did "git bisect" and found the commit: 03f9e5cba0ee1633af4abe734504df50af46fbd8 Report progress of REINDEX operations In src/backend/catalog/index.c, CLUSTER progress reporting increases index_rebuild_count in reindex_relation() by pgstat_progress_update_param(). However, reindex_relation() calls reindex_index(), and REINDEX progress reporting is existing on the latter function, and it starts pgstat_progress_start_command() pgstat_progress_end_command() for REINDEX progress reporting. Therefore, CLUSTER progress reporting failed to update index_rebuild_count because it made a mistake to update the REINDEX's view, I think. My Idea to fix that is following: - Add a target view name parameter to Progress monitor's API For example: pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT, i). pgstat_progress_update_param(*PROGRESS_CLUSTER_VIEW*, PROGRESS_CLUSTER_INDEX_REBUILD_COUNT, i). However, I'm not sure whether it is able or not because I haven't read the code of the API yet. What do you think about that? :) Thanks, Tatsuro Yamada
Re: [HACKERS] CLUSTER command progress monitor
Hi Alvaro! On 2019/08/02 3:43, Alvaro Herrera wrote: Hmm, I'm trying this out now and I don't see the index_rebuild_count ever go up. I think it's because the indexes are built using parallel index build ... or maybe it was the table AM changes that moved things around, not sure. There's a period at the end when the CLUSTER command keeps working, but it's gone from pg_stat_progress_cluster. Thanks for your report. I'll investigate it. :) Thanks, Tatsuro Yamada
Re: [HACKERS] CLUSTER command progress monitor
Hmm, I'm trying this out now and I don't see the index_rebuild_count ever go up. I think it's because the indexes are built using parallel index build ... or maybe it was the table AM changes that moved things around, not sure. There's a period at the end when the CLUSTER command keeps working, but it's gone from pg_stat_progress_cluster. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
On Tue, Mar 26, 2019 at 10:04:48AM +0900, Tatsuro Yamada wrote: > Hope this feature will help DBA and user. :) Congrats, Yamada-san. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
Hi Robert and Reviewers! On 2019/03/26 0:02, Robert Haas wrote: On Sun, Mar 24, 2019 at 9:02 PM Tatsuro Yamada wrote: Please find attached files. :) Committed. Thanks! Thank you! Hope this feature will help DBA and user. :) Regards, Tatsuro Yamada NTT Open Source Software Center
Re: [HACKERS] CLUSTER command progress monitor
On Sun, Mar 24, 2019 at 9:02 PM Tatsuro Yamada wrote: > Please find attached files. :) Committed. Thanks! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] CLUSTER command progress monitor
Hi Robert! On 2019/03/23 3:31, Robert Haas wrote: On Tue, Mar 19, 2019 at 2:47 PM Robert Haas wrote: how close you were getting to rewriting the entire heap. This is the one thing I found but did not fix; any chance you could make this change and update the documentation to match? Hi, is anybody working on this? I sent this email using my personal email address: yamatattsu@gmail-. I re-send it with the patch and my test result. Thank you so much for reviewing the patch and sorry for the late reply. Today, I realized that you sent the email for the patch because I took a sick leave from work for a while. So, I created new patch based on your comments asap. I hope it is acceptable to you. :) Changes - Add new column *heap_tuples_written* in the view This column is updated when the phases are "seq scanning heap", "index scanning heap" or "writing new heap". - Fix document - Revised the patch on the current head: 940311e4bb32a5fe99155052e41179c88b5d48af. Please find attached files. :) Regards, Tatsuro Yamada diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index ac2721c..26a6899 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -340,7 +340,15 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser pg_stat_progress_vacuumpg_stat_progress_vacuum One row for each backend (including autovacuum worker processes) running VACUUM, showing current progress. - See . + See . + + + + + pg_stat_progress_clusterpg_stat_progress_cluster + One row for each backend running + CLUSTER or VACUUM FULL, showing current progress. + See . @@ -3394,9 +3402,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, PostgreSQL has the ability to report the progress of - certain commands during command execution. Currently, the only command - which supports progress reporting is VACUUM. This may be - expanded in the future. + certain commands during command execution. Currently, the only commands + which support progress reporting are VACUUM and + CLUSTER. This may be expanded in the future. @@ -3408,9 +3416,11 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, one row for each backend (including autovacuum worker processes) that is currently vacuuming. The tables below describe the information that will be reported and provide information about how to interpret it. - Progress reporting is not currently supported for VACUUM FULL - and backends running VACUUM FULL will not be listed in this - view. + Progress for VACUUM FULL commands is reported via + pg_stat_progress_cluster + because both VACUUM FULL and CLUSTER + rewrite the table, while regular VACUUM only modifies it + in place. See . @@ -3588,6 +3598,183 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, + + + CLUSTER Progress Reporting + + + Whenever CLUSTER or VACUUM FULL is + running, the pg_stat_progress_cluster view will + contain a row for each backend that is currently running either command. + The tables below describe the information that will be reported and + provide information about how to interpret it. + + + + pg_stat_progress_cluster View + + + + Column + Type + Description + + + + + + pid + integer + Process ID of backend. + + + datid + oid + OID of the database to which this backend is connected. + + + datname + name + Name of the database to which this backend is connected. + + + relid + oid + OID of the table being clustered. + + + command + text + + The command that is running. Either CLUSTER or VACUUM FULL. + + + + phase + text + + Current processing phase. See . + + + + cluster_index_relid + bigint + + If the table is being scanned using an index, this is the OID of the + index being used; otherwise, it is zero. + + + + heap_tuples_scanned + bigint + + Number of heap tuples scanned. + This counter only advances when the phase is + seq scanning heap, + index scanning heap + or writing new heap. + + + + heap_tuples_written + bigint + + Number of heap tuples written. + This counter only advances when the phase is + seq scanning heap, + index scanning heap + or writing new heap. + + + + heap_blks_total + bigint + + Total number of heap blocks in the table. This number is reported + as of the beginning of seq scanning heap. + + + + heap_blks_scanned + bigint + + Number of heap blocks scanned. This counter only advances when the + phase is seq scanning heap. +
Re: [HACKERS] CLUSTER command progress monitor
Hi Robert! >On Tue, Mar 19, 2019 at 2:47 PM Robert Haas wrote: >> how close you were getting to rewriting the entire heap. This is the >> one thing I found but did not fix; any chance you could make this >> change and update the documentation to match? > > >Hi, is anybody working on this? Thank you so much for reviewing the patch and sorry for the late reply. Today, I realized that you sent the email for the patch because I took a sick leave from work for a while. So, I created new patch based on your comments asap. I hope it is acceptable to you. :) Please find attached file. Changes - Add new column *heap_tuples_written* in the view This column is updated when the phases are "seq scanning heap", "index scanning heap" or "writing new heap". - Fix document - Revised the patch on 280a408b48d5ee42969f981bceb9e9426c3a344c Regards, Tatsuro Yamada progress_monitor_for_cluster_command_v12.patch Description: Binary data
Re: [HACKERS] CLUSTER command progress monitor
On Tue, Mar 19, 2019 at 2:47 PM Robert Haas wrote: > how close you were getting to rewriting the entire heap. This is the > one thing I found but did not fix; any chance you could make this > change and update the documentation to match? Hi, is anybody working on this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] CLUSTER command progress monitor
On Mon, Mar 18, 2019 at 10:03 PM Tatsuro Yamada wrote: > Attached patch is a rebased document patch. :) Attached is an updated patch. I went through this patch carefully today, in the hopes of committing it, and I think the attached version is pretty closet to being committable, but there's at least one open issue remaining, as described below. - The regression tests did not pass because expected/rules.out was not properly updated. I fixed that. - The documentation did not build because some tags were not properly terminated e.g. rather than . I also fixed that. - The documentation had two nearly-identical lists of phases. I merged them into one. There might be room for some further fine-tuning here. - cluster_rel() had multiple places where it could return without calling pgstat_progress_end_command(). I fixed that. - cluster_rel() inexplicably delayed updating PROGRESS_CLUSTER_COMMAND for longer than seems necessary. I fixed that. - copy_heap_data() zeroed out the heap-tuples-scanned, heap-blocks-scanned, and total-heap-blocks counters when it began PROGRESS_CLUSTER_PHASE_SORT_TUPLES and PROGRESS_CLUSTER_PHASE_SWAP_REL_FILES. This seems like throwing away useful information for no good reason. I changed it not to do that in all cases except the one mentioned in the next paragraph. - It *is* currently to reset PROGRESS_CLUSTER_HEAP_TUPLES_SCANNED because that counter gets reused to indicate the number of heap tuples *written back out*, but I think that is bad design for two reasons. First, the documentation does not explain that sometimes the number of heap tuples scanned is really reporting the number of heap tuples written. Second, it's bad for columns to have misleading names. Third, it would actually be really useful to store these values in separate columns, because then you could expect that the number tuples written would eventually equal the number scanned, and you'd still have the number that were scanned around so that you could clearly see how close you were getting to rewriting the entire heap. This is the one thing I found but did not fix; any chance you could make this change and update the documentation to match? - The comment about reporting that we are now reindexing relations was jammed in between an existing comment and the associated code. I moved it to a more logical place. - The new if-statement in pg_stat_get_progress_info was missing a space required by project style. I added the space. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company progress_monitor_for_cluster_command_v11.patch Description: Binary data
Re: [HACKERS] CLUSTER command progress monitor
At Tue, 19 Mar 2019 11:02:57 +0900, Tatsuro Yamada wrote in > On 2019/03/19 10:43, Tatsuro Yamada wrote: > > Hi Rafia! > > On 2019/03/18 20:42, Rafia Sabih wrote: > >> On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada > >> wrote: > >>> Attached file is rebased patch on current HEAD. > >>> I changed a status. :) > >>> > >>> > >> Looks like the patch needs a rebase. > >> I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2 > >> > >> PFA reject file in case you want to have a look. > > Thanks for testing it. :) > > I rebased the patch on the current head: > > f2004f19ed9c9228d3ea2b12379ccb4b9212641f. > > Please find attached file. > > Also, I share my test case of progress monitor below. > > > Attached patch is a rebased document patch. :) The monitor view has four columns: heap_tuples_scanned : used while the "seq scan" and "index scan" phases. heap_blks_total, heap_blks_scanned : used only while the "seq scan" phase. index_rebuild_count: : used only while the "rebuilding index" phase. Couldn't we change the view like the following? .. | phase| heap tuples scanned | progress indicator | value ..-++-+-+ .. | seq scan ..| 2134214| heap blocks pct | 23.5% .. | index sc ..| 5453395| (null) | (null) .. | sorting ..| | (null) | (null) .. | swapping ..| | (null) | (null) .. | rebuildi ..| | index count | 2 .. | performi ..| | (null) | (null) Only seq scan phase has two kind of progress indicator so if we choose "heap blks pct" as the only indicator for the phase, it could be simplified as: .. | phase | progress indicator | value ..-+-+-+ .. | seq scan .. | heap blocks pct | 23.5% .. | index sc .. | heap tuples scanned | 2398457 .. | sorting .. | (null) | (null) .. | swapping .. | (null) | (null) .. | rebuildi .. | index count | 2 .. | performi .. | (null) | (null) A downside of the view is that it looks quite differently from pg_stat_progress_vacuum. Since I'm not sure it's a good design, feel free to oppose/reject this. finish_heap_swap is also called in matview code path but the function doesn't seem to detect that situation. Is it right behavior? (I didn't confirm what happens when it is called from matview refresh path, though.) regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: [HACKERS] CLUSTER command progress monitor
On 2019/03/19 10:43, Tatsuro Yamada wrote: Hi Rafia! On 2019/03/18 20:42, Rafia Sabih wrote: On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada wrote: Attached file is rebased patch on current HEAD. I changed a status. :) Looks like the patch needs a rebase. I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2 PFA reject file in case you want to have a look. Thanks for testing it. :) I rebased the patch on the current head: f2004f19ed9c9228d3ea2b12379ccb4b9212641f. Please find attached file. Also, I share my test case of progress monitor below. Attached patch is a rebased document patch. :) Thanks, Tatsuro Yamada diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index ac2721c8ad..79d98bb601 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -344,6 +344,14 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + pg_stat_progress_clusterpg_stat_progress_cluster + One row for each backend running + CLUSTER or VACUUM FULL, showing current progress. + See . + + + @@ -3394,9 +3402,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, PostgreSQL has the ability to report the progress of - certain commands during command execution. Currently, the only command - which supports progress reporting is VACUUM. This may be - expanded in the future. + certain commands during command execution. Currently, the only commands which + support progress reporting are VACUUM and + CLUSTER. This may be expanded in the future. @@ -3408,9 +3416,10 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, one row for each backend (including autovacuum worker processes) that is currently vacuuming. The tables below describe the information that will be reported and provide information about how to interpret it. - Progress reporting is not currently supported for VACUUM FULL - and backends running VACUUM FULL will not be listed in this - view. + Running VACUUM FULL is listed in pg_stat_progress_cluster + because both VACUUM FULL and CLUSTER + rewrite the table, while regular VACUUM only modifies it + in place. See . @@ -3587,6 +3596,218 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, + + + + CLUSTER Progress Reporting + + + Whenever CLUSTER is running, the + pg_stat_progress_cluster view will contain + a row for each backend that is currently running CLUSTER or VACUUM FULL. + The tables below describe the information that will be reported and + provide information about how to interpret it. + + + + pg_stat_progress_cluster View + + + + Column + Type + Description + + + + + + pid + integer + Process ID of backend. + + + datid + oid + OID of the database to which this backend is connected. + + + datname + name + Name of the database to which this backend is connected. + + + relid + oid + OID of the table being clustered. + + + command + text + + The command that is running. Either CLUSTER or VACUUM FULL. + + + + phase + text + + Current processing phase. See or . + + + + cluster_index_relid + bigint + + If the table is being scanned using an index, this is the OID of the + index being used; otherwise, it is zero. + + + + heap_tuples_scanned + bigint + + Number of heap tuples scanned. + This counter only advances when the phase is seq scanning heap, + index scanning heap and writing new heap. + + + + heap_blks_total + bigint + + Total number of heap blocks in the table. This number is reported + as of the beginning of seq scanning heap. + + + + heap_blks_scanned + bigint + + Number of heap blocks scanned. + This counter only advances when the phase is seq scanning heap. + + + + index_rebuild_count + bigint + + Number of rebuilded indexes. + This counter only advances when the phase is rebuilding index. + + + + + + + + CLUSTER phases + + + + Phase + Description + + + + + + initializing + + CLUSTER is preparing to begin scanning the heap. This + phase is expected to be very brief. + + + + seq scanning heap + + CLUSTER is currently scanning heap from the table by + seq scan. + + + + index scanning heap + + CLUSTER is currently scanning heap from the table by + index scan. + + + + sorting tuples + + CLUSTER is currently sorting tuples. + + + + swapping relation files + + CLUSTER is currently
Re: [HACKERS] CLUSTER command progress monitor
Hi Rafia! On 2019/03/18 20:42, Rafia Sabih wrote: On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada wrote: Attached file is rebased patch on current HEAD. I changed a status. :) Looks like the patch needs a rebase. I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2 PFA reject file in case you want to have a look. Thanks for testing it. :) I rebased the patch on the current head: f2004f19ed9c9228d3ea2b12379ccb4b9212641f. Please find attached file. Also, I share my test case of progress monitor below. === My test case === [Terminal1] Run this query on psql: \a \t select * from pg_stat_progress_cluster; \watch 0.05 [Terminal2] Run these queries on psql: drop table t1; create table t1 as select a, random() * 1000 as b from generate_series(0, 99) a; create index idx_t1 on t1(a); create index idx_t1_b on t1(b); analyze t1; -- index scan set enable_seqscan to off; cluster verbose t1 using idx_t1; -- seq scan set enable_seqscan to on; set enable_indexscan to off; cluster verbose t1 using idx_t1; -- only given table name to cluster command cluster verbose t1; -- only cluster command cluster verbose; -- vacuum full vacuum full t1; -- vacuum full vacuum full; Regards, Tatsuro Yamada diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index c339a2bb77..8a634dd57e 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -52,6 +52,7 @@ #include "catalog/storage.h" #include "commands/tablecmds.h" #include "commands/event_trigger.h" +#include "commands/progress.h" #include "commands/trigger.h" #include "executor/executor.h" #include "miscadmin.h" @@ -59,6 +60,7 @@ #include "nodes/nodeFuncs.h" #include "optimizer/optimizer.h" #include "parser/parser.h" +#include "pgstat.h" #include "rewrite/rewriteManip.h" #include "storage/bufmgr.h" #include "storage/lmgr.h" @@ -3851,6 +3853,7 @@ reindex_relation(Oid relid, int flags, int options) List *indexIds; bool is_pg_class; bool result; + int i; /* * Open and lock the relation. ShareLock is sufficient since we only need @@ -3938,6 +3941,7 @@ reindex_relation(Oid relid, int flags, int options) /* Reindex all the indexes. */ doneIndexes = NIL; + i = 1; foreach(indexId, indexIds) { Oid indexOid = lfirst_oid(indexId); @@ -3955,6 +3959,11 @@ reindex_relation(Oid relid, int flags, int options) if (is_pg_class) doneIndexes = lappend_oid(doneIndexes, indexOid); + + /* Set index rebuild count */ + pgstat_progress_update_param(PROGRESS_CLUSTER_INDEX_REBUILD_COUNT, + i); + i++; } } PG_CATCH(); diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index d962648bc5..87c0092787 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -907,6 +907,32 @@ CREATE VIEW pg_stat_progress_vacuum AS FROM pg_stat_get_progress_info('VACUUM') AS S LEFT JOIN pg_database D ON S.datid = D.oid; +CREATE VIEW pg_stat_progress_cluster AS +SELECT +S.pid AS pid, +S.datid AS datid, +D.datname AS datname, +S.relid AS relid, +CASE S.param1 WHEN 1 THEN 'CLUSTER' + WHEN 2 THEN 'VACUUM FULL' + END AS command, +CASE S.param2 WHEN 0 THEN 'initializing' + WHEN 1 THEN 'seq scanning heap' + WHEN 2 THEN 'index scanning heap' + WHEN 3 THEN 'sorting tuples' + WHEN 4 THEN 'writing new heap' + WHEN 5 THEN 'swapping relation files' + WHEN 6 THEN 'rebuilding index' + WHEN 7 THEN 'performing final cleanup' + END AS phase, +S.param3 AS cluster_index_relid, +S.param4 AS heap_tuples_scanned, +S.param5 AS heap_blks_total, +S.param6 AS heap_blks_scanned, +S.param7 AS index_rebuild_count +FROM pg_stat_get_progress_info('CLUSTER') AS S +LEFT JOIN pg_database D ON S.datid = D.oid; + CREATE VIEW pg_user_mappings AS SELECT U.oid AS umid, diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index 3e2a807640..478894c869 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -36,10 +36,12 @@ #include "catalog/objectaccess.h" #include "catalog/toasting.h" #include "commands/cluster.h" +#include "commands/progress.h" #include "commands/tablecmds.h" #include "commands/vacuum.h" #include "miscadmin.h" #include "optimizer/optimizer.h" +#include "pgstat.h" #include "storage/bufmgr.h" #include "storage/lmgr.h" #include "storage/predicate.h" @@ -276,6 +278,8 @@ cluster_rel(Oid tableOid, Oid indexOid, int options) /* Check for user-requested abort. */ CHECK_FOR_INTERRUPTS(); + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid); + /* * We grab exclusive access t
Re: [HACKERS] CLUSTER command progress monitor
On Fri, 8 Mar 2019 at 09:14, Tatsuro Yamada wrote: > > On 2019/03/06 15:38, Tatsuro Yamada wrote: > > On 2019/03/05 17:56, Tatsuro Yamada wrote: > >> On 2019/03/05 11:35, Robert Haas wrote: > >>> On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada > >>> wrote: > === Current design === > > CLUSTER command uses Index Scan or Seq Scan when scanning the heap. > Depending on which one is chosen, the command will proceed in the > following sequence of phases: > > * Scan method: Seq Scan > 0. initializing (*2) > 1. seq scanning heap(*1) > 3. sorting tuples (*2) > 4. writing new heap (*1) > 5. swapping relation files (*2) > 6. rebuilding index (*2) > 7. performing final cleanup (*2) > > * Scan method: Index Scan > 0. initializing (*2) > 2. index scanning heap (*1) > 5. swapping relation files (*2) > 6. rebuilding index (*2) > 7. performing final cleanup (*2) > > VACUUM FULL command will proceed in the following sequence of phases: > > 1. seq scanning heap(*1) > 5. swapping relation files (*2) > 6. rebuilding index (*2) > 7. performing final cleanup (*2) > > (*1): increasing the value in heap_tuples_scanned column > (*2): only shows the phase in the phase column > >>> > >>> All of that sounds good. > >>> > The view provides the information of CLUSTER command progress details as > follows > # \d pg_stat_progress_cluster > View "pg_catalog.pg_stat_progress_cluster" > Column | Type | Collation | Nullable | Default > ---+-+---+--+- > pid | integer | | | > datid | oid | | | > datname | name| | | > relid | oid | | | > command | text| | | > phase | text| | | > cluster_index_relid | bigint | | | > heap_tuples_scanned | bigint | | | > heap_tuples_vacuumed | bigint | | | > >>> > >>> Still not sure if we need heap_tuples_vacuumed. We could try to > >>> report heap_blks_scanned and heap_blks_total like we do for VACUUM, if > >>> we're using a Seq Scan. > >> > >> I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that > >> in > >> next patch. > >> > >> Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able > >> to > >> get those from initscan(). I'll investigate it more. > >> > >> cluster.c > >>copy_heap_data() > >> heap_beginscan() > >>heap_beginscan_internal() > >> initscan() > >> > >> > >> > === Discussion points === > > - Progress counter for "3. sorting tuples" phase > - Should we add pgstat_progress_update_param() in tuplesort.c like > a > "trace_sort"? > Thanks to Peter Geoghegan for the useful advice! > >>> > >>> How would we avoid an abstraction violation? > >> > >> Hmm... What do you mean an abstraction violation? > >> If it is difficult to solve, I'd not like to add the progress counter for > >> the sorting tuples. > >> > >> > - Progress counter for "6. rebuilding index" phase > - Should we add "index_vacuum_count" in the view like a vacuum > progress monitor? > If yes, I'll add pgstat_progress_update_param() to > reindex_relation() of index.c. > However, I'm not sure whether it is okay or not. > >>> > >>> Doesn't seem unreasonable to me. > >> > >> I see, I'll add it later. > > > > > > Attached file is revised and WIP patch including: > > > >- Remove heap_tuples_vacuumed > >- Add heap_blks_scanned and heap_blks_total > >- Add index_vacuum_count > > > > I tried to "add heap_blks_scanned and heap_blks_total" columns and I > > realized that > > "heap_tuples_scanned" column is suitable as a counter when a scan method is > > both index-scan and seq-scan because CLUSTER is on a tuple basis. > > > Attached file is rebased patch on current HEAD. > I changed a status. :) > > Looks like the patch needs a rebase. I was on the commit fb5806533f9fe0433290d84c9b019399cd69e9c2 PFA reject file in case you want to have a look. > Regards, > Tatsuro Yamada > > > -- Regards, Rafia Sabih --- src/backend/commands/cluster.c +++ src/backend/commands/cluster.c @@ -942,14 +960,33 @@ copy_heap_data(Oid OIDNewHeap, Oid OIDOld
Re: [HACKERS] CLUSTER command progress monitor
On 2019/03/06 15:38, Tatsuro Yamada wrote: On 2019/03/05 17:56, Tatsuro Yamada wrote: On 2019/03/05 11:35, Robert Haas wrote: On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada wrote: === Current design === CLUSTER command uses Index Scan or Seq Scan when scanning the heap. Depending on which one is chosen, the command will proceed in the following sequence of phases: * Scan method: Seq Scan 0. initializing (*2) 1. seq scanning heap (*1) 3. sorting tuples (*2) 4. writing new heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) * Scan method: Index Scan 0. initializing (*2) 2. index scanning heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) VACUUM FULL command will proceed in the following sequence of phases: 1. seq scanning heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) (*1): increasing the value in heap_tuples_scanned column (*2): only shows the phase in the phase column All of that sounds good. The view provides the information of CLUSTER command progress details as follows # \d pg_stat_progress_cluster View "pg_catalog.pg_stat_progress_cluster" Column | Type | Collation | Nullable | Default ---+-+---+--+- pid | integer | | | datid | oid | | | datname | name | | | relid | oid | | | command | text | | | phase | text | | | cluster_index_relid | bigint | | | heap_tuples_scanned | bigint | | | heap_tuples_vacuumed | bigint | | | Still not sure if we need heap_tuples_vacuumed. We could try to report heap_blks_scanned and heap_blks_total like we do for VACUUM, if we're using a Seq Scan. I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in next patch. Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to get those from initscan(). I'll investigate it more. cluster.c copy_heap_data() heap_beginscan() heap_beginscan_internal() initscan() === Discussion points === - Progress counter for "3. sorting tuples" phase - Should we add pgstat_progress_update_param() in tuplesort.c like a "trace_sort"? Thanks to Peter Geoghegan for the useful advice! How would we avoid an abstraction violation? Hmm... What do you mean an abstraction violation? If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples. - Progress counter for "6. rebuilding index" phase - Should we add "index_vacuum_count" in the view like a vacuum progress monitor? If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c. However, I'm not sure whether it is okay or not. Doesn't seem unreasonable to me. I see, I'll add it later. Attached file is revised and WIP patch including: - Remove heap_tuples_vacuumed - Add heap_blks_scanned and heap_blks_total - Add index_vacuum_count I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized that "heap_tuples_scanned" column is suitable as a counter when a scan method is both index-scan and seq-scan because CLUSTER is on a tuple basis. Attached file is rebased patch on current HEAD. I changed a status. :) Regards, Tatsuro Yamada diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 1ee1ed2894..acda12bf52 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -51,6 +51,7 @@ #include "catalog/storage.h" #include "commands/tablecmds.h" #include "commands/event_trigger.h" +#include "commands/progress.h" #include "commands/trigger.h" #include "executor/executor.h" #include "miscadmin.h" @@ -58,6 +59,7 @@ #include "nodes/nodeFuncs.h" #include "optimizer/optimizer.h" #include "parser/parser.h" +#include "pgstat.h" #include "rewrite/rewriteManip.h" #include "storage/bufmgr.h" #include "storage/lmgr.h" @@ -3851,6 +3853,7 @@ reindex_relation(Oid relid, int flags, int options) List *indexIds; bool is_pg_class; bool result; + int i; /* * Open and lock the relation. ShareLock is sufficient since we only need @@ -3938,6 +3941,7 @@ reindex_relation(Oid relid, int flags, int options) /* Reindex all the indexes. */ doneIndexes = NIL; + i = 1; foreach
Re: [HACKERS] CLUSTER command progress monitor
On 2019-Mar-06, Robert Haas wrote: > On Tue, Mar 5, 2019 at 8:03 PM Alvaro Herrera > wrote: > > One, err, small issue with that idea is that we need the param numbers > > not to conflict for any "progress update providers" that are to be used > > simultaneously by any command. > > Is that really an issue? I think progress reporting -- at least with > the current infrastructure -- is only ever going to be possible for > utility commands, not queries. And those really shouldn't have very > many sorts going on at once. Well, I don't think it is, but I thought it was worth pointing out explicitly. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
On Tue, Mar 5, 2019 at 8:03 PM Alvaro Herrera wrote: > One, err, small issue with that idea is that we need the param numbers > not to conflict for any "progress update providers" that are to be used > simultaneously by any command. Is that really an issue? I think progress reporting -- at least with the current infrastructure -- is only ever going to be possible for utility commands, not queries. And those really shouldn't have very many sorts going on at once. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] CLUSTER command progress monitor
On 2019/03/05 17:56, Tatsuro Yamada wrote: Hi Robert! On 2019/03/05 11:35, Robert Haas wrote: On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada wrote: === Current design === CLUSTER command uses Index Scan or Seq Scan when scanning the heap. Depending on which one is chosen, the command will proceed in the following sequence of phases: * Scan method: Seq Scan 0. initializing (*2) 1. seq scanning heap (*1) 3. sorting tuples (*2) 4. writing new heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) * Scan method: Index Scan 0. initializing (*2) 2. index scanning heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) VACUUM FULL command will proceed in the following sequence of phases: 1. seq scanning heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) (*1): increasing the value in heap_tuples_scanned column (*2): only shows the phase in the phase column All of that sounds good. The view provides the information of CLUSTER command progress details as follows # \d pg_stat_progress_cluster View "pg_catalog.pg_stat_progress_cluster" Column | Type | Collation | Nullable | Default ---+-+---+--+- pid | integer | | | datid | oid | | | datname | name | | | relid | oid | | | command | text | | | phase | text | | | cluster_index_relid | bigint | | | heap_tuples_scanned | bigint | | | heap_tuples_vacuumed | bigint | | | Still not sure if we need heap_tuples_vacuumed. We could try to report heap_blks_scanned and heap_blks_total like we do for VACUUM, if we're using a Seq Scan. I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in next patch. Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to get those from initscan(). I'll investigate it more. cluster.c copy_heap_data() heap_beginscan() heap_beginscan_internal() initscan() === Discussion points === - Progress counter for "3. sorting tuples" phase - Should we add pgstat_progress_update_param() in tuplesort.c like a "trace_sort"? Thanks to Peter Geoghegan for the useful advice! How would we avoid an abstraction violation? Hmm... What do you mean an abstraction violation? If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples. - Progress counter for "6. rebuilding index" phase - Should we add "index_vacuum_count" in the view like a vacuum progress monitor? If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c. However, I'm not sure whether it is okay or not. Doesn't seem unreasonable to me. I see, I'll add it later. Attached file is revised and WIP patch including: - Remove heap_tuples_vacuumed - Add heap_blks_scanned and heap_blks_total - Add index_vacuum_count I tried to "add heap_blks_scanned and heap_blks_total" columns and I realized that "heap_tuples_scanned" column is suitable as a counter when a scan method is both index-scan and seq-scan because CLUSTER is on a tuple basis. Regards, Tatsuro Yamada diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index d16c3d0ea5..a88e8d2492 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -51,6 +51,7 @@ #include "catalog/storage.h" #include "commands/tablecmds.h" #include "commands/event_trigger.h" +#include "commands/progress.h" #include "commands/trigger.h" #include "executor/executor.h" #include "miscadmin.h" @@ -58,6 +59,7 @@ #include "nodes/nodeFuncs.h" #include "optimizer/optimizer.h" #include "parser/parser.h" +#include "pgstat.h" #include "rewrite/rewriteManip.h" #include "storage/bufmgr.h" #include "storage/lmgr.h" @@ -3850,6 +3852,7 @@ reindex_relation(Oid relid, int flags, int options) List *indexIds; bool is_pg_class; bool result; + int i; /* * Open and lock the relation. ShareLock is sufficient since we only need @@ -3937,6 +3940,7 @@ reindex_relation(Oid relid, int flags, int options) /* Reindex all the indexes. */ doneIndexes = NIL; + i = 1; foreach(indexId, indexIds) { Oid indexOid = lfirst_oid(indexId); @@ -3954,6 +3958,11 @@ reindex_relation(Oi
Re: [HACKERS] CLUSTER command progress monitor
On 2019/03/06 1:13, Robert Haas wrote: On Tue, Mar 5, 2019 at 3:56 AM Tatsuro Yamada wrote: === Discussion points === - Progress counter for "3. sorting tuples" phase - Should we add pgstat_progress_update_param() in tuplesort.c like a "trace_sort"? Thanks to Peter Geoghegan for the useful advice! How would we avoid an abstraction violation? Hmm... What do you mean an abstraction violation? If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples. What I mean is... I think it would be useful to have this counter, but I'm not sure how the tuplesort code would know to update the counter in this case and not in other cases. The tuplesort code is used for lots of things; we can't update a counter for CLUSTER if the tuplesort is being used for CREATE INDEX or a Sort node in a query or whatever. So my question is how we would indicate to the tuplesort that it needs to do the counter update, and whether that would end up making for ugly code. Thanks for your explanation! I understood that now. I guess it means an API to get a progress for sort processing, so let us just put it aside now. I'd like to leave that as is for an appropriate person. Regards, Tatsuro Yamada
Re: [HACKERS] CLUSTER command progress monitor
On 2019-Mar-04, Robert Haas wrote: > On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada > wrote: > > === Discussion points === > > > > - Progress counter for "3. sorting tuples" phase > > - Should we add pgstat_progress_update_param() in tuplesort.c like a > >"trace_sort"? > >Thanks to Peter Geoghegan for the useful advice! > > How would we avoid an abstraction violation? The theory embodied in my patch at https://postgr.es/m/20190304204607.GA15946@alvherre.pgsql is that we don't; tuplesort.c functions (index.c's IndexBuildHeapScan in my case) would get a boolean parameter to indicate whether to update some params or not -- the param number(s) to update are supposed to be generic in the sense that it's not part of any individual command's implementation (PROGRESS_SCAN_BLOCKS_DONE for what you call "blks scanned", PROGRESS_SCAN_BLOCKS_TOTAL for "blks total"), but rather defined by the "progress update provider" (index.c or tuplesort.c). One, err, small issue with that idea is that we need the param numbers not to conflict for any "progress update providers" that are to be used simultaneously by any command. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
On Tue, Mar 5, 2019 at 3:56 AM Tatsuro Yamada wrote: > >> === Discussion points === > >> > >>- Progress counter for "3. sorting tuples" phase > >> - Should we add pgstat_progress_update_param() in tuplesort.c like a > >> "trace_sort"? > >> Thanks to Peter Geoghegan for the useful advice! > > > > How would we avoid an abstraction violation? > > Hmm... What do you mean an abstraction violation? > If it is difficult to solve, I'd not like to add the progress counter for the > sorting tuples. What I mean is... I think it would be useful to have this counter, but I'm not sure how the tuplesort code would know to update the counter in this case and not in other cases. The tuplesort code is used for lots of things; we can't update a counter for CLUSTER if the tuplesort is being used for CREATE INDEX or a Sort node in a query or whatever. So my question is how we would indicate to the tuplesort that it needs to do the counter update, and whether that would end up making for ugly code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] CLUSTER command progress monitor
Hi Robert! On 2019/03/05 11:35, Robert Haas wrote: On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada wrote: === Current design === CLUSTER command uses Index Scan or Seq Scan when scanning the heap. Depending on which one is chosen, the command will proceed in the following sequence of phases: * Scan method: Seq Scan 0. initializing (*2) 1. seq scanning heap(*1) 3. sorting tuples (*2) 4. writing new heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) * Scan method: Index Scan 0. initializing (*2) 2. index scanning heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) VACUUM FULL command will proceed in the following sequence of phases: 1. seq scanning heap(*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) (*1): increasing the value in heap_tuples_scanned column (*2): only shows the phase in the phase column All of that sounds good. The view provides the information of CLUSTER command progress details as follows # \d pg_stat_progress_cluster View "pg_catalog.pg_stat_progress_cluster" Column | Type | Collation | Nullable | Default ---+-+---+--+- pid | integer | | | datid | oid | | | datname | name| | | relid | oid | | | command | text| | | phase | text| | | cluster_index_relid | bigint | | | heap_tuples_scanned | bigint | | | heap_tuples_vacuumed | bigint | | | Still not sure if we need heap_tuples_vacuumed. We could try to report heap_blks_scanned and heap_blks_total like we do for VACUUM, if we're using a Seq Scan. I have no strong opinion to add heap_tuples_vacuumed, so I'll remove that in next patch. Regarding heap_blks_scanned and heap_blks_total, I suppose that it is able to get those from initscan(). I'll investigate it more. cluster.c copy_heap_data() heap_beginscan() heap_beginscan_internal() initscan() === Discussion points === - Progress counter for "3. sorting tuples" phase - Should we add pgstat_progress_update_param() in tuplesort.c like a "trace_sort"? Thanks to Peter Geoghegan for the useful advice! How would we avoid an abstraction violation? Hmm... What do you mean an abstraction violation? If it is difficult to solve, I'd not like to add the progress counter for the sorting tuples. - Progress counter for "6. rebuilding index" phase - Should we add "index_vacuum_count" in the view like a vacuum progress monitor? If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c. However, I'm not sure whether it is okay or not. Doesn't seem unreasonable to me. I see, I'll add it later. Regards, Tatsuro Yamada
Re: [HACKERS] CLUSTER command progress monitor
Hi David, On 2019/03/05 17:29, David Steele wrote: On 3/1/19 7:48 AM, Etsuro Fujita wrote: (2019/03/01 14:17), Tatsuro Yamada wrote: Attached patch is wip patch. Is it possible to remove the following patch? Because I registered the patch twice on CF Mar. https://commitfest.postgresql.org/22/2049/ Please remove the above and keep this: https://commitfest.postgresql.org/22/1779/ which I moved from the January CF to the March one on behalf of him. I have closed the duplicate entry (#2049) and retained the entry which contains the CF history (#1779). Thank you! :) Regards, Tatsuro Yamada
Re: Re: [HACKERS] CLUSTER command progress monitor
On 3/1/19 7:48 AM, Etsuro Fujita wrote: (2019/03/01 14:17), Tatsuro Yamada wrote: Attached patch is wip patch. Is it possible to remove the following patch? Because I registered the patch twice on CF Mar. https://commitfest.postgresql.org/22/2049/ Please remove the above and keep this: https://commitfest.postgresql.org/22/1779/ which I moved from the January CF to the March one on behalf of him. I have closed the duplicate entry (#2049) and retained the entry which contains the CF history (#1779). Regards, -- -David da...@pgmasters.net
Re: [HACKERS] CLUSTER command progress monitor
On Mon, Mar 4, 2019 at 5:38 AM Tatsuro Yamada wrote: > === Current design === > > CLUSTER command uses Index Scan or Seq Scan when scanning the heap. > Depending on which one is chosen, the command will proceed in the > following sequence of phases: > >* Scan method: Seq Scan > 0. initializing (*2) > 1. seq scanning heap(*1) > 3. sorting tuples (*2) > 4. writing new heap (*1) > 5. swapping relation files (*2) > 6. rebuilding index (*2) > 7. performing final cleanup (*2) > >* Scan method: Index Scan > 0. initializing (*2) > 2. index scanning heap (*1) > 5. swapping relation files (*2) > 6. rebuilding index (*2) > 7. performing final cleanup (*2) > > VACUUM FULL command will proceed in the following sequence of phases: > > 1. seq scanning heap(*1) > 5. swapping relation files (*2) > 6. rebuilding index (*2) > 7. performing final cleanup (*2) > > (*1): increasing the value in heap_tuples_scanned column > (*2): only shows the phase in the phase column All of that sounds good. > The view provides the information of CLUSTER command progress details as > follows > # \d pg_stat_progress_cluster >View "pg_catalog.pg_stat_progress_cluster" >Column | Type | Collation | Nullable | Default > ---+-+---+--+- > pid | integer | | | > datid | oid | | | > datname | name| | | > relid | oid | | | > command | text| | | > phase | text| | | > cluster_index_relid | bigint | | | > heap_tuples_scanned | bigint | | | > heap_tuples_vacuumed | bigint | | | Still not sure if we need heap_tuples_vacuumed. We could try to report heap_blks_scanned and heap_blks_total like we do for VACUUM, if we're using a Seq Scan. > === Discussion points === > > - Progress counter for "3. sorting tuples" phase > - Should we add pgstat_progress_update_param() in tuplesort.c like a >"trace_sort"? >Thanks to Peter Geoghegan for the useful advice! How would we avoid an abstraction violation? > - Progress counter for "6. rebuilding index" phase > - Should we add "index_vacuum_count" in the view like a vacuum progress > monitor? >If yes, I'll add pgstat_progress_update_param() to reindex_relation() > of index.c. >However, I'm not sure whether it is okay or not. Doesn't seem unreasonable to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] CLUSTER command progress monitor
On 2019/03/02 4:15, Robert Haas wrote: On Thu, Feb 28, 2019 at 11:54 PM Tatsuro Yamada wrote: Attached patch is wip patch. I rewrote the current design of the progress monitor and also wrote discussion points in the middle of this email. I'd like to get any feedback from -hackers. === Current design === CLUSTER command uses Index Scan or Seq Scan when scanning the heap. Depending on which one is chosen, the command will proceed in the following sequence of phases: * Scan method: Seq Scan 0. initializing (*2) 1. seq scanning heap(*1) 3. sorting tuples (*2) 4. writing new heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) * Scan method: Index Scan 0. initializing (*2) 2. index scanning heap (*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) VACUUM FULL command will proceed in the following sequence of phases: 1. seq scanning heap(*1) 5. swapping relation files (*2) 6. rebuilding index (*2) 7. performing final cleanup (*2) (*1): increasing the value in heap_tuples_scanned column (*2): only shows the phase in the phase column The view provides the information of CLUSTER command progress details as follows # \d pg_stat_progress_cluster View "pg_catalog.pg_stat_progress_cluster" Column | Type | Collation | Nullable | Default ---+-+---+--+- pid | integer | | | datid | oid | | | datname | name| | | relid | oid | | | command | text| | | phase | text| | | cluster_index_relid | bigint | | | heap_tuples_scanned | bigint | | | heap_tuples_vacuumed | bigint | | | === Discussion points === - Progress counter for "3. sorting tuples" phase - Should we add pgstat_progress_update_param() in tuplesort.c like a "trace_sort"? Thanks to Peter Geoghegan for the useful advice! - Progress counter for "6. rebuilding index" phase - Should we add "index_vacuum_count" in the view like a vacuum progress monitor? If yes, I'll add pgstat_progress_update_param() to reindex_relation() of index.c. However, I'm not sure whether it is okay or not. - pg_stat_progress_rewrite - TBA === My test case === I share my test case of progress monitor. If someone wants to watch the current progress monitor, you can use this test case as a example. [Terminal1] Run this query on psql: select * from pg_stat_progress_cluster; \watch 0.05 [Terminal2] Run these queries on psql: drop table t1; create table t1 as select a, random() * 1000 as b from generate_series(0, 99) a; create index idx_t1 on t1(a); create index idx_t1_b on t1(b); analyze t1; -- index scan set enable_seqscan to off; cluster verbose t1 using idx_t1; -- seq scan set enable_seqscan to on; set enable_indexscan to off; cluster verbose t1 using idx_t1; -- only given table name to cluster command cluster verbose t1; -- only cluster command cluster verbose; -- vacuum full vacuum full t1; -- vacuum full vacuum full; Thanks, Tatsuro Yamada
Re: [HACKERS] CLUSTER command progress monitor
On 2019/03/02 4:15, Robert Haas wrote: On Thu, Feb 28, 2019 at 11:54 PM Tatsuro Yamada wrote: Attached patch is wip patch. Thanks for your comments! :) I revised the code and the document. + CLUSTER and VACUUM FULL, showing current progress. and -> or Fixed. + certain commands during command execution. Currently, the suppoted + progress reporting commands are VACUUM and CLUSTER. suppoted -> supported But I'd just say: Currently, the only commands which support progress reporting are VACUUM and CLUSTER. I choose the latter. Fixed. + Running VACUUM FULL is listed in pg_stat_progress_cluster + view because it uses CLUSTER command internally. See . How about: Running VACUUM FULL is listed in pg_stat_progress_cluster because both VACUUM FULL and CLUSTER rewrite the table, while regular VACUUM only modifies it in place. Fixed. + Current processing command: CLUSTER/VACUUM FULL. The command that is running. Either CLUSTER or VACUUM FULL. Fixed. + Current processing phase of cluster/vacuum full. See or . Current processing phase of CLUSTER or VACUUM FULL. Or maybe better, just abbreviate to: Current processing phase. Fixed as you suggested. + Scan method of table: index scan/seq scan. Eh, shouldn't this be gone now? And likewise for the view definition? Fixed. Sorry, It was an oversight. + OID of the index. If the table is being scanned using an index, this is the OID of the index being used; otherwise, it is zero. Fixed. + heap_tuples_total Leftovers. Skipping over the rest of your documentation changes since it looks like a bunch of things there still need to be updated. I agree. Thanks a lot! I'll divide the patch into two patch such as code and document. + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid); This now appears inside cluster_rel(), but also vacuum_rel() is still doing the same thing. That's wrong. It was an oversight too. I fixed. + if(OidIsValid(indexOid)) Missing space. Please pgindent. Fixed. I Will do pgindent later. Please find attached files. :) Thanks, Tatsuro Yamada diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 3e229c693c..0d0f8f0e31 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -906,6 +906,30 @@ CREATE VIEW pg_stat_progress_vacuum AS FROM pg_stat_get_progress_info('VACUUM') AS S LEFT JOIN pg_database D ON S.datid = D.oid; +CREATE VIEW pg_stat_progress_cluster AS +SELECT +S.pid AS pid, +S.datid AS datid, +D.datname AS datname, +S.relid AS relid, +CASE S.param1 WHEN 1 THEN 'CLUSTER' + WHEN 2 THEN 'VACUUM FULL' + END AS command, +CASE S.param2 WHEN 0 THEN 'initializing' + WHEN 1 THEN 'seq scanning heap' + WHEN 2 THEN 'index scanning heap' + WHEN 3 THEN 'sorting tuples' + WHEN 4 THEN 'writing new heap' + WHEN 5 THEN 'swapping relation files' + WHEN 6 THEN 'rebuilding index' + WHEN 7 THEN 'performing final cleanup' + END AS phase, +S.param3 AS cluster_index_relid, +S.param4 AS heap_tuples_scanned, +S.param5 AS heap_tuples_vacuumed +FROM pg_stat_get_progress_info('CLUSTER') AS S +LEFT JOIN pg_database D ON S.datid = D.oid; + CREATE VIEW pg_user_mappings AS SELECT U.oid AS umid, diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c index a74af4c171..f22ff590f0 100644 --- a/src/backend/commands/cluster.c +++ b/src/backend/commands/cluster.c @@ -35,10 +35,12 @@ #include "catalog/objectaccess.h" #include "catalog/toasting.h" #include "commands/cluster.h" +#include "commands/progress.h" #include "commands/tablecmds.h" #include "commands/vacuum.h" #include "miscadmin.h" #include "optimizer/optimizer.h" +#include "pgstat.h" #include "storage/bufmgr.h" #include "storage/lmgr.h" #include "storage/predicate.h" @@ -275,6 +277,8 @@ cluster_rel(Oid tableOid, Oid indexOid, int options) /* Check for user-requested abort. */ CHECK_FOR_INTERRUPTS(); + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid); + /* * We grab exclusive access to the target rel and index for the duration * of the transaction. (This is redundant for the single-transaction @@ -385,6 +389,18 @@ cluster_rel(Oid tableOid, Oid indexOid, int options) */ CheckTableNotInUse(OldHeap, OidIsValid(indexOid) ? "CLUSTER" : "VACUUM"); + /* Set command to column */ + if (OidIsValid(indexOid)) + { + pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND, + PROGRESS_CLUSTER_COMMAND_CLUSTER); + } + else + { + pgstat_progress_update_param(PROGRESS_CLUSTER_COMMAND, + PROGRESS_CLUSTER_COMMAND_VACUUM_FULL); + } + /*
Re: [HACKERS] CLUSTER command progress monitor
On Thu, Feb 28, 2019 at 11:54 PM Tatsuro Yamada wrote: > Attached patch is wip patch. + CLUSTER and VACUUM FULL, showing current progress. and -> or + certain commands during command execution. Currently, the suppoted + progress reporting commands are VACUUM and CLUSTER. suppoted -> supported But I'd just say: Currently, the only commands which support progress reporting are VACUUM and CLUSTER. + Running VACUUM FULL is listed in pg_stat_progress_cluster + view because it uses CLUSTER command internally. See . How about: Running VACUUM FULL is listed in pg_stat_progress_cluster because both VACUUM FULL and CLUSTER rewrite the table, while regular VACUUM only modifies it in place. + Current processing command: CLUSTER/VACUUM FULL. The command that is running. Either CLUSTER or VACUUM FULL. + Current processing phase of cluster/vacuum full. See or . Current processing phase of CLUSTER or VACUUM FULL. Or maybe better, just abbreviate to: Current processing phase. + Scan method of table: index scan/seq scan. Eh, shouldn't this be gone now? And likewise for the view definition? + OID of the index. If the table is being scanned using an index, this is the OID of the index being used; otherwise, it is zero. + heap_tuples_total Leftovers. Skipping over the rest of your documentation changes since it looks like a bunch of things there still need to be updated. + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid); This now appears inside cluster_rel(), but also vacuum_rel() is still doing the same thing. That's wrong. + if(OidIsValid(indexOid)) Missing space. Please pgindent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] CLUSTER command progress monitor
(2019/03/01 14:17), Tatsuro Yamada wrote: Attached patch is wip patch. Is it possible to remove the following patch? Because I registered the patch twice on CF Mar. https://commitfest.postgresql.org/22/2049/ Please remove the above and keep this: https://commitfest.postgresql.org/22/1779/ which I moved from the January CF to the March one on behalf of him. Best regards, Etsuro Fujita
Re: [HACKERS] CLUSTER command progress monitor
Attached patch is wip patch. Is it possible to remove the following patch? Because I registered the patch twice on CF Mar. https://commitfest.postgresql.org/22/2049/ Thanks, Tatsuro Yamada
Re: [HACKERS] CLUSTER command progress monitor
On 2019/02/23 6:02, Robert Haas wrote: On Fri, Dec 28, 2018 at 3:20 AM Tatsuro Yamada wrote: This patch is rebased on HEAD. I'll tackle revising the patch based on feedbacks next month. + Running VACUUM FULL is listed in pg_stat_progress_cluster + view because it uses CLUSTER command internally. See . It's not really true to say that VACUUM FULL uses the CLUSTER command internally. It's not really true. It uses a good chunk of the same infrastructure, but it certainly doesn't use the actual command, and it's not really the exact same thing either, because internally it's doing a sequential scan but no sort, which never happens with CLUSTER. I'm not sure exactly how to rephrase this, but I think we need to make it more precise. One idea is that maybe we should try to think of a design that could also handle the rewriting variants of ALTER TABLE, and call it pg_stat_progress_rewrite. Maybe that's moving the goalposts too far, but I'm not saying we'd necessarily have to do all the work now, just have a view that we think could also handle that. Then again, maybe the needs are too different. Hmm..., I see. If possible, I'd like to stop thinking of VACUUM FULL to avoid complication of the implementation. For now, I haven't enough time to design pg_stat_progress_rewrite. I suppose that it's tough work. + Whenever CLUSTER is running, the + pg_stat_progress_cluster view will contain + one row for each backend that is currently clustering or vacuuming (VACUUM FULL). That sentence contradicts itself. Just say that it contains a row for each backend that is currently running CLUSTER or VACUUM FULL. Fixed. @@ -105,6 +107,7 @@ static void reform_and_rewrite_tuple(HeapTuple tuple, void cluster(ClusterStmt *stmt, bool isTopLevel) { + if (stmt->relation != NULL) { /* This is the single-relation case. */ Useless hunk. Fixed. @@ -186,7 +189,9 @@ cluster(ClusterStmt *stmt, bool isTopLevel) heap_close(rel, NoLock); /* Do the job. */ + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid); cluster_rel(tableOid, indexOid, stmt->options); + pgstat_progress_end_command(); } else { It seems like that stuff should be inside cluster_rel(). Fixed. + /* Set reltuples to total_tuples */ + pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_TUPLES, OldHeap->rd_rel->reltuples); I object. If the user wants that, they can get it from pg_class themselves via an SQL query. It's also an estimate, not something we know to be accurate; I want us to only report facts here, not theories I understand that progress monitor should only report facts, so I removed that code. + pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE, PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP); + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, PROGRESS_CLUSTER_METHOD_INDEX_SCAN); I think you should use pgstat_progress_update_multi_param() if updating multiple parameters at the same time. Also, some lines in this patch, such as this one, are very long. Consider techniques to reduce the line length to 80 characters or less, such as inserting a line break between the two arguments. Fixed. + /* Set scan_method to NULL */ + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, -1); NULL and -1 are not the same thing. Oops, fixed. I think that we shouldn't have both PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP and PROGRESS_CLUSTER_PHASE_SCAN_HEAP. They're the same thing. Let's just use PROGRESS_CLUSTER_PHASE_SCAN_HEAP for both. Actually, better yet, let's get rid of PROGRESS_CLUSTER_SCAN_METHOD and have PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP and PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP. That seems noticeably simpler. Fixed. I agree that it's acceptable to report PROGRESS_CLUSTER_HEAP_TUPLES_VACUUMED and PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD, but I'm not sure I understand why it's valuable to do so in the context of a progress indicator. Actually, I'm not sure why I added it since so much time has passed. :( So, I'll remove PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD at least. Attached patch is wip patch. Thanks! Tatsuro Yamada diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 0e73cdcdda..8cf829e72c 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -344,6 +344,14 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + pg_stat_progress_clusterpg_stat_progress_cluster + One row for each backend running + CLUSTER and VACUUM FULL, showing current progress. + See . + + + @@ -3376,9 +3384,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, PostgreSQL has the ability to report the progress of - certain commands during command execution. Currently, the only command - which supports progress reporting is VACUUM. This may be - expanded in
Re: [HACKERS] CLUSTER command progress monitor
On Fri, Dec 28, 2018 at 3:20 AM Tatsuro Yamada wrote: > This patch is rebased on HEAD. > I'll tackle revising the patch based on feedbacks next month. + Running VACUUM FULL is listed in pg_stat_progress_cluster + view because it uses CLUSTER command internally. See . It's not really true to say that VACUUM FULL uses the CLUSTER command internally. It's not really true. It uses a good chunk of the same infrastructure, but it certainly doesn't use the actual command, and it's not really the exact same thing either, because internally it's doing a sequential scan but no sort, which never happens with CLUSTER. I'm not sure exactly how to rephrase this, but I think we need to make it more precise. One idea is that maybe we should try to think of a design that could also handle the rewriting variants of ALTER TABLE, and call it pg_stat_progress_rewrite. Maybe that's moving the goalposts too far, but I'm not saying we'd necessarily have to do all the work now, just have a view that we think could also handle that. Then again, maybe the needs are too different. + Whenever CLUSTER is running, the + pg_stat_progress_cluster view will contain + one row for each backend that is currently clustering or vacuuming (VACUUM FULL). That sentence contradicts itself. Just say that it contains a row for each backend that is currently running CLUSTER or VACUUM FULL. @@ -105,6 +107,7 @@ static void reform_and_rewrite_tuple(HeapTuple tuple, void cluster(ClusterStmt *stmt, bool isTopLevel) { + if (stmt->relation != NULL) { /* This is the single-relation case. */ Useless hunk. @@ -186,7 +189,9 @@ cluster(ClusterStmt *stmt, bool isTopLevel) heap_close(rel, NoLock); /* Do the job. */ + pgstat_progress_start_command(PROGRESS_COMMAND_CLUSTER, tableOid); cluster_rel(tableOid, indexOid, stmt->options); + pgstat_progress_end_command(); } else { It seems like that stuff should be inside cluster_rel(). + /* Set reltuples to total_tuples */ + pgstat_progress_update_param(PROGRESS_CLUSTER_TOTAL_HEAP_TUPLES, OldHeap->rd_rel->reltuples); I object. If the user wants that, they can get it from pg_class themselves via an SQL query. It's also an estimate, not something we know to be accurate; I want us to only report facts here, not theories + pgstat_progress_update_param(PROGRESS_CLUSTER_PHASE, PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP); + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, PROGRESS_CLUSTER_METHOD_INDEX_SCAN); I think you should use pgstat_progress_update_multi_param() if updating multiple parameters at the same time. Also, some lines in this patch, such as this one, are very long. Consider techniques to reduce the line length to 80 characters or less, such as inserting a line break between the two arguments. + /* Set scan_method to NULL */ + pgstat_progress_update_param(PROGRESS_CLUSTER_SCAN_METHOD, -1); NULL and -1 are not the same thing. I think that we shouldn't have both PROGRESS_CLUSTER_PHASE_SCAN_HEAP_AND_WRITE_NEW_HEAP and PROGRESS_CLUSTER_PHASE_SCAN_HEAP. They're the same thing. Let's just use PROGRESS_CLUSTER_PHASE_SCAN_HEAP for both. Actually, better yet, let's get rid of PROGRESS_CLUSTER_SCAN_METHOD and have PROGRESS_CLUSTER_PHASE_SEQ_SCAN_HEAP and PROGRESS_CLUSTER_PHASE_INDEX_SCAN_HEAP. That seems noticeably simpler. I agree that it's acceptable to report PROGRESS_CLUSTER_HEAP_TUPLES_VACUUMED and PROGRESS_CLUSTER_HEAP_TUPLES_RECENTLY_DEAD, but I'm not sure I understand why it's valuable to do so in the context of a progress indicator. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] CLUSTER command progress monitor
Hi, Do you have a new version of this patch? If not, do you think you'll have something in time for the upcoming commitfest? Not yet, I'll be able to send only a rebased patch by the end of this month. I mean it has no design change because I can't catch up on how to get a progress from sort and index scan. However I'm going to register the patch on next CF. I'm happy if you have interested in the patch. This patch is rebased on HEAD. I'll tackle revising the patch based on feedbacks next month. Happy holidays! Tatsuro Yamada diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 96bcc3a63b..83421e5105 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -332,6 +332,14 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + pg_stat_progress_clusterpg_stat_progress_cluster + One row for each backend running + CLUSTER and VACUUM FULL, showing current progress. + See . + + + @@ -3351,9 +3359,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, PostgreSQL has the ability to report the progress of - certain commands during command execution. Currently, the only command - which supports progress reporting is VACUUM. This may be - expanded in the future. + certain commands during command execution. Currently, the suppoted + progress reporting commands are VACUUM and CLUSTER. + This may be expanded in the future. @@ -3365,9 +3373,8 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, one row for each backend (including autovacuum worker processes) that is currently vacuuming. The tables below describe the information that will be reported and provide information about how to interpret it. - Progress reporting is not currently supported for VACUUM FULL - and backends running VACUUM FULL will not be listed in this - view. + Running VACUUM FULL is listed in pg_stat_progress_cluster + view because it uses CLUSTER command internally. See . @@ -3544,6 +3551,228 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, + + + + CLUSTER Progress Reporting + + + Whenever CLUSTER is running, the + pg_stat_progress_cluster view will contain + one row for each backend that is currently clustering or vacuuming (VACUUM FULL). + The tables below describe the information that will be reported and + provide information about how to interpret it. + + + + pg_stat_progress_cluster View + + + + Column + Type + Description + + + + + + pid + integer + Process ID of backend. + + + datid + oid + OID of the database to which this backend is connected. + + + datname + name + Name of the database to which this backend is connected. + + + relid + oid + OID of the table being clustered. + + + command + text + + Current processing command: CLUSTER/VACUUM FULL. + + + + phase + text + + Current processing phase of cluster/vacuum full. See or . + + + + scan_method + text + + Scan method of table: index scan/seq scan. + + + + cluster_index_relid + bigint + + OID of the index. + + + + heap_tuples_total + bigint + + Total number of heap tuples in the table. This number is reported + as of the beginning of the scan; tuples added later will not be (and + need not be) visited by this CLUSTER and VACUUM FULL. + + + + heap_tuples_scanned + bigint + + Number of heap tuples scanned. + This counter only advances when the phase is scanning heap, + writing new heap and scan heap and write new heap. + + + + heap_tuples_vacuumed + bigint + + Number of heap tuples vacuumed. This counter only advances when the + command is VACUUM FULL and the phase is scanning heap. + + + + heap_tuples_recently_dead + bigint + + Number of heap tuples not vacuumed since these tuples marked recently dead. + This counter only advances when the command is VACUUM FULL and + the phase is scanning heap. + + + + + + + + CLUSTER phases + + + + Phase + Description + + + + + + initializing + + CLUSTER is preparing to begin scanning the heap. This + phase is expected to be very brief. + + + + scanning heap + + CLUSTER is currently scanning heap from the table by + seq scan. This phase is shown when the scan_method is seq scan. + + + + sorting tuples + + CLUSTER is currently sorting tuples. + This phase is shown when the scan_method is seq scan. + + + + scan hea
Re: [HACKERS] CLUSTER command progress monitor
Hi Alvaro, On 2018/12/19 2:23, Alvaro Herrera wrote: Hello Yamada-san, On 2018-Dec-03, Tatsuro Yamada wrote: Thank you for managing the CF and Sorry for the late reply. I'll rebase it for the next CF and also I'll clear my head because the patch needs design change to address the feedbacks, I guess. Therefore, the status is reconsidering the design of the patch. :) Do you have a new version of this patch? If not, do you think you'll have something in time for the upcoming commitfest? Not yet, I'll be able to send only a rebased patch by the end of this month. I mean it has no design change because I can't catch up on how to get a progress from sort and index scan. However I'm going to register the patch on next CF. I'm happy if you have interested in the patch. :) Thanks, Tatsuro Yamada
Re: [HACKERS] CLUSTER command progress monitor
On Tue, Dec 18, 2018 at 2:47 PM Alvaro Herrera wrote: > Well, if you think about individual blocks in terms of storage space, > maybe that's true, but I meant in an Heraclitus way of men never > stepping into the same river -- the second time you write the block, > it's not the same block you wrote before, so you count it twice. It's > not the actual disk space utilization that matters, but how much I/O > have you done (even if it is just to kernel cache, I suppose). Right. > I suppose mapping such numbers to actual progress is a bit of an art (or > intuition as you say), but it seems to be the best we can do, if we do > anything at all. I think that it's fairly useful. I suspect that you don't have to have my theoretical grounding in sorting to be able to do almost as well. All you need is a little bit of experience. > How good are those predictions? The feeling I get from this thread is > that if the estimation of the number of passes is unreliable, it's > better not to report it at all; just return how many we've done thus > far. It's undesirable to report that we're about 150% done (or take > hours to get to 40% done, then suddenly be over). Maybe it isn't that reliable. But on second thought I think that it might not matter, and maybe we should just not do that. "How slow can I make this sort go by subtracting work_mem?" is a game that I like to play sometimes. This blogpost plays that game, and reaches some pretty amusing conclusions: https://www.cybertec-postgresql.com/en/postgresql-improving-sort-performance/ It says that sorting numeric is 60% slower when you do an external sort. But it's an apples to asteroids comparison, because the comparison is made between 4MB of work_mem, and 1GB. I think that it's pretty damn impressive that it's only 60% slower! Besides, even that difference is probably on the high side of average, because numeric abbreviated keys work particularly well, and you won't get the same benefit with a unique numeric values when you happen to be doing a lot of merging. If you tried the same experiment with integers, or even text + abbreviated keys, I bet the difference would be a lot smaller. Despite the huge gap in the amount of memory used. On modern hardware, where doing some amount of random I/O is not that noticeable, you'll have a very hard time finding a case where even a paltry amount of memory with many passes does all that much worse than an internal sort (OTOH, it's not hard to find cases where an external sort is *faster*). Even if you make a generic estimate, it's still probably going to be pretty good, because there just isn't that much variation in how long the sort will take as you vary the amount of memory it can use. Some people will be surprised at this, but it's a pretty robust effect. (This is why I think that a hash_mem GUC might be a good medium term solution that improves upon work_mem -- the situation is dramatically different when it comes to hashing.) My point is that you could offer users the kind of insight they'd find very useful with only a very crude estimate of the amount of merging. Even if it was 60% slower than initially projected, that's still not an awful estimate to most users. That just leaves initial run generation, but it's relatively easy to accurately estimate the amount of initial runs. I rarely see a case where merging takes more than 40% of the total, barring parallel CREATE INDEX. > I wonder if internal sorts are really all that interesting from the PoV > of progress reporting. Also, I have the impression that quicksort isn't > very amenable to letting you know how much work is left. It is hard to predict the duration of one massive quicksort, but it's seems fairly easy to recognize a kind of cadence across multiple quicksorts/runs that each take seconds to a couple of minutes. That's going to be the vast, vast majority of cases we care about. -- Peter Geoghegan
Re: [HACKERS] CLUSTER command progress monitor
On 2018-Dec-18, Peter Geoghegan wrote: > On Tue, Dec 18, 2018 at 1:02 PM Alvaro Herrera > wrote: > > If we see this in terms of tapes and merges, we can report the total > > number of each of those that we have completed. As far as I understand, > > we write one tape to completion, and only then start another one, right? > > Since there's no way to know how many tapes/merges are needed in total, > > it's not possible to compute a percentage of completion. That's seems > > okay -- we're just telling the user that progress is being made, and we > > only report facts not theory. Perhaps we can (also?) indicate disk I/O > > utilization, in terms of the number of blocks written by tuplesort. > > The number of blocks tuplesort uses is constant from the end of > initial run generation, since logtape.c will recycle blocks. Well, if you think about individual blocks in terms of storage space, maybe that's true, but I meant in an Heraclitus way of men never stepping into the same river -- the second time you write the block, it's not the same block you wrote before, so you count it twice. It's not the actual disk space utilization that matters, but how much I/O have you done (even if it is just to kernel cache, I suppose). > > I suppose that in order to have tuplesort.c report progress, we would > > have to have some kind of API that tuplesort would invoke internally to > > indicate events such as "tape started/completed", "merge started/completed". > > One idea is to use a callback system; each tuplesort caller could > > optionally pass a callback to the "begin" function, for progress > > reporting purposes. Initially only cluster.c would use it, but I > > suppose eventually every tuplesort caller would want that. > > I think that you could have a callback that did something with the > information currently reported by trace_sort. That's not a bad way of > scoping the problem. That's how I myself monitor the progress of a > sort, and it works pretty well (whether or not that means other people > can do it is not exactly clear to me). Thanks, that looks useful. I suppose mapping such numbers to actual progress is a bit of an art (or intuition as you say), but it seems to be the best we can do, if we do anything at all. > We predict the number of merge passes within cost_sort() already. That > doesn't seem all that hard to generalize, so that you report the > expected number of passes against the current pass. Some passes are > much quicker than others, but you generally don't have that many with > realistic cases. I don't expect that it will work very well with an > internal sort, but in the case of CLUSTER that almost seems > irrelevant. And maybe even in all cases. How good are those predictions? The feeling I get from this thread is that if the estimation of the number of passes is unreliable, it's better not to report it at all; just return how many we've done thus far. It's undesirable to report that we're about 150% done (or take hours to get to 40% done, then suddenly be over). I wonder if internal sorts are really all that interesting from the PoV of progress reporting. Also, I have the impression that quicksort isn't very amenable to letting you know how much work is left. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
On Tue, Dec 18, 2018 at 1:02 PM Alvaro Herrera wrote: > If we see this in terms of tapes and merges, we can report the total > number of each of those that we have completed. As far as I understand, > we write one tape to completion, and only then start another one, right? > Since there's no way to know how many tapes/merges are needed in total, > it's not possible to compute a percentage of completion. That's seems > okay -- we're just telling the user that progress is being made, and we > only report facts not theory. Perhaps we can (also?) indicate disk I/O > utilization, in terms of the number of blocks written by tuplesort. The number of blocks tuplesort uses is constant from the end of initial run generation, since logtape.c will recycle blocks. > I suppose that in order to have tuplesort.c report progress, we would > have to have some kind of API that tuplesort would invoke internally to > indicate events such as "tape started/completed", "merge started/completed". > One idea is to use a callback system; each tuplesort caller could > optionally pass a callback to the "begin" function, for progress > reporting purposes. Initially only cluster.c would use it, but I > suppose eventually every tuplesort caller would want that. I think that you could have a callback that did something with the information currently reported by trace_sort. That's not a bad way of scoping the problem. That's how I myself monitor the progress of a sort, and it works pretty well (whether or not that means other people can do it is not exactly clear to me). We predict the number of merge passes within cost_sort() already. That doesn't seem all that hard to generalize, so that you report the expected number of passes against the current pass. Some passes are much quicker than others, but you generally don't have that many with realistic cases. I don't expect that it will work very well with an internal sort, but in the case of CLUSTER that almost seems irrelevant. And maybe even in all cases. I think that the user is going to have to be willing to develop some intuition about the progress for it to be all that useful. They're really looking for something that gives a clue if they'll have to wait an hour, a day, or a week, which it seems like trace_sort-like information gives you some idea of. (BTW, dtrace probes can already give the user much the same information -- I think that more people should use those, since tracing technology on Linux has improved drastically in the last few years.) -- Peter Geoghegan
Re: [HACKERS] CLUSTER command progress monitor
On 2017-Nov-21, Peter Geoghegan wrote: > On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas wrote: > > Progress reporting on sorts seems like a tricky problem to me, as I > > said before. In most cases, a sort is going to involve an initial > > stage where it reads all the input tuples and writes out quicksorted > > runs, and then a merge phase where it merges all the output tapes into > > a sorted result. There are some complexities; for example, if the > > number of tapes is really large, then we might need multiple merge > > phases, only the last of which will produce tuples. > > This would ordinarily be the point at which I'd say "but you're very > unlikely to require multiple passes for an external sort these days". > But I won't say that on this thread, because CLUSTER generally has > unusually wide tuples, and so is much more likely to be I/O bound, to > require multiple passes, etc. (I bet the v10 enhancements > disproportionately improved CLUSTER performance.) When the seqscan-and-sort strategy is used, we feed tuplesort with every tuple from the scan. Once that's completed, we call `performsort`, then retrieve tuples. If we see this in terms of tapes and merges, we can report the total number of each of those that we have completed. As far as I understand, we write one tape to completion, and only then start another one, right? Since there's no way to know how many tapes/merges are needed in total, it's not possible to compute a percentage of completion. That's seems okay -- we're just telling the user that progress is being made, and we only report facts not theory. Perhaps we can (also?) indicate disk I/O utilization, in terms of the number of blocks written by tuplesort. I suppose that in order to have tuplesort.c report progress, we would have to have some kind of API that tuplesort would invoke internally to indicate events such as "tape started/completed", "merge started/completed". One idea is to use a callback system; each tuplesort caller could optionally pass a callback to the "begin" function, for progress reporting purposes. Initially only cluster.c would use it, but I suppose eventually every tuplesort caller would want that. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
Hello Yamada-san, On 2018-Dec-03, Tatsuro Yamada wrote: > Thank you for managing the CF and Sorry for the late reply. > I'll rebase it for the next CF and also I'll clear my head because the patch > needs design change to address the feedbacks, I guess. Therefore, the status > is > reconsidering the design of the patch. :) Do you have a new version of this patch? If not, do you think you'll have something in time for the upcoming commitfest? Thanks -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
On Mon, Dec 03, 2018 at 02:17:25PM -0300, Alvaro Herrera wrote: > I think we should mark it as Returned with Feedback then. +1. -- Michael signature.asc Description: PGP signature
Re: [HACKERS] CLUSTER command progress monitor
On 2018-Dec-03, Tatsuro Yamada wrote: > > In the meantime I'm moving it to the next CF. > > Thank you for managing the CF and Sorry for the late reply. > I'll rebase it for the next CF and also I'll clear my head because the patch > needs design change to address the feedbacks, I guess. Therefore, the status > is > reconsidering the design of the patch. :) I think we should mark it as Returned with Feedback then. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] CLUSTER command progress monitor
On 2018/11/29 21:20, Dmitry Dolgov wrote: On Fri, Aug 24, 2018 at 7:06 AM Tatsuro Yamada wrote: On 2017/11/22 6:07, Peter Geoghegan wrote: On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas wrote: Progress reporting on sorts seems like a tricky problem to me, as I said before. In most cases, a sort is going to involve an initial stage where it reads all the input tuples and writes out quicksorted runs, and then a merge phase where it merges all the output tapes into a sorted result. There are some complexities; for example, if the number of tapes is really large, then we might need multiple merge phases, only the last of which will produce tuples. This would ordinarily be the point at which I'd say "but you're very unlikely to require multiple passes for an external sort these days". But I won't say that on this thread, because CLUSTER generally has unusually wide tuples, and so is much more likely to be I/O bound, to require multiple passes, etc. (I bet the v10 enhancements disproportionately improved CLUSTER performance.) Hi, I came back to develop the feature for community. V4 patch is corrected these following points: - Rebase on master (143290efd) - Fix document - Replace the column name scan_index_relid with cluster_index_relid. Thanks to Jeff Janes! I'm now working on improving the patch based on Robert's comment related to "Seqscan and Sort case" and also considering how to handle the "Index scan case". Thank you, Unfortunately, this patch has some conflicts now, could you rebase it? Also what's is the status of your work on improving it based on the provided feedback? In the meantime I'm moving it to the next CF. Thank you for managing the CF and Sorry for the late reply. I'll rebase it for the next CF and also I'll clear my head because the patch needs design change to address the feedbacks, I guess. Therefore, the status is reconsidering the design of the patch. :) Regards, Tatsuro Yamada NTT Open Source Software Center
Re: [HACKERS] CLUSTER command progress monitor
> On Fri, Aug 24, 2018 at 7:06 AM Tatsuro Yamada > wrote: > > On 2017/11/22 6:07, Peter Geoghegan wrote: > > On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas wrote: > >> Progress reporting on sorts seems like a tricky problem to me, as I > >> said before. In most cases, a sort is going to involve an initial > >> stage where it reads all the input tuples and writes out quicksorted > >> runs, and then a merge phase where it merges all the output tapes into > >> a sorted result. There are some complexities; for example, if the > >> number of tapes is really large, then we might need multiple merge > >> phases, only the last of which will produce tuples. > > > > This would ordinarily be the point at which I'd say "but you're very > > unlikely to require multiple passes for an external sort these days". > > But I won't say that on this thread, because CLUSTER generally has > > unusually wide tuples, and so is much more likely to be I/O bound, to > > require multiple passes, etc. (I bet the v10 enhancements > > disproportionately improved CLUSTER performance.) > > Hi, > > I came back to develop the feature for community. > V4 patch is corrected these following points: > >- Rebase on master (143290efd) >- Fix document >- Replace the column name scan_index_relid with cluster_index_relid. >Thanks to Jeff Janes! > > I'm now working on improving the patch based on Robert's comment related to > "Seqscan and Sort case" and also considering how to handle the "Index scan > case". Thank you, Unfortunately, this patch has some conflicts now, could you rebase it? Also what's is the status of your work on improving it based on the provided feedback? In the meantime I'm moving it to the next CF.
Re: [HACKERS] CLUSTER command progress monitor
On 2017/11/22 6:07, Peter Geoghegan wrote: On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas wrote: Progress reporting on sorts seems like a tricky problem to me, as I said before. In most cases, a sort is going to involve an initial stage where it reads all the input tuples and writes out quicksorted runs, and then a merge phase where it merges all the output tapes into a sorted result. There are some complexities; for example, if the number of tapes is really large, then we might need multiple merge phases, only the last of which will produce tuples. This would ordinarily be the point at which I'd say "but you're very unlikely to require multiple passes for an external sort these days". But I won't say that on this thread, because CLUSTER generally has unusually wide tuples, and so is much more likely to be I/O bound, to require multiple passes, etc. (I bet the v10 enhancements disproportionately improved CLUSTER performance.) Hi, I came back to develop the feature for community. V4 patch is corrected these following points: - Rebase on master (143290efd) - Fix document - Replace the column name scan_index_relid with cluster_index_relid. Thanks to Jeff Janes! I'm now working on improving the patch based on Robert's comment related to "Seqscan and Sort case" and also considering how to handle the "Index scan case". Please find attached file. Regards, Tatsuro Yamada diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 0484cfa77a..5a4bd203ea 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -332,6 +332,14 @@ postgres 27093 0.0 0.0 30096 2752 ?Ss 11:34 0:00 postgres: ser + + pg_stat_progress_clusterpg_stat_progress_cluster + One row for each backend running + CLUSTER and VACUUM FULL, showing current progress. + See . + + + @@ -3338,9 +3346,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, PostgreSQL has the ability to report the progress of - certain commands during command execution. Currently, the only command - which supports progress reporting is VACUUM. This may be - expanded in the future. + certain commands during command execution. Currently, the suppoted + progress reporting commands are VACUUM and CLUSTER. + This may be expanded in the future. @@ -3352,9 +3360,8 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, one row for each backend (including autovacuum worker processes) that is currently vacuuming. The tables below describe the information that will be reported and provide information about how to interpret it. - Progress reporting is not currently supported for VACUUM FULL - and backends running VACUUM FULL will not be listed in this - view. + Running VACUUM FULL is listed in pg_stat_progress_cluster + view because it uses CLUSTER command internally. See . @@ -3531,6 +3538,228 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid, + + + + CLUSTER Progress Reporting + + + Whenever CLUSTER is running, the + pg_stat_progress_cluster view will contain + one row for each backend that is currently clustering or vacuuming (VACUUM FULL). + The tables below describe the information that will be reported and + provide information about how to interpret it. + + + + pg_stat_progress_cluster View + + + + Column + Type + Description + + + + + + pid + integer + Process ID of backend. + + + datid + oid + OID of the database to which this backend is connected. + + + datname + name + Name of the database to which this backend is connected. + + + relid + oid + OID of the table being clustered. + + + command + text + + Current processing command: CLUSTER/VACUUM FULL. + + + + phase + text + + Current processing phase of cluster/vacuum full. See or . + + + + scan_method + text + + Scan method of table: index scan/seq scan. + + + + cluster_index_relid + bigint + + OID of the index. + + + + heap_tuples_total + bigint + + Total number of heap tuples in the table. This number is reported + as of the beginning of the scan; tuples added later will not be (and + need not be) visited by this CLUSTER and VACUUM FULL. + + + + heap_tuples_scanned + bigint + + Number of heap tuples scanned. + This counter only advances when the phase is scanning heap, + writing new heap and scan heap and write new heap. + + + + heap_tuples_vacuumed + bigint + + Number of heap tuples vacuumed. This counter only advances when the + command is VACUUM FULL and the phase is scanning heap. + + + +
Re: [HACKERS] CLUSTER command progress monitor
On Wed, Nov 22, 2017 at 1:53 PM, Michael Paquier wrote: > On Wed, Nov 22, 2017 at 5:55 AM, Robert Haas wrote: >> I have been of the opinion all along that progress monitoring needs to >> report facts, not theories. The number of tuples read thus far is a >> fact, and is fine to report for whatever value it may have to someone. >> The number of tuples that will be read in the future is a theory, and >> as you say, progress monitoring is most likely to be used in cases >> where theory and practice ended up being very different. > > +1. We should never as well enter in things like trying to estimate > the amount of time remaining to finish a task [1]. > > [1]: https://www.xkcd.com/612/ +1 That is one reason I made pg_stat_replication.XXX_lag report the lag of WAL that has been processed, not (say) the time until we catch up. In some information-poor scenarios it interpolates which isn't perfect but the general idea is that is shows you measurements of the past (facts), not predictions about the future (theories). -- Thomas Munro http://www.enterprisedb.com
Re: [HACKERS] CLUSTER command progress monitor
On Wed, Nov 22, 2017 at 5:55 AM, Robert Haas wrote: > I have been of the opinion all along that progress monitoring needs to > report facts, not theories. The number of tuples read thus far is a > fact, and is fine to report for whatever value it may have to someone. > The number of tuples that will be read in the future is a theory, and > as you say, progress monitoring is most likely to be used in cases > where theory and practice ended up being very different. +1. We should never as well enter in things like trying to estimate the amount of time remaining to finish a task [1]. [1]: https://www.xkcd.com/612/ -- Michael
Re: [HACKERS] CLUSTER command progress monitor
On Tue, Nov 21, 2017 at 12:55 PM, Robert Haas wrote: > I agree. > > I have been of the opinion all along that progress monitoring needs to > report facts, not theories. The number of tuples read thus far is a > fact, and is fine to report for whatever value it may have to someone. That makes a lot of sense to me. I sometimes think that we're too hesitant to expose internal information due to concerns about it being hard to interpret. I see wait events as bucking this trend, which I welcome. We see similar trends in the Linux kernel, with tools like perf and BCC/eBPF now being regularly used to debug production issues. > The number of tuples that will be read in the future is a theory, and > as you say, progress monitoring is most likely to be used in cases > where theory and practice ended up being very different. You hit the nail on the head here. It's not that these things are not difficult to interpret - the concern itself is justified. It just needs to be weighed against the benefit of having some instrumentation to start with. People are much more likely to complain about obscure debug information, which makes them feel dumb, than they are to complain about the absence of any instrumentation, but I still think that the latter is the bigger problem. Besides, you don't necessarily have to understand something to act on it. The internals of Oracle are trade secrets, but they were the first to have wait events, I think. At least having something that you can Google can make all the difference. -- Peter Geoghegan
Re: [HACKERS] CLUSTER command progress monitor
On Mon, Oct 2, 2017 at 6:04 AM, Robert Haas wrote: > Progress reporting on sorts seems like a tricky problem to me, as I > said before. In most cases, a sort is going to involve an initial > stage where it reads all the input tuples and writes out quicksorted > runs, and then a merge phase where it merges all the output tapes into > a sorted result. There are some complexities; for example, if the > number of tapes is really large, then we might need multiple merge > phases, only the last of which will produce tuples. This would ordinarily be the point at which I'd say "but you're very unlikely to require multiple passes for an external sort these days". But I won't say that on this thread, because CLUSTER generally has unusually wide tuples, and so is much more likely to be I/O bound, to require multiple passes, etc. (I bet the v10 enhancements disproportionately improved CLUSTER performance.) -- Peter Geoghegan
Re: [HACKERS] CLUSTER command progress monitor
On Mon, Nov 20, 2017 at 12:05 PM, Antonin Houska wrote: > Robert Haas wrote: >> On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada >> wrote: >> > 1. scanning heap >> > 2. sort tuples >> >> These two phases overlap, though. I believe progress reporting for >> sorts is really hard. In the simple case where the data fits in >> work_mem, none of the work of the sort gets done until all the data is >> read. Once you switch to an external sort, you're writing batch >> files, so a lot of the work is now being done during data loading. >> But as the number of batch files grows, the final merge at the end >> becomes an increasingly noticeable part of the cost, and eventually >> you end up needing multiple merge passes. I think we need some smart >> way to report on sorts so that we can tell how much of the work has >> really been done, but I don't know how to do it. > > Whatever complexity is hidden in the sort, cost_sort() should have taken it > into consideration when called via plan_cluster_use_sort(). Thus I think that > once we have both startup and total cost, the current progress of the sort > stage can be estimated from the current number of input and output > rows. Please remind me if my proposal appears to be too simplistic. I think it is far too simplistic. If the sort is being fed by a sequential scan, reporting the number of blocks scanned so far as compared to the total number that will be scanned would be a fine way of reporting on the progress of the sequential scan -- and it's better to use blocks, which we know for sure about, than rows, at which we can only guess. But that's the *scan* progress, not the *sort* progress. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] CLUSTER command progress monitor
On Mon, Nov 20, 2017 at 12:25 PM, Tom Lane wrote: > Antonin Houska writes: >> Robert Haas wrote: >>> These two phases overlap, though. I believe progress reporting for >>> sorts is really hard. > >> Whatever complexity is hidden in the sort, cost_sort() should have taken it >> into consideration when called via plan_cluster_use_sort(). Thus I think that >> once we have both startup and total cost, the current progress of the sort >> stage can be estimated from the current number of input and output >> rows. Please remind me if my proposal appears to be too simplistic. > > Well, even if you assume that the planner's cost model omits nothing > (which I wouldn't bet on), its result is only going to be as good as the > planner's estimate of the number of rows to be sorted. And, in cases > where people actually care about progress monitoring, it's likely that > the planner got that wrong, maybe horribly so. I think it's a bad idea > for progress monitoring to depend on the planner's estimates in any way > whatsoever. I agree. I have been of the opinion all along that progress monitoring needs to report facts, not theories. The number of tuples read thus far is a fact, and is fine to report for whatever value it may have to someone. The number of tuples that will be read in the future is a theory, and as you say, progress monitoring is most likely to be used in cases where theory and practice ended up being very different. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [HACKERS] CLUSTER command progress monitor
Tom Lane wrote: > Antonin Houska writes: > > Robert Haas wrote: > >> These two phases overlap, though. I believe progress reporting for > >> sorts is really hard. > > > Whatever complexity is hidden in the sort, cost_sort() should have taken it > > into consideration when called via plan_cluster_use_sort(). Thus I think > > that > > once we have both startup and total cost, the current progress of the sort > > stage can be estimated from the current number of input and output > > rows. Please remind me if my proposal appears to be too simplistic. > > Well, even if you assume that the planner's cost model omits nothing > (which I wouldn't bet on), its result is only going to be as good as the > planner's estimate of the number of rows to be sorted. And, in cases > where people actually care about progress monitoring, it's likely that > the planner got that wrong, maybe horribly so. I think it's a bad idea > for progress monitoring to depend on the planner's estimates in any way > whatsoever. The general idea was that some sort of prediction of the total cost is needed anyway if we should tell during execution what fraction of work has already been done. And also that the cost computation that we perform during execution shouldn't (ideally) differ from cost_sort(). So I thought that it's easier to refine cost_sort() than to implement the same computation from scratch elsewhere. Besides that I see 2 circumstances that make the estimate of the number of input tuples simpler in the CLUSTER case: * There's only 1 input relation w/o any kind of clause. * CLUSTER uses SnapshotAny, so pg_class(reltuples) is closer to the actual number of input rows than it would be in general case. (Of course, pg_class would only be useful for the initial estimate.) Unlike planner, the executor could recalculate the cost estimate at some point(s) as it recognizes that the actual number of tuples per page appears to differ from the density derived from pg_class initially. Still wrong? -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at
Re: [HACKERS] CLUSTER command progress monitor
Antonin Houska writes: > Robert Haas wrote: >> These two phases overlap, though. I believe progress reporting for >> sorts is really hard. > Whatever complexity is hidden in the sort, cost_sort() should have taken it > into consideration when called via plan_cluster_use_sort(). Thus I think that > once we have both startup and total cost, the current progress of the sort > stage can be estimated from the current number of input and output > rows. Please remind me if my proposal appears to be too simplistic. Well, even if you assume that the planner's cost model omits nothing (which I wouldn't bet on), its result is only going to be as good as the planner's estimate of the number of rows to be sorted. And, in cases where people actually care about progress monitoring, it's likely that the planner got that wrong, maybe horribly so. I think it's a bad idea for progress monitoring to depend on the planner's estimates in any way whatsoever. regards, tom lane
Re: [HACKERS] CLUSTER command progress monitor
Robert Haas wrote: > On Wed, Aug 30, 2017 at 10:12 PM, Tatsuro Yamada > wrote: > > 1. scanning heap > > 2. sort tuples > > These two phases overlap, though. I believe progress reporting for > sorts is really hard. In the simple case where the data fits in > work_mem, none of the work of the sort gets done until all the data is > read. Once you switch to an external sort, you're writing batch > files, so a lot of the work is now being done during data loading. > But as the number of batch files grows, the final merge at the end > becomes an increasingly noticeable part of the cost, and eventually > you end up needing multiple merge passes. I think we need some smart > way to report on sorts so that we can tell how much of the work has > really been done, but I don't know how to do it. Whatever complexity is hidden in the sort, cost_sort() should have taken it into consideration when called via plan_cluster_use_sort(). Thus I think that once we have both startup and total cost, the current progress of the sort stage can be estimated from the current number of input and output rows. Please remind me if my proposal appears to be too simplistic. -- Antonin Houska Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: http://www.postgresql-support.de, http://www.cybertec.at