Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Jun 27, 2014 at 4:10 AM, Dilip kumar wrote: ... > > Updated patch is attached in the mail.. Thanks Dilip. I get a compiler warning when building on Windows. When I started looking into that, I see that two files have too much code duplication between them: src/bin/scripts/vac_parallel.c (new file) src/bin/pg_dump/parallel.c (existing file) In particular, pgpipe is almost an exact duplicate between them, except the copy in vac_parallel.c has fallen behind changes made to parallel.c. (Those changes would have fixed the Windows warnings). I think that this function (and perhaps other parts as well--"exit_horribly" for example) need to refactored into a common file that both files can include. I don't know where the best place for that would be, though. (I haven't done this type of refactoring myself.) Also, there are several places in the patch which use spaces for indentation where tabs are called for by the coding style. It looks like you may have copied the code from one terminal window and copied it into another one, converting tabs to spaces in the process. This makes it hard to evaluate the amount of code duplication. In some places the code spins in a tight loop while waiting for a worker process to become free. If I strace the process, I got a long list of selects with 0 time outs: select(13, [6 8 10 12], NULL, NULL, {0, 0}) = 0 (Timeout) I have not tried to track down the code that causes it. I did notice that vacuumdb spends an awful lot of time at the top of the Linux "top" output, and this is probably why. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Thu, Jun 26, 2014 at 2:35 AM, Dilip kumar wrote: > > Thank you for giving your time, Please review the updated patch attached in > the mail. > > > > 1. Rebased the patch > > 2. Implemented parallel execution for new option --analyze-in-stages Hi Dilip, Thanks for rebasing. I haven't done an architectural or code review on it, I just applied it and used it a little on Linux. Based on that, I find most importantly that it doesn't seem to correctly vacuum tables which have upper case letters in the name, because it does not quote the table names when they need quotes. Of course that needs to be fixed, but taking it as it is, the resulting error message to the console is just: : Execute failed Which is not very informative. I get the same error if I do a "pg_ctl shutdown -mi" while running the parallel vacuumdb. Without the -j option it produces a more descriptive error message "FATAL: terminating connection due to administrator command", so something about the new feature suppresses the informative error messages. I get some compiler warnings with the new patch: vac_parallel.c: In function 'parallel_msg_master': vac_parallel.c:147: warning: function might be possible candidate for 'gnu_printf' format attribute vac_parallel.c:147: warning: function might be possible candidate for 'gnu_printf' format attribute vac_parallel.c: In function 'exit_horribly': vac_parallel.c:1071: warning: 'noreturn' function does return In the usage message, the string has a tab embedded within it (immediately before "use") that should be converted to literal spaces, otherwise the output of --help gets misaligned: printf(_(" -j, --jobs=NUM use this many parallel jobs to vacuum\n")); Thanks for the work on this. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Hi, I got following FAILED when I patched v3 to HEAD. $ patch -d. -p1 < ../patch/vacuumdb_parallel_v3.patch patching file doc/src/sgml/ref/vacuumdb.sgml Hunk #1 succeeded at 224 (offset 20 lines). patching file src/bin/scripts/Makefile Hunk #2 succeeded at 65 with fuzz 2 (offset -1 lines). patching file src/bin/scripts/vac_parallel.c patching file src/bin/scripts/vac_parallel.h patching file src/bin/scripts/vacuumdb.c Hunk #3 succeeded at 61 with fuzz 2. Hunk #4 succeeded at 87 (offset 2 lines). Hunk #5 succeeded at 143 (offset 2 lines). Hunk #6 succeeded at 158 (offset 5 lines). Hunk #7 succeeded at 214 with fuzz 2 (offset 5 lines). Hunk #8 FAILED at 223. Hunk #9 succeeded at 374 with fuzz 1 (offset 35 lines). Hunk #10 FAILED at 360. Hunk #11 FAILED at 387. 3 out of 11 hunks FAILED -- saving rejects to file src/bin/scripts/vacuumdb.c.rej --- Sawada Masahiko On Friday, March 21, 2014, Dilip kumar wrote: > On 16 January 2014 19:53, Euler Taveira Wrote, > > > > > > >> For the case where you have tables of varying size this would lead > > to a reduced overall processing time as it prevents large (read: long > > processing time) tables to be processed in the last step. While > > processing large tables at first and filling up "processing slots/jobs" > > when they get free with smaller tables one after the other would safe > > overall execution time. > > > Good point, I have made the change and attached the modified patch. > > > > > Don't you submit it for a CF, do you? Is it too late for this CF? > > Attached the latest updated patch > 1. Rebased the patch to current GIT head. > 2. Doc is updated. > 3. Supported parallel execution for all db option also. > > Same I will add to current open commitfest.. > > Regards, > Dilip > -- Regards, --- Sawada Masahiko
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 24 June 2014 11:02 Jeff Wrote, >I mean that the other commit, the one conflicting with your patch, is still >not finished. It probably would not have been committed if we realized the >problem at the time. That other patch runs analyze in stages at > different settings of default_statistics_target, but it has the loops in the > wrong order, so it analyzes one database in all three stages, then moves to > the next database. I think that these two changes are going to > interact with each other. But I can't predict right now what that > interaction will look like. So it is hard for me to evaluate your patch, > until the other one is resolved. >Normally I would evaluate your patch in isolation, but since the conflicting >patch is already committed (and is in the 9.4 branch) that would probably not >be very useful in this case. Oh k, Got your point, I will also try to think how these two patch can interact together.. Regards, Dilip
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Monday, June 23, 2014, Dilip kumar wrote: > On 24 June 2014 03:31, Jeff Wrote, > > > > Attached the latest updated patch > > > 1. Rebased the patch to current GIT head. > > > 2. Doc is updated. > > > 3. Supported parallel execution for all db option also. > > > > This patch needs to be rebased after the analyze-in-stages patch, > > c92c3d50d7fbe7391b5fc864b44434. > > Thank you for giving your attention to this, I will rebase this.. > > > Although that patch still needs to some work itself, despite being > > committed, as still loops over the stages for each db, rather than the > > dbs for each stage. > > If I understood your comment properly, Here you mean to say that > In vacuum_all_databases instead to running all DB's in parallel, we are > running db by db in parallel? > I mean that the other commit, the one conflicting with your patch, is still not finished. It probably would not have been committed if we realized the problem at the time. That other patch runs analyze in stages at different settings of default_statistics_target, but it has the loops in the wrong order, so it analyzes one database in all three stages, then moves to the next database. I think that these two changes are going to interact with each other. But I can't predict right now what that interaction will look like. So it is hard for me to evaluate your patch, until the other one is resolved. Normally I would evaluate your patch in isolation, but since the conflicting patch is already committed (and is in the 9.4 branch) that would probably not be very useful in this case. Cheers, Jeff
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 24 June 2014 03:31, Jeff Wrote, > > Attached the latest updated patch > > 1. Rebased the patch to current GIT head. > > 2. Doc is updated. > > 3. Supported parallel execution for all db option also. > > This patch needs to be rebased after the analyze-in-stages patch, > c92c3d50d7fbe7391b5fc864b44434. Thank you for giving your attention to this, I will rebase this.. > Although that patch still needs to some work itself, despite being > committed, as still loops over the stages for each db, rather than the > dbs for each stage. If I understood your comment properly, Here you mean to say that In vacuum_all_databases instead to running all DB's in parallel, we are running db by db in parallel? I think we can fix this.. > > So I don't know if this patch is really reviewable at this point, as it > is not clear how those things are going to interact with each other. Exactly what points you want to mention here ? Regards, Dilip Kumar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Mar 21, 2014 at 12:48 AM, Dilip kumar wrote: > On 16 January 2014 19:53, Euler Taveira Wrote, > >> > >> >> For the case where you have tables of varying size this would lead >> to a reduced overall processing time as it prevents large (read: long >> processing time) tables to be processed in the last step. While >> processing large tables at first and filling up "processing slots/jobs" >> when they get free with smaller tables one after the other would safe >> overall execution time. >> > Good point, I have made the change and attached the modified patch. >> > >> Don't you submit it for a CF, do you? Is it too late for this CF? > > Attached the latest updated patch > 1. Rebased the patch to current GIT head. > 2. Doc is updated. > 3. Supported parallel execution for all db option also. This patch needs to be rebased after the analyze-in-stages patch, c92c3d50d7fbe7391b5fc864b44434. Although that patch still needs to some work itself, despite being committed, as still loops over the stages for each db, rather than the dbs for each stage. So I don't know if this patch is really reviewable at this point, as it is not clear how those things are going to interact with each other. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Euler Taveira wrote: > On 08-11-2013 06:20, Dilip kumar wrote: > > On 08 November 2013 13:38, Jan Lentfer > > > > > >> For this use case, would it make sense to queue work (tables) in order of > >> their size, starting on the largest one? > > > >> For the case where you have tables of varying size this would lead to a > >> reduced overall processing time as it prevents large (read: long > >> processing time) tables to be processed in the last step. While processing > >> large tables at first and filling up "processing slots/jobs" when they get > >> free with smaller tables one after the other would safe overall execution > >> time. > > Good point, I have made the change and attached the modified patch. > > > Don't you submit it for a CF, do you? Is it too late for this CF? Not too late. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 08-11-2013 06:20, Dilip kumar wrote: > On 08 November 2013 13:38, Jan Lentfer > > >> For this use case, would it make sense to queue work (tables) in order of >> their size, starting on the largest one? > >> For the case where you have tables of varying size this would lead to a >> reduced overall processing time as it prevents large (read: long processing >> time) tables to be processed in the last step. While processing large tables >> at first and filling up "processing slots/jobs" when they get free with >> smaller tables one after the other would safe overall execution time. > Good point, I have made the change and attached the modified patch. > Don't you submit it for a CF, do you? Is it too late for this CF? -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Thu, Nov 7, 2013 at 8:42 PM, Dilip kumar wrote: > This patch implementing the following TODO item > > Allow parallel cores to be used by vacuumdb > http://www.postgresql.org/message-id/4f10a728.7090...@agliodbs.com Cool. Could you add this patch to the next commit fest for 9.4? It begins officially in a couple of days. Here is the URL to it: https://commitfest.postgresql.org/action/commitfest_view?id=20 Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Thu, Nov 7, 2013 at 8:42 PM, Dilip kumar wrote: > This patch implementing the following TODO item > > Allow parallel cores to be used by vacuumdb > http://www.postgresql.org/message-id/4f10a728.7090...@agliodbs.com > > > > Like Parallel pg_dump, vacuumdb is provided with the option to run the > vacuum of multiple tables in parallel. [ vacuumdb –j ] > > > > 1. One new option is provided with vacuumdb to give the number of > workers. > > 2. All worker will be started in beginning and all will be waiting for > the vacuum instruction from the master. > > 3. Now, if table list is provided in vacuumdb command using –t then, > it will send the vacuum of one table to one of the IDLE worker, next table > to next IDLE worker and so on. > > 4. If vacuum is given for one DB then, it will execute select on > pg_class to get the table list and fetch the table name one by one and also > assign the vacuum responsibility to IDLE workers. > > > > Performance Data by parallel vacuumdb: > > Machine Configuration: > > Core : 8 > > RAM: 24GB > > Test Scenario: > > 16 tables all with 4M records. [many records > are deleted and inserted using some pattern, (files is attached in the > mail)] > > > > Test Result > > > > {Base Code}Time(s)%CPU Usage Avg Read(kB/s)Avg Write(kB/s) > > 521 3% 12000 > 2 > > > > > > {With Parallel Vacuum Patch} > >worker Time(s)%CPU UsageAvg Read(kB/s) Avg > Write(kB/s) > > 1 518 3% 12000 > 2 --> this will take the same path as base code > > 2 390 5% 14000 > 3 > > 8 235 7% 18000 > 4 > > 16 197 8% 2 > 5 > > > > Conclusion: > > By running the vacuumdb in parallel, CPU and I/O throughput > is increasing and it can give >50% performance improvement. > > > > Work to be Done: > > 1. Documentations of the new command. > > 2. Parallel support for vacuum all db. > > > > Is it required to move the common code for parallel operation of pg_dump and > vacuumdb to one place and reuse it ? > > > > Prototype patch is attached in the mail, please provide your > feedback/Suggestions… > > > > Thanks & Regards, > > Dilip Kumar > > > > > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 08-11-2013 05:07, Jan Lentfer wrote: > For the case where you have tables of varying size this would lead to > a reduced overall processing time as it prevents large (read: long > processing time) tables to be processed in the last step. While > processing large tables at first and filling up "processing > slots/jobs" when they get free with smaller tables one after the > other would safe overall execution time. > That is certainly a good strategy (not the optimal [1] -- that is hard to achieve). Also, the strategy must: (i) consider the relation age before size (for vacuum); (ii) consider that you can't pick indexes for the same relation (for reindex). [1] http://www.postgresql.org/message-id/CA+TgmobwxqsagXKtyQ1S8+gMpqxF_MLXv=4350tfzvqawke...@mail.gmail.com -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 08 November 2013 13:38, Jan Lentfer > For this use case, would it make sense to queue work (tables) in order of > their size, starting on the largest one? > For the case where you have tables of varying size this would lead to a > reduced overall processing time as it prevents large (read: long processing > time) tables to be processed in the last step. While processing large tables > at first and filling up "processing slots/jobs" when they get free with > smaller tables one after the other would safe overall execution time. Good point, I have made the change and attached the modified patch. Regards, Dilip vacuumdb_parallel_v2.patch Description: vacuumdb_parallel_v2.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 08 November 2013 03:22, Euler Taveira Wrote > On 07-11-2013 09:42, Dilip kumar wrote: > > Dilip, this is on my TODO for 9.4. I've already had a half-backed patch > for it. Let's see what I can come up with. Ok, Let me know if I can contribute to this.. > > Is it required to move the common code for parallel operation of > pg_dump and vacuumdb to one place and reuse it ? > > > I'm not sure about that because the pg_dump parallel code is tight to > TOC entry. Also, dependency matters for pg_dump while in the scripts > case, an order to be choosen will be used. However, vacuumdb can share > the parallel code with clusterdb and reindexdb (my patch does it). +1 > > Of course, a refactor to unify parallel code (pg_dump and scripts) can > be done in a separate patch. > > > Prototype patch is attached in the mail, please provide your > feedback/Suggestions... > > > I'll try to merge your patch with the one I have here until the next CF. Regards, Dilip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Am 07.11.2013 12:42, schrieb Dilip kumar: > This patch implementing the following TODO item > > Allow parallel cores to be used by vacuumdb > > http://www.postgresql.org/message-id/4f10a728.7090...@agliodbs.com [1] > > Like Parallel pg_dump, vacuumdb is provided with the option to run the vacuum of multiple tables in parallel. [ VACUUMDB –J ] > > 1. One new option is provided with vacuumdb to give the number of workers. > > 2. All worker will be started in beginning and all will be waiting for the vacuum instruction from the master. > > 3. Now, if table list is provided in vacuumdb command using -t then, it will send the vacuum of one table to one of the IDLE worker, next table to next IDLE worker and so on. > > 4. If vacuum is given for one DB then, it will execute select on pg_class to get the table list and fetch the table name one by one and also assign the vacuum responsibility to IDLE workers. > > [...] For this use case, would it make sense to queue work (tables) in order of their size, starting on the largest one? For the case where you have tables of varying size this would lead to a reduced overall processing time as it prevents large (read: long processing time) tables to be processed in the last step. While processing large tables at first and filling up "processing slots/jobs" when they get free with smaller tables one after the other would safe overall execution time. Regards Jan -- professional: http://www.oscar-consult.de Links: -- [1] http://www.postgresql.org/message-id/4f10a728.7090...@agliodbs.com
Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On 07-11-2013 09:42, Dilip kumar wrote: Dilip, this is on my TODO for 9.4. I've already had a half-backed patch for it. Let's see what I can come up with. > Is it required to move the common code for parallel operation of pg_dump and > vacuumdb to one place and reuse it ? > I'm not sure about that because the pg_dump parallel code is tight to TOC entry. Also, dependency matters for pg_dump while in the scripts case, an order to be choosen will be used. However, vacuumdb can share the parallel code with clusterdb and reindexdb (my patch does it). Of course, a refactor to unify parallel code (pg_dump and scripts) can be done in a separate patch. > Prototype patch is attached in the mail, please provide your > feedback/Suggestions... > I'll try to merge your patch with the one I have here until the next CF. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers