Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-06-30 Thread Jeff Janes
On Fri, Jun 27, 2014 at 4:10 AM, Dilip kumar  wrote:
...
>
> Updated patch is attached in the mail..

Thanks Dilip.

I get a compiler warning when building on Windows.  When I started
looking into that, I see that two files have too much code duplication
between them:

src/bin/scripts/vac_parallel.c   (new file)
src/bin/pg_dump/parallel.c  (existing file)

In particular, pgpipe is almost an exact duplicate between them,
except the copy in vac_parallel.c has fallen behind changes made to
parallel.c.  (Those changes would have fixed the Windows warnings).  I
think that this function (and perhaps other parts as
well--"exit_horribly" for example) need to refactored into a common
file that both files can include.  I don't know where the best place
for that would be, though.  (I haven't done this type of refactoring
myself.)

Also, there are several places in the patch which use spaces for
indentation where tabs are called for by the coding style. It looks
like you may have copied the code from one terminal window and copied
it into another one, converting tabs to spaces in the process.  This
makes it hard to evaluate the amount of code duplication.

In some places the code spins in a tight loop while waiting for a
worker process to become free.  If I strace the process, I got a long
list of selects with 0 time outs:

select(13, [6 8 10 12], NULL, NULL, {0, 0}) = 0 (Timeout)

I have not tried to track down the code that causes it.  I did notice
that vacuumdb spends an awful lot of time at the top of the Linux
"top" output, and this is probably why.


Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-06-26 Thread Jeff Janes
On Thu, Jun 26, 2014 at 2:35 AM, Dilip kumar  wrote:

>
> Thank you for giving your time,  Please review the updated patch attached in
> the mail.
>
>
>
> 1.  Rebased the patch
>
> 2.  Implemented parallel execution for new option --analyze-in-stages

Hi Dilip,

Thanks for rebasing.

I haven't done an architectural or code review on it, I just applied
it and used it a little on Linux.

Based on that, I find most importantly that it doesn't seem to
correctly vacuum tables which have upper case letters in the name,
because it does not quote the table names when they need quotes.

Of course that needs to be fixed, but taking it as it is, the
resulting error message to the console is just:

 : Execute failed

Which is not very informative.  I get the same error if I do a "pg_ctl
shutdown -mi" while running the parallel vacuumdb. Without the -j
option it produces a more descriptive error message "FATAL:
terminating connection due to administrator command", so something
about the new feature suppresses the informative error messages.

I get some compiler warnings with the new patch:

vac_parallel.c: In function 'parallel_msg_master':
vac_parallel.c:147: warning: function might be possible candidate for
'gnu_printf' format attribute
vac_parallel.c:147: warning: function might be possible candidate for
'gnu_printf' format attribute
vac_parallel.c: In function 'exit_horribly':
vac_parallel.c:1071: warning: 'noreturn' function does return


In the usage message, the string has a tab embedded within it
(immediately before "use") that should be converted to literal spaces,
otherwise the output of --help gets misaligned:

printf(_("  -j, --jobs=NUM  use this many parallel
jobs to vacuum\n"));

Thanks for the work on this.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-06-25 Thread Sawada Masahiko
Hi,

I got following FAILED when I patched v3 to HEAD.

$ patch -d. -p1 < ../patch/vacuumdb_parallel_v3.patch
patching file doc/src/sgml/ref/vacuumdb.sgml
Hunk #1 succeeded at 224 (offset 20 lines).
patching file src/bin/scripts/Makefile
Hunk #2 succeeded at 65 with fuzz 2 (offset -1 lines).
patching file src/bin/scripts/vac_parallel.c
patching file src/bin/scripts/vac_parallel.h
patching file src/bin/scripts/vacuumdb.c
Hunk #3 succeeded at 61 with fuzz 2.
Hunk #4 succeeded at 87 (offset 2 lines).
Hunk #5 succeeded at 143 (offset 2 lines).
Hunk #6 succeeded at 158 (offset 5 lines).
Hunk #7 succeeded at 214 with fuzz 2 (offset 5 lines).
Hunk #8 FAILED at 223.
Hunk #9 succeeded at 374 with fuzz 1 (offset 35 lines).
Hunk #10 FAILED at 360.
Hunk #11 FAILED at 387.
3 out of 11 hunks FAILED -- saving rejects to file
src/bin/scripts/vacuumdb.c.rej

---
Sawada Masahiko

On Friday, March 21, 2014, Dilip kumar  wrote:

> On 16 January 2014 19:53, Euler Taveira Wrote,
>
> > >
> > >> For the case where you have tables of varying size this would lead
> > to a reduced overall processing time as it prevents large (read: long
> > processing time) tables to be processed in the last step. While
> > processing large tables at first and filling up "processing slots/jobs"
> > when they get free with smaller tables one after the other would safe
> > overall execution time.
> > > Good point, I have made the change and attached the modified patch.
> > >
> > Don't you submit it for a CF, do you? Is it too late for this CF?
>
>  Attached the latest updated patch
>  1. Rebased the patch to current GIT head.
>  2. Doc is updated.
>  3. Supported parallel execution for all db option also.
>
> Same I will add to current open commitfest..
>
> Regards,
> Dilip
>


-- 
Regards,

---
Sawada Masahiko


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-06-23 Thread Dilip kumar
On 24 June 2014 11:02 Jeff Wrote,

>I mean that the other commit, the one conflicting with your patch, is still 
>not finished.  It probably would not have been committed if we realized the 
>problem at the time.  That other patch runs analyze in stages at
> different settings of default_statistics_target, but it has the loops in the 
> wrong order, so it analyzes one database in all three stages, then moves to 
> the next database.  I think that these two changes are going to
> interact with each other.  But I can't predict right now what that 
> interaction will look like.   So it is hard for me to evaluate your patch, 
> until the other one is resolved.

>Normally I would evaluate your patch in isolation, but since the conflicting 
>patch is already committed (and is in the 9.4 branch) that would probably not 
>be very useful in this case.


Oh k,  Got your point, I will also try to think how these two patch can 
interact together..

Regards,
Dilip


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-06-23 Thread Jeff Janes
On Monday, June 23, 2014, Dilip kumar  wrote:

> On 24 June 2014 03:31, Jeff Wrote,
>
> > >  Attached the latest updated patch
> > >  1. Rebased the patch to current GIT head.
> > >  2. Doc is updated.
> > >  3. Supported parallel execution for all db option also.
> >
> > This patch needs to be rebased after the analyze-in-stages patch,
> > c92c3d50d7fbe7391b5fc864b44434.
>
> Thank you for giving your attention to this, I will rebase this..
>
> > Although that patch still needs to some work itself, despite being
> > committed, as still loops over the stages for each db, rather than the
> > dbs for each stage.
>
> If I understood your comment properly, Here you mean to say that
> In vacuum_all_databases instead to running all DB's in parallel, we are
> running db by db in parallel?
>

I mean that the other commit, the one conflicting with your patch, is still
not finished.  It probably would not have been committed if we realized the
problem at the time.  That other patch runs analyze in stages at different
settings of default_statistics_target, but it has the loops in the wrong
order, so it analyzes one database in all three stages, then moves to the
next database.  I think that these two changes are going to interact with
each other.  But I can't predict right now what that interaction will look
like.   So it is hard for me to evaluate your patch, until the other one is
resolved.

Normally I would evaluate your patch in isolation, but since the
conflicting patch is already committed (and is in the 9.4 branch) that
would probably not be very useful in this case.

Cheers,

Jeff


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-06-23 Thread Dilip kumar
On 24 June 2014 03:31, Jeff Wrote,

> >  Attached the latest updated patch
> >  1. Rebased the patch to current GIT head.
> >  2. Doc is updated.
> >  3. Supported parallel execution for all db option also.
> 
> This patch needs to be rebased after the analyze-in-stages patch,
> c92c3d50d7fbe7391b5fc864b44434.

Thank you for giving your attention to this, I will rebase this.. 
 
> Although that patch still needs to some work itself, despite being
> committed, as still loops over the stages for each db, rather than the
> dbs for each stage.

If I understood your comment properly, Here you mean to say that 
In vacuum_all_databases instead to running all DB's in parallel, we are running 
db by db in parallel?

I think we can fix this..

> 
> So I don't know if this patch is really reviewable at this point, as it
> is not clear how those things are going to interact with each other.

Exactly what points you want to mention here ?

Regards,
Dilip Kumar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-06-23 Thread Jeff Janes
On Fri, Mar 21, 2014 at 12:48 AM, Dilip kumar  wrote:
> On 16 January 2014 19:53, Euler Taveira Wrote,
>
>> >
>> >> For the case where you have tables of varying size this would lead
>> to a reduced overall processing time as it prevents large (read: long
>> processing time) tables to be processed in the last step. While
>> processing large tables at first and filling up "processing slots/jobs"
>> when they get free with smaller tables one after the other would safe
>> overall execution time.
>> > Good point, I have made the change and attached the modified patch.
>> >
>> Don't you submit it for a CF, do you? Is it too late for this CF?
>
>  Attached the latest updated patch
>  1. Rebased the patch to current GIT head.
>  2. Doc is updated.
>  3. Supported parallel execution for all db option also.

This patch needs to be rebased after the analyze-in-stages patch,
c92c3d50d7fbe7391b5fc864b44434.

Although that patch still needs to some work itself, despite being
committed, as still loops over the stages for each db, rather than the
dbs for each stage.

So I don't know if this patch is really reviewable at this point, as
it is not clear how those things are going to interact with each
other.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-01-16 Thread Alvaro Herrera
Euler Taveira wrote:
> On 08-11-2013 06:20, Dilip kumar wrote:
> > On 08 November 2013 13:38, Jan Lentfer
> > 
> > 
> >> For this use case, would it make sense to queue work (tables) in order of 
> >> their size, starting on the largest one?
> > 
> >> For the case where you have tables of varying size this would lead to a 
> >> reduced overall processing time as it prevents large (read: long 
> >> processing time) tables to be processed in the last step. While processing 
> >> large tables at first and filling up "processing slots/jobs" when they get 
> >> free with smaller tables one after the other would safe overall execution 
> >> time.
> > Good point, I have made the change and attached the modified patch.
> > 
> Don't you submit it for a CF, do you? Is it too late for this CF?

Not too late.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2014-01-16 Thread Euler Taveira
On 08-11-2013 06:20, Dilip kumar wrote:
> On 08 November 2013 13:38, Jan Lentfer
> 
> 
>> For this use case, would it make sense to queue work (tables) in order of 
>> their size, starting on the largest one?
> 
>> For the case where you have tables of varying size this would lead to a 
>> reduced overall processing time as it prevents large (read: long processing 
>> time) tables to be processed in the last step. While processing large tables 
>> at first and filling up "processing slots/jobs" when they get free with 
>> smaller tables one after the other would safe overall execution time.
> Good point, I have made the change and attached the modified patch.
> 
Don't you submit it for a CF, do you? Is it too late for this CF?


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2013-11-12 Thread Michael Paquier
On Thu, Nov 7, 2013 at 8:42 PM, Dilip kumar  wrote:
> This patch implementing the following TODO item
>
> Allow parallel cores to be used by vacuumdb
> http://www.postgresql.org/message-id/4f10a728.7090...@agliodbs.com
Cool. Could you add this patch to the next commit fest for 9.4? It
begins officially in a couple of days. Here is the URL to it:
https://commitfest.postgresql.org/action/commitfest_view?id=20

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2013-11-12 Thread Michael Paquier
On Thu, Nov 7, 2013 at 8:42 PM, Dilip kumar  wrote:
> This patch implementing the following TODO item
>
> Allow parallel cores to be used by vacuumdb
> http://www.postgresql.org/message-id/4f10a728.7090...@agliodbs.com
>
>
>
> Like Parallel pg_dump, vacuumdb is provided with the option to run the
> vacuum of multiple tables in parallel. [ vacuumdb –j ]
>
>
>
> 1.   One new option is provided with vacuumdb to give the number of
> workers.
>
> 2.   All worker will be started in beginning and all will be waiting for
> the vacuum instruction from the master.
>
> 3.   Now, if table list is provided in vacuumdb command using –t then,
> it will send the vacuum of one table to one of the IDLE worker, next table
> to next IDLE worker and so on.
>
> 4.   If vacuum is given for one DB then, it will execute select on
> pg_class to get the table list and fetch the table name one by one and also
> assign the vacuum responsibility to IDLE workers.
>
>
>
> Performance Data by parallel vacuumdb:
>
> Machine Configuration:
>
> Core : 8
>
> RAM: 24GB
>
> Test Scenario:
>
> 16 tables all with 4M records. [many records
> are deleted and inserted using some pattern, (files is attached in the
> mail)]
>
>
>
> Test Result
>
>
>
> {Base Code}Time(s)%CPU Usage  Avg Read(kB/s)Avg Write(kB/s)
>
> 521   3% 12000
> 2
>
>
>
>
>
> {With Parallel Vacuum Patch}
>
>worker  Time(s)%CPU UsageAvg Read(kB/s)  Avg
> Write(kB/s)
>
>   1 518 3% 12000
> 2   --> this will take the same path as base code
>
>   2 390 5% 14000
> 3
>
>   8 235 7% 18000
> 4
>
>   16   197 8% 2
> 5
>
>
>
> Conclusion:
>
> By running the vacuumdb in parallel, CPU and I/O throughput
> is increasing and it can give >50% performance improvement.
>
>
>
> Work to be Done:
>
> 1.   Documentations of the new command.
>
> 2.   Parallel support for vacuum all db.
>
>
>
> Is it required to move the common code for parallel operation of pg_dump and
> vacuumdb to one place and reuse it ?
>
>
>
> Prototype patch is attached in the mail, please provide your
> feedback/Suggestions…
>
>
>
> Thanks & Regards,
>
> Dilip Kumar
>
>
>
>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2013-11-08 Thread Euler Taveira
On 08-11-2013 05:07, Jan Lentfer wrote:
> For the case where you have tables of varying size this would lead to
> a reduced overall processing time as it prevents large (read: long
> processing time) tables to be processed in the last step. While
> processing large tables at first and filling up "processing
> slots/jobs" when they get free with smaller tables one after the
> other would safe overall execution time.
> 
That is certainly a good strategy (not the optimal [1] -- that is hard
to achieve). Also, the strategy must:

(i) consider the relation age before size (for vacuum);
(ii) consider that you can't pick indexes for the same relation (for
reindex).


[1]
http://www.postgresql.org/message-id/CA+TgmobwxqsagXKtyQ1S8+gMpqxF_MLXv=4350tfzvqawke...@mail.gmail.com


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2013-11-08 Thread Dilip kumar
On 08 November 2013 13:38, Jan Lentfer


> For this use case, would it make sense to queue work (tables) in order of 
> their size, starting on the largest one?

> For the case where you have tables of varying size this would lead to a 
> reduced overall processing time as it prevents large (read: long processing 
> time) tables to be processed in the last step. While processing large tables 
> at first and filling up "processing slots/jobs" when they get free with 
> smaller tables one after the other would safe overall execution time.
Good point, I have made the change and attached the modified patch.


Regards,
Dilip


vacuumdb_parallel_v2.patch
Description: vacuumdb_parallel_v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2013-11-08 Thread Dilip kumar
 On 08 November 2013 03:22, Euler Taveira Wrote

> On 07-11-2013 09:42, Dilip kumar wrote:
> 
> Dilip, this is on my TODO for 9.4. I've already had a half-backed patch
> for it. Let's see what I can come up with.

Ok, Let me know if I can contribute to this..

> > Is it required to move the common code for parallel operation of
> pg_dump and vacuumdb to one place and reuse it ?
> >
> I'm not sure about that because the pg_dump parallel code is tight to
> TOC entry. Also, dependency matters for pg_dump while in the scripts
> case, an order to be choosen will be used. However, vacuumdb can share
> the parallel code with clusterdb and reindexdb (my patch does it).

+1

> 
> Of course, a refactor to unify parallel code (pg_dump and scripts) can
> be done in a separate patch.
> 
> > Prototype patch is attached in the mail, please provide your
> feedback/Suggestions...
> >
> I'll try to merge your patch with the one I have here until the next CF.

Regards,
Dilip


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2013-11-08 Thread Jan Lentfer
 

Am 07.11.2013 12:42, schrieb Dilip kumar: 

> This patch
implementing the following TODO item 
> 
> Allow parallel cores to be
used by vacuumdb 
> 
>
http://www.postgresql.org/message-id/4f10a728.7090...@agliodbs.com [1]

> 
> Like Parallel pg_dump, vacuumdb is provided with the option to run
the vacuum of multiple tables in parallel. [ VACUUMDB –J ] 
> 
> 1. One
new option is provided with vacuumdb to give the number of workers. 
>

> 2. All worker will be started in beginning and all will be waiting
for the vacuum instruction from the master. 
> 
> 3. Now, if table list
is provided in vacuumdb command using -t then, it will send the vacuum
of one table to one of the IDLE worker, next table to next IDLE worker
and so on. 
> 
> 4. If vacuum is given for one DB then, it will execute
select on pg_class to get the table list and fetch the table name one by
one and also assign the vacuum responsibility to IDLE workers. 
> 
>
[...]

For this use case, would it make sense to queue work (tables) in
order of their size, starting on the largest one? 

For the case where
you have tables of varying size this would lead to a reduced overall
processing time as it prevents large (read: long processing time) tables
to be processed in the last step. While processing large tables at first
and filling up "processing slots/jobs" when they get free with smaller
tables one after the other would safe overall execution time. 

Regards


Jan 

-- 
professional: http://www.oscar-consult.de



Links:
--
[1]
http://www.postgresql.org/message-id/4f10a728.7090...@agliodbs.com


Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]

2013-11-07 Thread Euler Taveira
On 07-11-2013 09:42, Dilip kumar wrote:

Dilip, this is on my TODO for 9.4. I've already had a half-backed patch
for it. Let's see what I can come up with.

> Is it required to move the common code for parallel operation of pg_dump and 
> vacuumdb to one place and reuse it ?
> 
I'm not sure about that because the pg_dump parallel code is tight to
TOC entry. Also, dependency matters for pg_dump while in the scripts
case, an order to be choosen will be used. However, vacuumdb can share
the parallel code with clusterdb and reindexdb (my patch does it).

Of course, a refactor to unify parallel code (pg_dump and scripts) can
be done in a separate patch.

> Prototype patch is attached in the mail, please provide your 
> feedback/Suggestions...
> 
I'll try to merge your patch with the one I have here until the next CF.


-- 
   Euler Taveira   Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


<    1   2