On 27/09/14 01:36, Alvaro Herrera wrote:
Amit Kapila wrote:

Today while again thinking about the startegy used in patch to
parallelize the operation (vacuum database), I think we can
improve the same for cases when number of connections are
lesser than number of tables in database (which I presume
will normally be the case).  Currently we are sending command
to vacuum one table per connection, how about sending multiple
commands (example Vacuum t1; Vacuum t2) on one connection.
It seems to me there is extra roundtrip for cases when there
are many small tables in database and few large tables.  Do
you think we should optimize for any such cases?
I don't think this is a good idea; at least not in a first cut of this
patch.  It's easy to imagine that a table you initially think is small
enough turns out to have grown much larger since last analyze.  In that
case, putting one worker to process that one together with some other
table could end up being bad for parallelism, if later it turns out that
some other worker has no table to process.  (Table t2 in your example
could grown between the time the command is sent and t1 is vacuumed.)

It's simpler to have workers do one thing at a time only.

I don't think it's a very good idea to call pg_relation_size() on every
table in the database from vacuumdb.

Curious: would it be both feasible and useful to have multiple workers process a 'large' table, without complicating things too much? The could each start at a different position in the file.


Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to