I repeat your test on ProLiant DL580 Gen9 with Xeon E7-8890 v3. pgbench -s 100 and command vacuum pgbench_acounts after 10_000 transactions:
with: alter system set vacuum_cost_delay to DEFAULT; parallel_vacuum_workers | time 1 | 138.703,263 ms 2 | 83.751,064 ms 4 | 66.105,861 ms 8 | 59.820,171 ms with: alter system set vacuum_cost_delay to 1; parallel_vacuum_workers | time 1 | 127.210,896 ms 2 | 75.300,278 ms 4 | 64.253,087 ms 8 | 60.130,953 --- Dmitry Vasilyev Postgres Professional: http://www.postgrespro.ru The Russian Postgres Company 2016-08-23 14:02 GMT+03:00 Masahiko Sawada <sawada.m...@gmail.com>: > Hi all, > > I'd like to propose block level parallel VACUUM. > This feature makes VACUUM possible to use multiple CPU cores. > > Vacuum Processing Logic > =================== > > PostgreSQL VACUUM processing logic consists of 2 phases, > 1. Collecting dead tuple locations on heap. > 2. Reclaiming dead tuples from heap and indexes. > These phases 1 and 2 are executed alternately, and once amount of dead > tuple location reached maintenance_work_mem in phase 1, phase 2 will > be executed. > > Basic Design > ========== > > As for PoC, I implemented parallel vacuum so that each worker > processes both 1 and 2 phases for particular block range. > Suppose we vacuum 1000 blocks table with 4 workers, each worker > processes 250 consecutive blocks in phase 1 and then reclaims dead > tuples from heap and indexes (phase 2). > To use visibility map efficiency, each worker scan particular block > range of relation and collect dead tuple locations. > After each worker finished task, the leader process gathers these > vacuum statistics information and update relfrozenxid if possible. > > I also changed the buffer lock infrastructure so that multiple > processes can wait for cleanup lock on a buffer. > And the new GUC parameter vacuum_parallel_workers controls the number > of vacuum workers. > > Performance(PoC) > ========= > > I ran parallel vacuum on 13GB table (pgbench scale 1000) with several > workers (on my poor virtual machine). > The result is, > > 1. Vacuum whole table without index (disable page skipping) > 1 worker : 33 sec > 2 workers : 27 sec > 3 workers : 23 sec > 4 workers : 22 sec > > 2. Vacuum table and index (after 10000 transaction executed) > 1 worker : 12 sec > 2 workers : 49 sec > 3 workers : 54 sec > 4 workers : 53 sec > > As a result of my test, since multiple process could frequently try to > acquire the cleanup lock on same index buffer, execution time of > parallel vacuum got worse. > And it seems to be effective for only table vacuum so far, but is not > improved as expected (maybe disk bottleneck). > > Another Design > ============ > ISTM that processing index vacuum by multiple process is not good idea > in most cases because many index items can be stored in a page and > multiple vacuum worker could try to require the cleanup lock on the > same index buffer. > It's rather better that multiple workers process particular block > range and then multiple workers process each particular block range, > and then one worker per index processes index vacuum. > > Still lots of work to do but attached PoC patch. > Feedback and suggestion are very welcome. > > Regards, > > -- > Masahiko Sawada > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >