Jeff Janes wrote: > I would only envision using the parallel feature for vacuumdb after a > pg_upgrade or some other major maintenance window (that is the only > time I ever envision using vacuumdb at all). I don't think autovacuum > can be expected to handle such situations well, as it is designed to > be a smooth background process.
That's a fair point. One thing that would be pretty neat but I don't think I would get anyone to implement it, is having the user control the autovacuum launcher in some way. For instance "please vacuum this set of tables as quickly as possible", and it would launch as many workers are configured. It would take months to get a UI settled for this, however. > I guess the ideal solution would be for manual VACUUM to have a > PARALLEL option, then vacuumdb could just invoke that one table at a > time. That way you would get within-table parallelism which would be > important if one table dominates the entire database cluster. But I > don't foresee that happening any time soon. I see this as a completely different feature, which might also be pretty neat, at least if you're open to spending more I/O bandwidth processing a single table: have several processes scanning the heap simultaneously. Since I think vacuum is mostly I/O bound at the moment, I'm not sure there is much point in this currently. > I don't know how to calibrate the number of lines that is worthwhile. > If you write in C and need to have cross-platform compatibility and > robust error handling, it seems to take hundreds of lines to do much > of anything. The code duplication is a problem, but I don't think > just raw line count is, especially since it has already been written. Well, there are (at least) two types of duplicate code: first you have these common routines such as pgpipe that are duplicates for no good reason. Just move them to src/port or something and it's all good. But the OP said there is code that cannot be shared even though it's very similar in both incarnations. That means we cannot (or it's difficult to) just have one copy, which means as they fix bugs in one copy we need to update the other. This is bad -- witness the situation with ecpg's copy of date/time code, where there are bugs fixed in the backend version but the ecpg version does not have the fix. It's difficult to keep track of these things. > The trend in this project seems to be for shell scripts to eventually > get converted into C programs. In fact, src/bin/scripts now has no > scripts at all. Also it is important to vacuum/analyze tables in the > same database at the same time, otherwise you will not get much > speed-up in the ordinary case where there is only one meaningful > database. Doing that in a shell script would be fairly hard. It > should be pretty easy in Perl (at least for me--I'm sure others > disagree), but that also doesn't seem to be the way we do things for > programs intended for end users. Yeah, shipping shell scripts doesn't work very well for us. I'm thinking perhaps we can have sample scripts in which we show how to use parallel(1) to run multiple vacuumdb's in parallel in Unix and some similar mechanism in Windows, and that's it. So we wouldn't provide the complete toolset, but the platform surely has ways to make it happen. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers