[HACKERS] pg_autovacuum bug and feature request
Hi, I've been using pg_autovacuum for a couple of weeks now, and have noticed one weird little bug: sometimes the daemon calculates it used a negative amount of time for the last vacuum it did, and waits no time at all before checking if it needs to run anything again. Sample output: 2411 All DBs checked in: -717533400 usec, will sleep for 30 secs. The 30 secs is only because I ran it like this: pg_autovacuum -d 2 -s 30 -S 0 -t 250 -T 0.01 -U postgres I'm using PostgreSQL 7.3.2 on Debian Linux, kernel 2.4.21-rc3. Also, I'd like to see a way to tell pg_autovacuum which tables it should monitor. I understand most setups would like to have all tables monitored, but on our setup pg_autovacuum is wasting most of it's time (and a fair amount of serverload) vacuuming some large tables (several GB's of data, the vacuums regularly take half an hour per table or something in the very rough vicinity) which doesn't give a large win in performance anyway, while it should be focusing it's efforts on a few intensively used small tables, where frequent vacuums are a much larger win for performance. I vacuum everything nightly anyway, so those large tables can be totally ignored by pg_autovacuum in my setup. As you can see from the weird -t and -T parameters I already tried to make it favor those smaller tables (which get about the same amount of updates as the large tables), but I'm not quite sure I'm doing it the right way. Regards, Vincent van Leeuwen Media Design - http://www.mediadesign.nl/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Detecting corrupted pages earlier
On 2003-04-02 16:18:33 -0500, Tom Lane wrote: Kevin Brown [EMAIL PROTECTED] writes: Hmm...I don't know that I'd want to go that far -- setting this variable could be regarded as a policy decision. Some shops may have very good reason for running with ZERO_DAMAGED_PAGES enabled all the time, but I don't know what those reasons might be. I would buy this argument if I could imagine even a faintly plausible reason for doing that ... but I can't. regards, tom lane I've been following this discussion with great interest, because I actually have a situation where running with ZERO_DAMAGED_PAGES on all the time would be somewhat plausible. We use a PostgreSQL database purely for caching pages for a very busy website. A user changes some stuff which causes a page on the site to change, the HTML for the new page gets generated using the data from another database (containing all the actual data) and the generated HTML is inserted into this PG database. When a page is requested that isn't cached yet, it'll be generated and inserted too. This makes it possible to invalidate the cache-version of a large amount of pages by simply deleting the relevant rows and not spending the time to regenerate all that data immediately (and it makes crashrecovery more robust). We can afford to lose all the data in the cache DB, because it's all generated by using other data anyway. But losing all data would be bad from a performance/uptime perspective, as all the cached data would need to be regenerated (which takes a few days). Also, making backups once a day and restoring such a backup when something goes wrong is also impractical, because in our situation old data is much worse than no data at all. I'm working on a script to detect old data and delete it so a new page will be generated, but that isn't finished yet. Two weeks ago the server running this database screwed up (it crashes pretty badly) and made some data unreadable. Although I was running with fsync on on an ext3 partition (with data=writeback, linux 2.4.20, PG 7.2) some of the PG datafiles got damaged anyway (I blame IDE disks). The damage seemed light enough to keep running with this dataset (it occasionally borked with 'heap_delete: (am)invalid tid', but since our application attempts a delete followed by an insert of newly generated data in case of a db error it would repair itself most of the time). Two crashes later (weirdly patched kernels hooray) the errors got progressively worse ('missing chunk number 0 for toast value 79960605' and stuff like that) so we were forced to shut the website down, dump all the data we could dump (not everything), initdb and restore that dump. This cost us about 10 hours downtime. If I'd had the option I just would've set ZERO_DAMAGED_PAGES to true and let it run for a few days to sort itself out. Alternatively an ALTER TABLE foo ZERO DAMAGED PAGES; would've worked as well, although that would create a small downtime too. I know I'm doing a lot of weird things, and that I could avoid a lot of the problems listed here were I to do things differently, but the fact remains that I actually have a real-life situation where running (for a while at least) with ZERO_DAMAGED_PAGES on makes some kind of sense. Vincent van Leeuwen Media Design - http://www.mediadesign.nl/ ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Detecting corrupted pages earlier
On 2003-04-03 18:40:54 -0500, Tom Lane wrote: Vincent van Leeuwen [EMAIL PROTECTED] writes: ... This cost us about 10 hours downtime. If I'd had the option I just would've set ZERO_DAMAGED_PAGES to true and let it run for a few days to sort itself out. Yikes. If I understand this correctly, you had both critical data and cache data in the same database. As for the cache stuff, a few quick TRUNCATE TABLE commands would have gotten you out of the woods. As for the critical data (the stuff you actually needed to dump and restore), do you really want ZERO_DAMAGED_PAGES on for that? It's a heck of a blunt tool. regards, tom lane No, it wasn't that bad :) The REAL data is on a different server which hasn't let us down so far (and has reliable hardware and software, and backups :)). Only the cache database was hurt. The problem with truncating everything was that rebuilding the cache would cost about 48 hours downtime, as there is A LOT of data to rebuild. This really is an interim solution, things will be constructed much better and more reliable in the future, but for now it's there. Another reason we went for the dump/restore is that we upgraded to 7.3.2 at the same time, which we were postponing because weren't looking forward to that downtime :) Vincent van Leeuwen Media Design - http://www.mediadesign.nl/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Hard problem with concurrency
On 2003-02-18 20:02:29 +0100, Peter Eisentraut wrote: Christopher Kings-Lynne writes: REPLACE INTO anyone? ;) The upcoming SQL 200x standard includes a MERGE command that appears to fulfill that purpose. MySQL features a poor-mans aproach to this problem, their REPLACE command: http://www.mysql.com/doc/en/REPLACE.html REPLACE works exactly like INSERT, except that if an old record in the table has the same value as a new record on a UNIQUE index or PRIMARY KEY, the old record is deleted before the new record is inserted. I'd love to see this kind of functionality in PG, I've got a database that caches data which only gets conditional INSERT/UPDATEs, so that would save a lot of wasted SQL commands. Vincent van Leeuwen Media Design ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])