[HACKERS] pg_autovacuum bug and feature request

2003-07-04 Thread Vincent van Leeuwen
Hi,

I've been using pg_autovacuum for a couple of weeks now, and have noticed one
weird little bug: sometimes the daemon calculates it used a negative amount of
time for the last vacuum it did, and waits no time at all before checking if
it needs to run anything again. Sample output:

2411 All DBs checked in: -717533400 usec, will sleep for 30 secs.

The 30 secs is only because I ran it like this:
pg_autovacuum -d 2 -s 30 -S 0 -t 250 -T 0.01 -U postgres

I'm using PostgreSQL 7.3.2 on Debian Linux, kernel 2.4.21-rc3.


Also, I'd like to see a way to tell pg_autovacuum which tables it should
monitor. I understand most setups would like to have all tables monitored, but
on our setup pg_autovacuum is wasting most of it's time (and a fair amount of
serverload) vacuuming some large tables (several GB's of data, the vacuums
regularly take half an hour per table or something in the very rough vicinity)
which doesn't give a large win in performance anyway, while it should be
focusing it's efforts on a few intensively used small tables, where frequent
vacuums are a much larger win for performance. I vacuum everything nightly
anyway, so those large tables can be totally ignored by pg_autovacuum in my
setup. As you can see from the weird -t and -T parameters I already tried to
make it favor those smaller tables (which get about the same amount of updates
as the large tables), but I'm not quite sure I'm doing it the right way.


Regards,

Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Detecting corrupted pages earlier

2003-04-03 Thread Vincent van Leeuwen
On 2003-04-02 16:18:33 -0500, Tom Lane wrote:
 Kevin Brown [EMAIL PROTECTED] writes:
  Hmm...I don't know that I'd want to go that far -- setting this
  variable could be regarded as a policy decision.  Some shops may have
  very good reason for running with ZERO_DAMAGED_PAGES enabled all the
  time, but I don't know what those reasons might be.
 
 I would buy this argument if I could imagine even a faintly plausible
 reason for doing that ... but I can't.
 
   regards, tom lane
 

I've been following this discussion with great interest, because I actually
have a situation where running with ZERO_DAMAGED_PAGES on all the time would
be somewhat plausible.

We use a PostgreSQL database purely for caching pages for a very busy website.
A user changes some stuff which causes a page on the site to change, the HTML
for the new page gets generated using the data from another database
(containing all the actual data) and the generated HTML is inserted into this
PG database. When a page is requested that isn't cached yet, it'll be
generated and inserted too. This makes it possible to invalidate the
cache-version of a large amount of pages by simply deleting the relevant rows
and not spending the time to regenerate all that data immediately (and it
makes crashrecovery more robust).

We can afford to lose all the data in the cache DB, because it's all generated
by using other data anyway. But losing all data would be bad from a
performance/uptime perspective, as all the cached data would need to be
regenerated (which takes a few days). Also, making backups once a day and
restoring such a backup when something goes wrong is also impractical, because
in our situation old data is much worse than no data at all. I'm working on a
script to detect old data and delete it so a new page will be generated, but
that isn't finished yet.

Two weeks ago the server running this database screwed up (it crashes pretty
badly) and made some data unreadable. Although I was running with fsync on on
an ext3 partition (with data=writeback, linux 2.4.20, PG 7.2) some of the PG
datafiles got damaged anyway (I blame IDE disks). The damage seemed light
enough to keep running with this dataset (it occasionally borked with
'heap_delete: (am)invalid tid', but since our application attempts a delete
followed by an insert of newly generated data in case of a db error it would
repair itself most of the time). Two crashes later (weirdly patched kernels
hooray) the errors got progressively worse ('missing chunk number 0 for toast
value 79960605' and stuff like that) so we were forced to shut the website
down, dump all the data we could dump (not everything), initdb and restore
that dump. This cost us about 10 hours downtime. If I'd had the option I just
would've set ZERO_DAMAGED_PAGES to true and let it run for a few days to sort
itself out. Alternatively an ALTER TABLE foo ZERO DAMAGED PAGES; would've
worked as well, although that would create a small downtime too.

I know I'm doing a lot of weird things, and that I could avoid a lot of the
problems listed here were I to do things differently, but the fact remains
that I actually have a real-life situation where running (for a while at
least) with ZERO_DAMAGED_PAGES on makes some kind of sense.

Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] Detecting corrupted pages earlier

2003-04-03 Thread Vincent van Leeuwen
On 2003-04-03 18:40:54 -0500, Tom Lane wrote:
 Vincent van Leeuwen [EMAIL PROTECTED] writes:
  ... This cost us about 10 hours downtime. If I'd had the option I just
  would've set ZERO_DAMAGED_PAGES to true and let it run for a few days to sort
  itself out.
 
 Yikes.  If I understand this correctly, you had both critical data and
 cache data in the same database.  As for the cache stuff, a few quick
 TRUNCATE TABLE commands would have gotten you out of the woods.  As for
 the critical data (the stuff you actually needed to dump and restore),
 do you really want ZERO_DAMAGED_PAGES on for that?  It's a heck of a
 blunt tool.
 
   regards, tom lane

No, it wasn't that bad :) The REAL data is on a different server which hasn't
let us down so far (and has reliable hardware and software, and backups :)).
Only the cache database was hurt. The problem with truncating everything was
that rebuilding the cache would cost about 48 hours downtime, as there is A
LOT of data to rebuild. This really is an interim solution, things will be
constructed much better and more reliable in the future, but for now it's
there.

Another reason we went for the dump/restore is that we upgraded to 7.3.2 at
the same time, which we were postponing because weren't looking forward to
that downtime :)

Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] Hard problem with concurrency

2003-02-19 Thread Vincent van Leeuwen
On 2003-02-18 20:02:29 +0100, Peter Eisentraut wrote:
 Christopher Kings-Lynne writes:
 
  REPLACE INTO anyone? ;)
 
 The upcoming SQL 200x standard includes a MERGE command that appears to
 fulfill that purpose.
 

MySQL features a poor-mans aproach to this problem, their REPLACE command:

http://www.mysql.com/doc/en/REPLACE.html
REPLACE works exactly like INSERT, except that if an old record in the table
has the same value as a new record on a UNIQUE index or PRIMARY KEY, the old
record is deleted before the new record is inserted.

I'd love to see this kind of functionality in PG, I've got a database that
caches data which only gets conditional INSERT/UPDATEs, so that would save a
lot of wasted SQL commands.


Vincent van Leeuwen
Media Design


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])