Re: [PATCHES] Synchronized scans

2007-06-10 Thread Jeff Davis
On Sat, 2007-06-09 at 09:58 -0400, Tom Lane wrote:
 Jeff Davis [EMAIL PROTECTED] writes:
   * For a large table, do lazy_scan_heap, scan_heap, and a sequential
  scan usually progress at approximately the same rate?
 
 scan_heap would probably be faster than a regular seqscan, since it
 isn't doing any where-clause-checking or data output.  Except if you've
 got vacuum-cost-limit enabled, which I think is likely to be true by
 default in future.  Another problem is that lazy_scan_heap stops every
 so often to make a pass over the table's indexes, which'd certainly
 cause it to fall out of sync with more typical seqscans.

I think that these problems are significant enough that I'm not sure
sync-scanning a VACUUM is the right way to approach the problem.

Maybe a better solution would be to try to get a sequential scan to do
some of the work required by a VACUUM. I don't think we can stop in the
middle of a sequential scan to vacuum the indexes, but perhaps we could
come up with some kind of scheme. It would be cheaper (perhaps) to spill
the list of deletable TIDs to disk than to rescan a big (mostly live)
table later. And if it was costly, we wouldn't need to do the scan part
of a VACUUM on every sequential scan.

I'm sure this has been brought up before, does someone have a pointer to
a discussion about doing VACUUM-like work in a sequential scan?

Regards,
Jeff Davis


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PATCHES] Synchronized scans

2007-06-10 Thread Heikki Linnakangas

Tom Lane wrote:

Jeff Davis [EMAIL PROTECTED] writes:

 * Just adding in the syncscan to scan_heap and lazy_scan_heap seems
very easy at first thought. Are there any complications that I'm
missing?


I believe there are assumptions buried in both full and lazy vacuum that
blocks are scanned in increasing order.  Not sure how hard that would be
to fix or work around.  The only one I can specifically recall in lazy
vacuum is that we assume the list of deletable TIDs is sorted a priori.
Possibly you could deal with that by forcing an index-vacuum pass at the
instant where the scan would wrap around, so that the list could be
cleared before putting any lower-numbered blocks into it.


In this case, we're still scanning the table in increasing order, the 
zero-point is just shifted. We can still do a binary search if we do it 
in a whacky module-arithmetic fashion.


I believe TID list ordering is the only reason why we need to scan in order.

I don't think sync-scanning vacuum is worth pursuing, though, because of 
the other issues: index scans, vacuum cost accounting, and the fact that 
the 2nd pass would be harder to synchronize. There's a lot of other 
interesting ideas for vacuum that are more generally applicable.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Synchronized scans

2007-06-10 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 I don't think sync-scanning vacuum is worth pursuing, though, because of the
 other issues: index scans, vacuum cost accounting, and the fact that the 2nd
 pass would be harder to synchronize. There's a lot of other interesting ideas
 for vacuum that are more generally applicable.

I think we could probably arrange for vacuum to synchronize. If there's one
sequential scan running we have to imagine there are others coming along soon
too. so if we become desynchronized we'll just coerce the next one to start
where we want and follow it for a while.

However I have a another worry. Even if we did manage to get vacuum
synchronizing well what would it do to the sequential scan performance.
Instead of zipping along reading clean blocks into its small ring buffer and
discarding them when it's done it'll suddenly find many of its blocks dirty
when it goes to reuse them. Effectively we'll have just reinvented the problem
we had with vacuum previously albeit in a way which only hits sequential scans
particularly hard.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] Synchronized scans

2007-06-10 Thread Tom Lane
Jeff Davis [EMAIL PROTECTED] writes:
 I'm sure this has been brought up before, does someone have a pointer to
 a discussion about doing VACUUM-like work in a sequential scan?

Yeah, it's been discussed before; try looking for incremental vacuum
and such phrases.

The main stumbling block is cleaning out index entries for the
known-dead heap tuple.  The current VACUUM design amortizes that cost
across as many dead heap tuples as it can manage; doing it retail seems
inevitably to be a lot more expensive.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] Synchronized scans

2007-06-10 Thread Alvaro Herrera
Tom Lane wrote:
 Jeff Davis [EMAIL PROTECTED] writes:
  I'm sure this has been brought up before, does someone have a pointer to
  a discussion about doing VACUUM-like work in a sequential scan?
 
 Yeah, it's been discussed before; try looking for incremental vacuum
 and such phrases.
 
 The main stumbling block is cleaning out index entries for the
 known-dead heap tuple.  The current VACUUM design amortizes that cost
 across as many dead heap tuples as it can manage; doing it retail seems
 inevitably to be a lot more expensive.

Maybe what we could do is have a seqscan save known-dead tuple IDs in a
file, and then in a different operation (initiated by autovacuum) we
would remove those TIDs from indexes, before the regular heap scan.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings