Re: [PATCHES] Synchronized scans
On Sat, 2007-06-09 at 09:58 -0400, Tom Lane wrote: Jeff Davis [EMAIL PROTECTED] writes: * For a large table, do lazy_scan_heap, scan_heap, and a sequential scan usually progress at approximately the same rate? scan_heap would probably be faster than a regular seqscan, since it isn't doing any where-clause-checking or data output. Except if you've got vacuum-cost-limit enabled, which I think is likely to be true by default in future. Another problem is that lazy_scan_heap stops every so often to make a pass over the table's indexes, which'd certainly cause it to fall out of sync with more typical seqscans. I think that these problems are significant enough that I'm not sure sync-scanning a VACUUM is the right way to approach the problem. Maybe a better solution would be to try to get a sequential scan to do some of the work required by a VACUUM. I don't think we can stop in the middle of a sequential scan to vacuum the indexes, but perhaps we could come up with some kind of scheme. It would be cheaper (perhaps) to spill the list of deletable TIDs to disk than to rescan a big (mostly live) table later. And if it was costly, we wouldn't need to do the scan part of a VACUUM on every sequential scan. I'm sure this has been brought up before, does someone have a pointer to a discussion about doing VACUUM-like work in a sequential scan? Regards, Jeff Davis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Synchronized scans
Tom Lane wrote: Jeff Davis [EMAIL PROTECTED] writes: * Just adding in the syncscan to scan_heap and lazy_scan_heap seems very easy at first thought. Are there any complications that I'm missing? I believe there are assumptions buried in both full and lazy vacuum that blocks are scanned in increasing order. Not sure how hard that would be to fix or work around. The only one I can specifically recall in lazy vacuum is that we assume the list of deletable TIDs is sorted a priori. Possibly you could deal with that by forcing an index-vacuum pass at the instant where the scan would wrap around, so that the list could be cleared before putting any lower-numbered blocks into it. In this case, we're still scanning the table in increasing order, the zero-point is just shifted. We can still do a binary search if we do it in a whacky module-arithmetic fashion. I believe TID list ordering is the only reason why we need to scan in order. I don't think sync-scanning vacuum is worth pursuing, though, because of the other issues: index scans, vacuum cost accounting, and the fact that the 2nd pass would be harder to synchronize. There's a lot of other interesting ideas for vacuum that are more generally applicable. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Synchronized scans
Heikki Linnakangas [EMAIL PROTECTED] writes: I don't think sync-scanning vacuum is worth pursuing, though, because of the other issues: index scans, vacuum cost accounting, and the fact that the 2nd pass would be harder to synchronize. There's a lot of other interesting ideas for vacuum that are more generally applicable. I think we could probably arrange for vacuum to synchronize. If there's one sequential scan running we have to imagine there are others coming along soon too. so if we become desynchronized we'll just coerce the next one to start where we want and follow it for a while. However I have a another worry. Even if we did manage to get vacuum synchronizing well what would it do to the sequential scan performance. Instead of zipping along reading clean blocks into its small ring buffer and discarding them when it's done it'll suddenly find many of its blocks dirty when it goes to reuse them. Effectively we'll have just reinvented the problem we had with vacuum previously albeit in a way which only hits sequential scans particularly hard. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] Synchronized scans
Jeff Davis [EMAIL PROTECTED] writes: I'm sure this has been brought up before, does someone have a pointer to a discussion about doing VACUUM-like work in a sequential scan? Yeah, it's been discussed before; try looking for incremental vacuum and such phrases. The main stumbling block is cleaning out index entries for the known-dead heap tuple. The current VACUUM design amortizes that cost across as many dead heap tuples as it can manage; doing it retail seems inevitably to be a lot more expensive. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] Synchronized scans
Tom Lane wrote: Jeff Davis [EMAIL PROTECTED] writes: I'm sure this has been brought up before, does someone have a pointer to a discussion about doing VACUUM-like work in a sequential scan? Yeah, it's been discussed before; try looking for incremental vacuum and such phrases. The main stumbling block is cleaning out index entries for the known-dead heap tuple. The current VACUUM design amortizes that cost across as many dead heap tuples as it can manage; doing it retail seems inevitably to be a lot more expensive. Maybe what we could do is have a seqscan save known-dead tuple IDs in a file, and then in a different operation (initiated by autovacuum) we would remove those TIDs from indexes, before the regular heap scan. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings