Re: [HACKERS] Vacuum questions...
On Thu, Sep 29, 2005 at 12:50:13AM +0300, Hannu Krosing wrote: > On T, 2005-09-27 at 17:57 -0500, Jim C. Nasby wrote: > > On Tue, Sep 27, 2005 at 02:47:46PM -0400, Jan Wieck wrote: > > > On 9/24/2005 8:17 PM, Jim C. Nasby wrote: > > > > > > >Would it be difficult to vacuum as part of a dump? The reasoning behind > > > >this is that you have to read the table to do the dump anyway, > > > > > > I think aside from what's been said so far, it would be rather difficult > > > anyway. pg_dump relies on MVCC and requires to run in one transaction to > > > see a consistent snapshot while vacuum jiggles around with transactions > > > in some rather non-standard way. > > > > Is this true even if they were in different connections? > > > > My (vague) understanding of the vacuum process is that it first vacuums > > indexes, and then vacuums the heap. > > actually (lazy) vacuum does this > > 1) scan heap, collect ctids of rows to remove > 2) clean indexes > 3) clean heap > > > Since we don't dump indexes, there's > > nothing for backup to do while those are vacuumed, so my idea is: > > > > pg_dump: > > foreach (table) > > spawn vacuum > > wait for vacuum to hit heap > > start copy > > wait for analyze to finish > > next; > > probably the first heap scan of vacuum would go faster than dump as it > does not have to write out anything, and the second scan ( nr 3 in above > list ) would be either faster or slower, as it has to lock each page and > rearrange tuples there. > > so it would be very hard to synchronize vacuum with either of them. Well, I guess it depends on what the dump was writing to. Also depends on available cache I expect. Is this something that could be hacked together fairly easy just for testing purposes? Would firing off a VACUUM tablename at the same time as a COPY tablename be a good enough approximation? -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Vacuum questions...
On T, 2005-09-27 at 17:57 -0500, Jim C. Nasby wrote: > On Tue, Sep 27, 2005 at 02:47:46PM -0400, Jan Wieck wrote: > > On 9/24/2005 8:17 PM, Jim C. Nasby wrote: > > > > >Would it be difficult to vacuum as part of a dump? The reasoning behind > > >this is that you have to read the table to do the dump anyway, > > > > I think aside from what's been said so far, it would be rather difficult > > anyway. pg_dump relies on MVCC and requires to run in one transaction to > > see a consistent snapshot while vacuum jiggles around with transactions > > in some rather non-standard way. > > Is this true even if they were in different connections? > > My (vague) understanding of the vacuum process is that it first vacuums > indexes, and then vacuums the heap. actually (lazy) vacuum does this 1) scan heap, collect ctids of rows to remove 2) clean indexes 3) clean heap > Since we don't dump indexes, there's > nothing for backup to do while those are vacuumed, so my idea is: > > pg_dump: > foreach (table) > spawn vacuum > wait for vacuum to hit heap > start copy > wait for analyze to finish > next; probably the first heap scan of vacuum would go faster than dump as it does not have to write out anything, and the second scan ( nr 3 in above list ) would be either faster or slower, as it has to lock each page and rearrange tuples there. so it would be very hard to synchronize vacuum with either of them. -- Hannu Krosing <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Vacuum questions...
On Tue, Sep 27, 2005 at 07:12:21PM -0400, Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > AFAIK, this should allow both to run in seperate transactions. > > ... and pretty much destroy any synchronization between the two scans, > which was sort of the point wasn't it? Aren't there ways to sync them outside of a transaction? My theory is that you don't need to syncronize them at the tuple level, since whichever one gets ahead reading the HEAP will be pulling the data off the drive, while the one that's behind will just grab it out of the buffer (or at worst, the kernel's cache). So all you should need to do is start both scans at about (as in within a few seconds) the same time. Heck, if vacuum was made to put more verbose info in it's process status then it could be as simple as having pg_dump start a vacuum of a table in a seperate connection and just watching for the status to indicate it had started vacuuming the table. I *think* this shouldn't be too hard to test, which is good since it's all theory right now. :) -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Vacuum questions...
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > AFAIK, this should allow both to run in seperate transactions. ... and pretty much destroy any synchronization between the two scans, which was sort of the point wasn't it? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Vacuum questions...
On Tue, Sep 27, 2005 at 02:47:46PM -0400, Jan Wieck wrote: > On 9/24/2005 8:17 PM, Jim C. Nasby wrote: > > >Would it be difficult to vacuum as part of a dump? The reasoning behind > >this is that you have to read the table to do the dump anyway, > > I think aside from what's been said so far, it would be rather difficult > anyway. pg_dump relies on MVCC and requires to run in one transaction to > see a consistent snapshot while vacuum jiggles around with transactions > in some rather non-standard way. Is this true even if they were in different connections? My (vague) understanding of the vacuum process is that it first vacuums indexes, and then vacuums the heap. Since we don't dump indexes, there's nothing for backup to do while those are vacuumed, so my idea is: pg_dump: foreach (table) spawn vacuum wait for vacuum to hit heap start copy wait for analyze to finish next; dump_vacuum (table): foreach (index on table) vacuum index next; notify pg_dump we're going to start vacuum of heap vacuum heap if we should analyze { analyze table } notify pg_dump analyze is done exit AFAIK, this should allow both to run in seperate transactions. Granted, it would slow down the dump, since it would have to wait while indexes were being vacuumed, but it would win when it came to the heap. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Vacuum questions...
On 9/24/2005 8:17 PM, Jim C. Nasby wrote: Would it be difficult to vacuum as part of a dump? The reasoning behind this is that you have to read the table to do the dump anyway, I think aside from what's been said so far, it would be rather difficult anyway. pg_dump relies on MVCC and requires to run in one transaction to see a consistent snapshot while vacuum jiggles around with transactions in some rather non-standard way. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] # ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Vacuum questions...
Gaetano Mendola wrote: > Alvaro Herrera wrote: >> On Mon, Sep 26, 2005 at 05:41:24PM +0200, Gaetano Mendola wrote: >>> Joshua D. Drake wrote: Autovacuum is integrated into the backend for 8.1 >>> Can I set the autovacuum parameter per table instead of per >>> engine ? >> Yes. > Reading the 8.1 release note I found: Move /contrib/pg_autovacuum into the main server (Alvaro) Integrating autovacuum into the server allows it to be automatically started and stopped in sync with the database server, and allows autovacuum to be configured from postgresql.conf. May be it could be useles mention that was not exactly pg_autovacuum moved because for example you can now set parameter per table and pg_autvacuum did not. Regards Gaetano Mendola ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Vacuum questions...
Alvaro Herrera wrote: > On Mon, Sep 26, 2005 at 05:41:24PM +0200, Gaetano Mendola wrote: >> Joshua D. Drake wrote: > >>> Autovacuum is integrated into the backend for 8.1 >> Can I set the autovacuum parameter per table instead of per >> engine ? > > Yes. Finally :-) good work. Regards Gaetano Mendola ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Vacuum questions...
On Sat, Sep 24, 2005 at 08:25:30PM -0700, Joshua D. Drake wrote: > Jim C. Nasby wrote: > > >Would it be difficult to vacuum as part of a dump? The reasoning behind > >this is that you have to read the table to do the dump anyway, so it > >would be a good time to be able to piggy-back other operations that need > >to read the entire table on top. I know vacuuming of indexes complicates > >this, so it's probably not as simple as just firing off a vacuum and > >copy at the same time (although that idea is probably worth testing, > >since it might still be a win). > > > > > This would be a nightmare on a large database. Think of how > long it takes to dump 20 gig, now add how long it is going to > take to vacuum that size of DB, now think about a 500 gig > database. What says that the length of the dump is limited by the disks the database is on? I suspect in many cases it's not. Anyway, this would certainly be an optional step, so if it turns out it hurts you in your environment, you don't have to use it. > Actually this also probably would not gain you much in 8.1 > as vacuum in theory is already dealing with itself. Maybe, maybe not. If you're already reading the entire table to dump it, why read the entire table again some other time to vacuum it. Just get it all done at once. Whenever we get a 'map of pages that need vacuuming' which will presumably greatly speed up vacuum then maybe your argument makes sense. Right now I don't see how it helps anything. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Vacuum questions...
On Sun, Sep 25, 2005 at 11:50:14AM -0400, Tom Lane wrote: > Hannu Krosing <[EMAIL PROTECTED]> writes: > > Perhaps VACUUM could send some statistics after each N pages and this > > would then be available through something similar to pg_statistics > > table. > > Why not just have it send some text to be displayed in the "current > command" field of pg_stat_activity? The infrastructure is all there > already for that. If someone wanted to write a nice interface showing the status of a vacuum it would be easier if they didn't have to parse a text field... but *anything* would be a vast improvement over what we have now. newbie TODO? -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Vacuum questions...
On Mon, Sep 26, 2005 at 05:41:24PM +0200, Gaetano Mendola wrote: > Joshua D. Drake wrote: > > Autovacuum is integrated into the backend for 8.1 > > Can I set the autovacuum parameter per table instead of per > engine ? Yes. -- Alvaro Herrera Architect, http://www.EnterpriseDB.com Jude: I wish humans laid eggs Ringlord: Why would you want humans to lay eggs? Jude: So I can eat them ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Vacuum questions...
Joshua D. Drake wrote: > Hannu Krosing wrote: > >> On L, 2005-09-24 at 20:25 -0700, Joshua D. Drake wrote: >> >> >> >>> Actually this also probably would not gain you much in 8.1 >>> as vacuum in theory is already dealing with itself. >>> >> >> Interesting. Could you explain it in a more detailed way ? >> How does vacuum "deal with itself" in 8.1 ? >> >> > Autovacuum is integrated into the backend for 8.1 Can I set the autovacuum parameter per table instead of per engine ? I'm using pg_autovacuum right now in 7.4 and is not enough because some tables ( one that implement a materialized view for example ) are out of an average engine usage and other tables are so huge to not be analyzed for months. Regards Gaetano Mendola ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Vacuum questions...
Hannu Krosing <[EMAIL PROTECTED]> writes: > Perhaps VACUUM could send some statistics after each N pages and this > would then be available through something similar to pg_statistics > table. Why not just have it send some text to be displayed in the "current command" field of pg_stat_activity? The infrastructure is all there already for that. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Vacuum questions...
Hannu Krosing wrote: On L, 2005-09-24 at 20:25 -0700, Joshua D. Drake wrote: Actually this also probably would not gain you much in 8.1 as vacuum in theory is already dealing with itself. Interesting. Could you explain it in a more detailed way ? How does vacuum "deal with itself" in 8.1 ? Autovacuum is integrated into the backend for 8.1 Not sure of a "quick scan" approach, espacially for tables big enough for the progress info would be interesting (in my experience a scan is never quick). It would be a seq so on a larger table it would probably be a long time. I was thinking if there was some mapping of known dead rows or something so we didn't have to scan the page for the statistics. We of course would scan to do the actual work but if vacuum cleared the map while doing the work it may not be that bad. Sincerely, Joshua D. Drake Perhaps VACUUM could send some statistics after each N pages and this would then be available through something similar to pg_statistics table. -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Vacuum questions...
On L, 2005-09-24 at 20:25 -0700, Joshua D. Drake wrote: > Actually this also probably would not gain you much in 8.1 > as vacuum in theory is already dealing with itself. Interesting. Could you explain it in a more detailed way ? How does vacuum "deal with itself" in 8.1 ? > >Also, would it be possible to add some means to check the status of a > >running vacuum? Even with vacuum verbose, once it starts in on a large > >table you have no way to know how far along it is. > > > > > That is an interesting thought... Perhaps a quick scan of > the table to see how many dead rows there are? Then check > back every n/10 ? Hmmm... I am not a C guy so I don't know if > that is technically feasible (although probably possible) but it > is interesting from a DBA perspective. Not sure of a "quick scan" approach, espacially for tables big enough for the progress info would be interesting (in my experience a scan is never quick). Perhaps VACUUM could send some statistics after each N pages and this would then be available through something similar to pg_statistics table. -- Hannu Krosing <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Vacuum questions...
On Sat, Sep 24, 2005 at 07:17:38PM -0500, Jim C. Nasby wrote: > Finally, if vacuum_delay is enabled, does vacuum_cost_page_miss consider > a miss as not in the database buffer, or not in the kernel buffer? The database buffer. > I > remember discussions about trying to track IO request times to try and > determine if something came out of kernel buffers or not, but AFAIK > that's all vaporware right now... I don't remember the discussion, but it certainly hasn't been implemented. -- Alvaro Herrerahttp://www.PlanetPostgreSQL.org "Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Vacuum questions...
Jim C. Nasby wrote: Would it be difficult to vacuum as part of a dump? The reasoning behind this is that you have to read the table to do the dump anyway, so it would be a good time to be able to piggy-back other operations that need to read the entire table on top. I know vacuuming of indexes complicates this, so it's probably not as simple as just firing off a vacuum and copy at the same time (although that idea is probably worth testing, since it might still be a win). This would be a nightmare on a large database. Think of how long it takes to dump 20 gig, now add how long it is going to take to vacuum that size of DB, now think about a 500 gig database. Actually this also probably would not gain you much in 8.1 as vacuum in theory is already dealing with itself. When dropping a table or index, is it's space immediately released in the FSM? I would have to double check but I believe you would have to vacuum to reclaim the space to the FSM because the relationship is still there just like when you delete (but not truncate). Also, would it be possible to add some means to check the status of a running vacuum? Even with vacuum verbose, once it starts in on a large table you have no way to know how far along it is. That is an interesting thought... Perhaps a quick scan of the table to see how many dead rows there are? Then check back every n/10 ? Hmmm... I am not a C guy so I don't know if that is technically feasible (although probably possible) but it is interesting from a DBA perspective. Although that could be an issue on a large table as well I think. Finally, if vacuum_delay is enabled, does vacuum_cost_page_miss consider a miss as not in the database buffer, or not in the kernel buffer? I remember discussions about trying to track IO request times to try and determine if something came out of kernel buffers or not, but AFAIK that's all vaporware right now... Good question, anyone else? Sincerely, Joshua D. Drake -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] Vacuum questions...
Would it be difficult to vacuum as part of a dump? The reasoning behind this is that you have to read the table to do the dump anyway, so it would be a good time to be able to piggy-back other operations that need to read the entire table on top. I know vacuuming of indexes complicates this, so it's probably not as simple as just firing off a vacuum and copy at the same time (although that idea is probably worth testing, since it might still be a win). When dropping a table or index, is it's space immediately released in the FSM? Also, would it be possible to add some means to check the status of a running vacuum? Even with vacuum verbose, once it starts in on a large table you have no way to know how far along it is. Finally, if vacuum_delay is enabled, does vacuum_cost_page_miss consider a miss as not in the database buffer, or not in the kernel buffer? I remember discussions about trying to track IO request times to try and determine if something came out of kernel buffers or not, but AFAIK that's all vaporware right now... -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq