Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update
Just for the humor of it, as well as to confirm Nick's perspective, years ago on our inhouse developed Burroughs mainframe dbms, we had a process called garbage collect. Nicolai Tufar wrote: I always wandered if VACUUM is the right name for the porcess. Now, when PostgreSQL is actively challenging in Enterprise space, it might be a good idea to give it a more enterprise-like name. Try to think how it is looking for an outside person to see us, database professionals hold lenghty discussions about the ways we vacuum a database. Why should you need to vacuum a database? Is it dirty? In my personal opinion, something like space reclaiming daemon, free-list organizer, tuple recyle job or segment coalesce process would sound more business-like . Regards, Nick - Original Message - From: Bruce Momjian [EMAIL PROTECTED] To: Curtis Faith [EMAIL PROTECTED] Cc: Tom Lane [EMAIL PROTECTED]; Ron Johnson [EMAIL PROTECTED]; PgSQL Performance ML [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 9:09 PM Subject: Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update Good ideas. I think the master solution is to hook the statistics daemon information into an automatic vacuum that could _know_ which tables need attention. -- - Curtis Faith wrote: tom lane wrote: Sure, it's just shuffling the housekeeping work from one place to another. The thing that I like about Postgres' approach is that we put the housekeeping in a background task (VACUUM) rather than in the critical path of foreground transaction commit. Thinking with my marketing hat on, MVCC would be a much bigger win if VACUUM was not required (or was done automagically). The need for periodic VACUUM just gives ammunition to the PostgreSQL opponents who can claim we are deferring work but that it amounts to the same thing. A fully automatic background VACUUM will significantly reduce but will not eliminate this perceived weakness. However, it always seemed to me there should be some way to reuse the space more dynamically and quickly than a background VACUUM thereby reducing the percentage of tuples that are expired in heavy update cases. If only a very tiny number of tuples on the disk are expired this will reduce the aggregate performance/space penalty of MVCC into insignificance for the majority of uses. Couldn't we reuse tuple and index space as soon as there are no transactions that depend on the old tuple or index values. I have imagined that this was always part of the long-term master plan. Couldn't we keep a list of dead tuples in shared memory and look in the list first when deciding where to place new values for inserts or updates so we don't have to rely on VACUUM (even a background one)? If there are expired tuple slots in the list these would be used before allocating a new slot from the tuple heap. The only issue is determining the lowest transaction ID for in-process transactions which seems relatively easy to do (if it's not already done somewhere). In the normal shutdown and startup case, a tuple VACUUM could be performed automatically. This would normally be very fast since there would not be many tuples in the list. Index slots would be handled differently since these cannot be substituted one for another. However, these could be recovered as part of every index page update. Pages would be scanned before being written and any expired slots that had transaction ID's lower than the lowest active slot would be removed. This could be done for non-leaf pages as well and would result in only reorganizing a page that is already going to be written thereby not adding much to the overall work. I don't think that internal pages that contain pointers to values in nodes further down the tree that are no longer in the leaf nodes because of this partial expired entry elimination will cause a problem since searches and scans will still work fine. Does VACUUM do something that could not be handled in this realtime manner? - Curtis ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end
Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update
I always wandered if VACUUM is the right name for the porcess. Now, when PostgreSQL is actively challenging in Enterprise space, it might be a good idea to give it a more enterprise-like name. Try to think how it is looking for an outside person to see us, database professionals hold lenghty discussions about the ways we vacuum a database. Why should you need to vacuum a database? Is it dirty? In my personal opinion, something like space reclaiming daemon, free-list organizer, tuple recyle job or segment coalesce process would sound more business-like . Regards, Nick - Original Message - From: Bruce Momjian [EMAIL PROTECTED] To: Curtis Faith [EMAIL PROTECTED] Cc: Tom Lane [EMAIL PROTECTED]; Ron Johnson [EMAIL PROTECTED]; PgSQL Performance ML [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 9:09 PM Subject: Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update Good ideas. I think the master solution is to hook the statistics daemon information into an automatic vacuum that could _know_ which tables need attention. -- - Curtis Faith wrote: tom lane wrote: Sure, it's just shuffling the housekeeping work from one place to another. The thing that I like about Postgres' approach is that we put the housekeeping in a background task (VACUUM) rather than in the critical path of foreground transaction commit. Thinking with my marketing hat on, MVCC would be a much bigger win if VACUUM was not required (or was done automagically). The need for periodic VACUUM just gives ammunition to the PostgreSQL opponents who can claim we are deferring work but that it amounts to the same thing. A fully automatic background VACUUM will significantly reduce but will not eliminate this perceived weakness. However, it always seemed to me there should be some way to reuse the space more dynamically and quickly than a background VACUUM thereby reducing the percentage of tuples that are expired in heavy update cases. If only a very tiny number of tuples on the disk are expired this will reduce the aggregate performance/space penalty of MVCC into insignificance for the majority of uses. Couldn't we reuse tuple and index space as soon as there are no transactions that depend on the old tuple or index values. I have imagined that this was always part of the long-term master plan. Couldn't we keep a list of dead tuples in shared memory and look in the list first when deciding where to place new values for inserts or updates so we don't have to rely on VACUUM (even a background one)? If there are expired tuple slots in the list these would be used before allocating a new slot from the tuple heap. The only issue is determining the lowest transaction ID for in-process transactions which seems relatively easy to do (if it's not already done somewhere). In the normal shutdown and startup case, a tuple VACUUM could be performed automatically. This would normally be very fast since there would not be many tuples in the list. Index slots would be handled differently since these cannot be substituted one for another. However, these could be recovered as part of every index page update. Pages would be scanned before being written and any expired slots that had transaction ID's lower than the lowest active slot would be removed. This could be done for non-leaf pages as well and would result in only reorganizing a page that is already going to be written thereby not adding much to the overall work. I don't think that internal pages that contain pointers to values in nodes further down the tree that are no longer in the leaf nodes because of this partial expired entry elimination will cause a problem since searches and scans will still work fine. Does VACUUM do something that could not be handled in this realtime manner? - Curtis ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update
Or just reorg. Am Mittwoch, 27. November 2002 15:02 schrieb Nicolai Tufar: I always wandered if VACUUM is the right name for the porcess. Now, when PostgreSQL is actively challenging in Enterprise space, it might be a good idea to give it a more enterprise-like name. Try to think how it is looking for an outside person to see us, database professionals hold lenghty discussions about the ways we vacuum a database. Why should you need to vacuum a database? Is it dirty? In my personal opinion, something like space reclaiming daemon, free-list organizer, tuple recyle job or segment coalesce process would sound more business-like . Regards, Nick - Original Message - From: Bruce Momjian [EMAIL PROTECTED] To: Curtis Faith [EMAIL PROTECTED] Cc: Tom Lane [EMAIL PROTECTED]; Ron Johnson [EMAIL PROTECTED]; PgSQL Performance ML [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Tuesday, November 26, 2002 9:09 PM Subject: Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update Good ideas. I think the master solution is to hook the statistics daemon information into an automatic vacuum that could _know_ which tables need attention. - - - Curtis Faith wrote: tom lane wrote: Sure, it's just shuffling the housekeeping work from one place to another. The thing that I like about Postgres' approach is that we put the housekeeping in a background task (VACUUM) rather than in the critical path of foreground transaction commit. Thinking with my marketing hat on, MVCC would be a much bigger win if VACUUM was not required (or was done automagically). The need for periodic VACUUM just gives ammunition to the PostgreSQL opponents who can claim we are deferring work but that it amounts to the same thing. A fully automatic background VACUUM will significantly reduce but will not eliminate this perceived weakness. However, it always seemed to me there should be some way to reuse the space more dynamically and quickly than a background VACUUM thereby reducing the percentage of tuples that are expired in heavy update cases. If only a very tiny number of tuples on the disk are expired this will reduce the aggregate performance/space penalty of MVCC into insignificance for the majority of uses. Couldn't we reuse tuple and index space as soon as there are no transactions that depend on the old tuple or index values. I have imagined that this was always part of the long-term master plan. Couldn't we keep a list of dead tuples in shared memory and look in the list first when deciding where to place new values for inserts or updates so we don't have to rely on VACUUM (even a background one)? If there are expired tuple slots in the list these would be used before allocating a new slot from the tuple heap. The only issue is determining the lowest transaction ID for in-process transactions which seems relatively easy to do (if it's not already done somewhere). In the normal shutdown and startup case, a tuple VACUUM could be performed automatically. This would normally be very fast since there would not be many tuples in the list. Index slots would be handled differently since these cannot be substituted one for another. However, these could be recovered as part of every index page update. Pages would be scanned before being written and any expired slots that had transaction ID's lower than the lowest active slot would be removed. This could be done for non-leaf pages as well and would result in only reorganizing a page that is already going to be written thereby not adding much to the overall work. I don't think that internal pages that contain pointers to values in nodes further down the tree that are no longer in the leaf nodes because of this partial expired entry elimination will cause a problem since searches and scans will still work fine. Does VACUUM do something that could not be handled in this realtime manner? - Curtis ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Dr. Eckhardt
Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update
-Original Message- From: Nicolai Tufar [mailto:[EMAIL PROTECTED]] Sent: 27 November 2002 14:02 To: [EMAIL PROTECTED]; PgSQL Performance ML Subject: Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update I always wandered if VACUUM is the right name for the porcess. Now, when PostgreSQL is actively challenging in Enterprise space, it might be a good idea to give it a more enterprise-like name. Try to think how it is looking for an outside person to see us, database professionals hold lenghty discussions about the ways we vacuum a database. Why should you need to vacuum a database? Is it dirty? In my personal opinion, something like space reclaiming daemon, free-list organizer, tuple recyle job or segment coalesce process would sound more business-like . As inspired by the SQL Server Enterprise Manager I've just been swearing at: Database Optimizer Regards, Dave. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update
tom lane wrote: Sure, it's just shuffling the housekeeping work from one place to another. The thing that I like about Postgres' approach is that we put the housekeeping in a background task (VACUUM) rather than in the critical path of foreground transaction commit. Thinking with my marketing hat on, MVCC would be a much bigger win if VACUUM was not required (or was done automagically). The need for periodic VACUUM just gives ammunition to the PostgreSQL opponents who can claim we are deferring work but that it amounts to the same thing. A fully automatic background VACUUM will significantly reduce but will not eliminate this perceived weakness. However, it always seemed to me there should be some way to reuse the space more dynamically and quickly than a background VACUUM thereby reducing the percentage of tuples that are expired in heavy update cases. If only a very tiny number of tuples on the disk are expired this will reduce the aggregate performance/space penalty of MVCC into insignificance for the majority of uses. Couldn't we reuse tuple and index space as soon as there are no transactions that depend on the old tuple or index values. I have imagined that this was always part of the long-term master plan. Couldn't we keep a list of dead tuples in shared memory and look in the list first when deciding where to place new values for inserts or updates so we don't have to rely on VACUUM (even a background one)? If there are expired tuple slots in the list these would be used before allocating a new slot from the tuple heap. The only issue is determining the lowest transaction ID for in-process transactions which seems relatively easy to do (if it's not already done somewhere). In the normal shutdown and startup case, a tuple VACUUM could be performed automatically. This would normally be very fast since there would not be many tuples in the list. Index slots would be handled differently since these cannot be substituted one for another. However, these could be recovered as part of every index page update. Pages would be scanned before being written and any expired slots that had transaction ID's lower than the lowest active slot would be removed. This could be done for non-leaf pages as well and would result in only reorganizing a page that is already going to be written thereby not adding much to the overall work. I don't think that internal pages that contain pointers to values in nodes further down the tree that are no longer in the leaf nodes because of this partial expired entry elimination will cause a problem since searches and scans will still work fine. Does VACUUM do something that could not be handled in this realtime manner? - Curtis ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update
Curtis Faith [EMAIL PROTECTED] writes: tom lane wrote: Sure, it's just shuffling the housekeeping work from one place to another. The thing that I like about Postgres' approach is that we put the housekeeping in a background task (VACUUM) rather than in the critical path of foreground transaction commit. Couldn't we reuse tuple and index space as soon as there are no transactions that depend on the old tuple or index values. I have imagined that this was always part of the long-term master plan. Couldn't we keep a list of dead tuples in shared memory and look in the list first when deciding where to place new values for inserts or updates so we don't have to rely on VACUUM (even a background one)? ISTM that either of these ideas would lead to pushing VACUUM overhead into the foreground transactions, which is exactly what we don't want to do. Keep in mind also that shared memory is finite ... *very* finite. It's bad enough trying to keep per-page status in there (cf FSM) --- per-tuple status is right out. I agree that automatic background VACUUMing would go a long way towards reducing operational problems. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] [HACKERS] Realtime VACUUM, was: performance of insert/delete/update
Good ideas. I think the master solution is to hook the statistics daemon information into an automatic vacuum that could _know_ which tables need attention. --- Curtis Faith wrote: tom lane wrote: Sure, it's just shuffling the housekeeping work from one place to another. The thing that I like about Postgres' approach is that we put the housekeeping in a background task (VACUUM) rather than in the critical path of foreground transaction commit. Thinking with my marketing hat on, MVCC would be a much bigger win if VACUUM was not required (or was done automagically). The need for periodic VACUUM just gives ammunition to the PostgreSQL opponents who can claim we are deferring work but that it amounts to the same thing. A fully automatic background VACUUM will significantly reduce but will not eliminate this perceived weakness. However, it always seemed to me there should be some way to reuse the space more dynamically and quickly than a background VACUUM thereby reducing the percentage of tuples that are expired in heavy update cases. If only a very tiny number of tuples on the disk are expired this will reduce the aggregate performance/space penalty of MVCC into insignificance for the majority of uses. Couldn't we reuse tuple and index space as soon as there are no transactions that depend on the old tuple or index values. I have imagined that this was always part of the long-term master plan. Couldn't we keep a list of dead tuples in shared memory and look in the list first when deciding where to place new values for inserts or updates so we don't have to rely on VACUUM (even a background one)? If there are expired tuple slots in the list these would be used before allocating a new slot from the tuple heap. The only issue is determining the lowest transaction ID for in-process transactions which seems relatively easy to do (if it's not already done somewhere). In the normal shutdown and startup case, a tuple VACUUM could be performed automatically. This would normally be very fast since there would not be many tuples in the list. Index slots would be handled differently since these cannot be substituted one for another. However, these could be recovered as part of every index page update. Pages would be scanned before being written and any expired slots that had transaction ID's lower than the lowest active slot would be removed. This could be done for non-leaf pages as well and would result in only reorganizing a page that is already going to be written thereby not adding much to the overall work. I don't think that internal pages that contain pointers to values in nodes further down the tree that are no longer in the leaf nodes because of this partial expired entry elimination will cause a problem since searches and scans will still work fine. Does VACUUM do something that could not be handled in this realtime manner? - Curtis ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html