[GENERAL] reserving space in a rec for future update
Hi: I have a situation where I will be inserting thousands of records into a table but leaving 2 of it's columns null. Later on, I will be updating most of those records and putting real values in place of those 2 nulls. As for the ones that do not get updated, I want to leave them null. My concern has to do with record fragmentation at the time of update because there's no room to expand them to accept the non-null data. (BTW, the columns are floating point). Is there a way to initially insert nulls, but reserve space for the future update (and avoid record fragmentation)? Is my record fragmentation concern unfounded? Thanks in Advance !
Re: [GENERAL] reserving space in a rec for future update
On Nov 14, 2007 9:28 AM, Gauthier, Dave [EMAIL PROTECTED] wrote: Hi: I have a situation where I will be inserting thousands of records into a table but leaving 2 of it's columns null. Later on, I will be updating most of those records and putting real values in place of those 2 nulls. As for the ones that do not get updated, I want to leave them null. My concern has to do with record fragmentation at the time of update because there's no room to expand them to accept the non-null data. (BTW, the columns are floating point). I don't think you really understand how PostgreSQL storage works. every update is the exact same as a delete / insert in terms of storage. So, you're worrying about a problem that doesn't exist. Read up Read up on it here: http://www.postgresql.org/docs/8.2/static/mvcc.html ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] reserving space in a rec for future update
On Wed, Nov 14, 2007 at 10:28:30AM -0500, Gauthier, Dave wrote: null. My concern has to do with record fragmentation at the time of update because there's no room to expand them to accept the non-null data. (BTW, the columns are floating point). You have a mistaken idea about how this works. Is there a way to initially insert nulls, but reserve space for the future update (and avoid record fragmentation)? No. Is my record fragmentation concern unfounded? Sort of. The way this will work in Postgres is that, when you UPDATE the row, the old row will be marked dead, and a _new_ row will be written out with the new data. You will need to perform VACUUM in order to keep the table from bloating. You'll want to read the manual carefully about this topic, in order to keep your table from getting so bloated that your free space map becomes useless. One of the weakest areas for PostgreSQL is its behaviour under this sort of most rows updated scenario, and it is wise to plan carefully how you will accomplish these sorts of activities without causing yourself extreme pain. A -- Andrew Sullivan Old sigs will return after re-constitution of blue smoke ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] reserving space in a rec for future update
In this usage scenario, doesn't the new HOT (heap only tuples) feature of PG8.3 help, in terms of the DB requiring less VACUUM maintenance? I am similarly performing a huge number of inserts, followed by a huge number of updates to fill in a few null fields. The data is indexed by insert time. My problem is, selects using that index degrade over time as updates are performed, presumably because data is no longer ordered sequentially across pages after updates are performed. I was hoping that HOT would help here and am actually installing PG8.3 now in order to perform some testing... Mike Andrew Sullivan wrote: On Wed, Nov 14, 2007 at 10:28:30AM -0500, Gauthier, Dave wrote: null. My concern has to do with record fragmentation at the time of update because there's no room to expand them to accept the non-null data. (BTW, the columns are floating point). You have a mistaken idea about how this works. Is there a way to initially insert nulls, but reserve space for the future update (and avoid record fragmentation)? No. Is my record fragmentation concern unfounded? Sort of. The way this will work in Postgres is that, when you UPDATE the row, the old row will be marked dead, and a _new_ row will be written out with the new data. You will need to perform VACUUM in order to keep the table from bloating. You'll want to read the manual carefully about this topic, in order to keep your table from getting so bloated that your free space map becomes useless. One of the weakest areas for PostgreSQL is its behaviour under this sort of most rows updated scenario, and it is wise to plan carefully how you will accomplish these sorts of activities without causing yourself extreme pain. A ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [GENERAL] reserving space in a rec for future update
On Nov 14, 2007, at 10:44 AM, Mike Charnoky wrote: In this usage scenario, doesn't the new HOT (heap only tuples) feature of PG8.3 help, in terms of the DB requiring less VACUUM maintenance? I am similarly performing a huge number of inserts, followed by a huge number of updates to fill in a few null fields. The data is indexed by insert time. My problem is, selects using that index degrade over time as updates are performed, presumably because data is no longer ordered sequentially across pages after updates are performed. I was hoping that HOT would help here and am actually installing PG8.3 now in order to perform some testing... Mike Some, what HOT does is keeps index rows from being updated when updates are made to column values that aren't indexed. The same insert/delete still happens in the table data. Erik Jones Software Developer | Emma® [EMAIL PROTECTED] 800.595.4401 or 615.292.5888 615.292.0777 (fax) Emma helps organizations everywhere communicate market in style. Visit us online at http://www.myemma.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] reserving space in a rec for future update
On Wed, Nov 14, 2007 at 11:31:11AM -0500, Gauthier, Dave wrote: Thanks for the advanced warning about problems with vaccuum ! Note this isn't a _problem_ with vacuum, exactly, it's just the set of compromises that PostgreSQL has settled on. There are other ways of cleaning up the system (defrag, of the sort you were implying is one, pay the cost during transaction is another one, c.). The piper has to be paid, and all we're doing is arguing about what currency we'll use :) A -- Andrew Sullivan Old sigs will return after re-constitution of blue smoke ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [GENERAL] reserving space in a rec for future update
On Wed, Nov 14, 2007 at 11:44:55AM -0500, Mike Charnoky wrote: In this usage scenario, doesn't the new HOT (heap only tuples) feature of PG8.3 help, in terms of the DB requiring less VACUUM maintenance? It should, yes. We'll probably know more once 8.3 is in the field. For new work, though, I would certainly suggest trying this on 8.3. I know that's the _point_ of the feature. But if you've already got an application you need to field today, doing it on a beta is risky. A -- Andrew Sullivan Old sigs will return after re-constitution of blue smoke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [GENERAL] reserving space in a rec for future update
OK, I didn't know Postgres did it this way. I was hoping it would retain the old rec and update in place (if the updated values could fit). I guess not. I can rewrite the DB loading algorithm to get those values in advance, load into program memory, and reference at the time of the initial load. Thanks for the advanced warning about problems with vaccuum ! -dave -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Andrew Sullivan Sent: Wednesday, November 14, 2007 11:13 AM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] reserving space in a rec for future update On Wed, Nov 14, 2007 at 10:28:30AM -0500, Gauthier, Dave wrote: null. My concern has to do with record fragmentation at the time of update because there's no room to expand them to accept the non-null data. (BTW, the columns are floating point). You have a mistaken idea about how this works. Is there a way to initially insert nulls, but reserve space for the future update (and avoid record fragmentation)? No. Is my record fragmentation concern unfounded? Sort of. The way this will work in Postgres is that, when you UPDATE the row, the old row will be marked dead, and a _new_ row will be written out with the new data. You will need to perform VACUUM in order to keep the table from bloating. You'll want to read the manual carefully about this topic, in order to keep your table from getting so bloated that your free space map becomes useless. One of the weakest areas for PostgreSQL is its behaviour under this sort of most rows updated scenario, and it is wise to plan carefully how you will accomplish these sorts of activities without causing yourself extreme pain. A -- Andrew Sullivan Old sigs will return after re-constitution of blue smoke ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org/
Re: [GENERAL] reserving space in a rec for future update
Erik Jones wrote: On Nov 14, 2007, at 10:44 AM, Mike Charnoky wrote: In this usage scenario, doesn't the new HOT (heap only tuples) feature of PG8.3 help, in terms of the DB requiring less VACUUM maintenance? I am similarly performing a huge number of inserts, followed by a huge number of updates to fill in a few null fields. The data is indexed by insert time. My problem is, selects using that index degrade over time as updates are performed, presumably because data is no longer ordered sequentially across pages after updates are performed. I was hoping that HOT would help here and am actually installing PG8.3 now in order to perform some testing... Some, what HOT does is keeps index rows from being updated when updates are made to column values that aren't indexed. The same insert/delete still happens in the table data. But another thing HOT does is make it possible to vacuum the *page* that the update is taking place on. So if there are dead tuples that nobody needs, they can be removed and the new tuple can be placed there. -- Alvaro Herrerahttp://www.advogato.org/person/alvherre Use it up, wear it out, make it do, or do without ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings