Re: [HACKERS] Frequent Update Project: Design Overview of HOT
On Fri, 2006-11-10 at 09:51 +0200, Hannu Krosing wrote: > > What are the advantages of HOT over SITC (other than cool name) ? > > still wondering this, is it just the abilty to span multiple pages ? Multiple page spanning, copy back/VACUUM support, separate overflow relation to prevent heap growth were the main ones. There's ideas in HOT from lots of people and places, SITC was just one of those. There wasn't a dust-off between different designs, just an attempt to take the best designs from wherever/whoever they came from, so HOT isn't one person's design. If anybody wants to change the name, we can, to whatever you like. > > Maybe just make HOT an extended SITC which can span pages. There is a strong willingness to get the design optimal, which will require change. > > > http://www.postgresql.org/about/donate That's exactly what HOT is all about -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates
On 11/10/06, Tom Lane <[EMAIL PROTECTED] > wrote: "Pavan Deolasee" <[EMAIL PROTECTED]> writes:> Yes. The last bit in the t_infomask is used up to mark presence of overflow > tuple header. But I believe there are few more bits that can be reused. > There are three bits available in the t_ctid field as well (since ip_posid> needs maximum 13 bits).No, you cannot have those bits --- BLCKSZ is not limited to 8K, and evenif it were, we will not hold still for sticking flag bits into an unrelated datatype.BLCKSZ is not limited to 8K, but it is limited to 32K because of lp_off and lp_lenwhich are 15 bits in size.OffsetNumber is limited to (BLCKSZ / sizeof(ItemIdData)). Since sizeof(ItemIdData) is 4 bytes, MaxOffsetNumber is 8192. So we need only 13 bits to represent the MaxOffsetNumber. Regards,Pavan
Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates
Tom Lane wrote: "Pavan Deolasee" <[EMAIL PROTECTED]> writes: Yes. The last bit in the t_infomask is used up to mark presence of overflow tuple header. But I believe there are few more bits that can be reused. There are three bits available in the t_ctid field as well (since ip_posid needs maximum 13 bits). No, you cannot have those bits --- BLCKSZ is not limited to 8K, and even if it were, we will not hold still for sticking flag bits into an unrelated datatype. You can probably fix this by inventing multiple context-dependent interpretations of t_infomask bits, but that strikes me as a mighty ugly and fragile way to proceed. Yeah, that seems ugly. I think the best place to grab more bits is the t_natts field. The max value for that is MaxTupleAttributeNumber == 1664, which fits in 11 bits. That leaves 5 bits for other use. We'll have to replace all direct access to that field with an accessor macro, but according to grep there isn't that many files that need to be changed. (Actually, the assumption that you can throw an additional back-pointer into overflow tuple headers is the worst feature of this proposal in that regard --- it's really not that easy to support multiple header formats.) Well, we already have a variable length null bitmap in the header. It seems quite straightforward to me to add the new field before the null bitmap. It certainly requires some changes, in particular to places that access the null bitmap, but it's not an insurmountable effort. Or am I missing some less obvious consequences? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates
> > As more UPDATEs take place these tuple chains would grow, making > > locating the latest tuple take progressively longer. > More generally, do we need an overflow table at all, rather > than having these overflow tuples living in the same file as > the root tuples? As long as there's a bit available to mark > a tuple as being this special not-separately-indexed type, > you don't need a special location to know what it is. Yes, I have that same impression. 1. It doubles the IO (original page + hot page), if the new row would have fit into the original page. 2. locking should be easier if only the original heap page is involved. 3. It makes the overflow pages really hot because all concurrent updates compete for the few overflow pages. 4. although at first it might seem so I see no advantage for vacuum with overflow 5. the size reduction of heap is imho moot because you trade it for a growing overflow (size reduction only comes from reusing dead tuples and not adding index tuples --> SITC) Could you try to explain the reasoning behind separate overflow storage ? What has been stated so far was not really conclusive to me in this regard. e.g. a different header seems no easier in overflow than in heap Andreas ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates
On 11/10/06, Heikki Linnakangas <[EMAIL PROTECTED]> wrote: Tom Lane wrote:> (Actually, the assumption that you can throw an additional back-pointer> into overflow tuple headers is the worst feature of this proposal in> that regard --- it's really not that easy to support multiple header > formats.)Well, we already have a variable length null bitmap in the header. Itseems quite straightforward to me to add the new field before the nullbitmap. It certainly requires some changes, in particular to places that access the null bitmap, but it's not an insurmountable effort. Or am Imissing some less obvious consequences?We have added the overflow header (which right now contains a single entry i.e.the back pointer) on the very similar lines to optional Oid field in the tuple header.A flag (the last free in the t_infomask) is used to check if there is an additionaloverflow header and if so t_hoff is adjusted appropriately. So in the current prototype, the overflow header is after the null bitmapand before the Oid, if it exists.Regards,Pavan
Re: [HACKERS] beta3 CFLAGS issue on openbsd
Am Freitag, 10. November 2006 08:29 schrieb Jeremy Drake: > I figured out that the -g flag was being surreptitiously added to my > CFLAGS. It was like pulling teeth trying to get the -g flag out. I tried > --disable-debug to configure, which did not work. I had to do > CFLAGS=-O2 ./configure ... Apparently you have some CFLAGS setting in your environment. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates
On Thu, 2006-11-09 at 18:28 -0500, Tom Lane wrote: > "Simon Riggs" <[EMAIL PROTECTED]> writes: > > As more UPDATEs take place these tuple chains would grow, making > > locating the latest tuple take progressively longer. > > This is the part that bothers me --- particularly the random-access > nature of the search. I wonder whether you couldn't do something > involving an initial table fill-factor of less than 50%, and having > the first updated version living on the same heap page as its parent. > Only when the active chain length was more than one (which you > hypothesize is rare) would it actually be necessary to do a random > access into the overflow table. Thats appropriate sometimes, not others, but I'll investigate this further so that its possible to take advantage of non-zero fillfactors when they exist. There's a number of distinct use-cases here: If you have a very small, heavily updated table it makes a lot of sense to use lower fillfactors as well. If you have a larger table, using fillfactor 50% immediately doubles the size of the table. If the updates are uneven, as they mostly are because of the Benfold distribution/Pareto principle, then it has been found that leaving space on block doesn't help the heavily updated portions of a table, whereas it hinders the lightly updated portions of a table. TPC-C and TPC-B both have uniformly distributed UPDATEs, so its easy to use the fillfactor to great advantage there. > More generally, do we need an overflow table at all, rather than having > these overflow tuples living in the same file as the root tuples? As > long as there's a bit available to mark a tuple as being this special > not-separately-indexed type, you don't need a special location to know > what it is. This might break down in the presence of seqscans though. HOT currently attempts to place a subsequent UPDATE on the same page of the overflow relation, but this doesn't happen (yet) for placing multiple versions on same page. IMHO it could, but will think about it. > > This allows the length of a typical tuple chain to be extremely short in > > practice. For a single connection issuing a stream of UPDATEs the chain > > length will no more than 1 at any time. > > Only if there are no other transactions being held open, which makes > this claim a lot weaker. True, but Nikhil has run tests that clearly show HOT outperforming current situation in the case of long running transactions. The need to optimise HeapTupleSatisfiesVacuum() and avoid long chains does still remain a difficulty for both HOT and the current situation. > > HOT can only work in cases where a tuple does not modify one of the > > columns defined in an index on the table, and when we do not alter the > > row length of the tuple. > > Seems like "altering the row length" isn't the issue, it's just "is > there room on the page for the new version". Again, a generous > fillfactor would give you more flexibility. The copy-back operation can only work if the tuple fits in the same space as the root tuple. If it doesn't you end up with a tuple permanently in the overflow relation. That might not worry us, I guess. Also, my understanding was that an overwrite operation could not vary the length of a tuple (at least according to code comments). > > [We'll be able to do that more efficiently when > > we have plan invalidation] > > Uh, what's that got to do with it? Currently the HOT code dynamically tests to see if the index columns have been touched. If we had plan invalidation that would be able to be assessed more easily at planning time, in cases where there is no BEFORE trigger. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates
Simon Riggs wrote: On Thu, 2006-11-09 at 18:28 -0500, Tom Lane wrote: "Simon Riggs" <[EMAIL PROTECTED]> writes: HOT can only work in cases where a tuple does not modify one of the columns defined in an index on the table, and when we do not alter the row length of the tuple. Seems like "altering the row length" isn't the issue, it's just "is there room on the page for the new version". Again, a generous fillfactor would give you more flexibility. The copy-back operation can only work if the tuple fits in the same space as the root tuple. If it doesn't you end up with a tuple permanently in the overflow relation. That might not worry us, I guess. Also, my understanding was that an overwrite operation could not vary the length of a tuple (at least according to code comments). You can't If someone else has the page pinned, [We'll be able to do that more efficiently when we have plan invalidation] Uh, what's that got to do with it? Currently the HOT code dynamically tests to see if the index columns have been touched. If we had plan invalidation that would be able to be assessed more easily at planning time, in cases where there is no BEFORE trigger. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates
Oops, pressed send too early. Ignore the one-line reply I just sent... Simon Riggs wrote: On Thu, 2006-11-09 at 18:28 -0500, Tom Lane wrote: "Simon Riggs" <[EMAIL PROTECTED]> writes: HOT can only work in cases where a tuple does not modify one of the columns defined in an index on the table, and when we do not alter the row length of the tuple. Seems like "altering the row length" isn't the issue, it's just "is there room on the page for the new version". Again, a generous fillfactor would give you more flexibility. The copy-back operation can only work if the tuple fits in the same space as the root tuple. If it doesn't you end up with a tuple permanently in the overflow relation. That might not worry us, I guess. You can't move tuples around in a page without holding a Vacuum lock on the page. Some other backend might have the page pinned and have a pointer to a tuple on the page that would get bogus if the tuple is moved. Maybe you could try to get a vacuum lock when doing the update and prereserve space for the new version if you can get one. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates
"Zeugswetter Andreas ADI SD" <[EMAIL PROTECTED]> writes: > 1. It doubles the IO (original page + hot page), if the new row would > have fit into the original page. That's an awfully big IF there. Even if you use a fillfactor of 50% in which case you're paying a 100% performance penalty *all* the time, not just when dealing with a table that's been bloated by multiple versions you still have no guarantee the extra versions will fit on the same page. > 4. although at first it might seem so I see no advantage for vacuum with > overflow The main problem with vacuum now is that it must scan the entire table (and the entire index) even if only a few records are garbage. If we isolate the garbage in a separate area then vacuum doesn't have to scan unrelated tuples. I'm not sure this really solves that problem because there are still DELETEs to consider but it does remove one factor that exacerbates it unnecessarily. I think the vision is that the overflow table would never be very large because it can be vacuumed very aggressively. It has only tuples that are busy and will need vacuuming as soon as a transaction ends. Unlike the main table which is mostly tuples that don't need vacuuming. > 5. the size reduction of heap is imho moot because you trade it for a > growing overflow > (size reduction only comes from reusing dead tuples and not > adding index tuples --> SITC) I think you're comparing the wrong thing. Size isn't a problem in itself, size is a problem because it causes extra i/o. So a heap that's double in size necessary takes twice as long as necessary to scan. The fact that the overflow tables are taking up space isn't interesting if they don't have to be scanned. Hitting the overflow tables should be quite rare, it only comes into play when looking at concurrently updated tuples. It certainly happens but most tuples in the table will be committed and not being concurrently updated by anyone else. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates
"Simon Riggs" <[EMAIL PROTECTED]> writes: >> Seems like "altering the row length" isn't the issue, it's just "is >> there room on the page for the new version". Again, a generous >> fillfactor would give you more flexibility. > > The copy-back operation can only work if the tuple fits in the same > space as the root tuple. If it doesn't you end up with a tuple > permanently in the overflow relation. That might not worry us, I guess. I think he's suggesting that you can put the new version in the available space rather than use the space from the existing tuple. You can keep the same line pointer so index entries still refer to the correct tuple. The only problem I see is that if you determine that there's space available when you do the update that space may have disappeared by the table you come along to do the move-back. Perhaps you can do something clever with reserving the space at that time for the later move-back but I fear that'll complicate vacuum and open up risks if the system crashes in that state. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates
Hi, > > This allows the length of a typical tuple chain to be extremely short in> > practice. For a single connection issuing a stream of UPDATEs the chain> > length will no more than 1 at any time. >> Only if there are no other transactions being held open, which makes> this claim a lot weaker.True, but Nikhil has run tests that clearly show HOT outperformingcurrent situation in the case of long running transactions. The need to optimise HeapTupleSatisfiesVacuum() and avoid long chains does stillremain a difficulty for both HOT and the current situation.Yes, I carried out some pgbench runs comparing our current HOT update patch with PG82BETA2 sources for the long running transaction case. For an apples to apples comparison we got roughly 170% improvement with the HOT update patch over BETA2. In case of BETA2, since all versions are in the main heap, we end up doing multiple index scans for them. In case of HOT updates, we have a single index entry with the chains getting traversed from the overflow relation. So as Simon has mentioned the need to avoid long chains remains a difficulty for both the situations. Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates
Hi,I think the vision is that the overflow table would never be very large because it can be vacuumed very aggressively. It has only tuples that are busyand will need vacuuming as soon as a transaction ends. Unlike the main tablewhich is mostly tuples that don't need vacuuming. Thats right. vacuum if it gets a chance to work on the overflow relation seems to be doing a decent job in our runs. If autovacuum/vacuum gets to run optimally, the FSM information generated for the overflow relations will be able to serve a lot of new tuple requests avoiding undue/large bloat in the overflow relations. Regards,Nikhils-- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates
On Fri, 2006-11-10 at 12:32 +0100, Zeugswetter Andreas ADI SD wrote: > > > As more UPDATEs take place these tuple chains would grow, making > > > locating the latest tuple take progressively longer. > > > More generally, do we need an overflow table at all, rather > > than having these overflow tuples living in the same file as > > the root tuples? As long as there's a bit available to mark > > a tuple as being this special not-separately-indexed type, > > you don't need a special location to know what it is. > > Yes, I have that same impression. I think this might work, I'll think about it. If you can come up with a test case where we would need this optimisation then I'm sure that can be prototyped. In many cases though an overflow relation is desirable, so ISTM that we might want to be able to do both: have an updated tuple with not-indexed bit set in the heap if on same page, else go to overflow relation. > 1. It doubles the IO (original page + hot page), if the new row would > have fit into the original page. Yes, but thats a big if, and once it is full that optimisation goes away. Some thoughts: First, it presumes that the second block is not in cache (see later). Second, my observation is that this can happen for some part of the time, but under heavy updates this window of opportunity is not the common case and so this optimisation would not make much difference. However, in some cases, it would be appropriate, so I'll investigate. HOT optimises the case where a tuple in the overflow relation is UPDATEd, so it can place the subsequent tuple version on the same page. So if you perform 100 UPDATEs on a single tuple, the first will write to the overflow relation in a new block, then the further 99 will attempt to write to the same block, if they can. So in many cases we would do only 101 block accesses and no real I/O. > 2. locking should be easier if only the original heap page is involved. Yes, but multi-page update already happens now, so HOT is not different on that point. > 3. It makes the overflow pages really hot because all concurrent updates > compete > for the few overflow pages. Which ensures they are in shared_buffers, rather than causing I/O. The way FSM works, it will cause concurrent updaters to spread out their writes to many blocks. So in the case of a single high frequency updater all of the updates go into the same block of the overflow relation, so the optimisation you referred to in (1) does take effect strongly in that case, yet without causing contention with other updaters. The FSM doesn't change with HOT, but the effects of having inserted additional tuples into the main heap are much harder to undo afterwards. The overflow relation varies in size according to the number of updates, not the actual number of tuples, as does the main heap, so VACUUMing will focus on the hotspots and be more efficient, especially since no indexes need be scanned. [Sure we can use heapblock-need-vacuum bitmaps, but there will still be a mix of updated/not-updated tuples in there, so VACUUM would still be less efficient than with HOT]. So VACUUM can operate more frequently on the overflow relation and keep the size reasonable for more of the time, avoiding I/O. Contention is and will remain a HOT topic ;-) I understand your concerns and we should continue to monitor this on the various performance tests that will be run. > 4. although at first it might seem so I see no advantage for vacuum with > overflow No need to VACUUM the indexes, which is the most expensive part. The more indexes you have, the more VACUUM costs, not so with HOT. > 5. the size reduction of heap is imho moot because you trade it for a > growing overflow > (size reduction only comes from reusing dead tuples and not > adding index tuples --> SITC) HOT doesn't claim to reduce the size of the heap. In the presence of a long running transaction, SizeOfHOT(heap + overflow) = SizeOfCurrent(Heap). VACUUM is still required in both cases to manage total heap size. If we have solely UPDATEs and no deletes, then only the overflow relation need be VACUUMed. > Could you try to explain the reasoning behind separate overflow storage > ? I think the answers above cover the main points, which seem to make the case clearly enough from a design rationale perspective, even without the performance test results to confirm them. > What has been stated so far was not really conclusive to me in this > regard. Personally, I understand. I argued against them for about a month after I first heard of the idea, but they make sense for me now. HOT has evolved considerably from the various ideas of each of the original idea team (Jonah, Bruce, Jan, myself) and will continue to do so as better ideas replace poor ones, based on performance tests. All of the ideas within it need to be strongly challenged to ensure we arrive at the best solution. > e.g. a different header seems no easier in overflow than in heap
Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates
On 11/10/06, Simon Riggs <[EMAIL PROTECTED]> wrote: On Fri, 2006-11-10 at 12:32 +0100, Zeugswetter Andreas ADI SD wrote:> e.g. a different header seems no easier in overflow than in heap True. The idea there is that we can turn frequent update on/off fairlyeasily for normal tables since there are no tuple format changes in themain heap. It also allows HOT to avoid wasting space when a table is heavily updated in certain places only.I general though, it would make implementation a bit simpler when tuples withdifferent header are isolated in a different relation. Regards, Pavan
Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates
> > I think the vision is that the overflow table would never be very > > large because it can be vacuumed very aggressively. It has only tuples > > that are busy and will need vacuuming as soon as a transaction ends. > > Unlike the main table which is mostly tuples that don't need > > vacuuming. Except when deleted :-) > Thats right. vacuum if it gets a chance to work on the > overflow relation seems to be doing a decent job in our runs. > If autovacuum/vacuum gets to run optimally, the FSM > information generated for the overflow relations will be able > to serve a lot of new tuple requests avoiding undue/large > bloat in the overflow relations. It seems like we would want to create a chain into overflow for deleted rows also (header + all cols null), so we can vacuum those too only by looking at overflow, at least optionally. I think the overflow would really need to solve deletes too, or the bitmap idea is more generally useful to vacuum. Generally for clear distinction I think we are talking about two things here. 1. reduce index bloat and maintenance work 2. allow vaccuum a cheaper focus on what needs to be done Andreas ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates
Hi, > True, but Nikhil has run tests that clearly show HOT outperforming > current situation in the case of long running transactions. The need to > optimise HeapTupleSatisfiesVacuum() and avoid long chains does still > remain a difficulty for both HOT and the current situation. > Yes, I carried out some pgbench runs comparing our current HOT update patch with PG82BETA2 sources for the long running transaction case. For an apples to apples comparison we got roughly 170% improvement with the HOT update patch over BETA2. In case of BETA2, since all versions are in the main heap, we end up doing multiple index scans for them. In case of HOT updates, we have a single index entry with the chains getting traversed from the overflow relation. So as Simon has mentioned the need to avoid long chains remains a difficulty for both the situations. Regards, Nikhils ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[HACKERS] how & from where to start & admin pgsql on red hat ES 3
Hi Group! I am trying to work & connect php wirh PGSQL on Red Hat ES 3. I got some Important path like: /usr/share/pgsql/contrib /usr/include/pgsql /usr/bin But I don't know how to start/stop pgsql server, Insert data from client pgsql , view database files, admin database & configuration file. please help me Thanks in advance ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] how & from where to start & admin pgsql on red hat
jatrojoomla wrote: Hi Group! I am trying to work & connect php wirh PGSQL on Red Hat ES 3. I got some Important path like: /usr/share/pgsql/contrib /usr/include/pgsql /usr/bin But I don't know how to start/stop pgsql server, Insert data from client pgsql , view database files, admin database & configuration file. This is absolutely the wrong forum in which to ask questions like this - it is only for discussion relating to development of postgres, not to usage questions. Please ask on pgsql-general. cheers andrew ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[HACKERS] Protocol specs
Folks, Does anyone know where I can find information about the PG communication protocol specifications between backend and frontend? Regards, Gevik. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates
On 11/10/06, Tom Lane <[EMAIL PROTECTED]> wrote: "Pavan Deolasee" <[EMAIL PROTECTED]> writes:> On 11/10/06, Tom Lane <[EMAIL PROTECTED]> wrote:> (2) Isn't this full of race conditions? > I agree, there could be race conditions. But IMO we can handle those.Doubtless you can prevent races by introducing a bunch of additionallocking. The question was really directed to how much concurrent performance is left, once you get done locking it down.I understand your point and I can clearly see a chance to improve upon the currentlocking implementation in the prototype even though we are seeing a good performance boost for 50 clients and 50 scaling factor with pgbench runs as mentioned by Nikhil.Regards,Pavan
Re: [HACKERS] Protocol specs
Gevik Babakhani wrote: Folks, Does anyone know where I can find information about the PG communication protocol specifications between backend and frontend? Yeah, there's a chapter about that in the manual: http://www.postgresql.org/docs/8.1/interactive/protocol.html -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates
> > 1. It doubles the IO (original page + hot page), if the new row would > > have fit into the original page. > > That's an awfully big IF there. Even if you use a fillfactor > of 50% in which case you're paying a 100% performance penalty I don't see where the 50% come from ? That's only needed if you update all rows on the page. And that in a timeframe, that does not allow reuse of other outdated tuples. > > 4. although at first it might seem so I see no advantage for vacuum > > with overflow > > The main problem with vacuum now is that it must scan the > entire table (and the entire index) even if only a few > records are garbage. If we isolate the garbage in a separate > area then vacuum doesn't have to scan unrelated tuples. > > I'm not sure this really solves that problem because there > are still DELETEs to consider but it does remove one factor > that exacerbates it unnecessarily. Yea, so you still need to vaccum the large table regularly. > I think the vision is that the overflow table would never be > very large because it can be vacuumed very aggressively. It > has only tuples that are busy and will need vacuuming as soon > as a transaction ends. Unlike the main table which is mostly > tuples that don't need vacuuming. Ok, but you have to provide an extra vacuum that does only that then (and it randomly touches heap pages, and only does partial work there). > > 5. the size reduction of heap is imho moot because you trade it for a > > growing overflow > > (size reduction only comes from reusing dead tuples and > not adding > > index tuples --> SITC) > > I think you're comparing the wrong thing. I mean unless you do individually vacuum the overflow more frequently > Size isn't a problem in itself, size is a problem because it causes extra > i/o. Yes, and I state that at all possible occations :-) OnDisk size is a problem, really. > So a heap that's double in size necessary takes twice as > long as necessary to scan. The fact that the overflow tables > are taking up space isn't interesting if they don't have to > be scanned. The overflow does have to be read for each seq scan. And it was stated that it would be accessed with random access (follow tuple chain). But maybe we can read the overflow same as if it where an additional segment file ? > Hitting the overflow tables should be quite rare, it only > comes into play when looking at concurrently updated tuples. > It certainly happens but most tuples in the table will be > committed and not being concurrently updated by anyone else. The first update moves the row to overflow, only the 2nd next might be able to pull it back. So on average you would have at least 66% of all updated rows after last vacuum in the overflow. The problem with needing very frequent vacuums is, that you might not be able to do any work because of long transactions. Andreas ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates
> > 2. locking should be easier if only the original heap page is involved. > > Yes, but multi-page update already happens now, so HOT is not > different on that point. I was thinking about the case when you "pull back" a tuple, which seems to be more difficult than what we have now. Andreas PS: I think it is great that you are doing all this work and explaining it for us. Thanks. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates
> > True, but Nikhil has run tests that clearly show HOT outperforming > > current situation in the case of long running transactions. The need > > to optimise HeapTupleSatisfiesVacuum() and avoid long chains does > > still remain a difficulty for both HOT and the current situation. > > > Yes, I carried out some pgbench runs comparing our current > HOT update patch with PG82BETA2 sources for the long running > transaction case. For an apples to apples comparison we got Vaccuums every 5 minutes, or no vaccuums ? > roughly 170% improvement with the HOT update patch over BETA2. Wow, must be smaller indexes and generally less index maintenance. What this also states imho, is that following tuple chains is not so expensive as maintaining indexes (at least in a heavy update scenario like pgbench). Maybe we should try a version, where the only difference to now is, that when the index keys stay the same the indexes are not updated, and the tuple chain is followed instead when selecting with index. (Maybe like the current alive flag the index pointer can even be refreshed to the oldest visible tuple by readers) Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Frequent Update Project: Design Overview ofHOTUpdates
On Fri, 2006-11-10 at 17:00 +0100, Zeugswetter Andreas ADI SD wrote: > > > 2. locking should be easier if only the original heap page is > involved. > > > > Yes, but multi-page update already happens now, so HOT is not > > different on that point. > > I was thinking about the case when you "pull back" a tuple, which seems > to be more > difficult than what we have now. Well, it sounds like it would be really bad, at least thats what I thought at first. We tried it anyway and performance shows it aint that bad. > PS: I think it is great that you are doing all this work and explaining > it for us. Thanks. Thanks for your feedback; I'm certain we'll be able to improve on where HOT is now with some objective thinking. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] Frequent Update Project: Design Overview of
Ühel kenal päeval, R, 2006-11-10 kell 12:19, kirjutas Simon Riggs: > On Thu, 2006-11-09 at 18:28 -0500, Tom Lane wrote: > > > HOT can only work in cases where a tuple does not modify one of the > > > columns defined in an index on the table, and when we do not alter the > > > row length of the tuple. > > > > Seems like "altering the row length" isn't the issue, it's just "is > > there room on the page for the new version". Again, a generous > > fillfactor would give you more flexibility. > > The copy-back operation can only work if the tuple fits in the same > space as the root tuple. If it doesn't you end up with a tuple > permanently in the overflow relation. That might not worry us, I guess. > > Also, my understanding was that an overwrite operation could not vary > the length of a tuple (at least according to code comments). But you can change the t_ctid pointer in the heap tuple to point to oldest visible tuple in overflow. What I did not understand very well, is how do you determine, which overflow tuples are safe to remove. Do you mark/remove them when doing the copyback ? > -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com NOTICE: This communication contains privileged or other confidential information. If you have received it in error, please advise the sender by reply email and immediately delete the message and any attachments without copying or disclosing the contents. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Frequent Update Project: Design Overview ofHOTUpdates
On Fri, 2006-11-10 at 20:38 +0200, Hannu Krosing wrote: > Ühel kenal päeval, R, 2006-11-10 kell 12:19, kirjutas Simon Riggs: > > On Thu, 2006-11-09 at 18:28 -0500, Tom Lane wrote: > > > > > HOT can only work in cases where a tuple does not modify one of the > > > > columns defined in an index on the table, and when we do not alter the > > > > row length of the tuple. > > > > > > Seems like "altering the row length" isn't the issue, it's just "is > > > there room on the page for the new version". Again, a generous > > > fillfactor would give you more flexibility. > > > > The copy-back operation can only work if the tuple fits in the same > > space as the root tuple. If it doesn't you end up with a tuple > > permanently in the overflow relation. That might not worry us, I guess. > > > > Also, my understanding was that an overwrite operation could not vary > > the length of a tuple (at least according to code comments). > > But you can change the t_ctid pointer in the heap tuple to point to > oldest visible tuple in overflow. > > What I did not understand very well, is how do you determine, which > overflow tuples are safe to remove. Do you mark/remove them when doing > the copyback ? Yes, dead chain is marked as we walk it to the next visible tuple prior to copyback. Thats done with an exclusive lock held on the root block. Pavan can answer those details better than me, so I'll leave that to him, but the copyback operation needs to be closely examined. We hope to post the code by Tuesday. It's being cleaned now to remove the remains of previous prototypes to allow it be reviewed without confusion. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Frequent Update Project: Design Overview of HOT Updates
Hi, On 11/10/06, Pavan Deolasee <[EMAIL PROTECTED]> wrote: On 11/10/06, Tom Lane <[EMAIL PROTECTED]> wrote: "Pavan Deolasee" < [EMAIL PROTECTED]> writes:> On 11/10/06, Tom Lane <[EMAIL PROTECTED]> wrote:> (2) Isn't this full of race conditions? > I agree, there could be race conditions. But IMO we can handle those.Doubtless you can prevent races by introducing a bunch of additionallocking. The question was really directed to how much concurrent performance is left, once you get done locking it down.I understand your point and I can clearly see a chance to improve upon the currentlocking implementation in the prototype even though we are seeing a good performance boost for 50 clients and 50 scaling factor with pgbench runs as mentioned by Nikhil.Regards,Pavan Yes, we have done a number of runs with and without autovacuum with parameters like 50 clients, 50 scaling factor and 25000 transactions per client. 50 clients should introduce a decent amount of concurrency. The tps values observed with the HOT update patch (850 tps) were approximately 200+% better than PG82 sources (270). Runs with 25 clients, 25 scaling factor and 25000 transactions produce similar percentage increases with the HOT update patch. Regards, Nikhils -- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates
Hi, On 11/10/06, Zeugswetter Andreas ADI SD <[EMAIL PROTECTED]> wrote: > > True, but Nikhil has run tests that clearly show HOT outperforming> > current situation in the case of long running transactions. The need > > to optimise HeapTupleSatisfiesVacuum() and avoid long chains does> > still remain a difficulty for both HOT and the current situation.>>> Yes, I carried out some pgbench runs comparing our current > HOT update patch with PG82BETA2 sources for the long running> transaction case. For an apples to apples comparison we gotVaccuums every 5 minutes, or no vaccuums ? We tried with both. Vacuum seems to do little to help in a long running transaction case. Generally in most of the pgbench runs that we carried out, autovacuum did not seem to be of much help even to PG82. Regards, Nikhils-- EnterpriseDB http://www.enterprisedb.com
Re: [HACKERS] Frequent Update Project: Design Overview ofHOTUpdates
On Fri, 2006-11-10 at 17:04 +0100, Zeugswetter Andreas ADI SD wrote: > > > True, but Nikhil has run tests that clearly show HOT outperforming > > > current situation in the case of long running transactions. The need > > > > to optimise HeapTupleSatisfiesVacuum() and avoid long chains does > > > still remain a difficulty for both HOT and the current situation. > > > > > > Yes, I carried out some pgbench runs comparing our current > > HOT update patch with PG82BETA2 sources for the long running > > transaction case. For an apples to apples comparison we got > > Vaccuums every 5 minutes, or no vaccuums ? Same either way, since vacuum would not remove any tuples. > > roughly 170% improvement with the HOT update patch over BETA2. > > Wow, must be smaller indexes and generally less index maintenance. > What this also states imho, is that following tuple chains > is not so expensive as maintaining indexes (at least in a heavy update > scenario like pgbench). > > Maybe we should try a version, where the only difference to now is, > that when the index keys stay the same the indexes are not updated, and > the tuple > chain is followed instead when selecting with index. (Maybe like the > current alive flag the index pointer can even be refreshed to the oldest > visible > tuple by readers) I think that is SITC, nearly. It's also exactly what HOT does, with the exception that the updated tuples goes to a block in the overflow, rather than a block later in the heap. The overflow relation isn't critical to the HOT mechanism, but it does allow the two optimisations of not requiring all tuple headers to be modified with the back pointer and improving the locality of VACUUM. We might do that with some other structuring, but if it works for TOAST, I figure its OK for HOT also. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Frequent Update Project: Design Overview of HOTUpdates
On Fri, 2006-11-10 at 16:46 +0100, Zeugswetter Andreas ADI SD wrote: > > I'm not sure this really solves that problem because there > > are still DELETEs to consider but it does remove one factor > > that exacerbates it unnecessarily. > > Yea, so you still need to vaccum the large table regularly. HOT covers the use-case of heavy updating, which in many common cases occurs on tables with few inserts/deletes. HOT would significantly reduce the need to vacuum since deletes and wraparound issues would be the only remaining reasons to do this. [I have some ideas for how to optimize tables with heavy INSERT/DELETE activity, but that case is much less prevalent than heavy UPDATEs.] > > I think the vision is that the overflow table would never be > > very large because it can be vacuumed very aggressively. It > > has only tuples that are busy and will need vacuuming as soon > > as a transaction ends. Unlike the main table which is mostly > > tuples that don't need vacuuming. > > Ok, but you have to provide an extra vacuum that does only that then > (and it randomly touches heap pages, and only does partial work there). Sure, HOT needs a specially optimised VACUUM. > > So a heap that's double in size necessary takes twice as > > long as necessary to scan. The fact that the overflow tables > > are taking up space isn't interesting if they don't have to > > be scanned. > > The overflow does have to be read for each seq scan. And it was stated > that it would > be accessed with random access (follow tuple chain). > But maybe we can read the overflow same as if it where an additional > segment file ? Not without taking a write-avoiding lock on the table, unfortunately. > > Hitting the overflow tables should be quite rare, it only > > comes into play when looking at concurrently updated tuples. > > It certainly happens but most tuples in the table will be > > committed and not being concurrently updated by anyone else. > > The first update moves the row to overflow, only the 2nd next might be > able to pull it back. > So on average you would have at least 66% of all updated rows after last > vacuum in the overflow. > > The problem with needing very frequent vacuums is, that you might not be > able to do any work because of long transactions. HOT doesn't need more frequent VACUUMs, it is just more efficient and so can allow them, when needed to avoid I/O. Space usage in the overflow relation is at its worst in the case of an enormous table with low volume random updates, but note that it is *never* worse than current space usage. In the best case, which is actually fairly common in practice: a small number of rows of a large table are being updated by a steady stream of concurrent updates, we find the overflow relation needs only a few 100 tuples, so regular vacuuming will be both easy and effective. As an aside, note that HOT works best in real-world situations, not benchmarks such as TPC where the I/Os are deliberately randomised to test the scalability of the RDBMS. But even then, HOT works better. The long-running transaction issue remains unsolved in this proposal, but I have some ideas for later. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PostgreSQL 8.2 (from CVS devel) first impressions
[EMAIL PROTECTED] writes: > On Sun, Nov 05, 2006 at 09:11:07AM +, Simon Riggs wrote: >> If that hit you then we're gonna get a few more people trip the same >> way. Do you have any suggestions as to how to avoid that experience for >> others? > I believe the release notes are inadequate. I've read them three times > before and it never stood out for me. I put a note about this into "Migration to 8.2". regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PostgreSQL 8.2 (from CVS devel) first impressions
[EMAIL PROTECTED] writes: > On Sun, Nov 05, 2006 at 11:01:40AM -0500, Neil Conway wrote: >> Presumably those are just the standard warnings we can't easiy >> eliminate. If not, can you post them please? > They all appear harmless. The reason those "uninitialized variable" warnings got away from us is that gcc doesn't emit them at -O2 or below, so most of us never saw 'em before. I've cleaned them up. The "find_rule" gripe is really a flex bug :-( ... not easy to avoid. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] Indicate disabled triggers in \d
On Tue, 2006-11-07 at 16:21 +1100, Brendan Jurd wrote: > Minor fix to the previous patch; result7 was not being cleared at the > end of the block. The patch still leaks result7 circa line 1400 (CVS HEAD). I didn't look closely, but you probably also leak result7 circa line 1209, if result6 is NULL. (Yeah, we definitely need to refactor describeOneTableDetails().) -Neil ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] PostgreSQL 8.2 (from CVS devel) first impressions
On Fri, Nov 10, 2006 at 08:17:09PM -0500, Tom Lane wrote: > [EMAIL PROTECTED] writes: > > On Sun, Nov 05, 2006 at 11:01:40AM -0500, Neil Conway wrote: > >> Presumably those are just the standard warnings we can't easiy > >> eliminate. If not, can you post them please? > > They all appear harmless. > The reason those "uninitialized variable" warnings got away from us is > that gcc doesn't emit them at -O2 or below, so most of us never saw 'em > before. I've cleaned them up. Cool. Thanks. I like my compiles warnings free. :-) > The "find_rule" gripe is really a flex bug :-( ... not easy to avoid. *nod* Cheers, mark -- [EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] __ . . _ ._ . . .__. . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/|_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] Indicate disabled triggers in \d
On 11/11/06, Neil Conway <[EMAIL PROTECTED]> wrote: The patch still leaks result7 circa line 1400 (CVS HEAD). I didn't look closely, but you probably also leak result7 circa line 1209, if result6 is NULL. New version of the patch attached (against CVS HEAD) that fixes these two issues. (Yeah, we definitely need to refactor describeOneTableDetails().) I'd be interested in doing some work on this. What did you have in mind? BJ describe_disabled_trigs.diff Description: Binary data ---(end of broadcast)--- TIP 6: explain analyze is your friend
[HACKERS] Failure on tapir / only 10 max connections?
Tapir appears to be failing because make check wants more than 10 connections for testing. What I don't understand is why it's being limited to 10. initdb -d doesn't help either... ... selecting default max_connections ... 10 selecting default shared_buffers/max_fsm_pages ... 32MB/204800 creating configuration files ... ok ... AFAIK, my shared memory settings should be high enough to support more than 10 connections... kern.ipc.shmall: 524288 kern.ipc.shmseg: 128 kern.ipc.shmmni: 192 kern.ipc.shmmin: 1 kern.ipc.shmmax: 1073741824 Is 10 just the new default? BTW, does make check log it's initdb output anywhere? It'd be handy if it did... -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Failure on tapir / only 10 max connections?
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > Tapir appears to be failing because make check wants more than 10 > connections for testing. What I don't understand is why it's being > limited to 10. Your SysV IPC limits are too small --- apparently it's not so much shared memory as semaphores that are the problem. See the FreeBSD-specific notes in http://developer.postgresql.org/pgdocs/postgres/kernel-resources.html regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq