Re: [HACKERS] Another idea for dealing with cmin/cmax
On Oct 3, 2006, at 2:23 PM, Gregory Stark wrote: If the space set aside for these transaction ids is full when you're inserting i suppose you could just go back to the FSM for another page. But I don't see any way out when you're deleting. You have to mark xmax one way or another and if there's no space left in the footer and you only have 4 bits in the tuple what are you going to do? As an aside doing vacuum freeze more aggressively might reduce the pressure on these ITL slots. But I don't see any way to guarantee a slot is available for xmax when deleting. We would need some sort of scheme where the space for transaction ids is able to grow but we're already growing from both ends of the page. We would either have to interleave transaction ids with line pointers or store them on another special page somewhere. Well, worst-case you could just re-do the whole page if you need to expand the list of transaction slots; I don't think that's a huge deal. What did have me baffled was how to deal with xmax though, since (as you mentioned), you can end up in a situation where you can't delete a tuple because there's no more room on the page for another xmax. But I just thought of a way around that which might be better than a separate store for transaction info: allow for moving a tuple off the current page by placing a link to it's new location, similar to how ctid works. We probably wouldn't want to try and cram that into the item list, but I think we should be able to create a special version of a tuple header (AddressForwardingHeader) that simply states the tuple has moved to this new ctid; go there. Of course, anytime you have to follow that link you're going to pay a penalty, but I think this should only be needed when trying to delete a tuple on a page that's basically full. Theoretically, there shouldn't be too many people trying to hit that deleted tuple, but to further reduce the number of people hitting it, we could include the visibility info (or a pointer to it) in the AddressForwardingHeader. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Another idea for dealing with cmin/cmax
ITAGAKI Takahiro [EMAIL PROTECTED] writes: ITL-like approach is more efficient than per-tuple XIDs unless all tuples in a page are locked at the same time. However, MAXTRANS and PCTFREE issues may bother us. I'm not sure how Oracle gets away with MAXTRANS. Somehow it seems to never arise as a practical problem yet it seems like there must be instances where it would cause a serious problems. It's worse for Postgres since as I understand it Oracle only needs to track transaction ids that are actively locking the record. Postgres needs to track transaction ids for the inserting and deleting transaction even when there's no lock (or no lock remaining). I can imagine having a scheme where there's a short list of transaction ids in the footer and then each tuple just stores an index into that list. If you had space for 16 transaction ids set aside then you could store xmin and xmax in a single byte for example. If the space set aside for these transaction ids is full when you're inserting i suppose you could just go back to the FSM for another page. But I don't see any way out when you're deleting. You have to mark xmax one way or another and if there's no space left in the footer and you only have 4 bits in the tuple what are you going to do? As an aside doing vacuum freeze more aggressively might reduce the pressure on these ITL slots. But I don't see any way to guarantee a slot is available for xmax when deleting. We would need some sort of scheme where the space for transaction ids is able to grow but we're already growing from both ends of the page. We would either have to interleave transaction ids with line pointers or store them on another special page somewhere. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Another idea for dealing with cmin/cmax
Ühel kenal päeval, E, 2006-10-02 kell 01:30, kirjutas Tom Lane: Jim C. Nasby [EMAIL PROTECTED] writes: ... place a limit on the number of transactions that can be live in a table at once. Urk, well maybe, but ... you could shrink all the visibility info to 1 byte if you wanted to. ... 256 of 'em is surely not an acceptable limit. I have been thinking about this, and it seems that especially for OLAP loads it would be much better to keep tuple visibility info in a separate file, lets call it Tuple Visibility Map (TVM) TVM would have the following benefits: 1) TVM could be uses for index-only lookups as well as heap-only lookups, also other index lookups could be filtered against it fast before going to heap. 2) TVM could be heavily compressed, especially for bulk loads something like a single (xmin, xmax,cmin,cmax) tuple plus RLE-encoded list of pointers to it will do. 3) In case TVM space is needed in in any page, it would be easy to just throw away cmin/cmax from tuples from committed/aborted transactions. 4) First pass of VACUUM would be much faster, as it has to scan only TVM. Pages with no expired tuples would not need to be touched. If we can come up with a good design for TVM, it may also be an overall win for many kinds of OLTP queries, as it may result in less writes to disk and almost the same amount of writing to WAL. Maybe bitmap or btree index would be something to use as a starting point when designing TVM. Another idea to consider would be to merge FSM and TVM and then use them also for keeping data in cluster order. -- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Another idea for dealing with cmin/cmax
Tom Lane [EMAIL PROTECTED] writes: Jim C. Nasby [EMAIL PROTECTED] writes: ... place a limit on the number of transactions that can be live in a table at once. Urk, well maybe, but ... you could shrink all the visibility info to 1 byte if you wanted to. ... 256 of 'em is surely not an acceptable limit. The plan Gavin Sherry and I were discussing at the Code Sprint was to store a single most common xmin xmin in the per-page special area. Then have a bit on each tuple indicating that the xmin isn't present in the tuple and instead to use the xmin from the page footer. Any tuples with other values of xmin would have to store that xmin normally. The use case here is primarily tables loaded in large batch jobs that have large swaths of the table, possibly the entire table, loaded in the same transaction. My thinking was that most common xmin could be very approximate. In fact my my plan was to just store the first xmin the page sees there. This catches the common use cases of pg_restore or COPY loading many records and even catches most cases of large inserts into existing tables whenever they extend the table. A further refinement could be to have vacuum look for the actual most common xmin and rewrite the page using it. If there's enough free space it may be worthwhile storing InvalidTransactionId and allowing the next insert to set the most common xmin. However I'm not convinced of the importance of this refinement. The thing to remember is that shaving bits off tuples is not a goal in itself. It's a means to an end, namely packing more tuples on a page. If we shave off 4 bytes off every tuple when we're loading thousands of tuples then we get to put more of them on a page. If we save space on tuples when we're running vacuum that just gives us more free space in the free space map and we only see a benefit down the line if we end up eventually filling up that space. -- greg ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Another idea for dealing with cmin/cmax
Tom Lane [EMAIL PROTECTED] wrote: Jim C. Nasby [EMAIL PROTECTED] writes: Dumb question... wouldn't getting down to 20 bytes buy us something? BTW, the apparently useless byte after the 27- or 23-byte header actually has some good use: in a table of up to 8 columns, you can fit a null bitmap there for free. In a scheme that took us down to 20 rather than 19 bytes, even a narrow table would pay the full maxalign price for having a null. We may use free bytes for other purposes. For example, if we increase the size of XIDs from 4 to 6 bytes, we can get rid of transaction wraparound problem. There may be some other uses. I'm in favor of combining cmin/cmax/xvac to get us down to 23 bytes, but I think anything beyond that is going to face a serious problem of greatly increased cost for diminishing returns. If we really want to reduce the size of headers to 16 bytes, we maybe need to do with HeapTupleHeaderData.t_ctid . Under the assumption that tuples are offen updated in the same page, we only need offsets in the page to link an old tuple to new one. We can reduce the size of t_ctid from 6 to 2 bytes. When tuples are updated to other pages, we probably need phantom ctid. In another assumption that tuples are offen updated after frozen, we can overlap ctid and xmin to a physical field using union. But phantom xids are needed to update unfrozen tuples. However, I think both assumptions have less probability than the one assumed when we introduce phantom cids. The above ideas probably do not work well. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Another idea for dealing with cmin/cmax
Jim C. Nasby [EMAIL PROTECTED] writes: ... place a limit on the number of transactions that can be live in a table at once. Urk, well maybe, but ... you could shrink all the visibility info to 1 byte if you wanted to. ... 256 of 'em is surely not an acceptable limit. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Another idea for dealing with cmin/cmax
ITAGAKI Takahiro wrote: However, I think our next goal to shrink the headers is 16 bytes. The headers become 23 bytes using phantom cids and we are limited by alignments, so we will have no more advantages unless we delete extra 7 bytes in the headers. ...and it seems to be very difficult. Yeah, I thought about that too earlier. If we get rid of VACUUM FULL, or replace it with something that doesn't need xvac, and keep cmin and cmax in backend-private storage, we could get rid of the overlayed t_field4, which is 4 bytes. Then we're down to 19 bytes. We could get rid of t_hoff, because we should always be able to calculate the header size. Then we're down to 18 bytes. There's currently 15 bits in use in the infomask. After we remove the HEAP_MOVED_* fields that we don't need without VACUUM FULL, that's down to 13 bits. t_natts only needs 11 bits, because MaxHeapAttributeNumber is 1600. We could move 5 of the bits in infomask to the high 5 bits of t_natts, and save one byte. We're now down to 17 bytes. That's as far as I got. So it seems we could shave off some bytes, but we still can't get down to 16. And the changes needed in total would be quite massive. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Another idea for dealing with cmin/cmax
On Fri, Sep 29, 2006 at 09:35:31AM +0100, Heikki Linnakangas wrote: We could get rid of t_hoff, because we should always be able to calculate the header size. Then we're down to 18 bytes. Without t_hoff, how do you know the size of the null bitmap? You could probably do it only if you assume the null bitmap, if present, always covers all the columns... Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] Another idea for dealing with cmin/cmax
Martijn van Oosterhout wrote: On Fri, Sep 29, 2006 at 09:35:31AM +0100, Heikki Linnakangas wrote: We could get rid of t_hoff, because we should always be able to calculate the header size. Then we're down to 18 bytes. Without t_hoff, how do you know the size of the null bitmap? You could probably do it only if you assume the null bitmap, if present, always covers all the columns... I think we assume that already. heap_form_tuple reserves space for the bitmap like this: if (hasnull) len += BITMAPLEN(numberOfAttributes); -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Another idea for dealing with cmin/cmax
On Fri, Sep 29, 2006 at 10:59:13AM +0100, Heikki Linnakangas wrote: Martijn van Oosterhout wrote: On Fri, Sep 29, 2006 at 09:35:31AM +0100, Heikki Linnakangas wrote: We could get rid of t_hoff, because we should always be able to calculate the header size. Then we're down to 18 bytes. Without t_hoff, how do you know the size of the null bitmap? You could probably do it only if you assume the null bitmap, if present, always covers all the columns... I think we assume that already. heap_form_tuple reserves space for the bitmap like this: if (hasnull) len += BITMAPLEN(numberOfAttributes); Ok, now we do an ALTER TABLE blah ADD COLUMN ..., and we have to expand the bitmaps for the entire table? Have a nice day, -- Martijn van Oosterhout kleptog@svana.org http://svana.org/kleptog/ From each according to his ability. To each according to his ability to litigate. signature.asc Description: Digital signature
Re: [HACKERS] Another idea for dealing with cmin/cmax
Martijn van Oosterhout wrote: On Fri, Sep 29, 2006 at 10:59:13AM +0100, Heikki Linnakangas wrote: Martijn van Oosterhout wrote: On Fri, Sep 29, 2006 at 09:35:31AM +0100, Heikki Linnakangas wrote: We could get rid of t_hoff, because we should always be able to calculate the header size. Then we're down to 18 bytes. Without t_hoff, how do you know the size of the null bitmap? You could probably do it only if you assume the null bitmap, if present, always covers all the columns... I think we assume that already. heap_form_tuple reserves space for the bitmap like this: if (hasnull) len += BITMAPLEN(numberOfAttributes); Ok, now we do an ALTER TABLE blah ADD COLUMN ..., and we have to expand the bitmaps for the entire table? No, you'd still have the the number of attributes (t_natts) in the header. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Another idea for dealing with cmin/cmax
On Fri, Sep 29, 2006 at 01:15:06PM +0900, ITAGAKI Takahiro wrote: Jim C. Nasby [EMAIL PROTECTED] wrote: The reason I thought of this is because once the transaction commits, we have no use for the cid info. So we could do something like have bgwriter look for tuples that belong to committed transactions before it writes a page, and strip the cid out of them. Your concept is just like as the experimental method that I suggested before in http://archives.postgresql.org/pgsql-hackers/2005-08/msg01193.php We can remove cmin and cmax from commited tuples and xmin from frozen tuples and we might save some bytes in tuple headers. However, I think our next goal to shrink the headers is 16 bytes. The headers become 23 bytes using phantom cids and we are limited by alignments, so we will have no more advantages unless we delete extra 7 bytes in the headers. ...and it seems to be very difficult. Dumb question... wouldn't getting down to 20 bytes buy us something? -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Another idea for dealing with cmin/cmax
Jim C. Nasby [EMAIL PROTECTED] writes: Dumb question... wouldn't getting down to 20 bytes buy us something? Only on 32-bit machines, which are getting less interesting as database servers every day. (Just last night I was reading somebody opining that the transition to 64-bit hardware would be effectively complete by 2008 ... and he was talking about desktop PCs, not serious iron.) BTW, the apparently useless byte after the 27- or 23-byte header actually has some good use: in a table of up to 8 columns, you can fit a null bitmap there for free. In a scheme that took us down to 20 rather than 19 bytes, even a narrow table would pay the full maxalign price for having a null. I'm in favor of combining cmin/cmax/xvac to get us down to 23 bytes, but I think anything beyond that is going to face a serious problem of greatly increased cost for diminishing returns. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Another idea for dealing with cmin/cmax
Jim C. Nasby wrote: In addition to/instead of abstracting cmin/cmax to a phantom ID, what about allowing for two versions of the tuple header, one with cid info and one without? That would allow for cid info to be stripped out when pages were written to disk. How exactly would that help? You can't just strip out cid info when writing to disk, if you don't want to lose the information. And it's certainly a lot more complicated than the phantom id thing. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Another idea for dealing with cmin/cmax
On Thu, Sep 28, 2006 at 05:13:11PM +0100, Heikki Linnakangas wrote: Jim C. Nasby wrote: In addition to/instead of abstracting cmin/cmax to a phantom ID, what about allowing for two versions of the tuple header, one with cid info and one without? That would allow for cid info to be stripped out when pages were written to disk. How exactly would that help? You can't just strip out cid info when writing to disk, if you don't want to lose the information. Erm, sorry, brainfart... yeah, we'd need to be able to write the info out to disk. The reason I thought of this is because once the transaction commits, we have no use for the cid info. So we could do something like have bgwriter look for tuples that belong to committed transactions before it writes a page, and strip the cid out of them. The problem with *that* is that (AFAIK) you'll need cid info again once you go to update or delete that tuple. And that might obviously need to spill to disk before the transaction commits. Back to the drawing board... -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Another idea for dealing with cmin/cmax
Jim C. Nasby [EMAIL PROTECTED] wrote: The reason I thought of this is because once the transaction commits, we have no use for the cid info. So we could do something like have bgwriter look for tuples that belong to committed transactions before it writes a page, and strip the cid out of them. Your concept is just like as the experimental method that I suggested before in http://archives.postgresql.org/pgsql-hackers/2005-08/msg01193.php We can remove cmin and cmax from commited tuples and xmin from frozen tuples and we might save some bytes in tuple headers. However, I think our next goal to shrink the headers is 16 bytes. The headers become 23 bytes using phantom cids and we are limited by alignments, so we will have no more advantages unless we delete extra 7 bytes in the headers. ...and it seems to be very difficult. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 6: explain analyze is your friend