Re: [HACKERS] Reducing overhead for repeat de-TOASTing
On Mon, 2008-06-16 at 15:35 -0400, Tom Lane wrote: Recent discussions with the PostGIS hackers led me to think about ways to reduce overhead when the same TOAST value is repeatedly detoasted. In the example shown here http://archives.postgresql.org/pgsql-hackers/2008-06/msg00384.php 90% of the runtime is being consumed by repeated detoastings of a single datum. That is certainly an outlier case, but we've heard complaints of TOAST being slow before. The solution I'm about to propose should fix this, and as a bonus it will reduce the problem of memory leakage when a detoasted value doesn't get pfreed. What I am imagining is that the tuple toaster will maintain a cache of recently-detoasted values... Comments, better ideas? Anyone think this is too much trouble to take for the problem? You've not covered the idea that we just alter the execution so we just detoast once. If we tried harder to reduce the number of detoastings then we would benefit all of the cases you mention, including internal decompression. We would use memory, yes, but then so would a cache of recently detoasted values. If we see that the index scan key is toastable/ed then the lowest levels of the plan can create an expanded copy of the tuple and pass that upwards. We may need to do this in a longer lived context and explicitly free previous tuples to avoid memory bloat, but we'd have the same memory usage and same memory freeing issues as with caching. It just seems more direct and more obvious, especially since it is just an internal version of the workaround, which was to create a function to perform early detoasting. Maybe this could be done inside the IndexScan node when a tuple arrives with toasted values(s) for the scan key attribute(s). I presume there's various reasons why you've ruled that out, but with such a complex proposal it seems worth revisiting the alternatives, even if just to document them for the archives. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
Simon Riggs [EMAIL PROTECTED] writes: Agreed. Yet I'm thinking that a more coherent approach to optimising the tuple memory usage in the executor tree might be better than the special cases we seem to have in various places. I don't know what that is, or even if its possible though. Yeah. I had tried to think of a way to manage the cached detoasted value as part of the TupleTableSlot in which the toasted datum is (normally) stored, but there seems no way to know which slot that is at the point where pg_detoast_datum is invoked --- and as mentioned earlier, speculatively detoasting things at the point of the slot access seems a loser. [ thinks a bit more ... ] But there's always more than one way to skin a cat. Right now, when you fetch a toasted attribute value out of a Slot, what you get is a pointer to a stored-on-disk TOAST pointer, ie 0x80 or 0x01 length (18) struct varatt_external Now the Slot knows which attributes are varlena (it has a tupdesc) so it could easily check whether it's about to return one of these. It could instead return a pointer to, say 0x80 or 0x01 length (more than 18) struct varatt_external pointer to Slot pointer to detoasted value, or NULL if not detoasted yet and that pointer-to-Slot would give us the hook we need to manage the detoasting when and if pg_detoast_datum gets called. Both this struct and the ultimately decompressed value would be auxiliary memory belonging to the Slot, and would go away at slot clear. (This is certain to work since a not-toasted pass-by-ref datum in the tuple would have that same lifetime.) Come to think of it, if Slots are going to manage detoasted copies of attributes, we could have them auto-detoast inline-compressed Datums at the time of fetch. The argument that this might be wasted work has a lot less force for that case. I am not sure this is a better scheme than the backend-wide cache, but it's worth thinking about. It would have a lot less management overhead. On the other hand it couldn't amortize detoastings across repeated tuple fetches (such as could happen in a join, or successive queries on the same value). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
On Jun 16, 2008, at 3:35 PM, Tom Lane wrote: to a cache entry rather than a freshly palloc'd value. The cache lookup key is the toast table OID plus value OID. Now pg_detoast_datum() has no ... the result of decompressing an inline-compressed datum, because those have no unique ID that could be used for a lookup key. This puts a bit of a Wouldn't the tid fit this? or table oid + tid? -- Jeff Trout [EMAIL PROTECTED] http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
Jeff [EMAIL PROTECTED] writes: On Jun 16, 2008, at 3:35 PM, Tom Lane wrote: the result of decompressing an inline-compressed datum, because those have no unique ID that could be used for a lookup key. This puts a bit of a Wouldn't the tid fit this? or table oid + tid? No. The killer reason why not is that at the time we need to decompress a datum, we don't know what row it came from. There are some other problems too... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
But we can resolve that by ruling that the required lifetime is the same as the value would have had if it'd really been palloc'd --- IOW, until the memory context that was current at the time gets deleted or reset. Many support functions of GiST/GIN live in very short memory context - only for one call. So, that cache invalidation technique doesn't give any advantage without rearranging this part. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
Teodor Sigaev [EMAIL PROTECTED] writes: But we can resolve that by ruling that the required lifetime is the same as the value would have had if it'd really been palloc'd --- IOW, until the memory context that was current at the time gets deleted or reset. Many support functions of GiST/GIN live in very short memory context - only for one call. So, that cache invalidation technique doesn't give any advantage without rearranging this part. Right, but I think I've got that covered. The memory context reset won't actually flush the toast cache entry, it effectively just drops its reference count. We'll only drop cache entries when under memory pressure (or if they're invalidated by toast table updates/deletes). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
I definitely think it's worth it, even if it doesn't handle an inline-compressed datum. Yeah. I'm not certain how much benefit we could get there anyway. If the datum isn't out-of-line then there's a small upper limit on how big it can be and hence a small upper limit on how long it takes to decompress. It's not clear that a complicated caching scheme would pay for itself. Well there's a small upper limit per-instance but the aggregate could still be significant if you have a situation like btree scans which are repeatedly detoasting the same datum. Note that the inline compressed case includes packed varlenas which are being copied just to get their alignment right. It would be nice to get rid of that palloc/pfree bandwidth. I don't really see a way to do this though. If we hook into the original datum's mcxt we could use the pointer itself as a key. But if the original datum comes from a buffer that doesn't work. One thought I had -- which doesn't seem to go anywhere, but I thought was worth mentioning in case you see a way to leverage it that I don't -- is that if the toast key is already in the cache then deform_tuple could substitute the cached value directly instead of waiting for someone to detoast it. That means we can save all the subsequent trips to the toast cache manager. I'm not sure that would give us a convenient way to know when to unpin the toast cache entry though. It's possible that some code is aware that deform_tuple doesn't allocate anything currently and therefore doesn't set the memory context to anything that will live as long as the data it returns. Incidentally, I'm on vacation and reading this via an awful webmail interface. So I'm likely to miss some interesting stuff for a couple weeks. I suppose the Snr ratio of the list is likely to move but I'm not sure which direction...
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
* Tom Lane ([EMAIL PROTECTED]) wrote: One unsolved problem is that this scheme doesn't provide any way to cache the result of decompressing an inline-compressed datum, because those have no unique ID that could be used for a lookup key. That's pretty unfortunate. Ideas? Not at the moment, but given the situation it really does strike me as something we want to solve. Inventing an ID would likely be overkill or wouldn't solve the problem anyway, I'm guessing... Comments, better ideas? Anyone think this is too much trouble to take for the problem? I definitely think it's worth it, even if it doesn't handle an inline-compressed datum. PostGIS is certainly a good use case for why, but I doubt it's the only one. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
Stephen Frost [EMAIL PROTECTED] writes: * Tom Lane ([EMAIL PROTECTED]) wrote: Comments, better ideas? Anyone think this is too much trouble to take for the problem? I definitely think it's worth it, even if it doesn't handle an inline-compressed datum. Yeah. I'm not certain how much benefit we could get there anyway. If the datum isn't out-of-line then there's a small upper limit on how big it can be and hence a small upper limit on how long it takes to decompress. It's not clear that a complicated caching scheme would pay for itself. The profile shown here: http://postgis.refractions.net/pipermail/postgis-devel/2008-June/003081.html shows that the problem the PostGIS guys are looking at is definitely an out-of-line case (in fact, it looks like the datum wasn't even compressed). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers