Re: [HACKERS] Reducing overhead for repeat de-TOASTing
Simon Riggs <[EMAIL PROTECTED]> writes: > Agreed. Yet I'm thinking that a more coherent approach to optimising the > tuple memory usage in the executor tree might be better than the special > cases we seem to have in various places. I don't know what that is, or > even if its possible though. Yeah. I had tried to think of a way to manage the cached detoasted value as part of the TupleTableSlot in which the toasted datum is (normally) stored, but there seems no way to know which slot that is at the point where pg_detoast_datum is invoked --- and as mentioned earlier, speculatively detoasting things at the point of the slot access seems a loser. [ thinks a bit more ... ] But there's always more than one way to skin a cat. Right now, when you fetch a toasted attribute value out of a Slot, what you get is a pointer to a stored-on-disk TOAST pointer, ie 0x80 or 0x01 length (18) struct varatt_external Now the Slot knows which attributes are varlena (it has a tupdesc) so it could easily check whether it's about to return one of these. It could instead return a pointer to, say 0x80 or 0x01 length (more than 18) struct varatt_external pointer to Slot pointer to detoasted value, or NULL if not detoasted yet and that pointer-to-Slot would give us the hook we need to manage the detoasting when and if pg_detoast_datum gets called. Both this struct and the ultimately decompressed value would be auxiliary memory belonging to the Slot, and would go away at slot clear. (This is certain to work since a not-toasted pass-by-ref datum in the tuple would have that same lifetime.) Come to think of it, if Slots are going to manage detoasted copies of attributes, we could have them auto-detoast inline-compressed Datums at the time of fetch. The argument that this might be wasted work has a lot less force for that case. I am not sure this is a better scheme than the backend-wide cache, but it's worth thinking about. It would have a lot less management overhead. On the other hand it couldn't amortize detoastings across repeated tuple fetches (such as could happen in a join, or successive queries on the same value). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
On Wed, 2008-06-18 at 09:45 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > You've not covered the idea that we just alter the execution so we just > > detoast once. > > That's because I already considered and rejected that idea. There's > no very good place to do it. See thread on postgis-devel: > > http://postgis.refractions.net/pipermail/postgis-devel/2008-June/003091.html > > Aside from the problems mentioned there, there's the issue that a lower > plan level doesn't have any way to know whether the value will be needed > at all. We could look for references to the Var but it's entirely > possible that the Var is being passed to some function that doesn't > require a fully detoasted result. It wouldn't do for this > "optimization" to disable the slice-fetch feature... Agreed. Yet I'm thinking that a more coherent approach to optimising the tuple memory usage in the executor tree might be better than the special cases we seem to have in various places. I don't know what that is, or even if its possible though. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
Simon Riggs <[EMAIL PROTECTED]> writes: > You've not covered the idea that we just alter the execution so we just > detoast once. That's because I already considered and rejected that idea. There's no very good place to do it. See thread on postgis-devel: http://postgis.refractions.net/pipermail/postgis-devel/2008-June/003091.html Aside from the problems mentioned there, there's the issue that a lower plan level doesn't have any way to know whether the value will be needed at all. We could look for references to the Var but it's entirely possible that the Var is being passed to some function that doesn't require a fully detoasted result. It wouldn't do for this "optimization" to disable the slice-fetch feature... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
On Mon, 2008-06-16 at 15:35 -0400, Tom Lane wrote: > Recent discussions with the PostGIS hackers led me to think about ways > to reduce overhead when the same TOAST value is repeatedly detoasted. > In the example shown here > http://archives.postgresql.org/pgsql-hackers/2008-06/msg00384.php > 90% of the runtime is being consumed by repeated detoastings of a single > datum. That is certainly an outlier case, but we've heard complaints of > TOAST being slow before. The solution I'm about to propose should fix > this, and as a bonus it will reduce the problem of memory leakage when > a detoasted value doesn't get pfreed. > > What I am imagining is that the tuple toaster will maintain a cache of > recently-detoasted values... > Comments, better ideas? Anyone think this is too much trouble to take > for the problem? You've not covered the idea that we just alter the execution so we just detoast once. If we tried harder to reduce the number of detoastings then we would benefit all of the cases you mention, including internal decompression. We would use memory, yes, but then so would a cache of recently detoasted values. If we see that the index scan key is toastable/ed then the lowest levels of the plan can create an expanded copy of the tuple and pass that upwards. We may need to do this in a longer lived context and explicitly free previous tuples to avoid memory bloat, but we'd have the same memory usage and same memory freeing issues as with caching. It just seems more direct and more obvious, especially since it is just an internal version of the workaround, which was to create a function to perform early detoasting. Maybe this could be done inside the IndexScan node when a tuple arrives with toasted values(s) for the scan key attribute(s). I presume there's various reasons why you've ruled that out, but with such a complex proposal it seems worth revisiting the alternatives, even if just to document them for the archives. -- Simon Riggs www.2ndQuadrant.com PostgreSQL Training, Services and Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
> > I definitely think it's worth it, even if it doesn't handle an > > inline-compressed datum. > > Yeah. I'm not certain how much benefit we could get there anyway. > If the datum isn't out-of-line then there's a small upper limit on how > big it can be and hence a small upper limit on how long it takes to > decompress. It's not clear that a complicated caching scheme would > pay for itself. Well there's a small upper limit per-instance but the aggregate could still be significant if you have a situation like btree scans which are repeatedly detoasting the same datum. Note that the "inline compressed" case includes packed varlenas which are being copied just to get their alignment right. It would be nice to get rid of that palloc/pfree bandwidth. I don't really see a way to do this though. If we hook into the original datum's mcxt we could use the pointer itself as a key. But if the original datum comes from a buffer that doesn't work. One thought I had -- which doesn't seem to go anywhere, but I thought was worth mentioning in case you see a way to leverage it that I don't -- is that if the toast key is already in the cache then deform_tuple could substitute the cached value directly instead of waiting for someone to detoast it. That means we can save all the subsequent trips to the toast cache manager. I'm not sure that would give us a convenient way to know when to unpin the toast cache entry though. It's possible that some code is aware that deform_tuple doesn't allocate anything currently and therefore doesn't set the memory context to anything that will live as long as the data it returns. Incidentally, I'm on vacation and reading this via an awful webmail interface. So I'm likely to miss some interesting stuff for a couple weeks. I suppose the Snr ratio of the list is likely to move but I'm not sure which direction...
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
Teodor Sigaev <[EMAIL PROTECTED]> writes: >> But we can resolve that by ruling that the required lifetime is the same >> as the value would have had if it'd really been palloc'd --- IOW, until >> the memory context that was current at the time gets deleted or reset. > Many support functions of GiST/GIN live in very short memory context - only > for > one call. So, that cache invalidation technique doesn't give any advantage > without rearranging this part. Right, but I think I've got that covered. The memory context reset won't actually flush the toast cache entry, it effectively just drops its reference count. We'll only drop cache entries when under memory pressure (or if they're invalidated by toast table updates/deletes). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
But we can resolve that by ruling that the required lifetime is the same as the value would have had if it'd really been palloc'd --- IOW, until the memory context that was current at the time gets deleted or reset. Many support functions of GiST/GIN live in very short memory context - only for one call. So, that cache invalidation technique doesn't give any advantage without rearranging this part. -- Teodor Sigaev E-mail: [EMAIL PROTECTED] WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
Jeff <[EMAIL PROTECTED]> writes: > On Jun 16, 2008, at 3:35 PM, Tom Lane wrote: >> the result of decompressing an inline-compressed datum, because those >> have no unique ID that could be used for a lookup key. This puts a >> bit of a > Wouldn't the tid fit this? or table oid + tid? No. The killer reason why not is that at the time we need to decompress a datum, we don't know what row it came from. There are some other problems too... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
On Jun 16, 2008, at 3:35 PM, Tom Lane wrote: to a cache entry rather than a freshly palloc'd value. The cache lookup key is the toast table OID plus value OID. Now pg_detoast_datum() has no ... the result of decompressing an inline-compressed datum, because those have no unique ID that could be used for a lookup key. This puts a bit of a Wouldn't the tid fit this? or table oid + tid? -- Jeff Trout <[EMAIL PROTECTED]> http://www.stuarthamm.net/ http://www.dellsmartexitin.com/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
Stephen Frost <[EMAIL PROTECTED]> writes: > * Tom Lane ([EMAIL PROTECTED]) wrote: >> Comments, better ideas? Anyone think this is too much trouble to take >> for the problem? > I definitely think it's worth it, even if it doesn't handle an > inline-compressed datum. Yeah. I'm not certain how much benefit we could get there anyway. If the datum isn't out-of-line then there's a small upper limit on how big it can be and hence a small upper limit on how long it takes to decompress. It's not clear that a complicated caching scheme would pay for itself. The profile shown here: http://postgis.refractions.net/pipermail/postgis-devel/2008-June/003081.html shows that the problem the PostGIS guys are looking at is definitely an out-of-line case (in fact, it looks like the datum wasn't even compressed). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Reducing overhead for repeat de-TOASTing
* Tom Lane ([EMAIL PROTECTED]) wrote: > One unsolved problem is that this scheme doesn't provide any way to cache > the result of decompressing an inline-compressed datum, because those have > no unique ID that could be used for a lookup key. That's pretty unfortunate. > Ideas? Not at the moment, but given the situation it really does strike me as something we want to solve. Inventing an ID would likely be overkill or wouldn't solve the problem anyway, I'm guessing... > Comments, better ideas? Anyone think this is too much trouble to take > for the problem? I definitely think it's worth it, even if it doesn't handle an inline-compressed datum. PostGIS is certainly a good use case for why, but I doubt it's the only one. Thanks! Stephen signature.asc Description: Digital signature
[HACKERS] Reducing overhead for repeat de-TOASTing
Recent discussions with the PostGIS hackers led me to think about ways to reduce overhead when the same TOAST value is repeatedly detoasted. In the example shown here http://archives.postgresql.org/pgsql-hackers/2008-06/msg00384.php 90% of the runtime is being consumed by repeated detoastings of a single datum. That is certainly an outlier case, but we've heard complaints of TOAST being slow before. The solution I'm about to propose should fix this, and as a bonus it will reduce the problem of memory leakage when a detoasted value doesn't get pfreed. What I am imagining is that the tuple toaster will maintain a cache of recently-detoasted values, and that pg_detoast_datum() returns a pointer to a cache entry rather than a freshly palloc'd value. The cache lookup key is the toast table OID plus value OID. Now pg_detoast_datum() has no idea where the pointer it returns will go, so the big problem with this scheme is that it's hard to tell how long a cache entry needs to live. But we can resolve that by ruling that the required lifetime is the same as the value would have had if it'd really been palloc'd --- IOW, until the memory context that was current at the time gets deleted or reset. This can be implemented by allowing the toaster cache to hook into the memory context delete/reset calls. (We have no such capability now, but there have been some previous cases where it would've come in handy, so I think this would be a useful extension anyhow. It should be an actual hook that can be used by anyone, not something where mcxt.c is specifically passing control to tuptoaster.) Once a cache entry is no longer required anywhere, it can be removed, and will be if needed to shrink the cache to some target size (work_mem perhaps, or is it worth inventing a separate GUC for this?). But we can hang onto the value longer if it's not taking up needed space. This is important since the use pattern exhibited in the PostGIS problem involves repeat detoastings that are separated by MemoryContextResets --- if we free the cache entry as soon as possible, we'll not gain anything. The other cache management problem that would have to be solved is to invalidate entries when the underlying TOAST table or individual TOAST value is deleted. This is exactly like the problem for tuples in the syscaches, and can be solved via sinval messaging. (The reason we need to worry about this is to guard against the possibility that the same identifier is re-used for a new TOAST value; otherwise we could just let dead values age out of the cache.) A change of this sort has the potential to break a lot of code, but I think the only thing we'd really have to do is turn PG_FREE_IF_COPY() into a no-op. In general, functions are not supposed to scribble on pass-by-reference input datums, and since most functions don't actually distinguish whether their inputs got detoasted (except perhaps by using PG_FREE_IF_COPY()), that means that they won't be scribbling on the toast cache entry either. It would be possible to extend this concept to caching toast slice fetches (pg_detoast_datum_slice() calls) if we add the offset/length arguments to the cache lookup key. I'm not certain if that's worth the trouble though --- any feelings about that? Also, that case is much more likely to have callers that think they can scribble on the result, since they "know" it must be a palloc'd value. One unsolved problem is that this scheme doesn't provide any way to cache the result of decompressing an inline-compressed datum, because those have no unique ID that could be used for a lookup key. This puts a bit of a damper on the idea of making PG_FREE_IF_COPY() a no-op --- if it is, then detoasting a datum of that type would indeed cause a memory leak. The best idea I have at the moment is to continue to have inline decompression produce a palloc'd value, leave PG_FREE_IF_COPY() as-is, and arrange for pfree() on toast cache entries to be a no-op. (Which we can do by setting up a special memory context methods pointer for them.) It'd be cool if there were a way to cache decompression though. Ideas? Comments, better ideas? Anyone think this is too much trouble to take for the problem? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers