On 20/10/2023 05:06, Stephen Frost wrote:
Greetings,

* Andrei Lepikhov (a.lepik...@postgrespro.ru) wrote:
On 19/10/2023 02:00, Stephen Frost wrote:
* Andrei Lepikhov (a.lepik...@postgrespro.ru) wrote:
On 29/9/2023 09:52, Andrei Lepikhov wrote:
On 22/5/2023 22:59, reid.thomp...@crunchydata.com wrote:
Attach patches updated to master.
Pulled from patch 2 back to patch 1 a change that was also pertinent
to patch 1.
+1 to the idea, have doubts on the implementation.

I have a question. I see the feature triggers ERROR on the exceeding of
the memory limit. The superior PG_CATCH() section will handle the error.
As I see, many such sections use memory allocations. What if some
routine, like the CopyErrorData(), exceeds the limit, too? In this case,
we could repeat the error until the top PG_CATCH(). Is this correct
behaviour? Maybe to check in the exceeds_max_total_bkend_mem() for
recursion and allow error handlers to slightly exceed this hard limit?

By the patch in attachment I try to show which sort of problems I'm worrying
about. In some PП_CATCH() sections we do CopyErrorData (allocate some
memory) before aborting the transaction. So, the allocation error can move
us out of this section before aborting. We await for soft ERROR message but
will face more hard consequences.

While it's an interesting idea to consider making exceptions to the
limit, and perhaps we'll do that (or have some kind of 'reserve' for
such cases), this isn't really any different than today, is it?  We
might have a malloc() failure in the main path, end up in PG_CATCH() and
then try to do a CopyErrorData() and have another malloc() failure.

If we can rearrange the code to make this less likely to happen, by
doing a bit more work to free() resources used in the main path before
trying to do new allocations, then, sure, let's go ahead and do that,
but that's independent from this effort.

I agree that rearranging efforts can be made independently. The code in the
letter above was shown just as a demo of the case I'm worried about.
IMO, the thing that should be implemented here is a recursion level for the
memory limit. If processing the error, we fall into recursion with this
limit - we should ignore it.
I imagine custom extensions that use PG_CATCH() and allocate some data
there. At least we can raise the level of error to FATAL.

Ignoring such would defeat much of the point of this effort- which is to
get to a position where we can say with some confidence that we're not
going to go over some limit that the user has set and therefore not
allow ourselves to end up getting OOM killed.  These are all the same
issues that already exist today on systems which don't allow overcommit
too, there isn't anything new here in regards to these risks, so I'm not
really keen to complicate this to deal with issues that are already
there.

Perhaps once we've got the basics in place then we could consider
reserving some space for handling such cases..  but I don't think it'll
actually be very clean and what if we have an allocation that goes
beyond what that reserved space is anyway?  Then we're in the same spot
again where we have the choice of either failing the allocation in a
less elegant way than we might like to handle that error, or risk
getting outright kill'd by the kernel.  Of those choices, sure seems
like failing the allocation is the better way to go.

I've got your point.
The only issue I worry about is the uncertainty and clutter that can be created by this feature. In the worst case, when we have a complex error stack (including the extension's CATCH sections, exceptions in stored procedures, etc.), the backend will throw the memory limit error repeatedly. Of course, one failed backend looks better than a surprisingly killed postmaster, but the mix of different error reports and details looks terrible and challenging to debug in the case of trouble. So, may we throw a FATAL error if we reach this limit while handling an exception?

--
regards,
Andrey Lepikhov
Postgres Professional



Reply via email to