On 11/16/22 02:47, Amit Langote wrote:
A customer seems to have run into $subject. Here's a reproducer they shared:
With the following logged: LOG: server process (PID 121846) was terminated by signal 9: Killed
That's the Linux OOM killer. Was this running in a container or under systemd with memory.limit_in_bytes set? If so, perhaps they need a higher setting.
The problem seems to be spi_printtup() continuing to allocate memory to expand _SPI_current->tuptable to store the result of crosstab()'s input query that's executed using: /* Retrieve the desired rows */ ret = SPI_execute(sql, true, 0); Note that this asks SPI to retrieve and store *all* result rows of the query in _SPI_current->tuptable, and if there happen to be so many rows, as in the case of above example, spi_printtup() ends up asking for a bit too much memory.
check
The easiest fix for this seems to be for crosstab() to use open a cursor (SPI_cursor_open) and fetch the rows in batches (SPI_cursor_fetch) rather than all in one go. I have implemented that in the attached. Maybe the patch should address other functions that potentially have the same problem.
Seems reasonable. I didn't look that closely at the patch, but I do think that there needs to be some justification for the selected batch size and/or make it configurable.
I also wondered about fixing this by making _SPI_current->tuptable use a tuplestore that can spill to disk as its backing store rather than a plain C HeapTuple array, but haven't checked how big of a change that would be; SPI_tuptable is referenced in many places across the tree. Though I suspect that idea has enough merits to give that a try someday.
Seems like a separate patch at the very least
Thoughts on whether this should be fixed and the fix be back-patched?
-1 on backpatching -- this is not a bug, and the changes are non-trivial Joe -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com