Re: pg_walinspect memory leaks
On Mon, Feb 20, 2023 at 11:34:03AM -0800, Jeff Davis wrote: > Committed to 16 with the changes to GetXLogSummaryStats() as well. > Committed unmodified version of your 15 backport. Thank you! Thanks for taking care of the FPI code path, Jeff! -- Michael signature.asc Description: PGP signature
Re: pg_walinspect memory leaks
On Mon, 2023-02-20 at 15:17 +0530, Bharath Rupireddy wrote: > Similarly, the loops in GetXLogSummaryStats() too > don't palloc any memory, so no memory leak. Break on palloc in gdb in that loop and you'll see a palloc in CStringGetTextDatum(name). In general, you should expect *GetDatum() to palloc unless you're sure that it's pass-by-value. Even Float8GetDatum() has code to account for pass-by-ref float8s. There are also a couple calls to psprintf() in the stats_per_record path. > I've seen no memory growth > during execution of pg_get_wal_stats_till_end_of_wal() for 35million > WAL records, see [1] PID 543967 (during the execution of the stats > function, the memory usage remained constant). Therefore, I feel that > the fix isn't required for GetWalStats(). That is true because the loops in GetXLogSummaryStats() are based on constants. It does at most RM_MAX_ID * MAX_XLINFO_TYPES calls to FillXLogStatsRow() regardless of the number of WAL records. It's not a significant amount of memory, at least today. But, since we're already using the temp context pattern, we might as well use it here for clarity so that we don't have to guess about whether the amount of memory is significant or not. Committed to 16 with the changes to GetXLogSummaryStats() as well. Committed unmodified version of your 15 backport. Thank you! -- Jeff Davis PostgreSQL Contributor Team - AWS
Re: pg_walinspect memory leaks
On Sat, Feb 18, 2023 at 5:07 AM Jeff Davis wrote: > > On Thu, 2023-02-16 at 18:00 +0530, Bharath Rupireddy wrote: > > I'm attaching the patches here. For HEAD, I'd > > want to be a bit defensive and use the temporary memory context for > > pg_get_wal_fpi_info() too. > > I don't see why we shouldn't backpatch that, too? pg_get_wal_fpi_info() is added in v16, so backpatching isn't necessary. > Also, it seems like we should do the same thing for the loop in > GetXLogSummaryStats(). Maybe just for the outer loop is fine (the inner > loop is only 16 elements); though again, there's not an obvious > downside to fixing that, too. Firstly, WAL record traversing loop in GetWalStats() really doesn't leak memory, because it just increments some counters and doesn't palloc any memory. Similarly, the loops in GetXLogSummaryStats() too don't palloc any memory, so no memory leak. I've seen no memory growth during execution of pg_get_wal_stats_till_end_of_wal() for 35million WAL records, see [1] PID 543967 (during the execution of the stats function, the memory usage remained constant). Therefore, I feel that the fix isn't required for GetWalStats(). [1] PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 543967 ubuntu20 0 168668 152056 149988 R 99.7 0.5 1:33.72 postgres 412271 ubuntu20 0 1101852 252724 42904 S 1.3 0.8 2:18.36 node 412208 ubuntu20 0 965000 112488 36012 S 0.3 0.4 0:23.46 node 477193 ubuntu20 0 5837096 34172 9420 S 0.3 0.1 0:00.93 cpptools-srv -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pg_walinspect memory leaks
On Thu, 2023-02-16 at 18:00 +0530, Bharath Rupireddy wrote: > I'm attaching the patches here. For HEAD, I'd > want to be a bit defensive and use the temporary memory context for > pg_get_wal_fpi_info() too. I don't see why we shouldn't backpatch that, too? Also, it seems like we should do the same thing for the loop in GetXLogSummaryStats(). Maybe just for the outer loop is fine (the inner loop is only 16 elements); though again, there's not an obvious downside to fixing that, too. -- Jeff Davis PostgreSQL Contributor Team - AWS
Re: pg_walinspect memory leaks
On Thu, Feb 16, 2023 at 06:00:00PM +0530, Bharath Rupireddy wrote: > The memory usage goes up with many WAL records in GetWALRecordsInfo(). > The affected functions are pg_get_wal_records_info() and > pg_get_wal_records_info_till_end_of_wal(). I think the best way to fix > this is to use a temporary memory context (like the jsonfuncs.c), > reset it after every tuple is put into the tuple store. This fix keeps > the memory under limits. I'm attaching the patches here. What you are doing here looks OK, at quick glance. That's common across the code, see also dblink or file_fdw. > For HEAD, I'd > want to be a bit defensive and use the temporary memory context for > pg_get_wal_fpi_info() too. If there is a burst of FPWs across the range you are scanning, the problem could be equally worse. Sorry for missing that. -- Michael signature.asc Description: PGP signature
Re: pg_walinspect memory leaks
On Tue, Feb 14, 2023 at 4:07 PM Bharath Rupireddy wrote: > > On Tue, Feb 14, 2023 at 6:25 AM Andres Freund wrote: > > > > Hi, > > > > On 2023-02-13 15:22:02 -0800, Peter Geoghegan wrote: > > > More concretely, it looks like GetWALRecordInfo() calls > > > CStringGetTextDatum/cstring_to_text in a way that accumulates way too > > > much memory in ExprContext. > > > > Additionally, we leak two stringinfos for each record. > > > > > > > This could be avoided by using a separate memory context that is reset > > > periodically, or something else along the same lines. > > > > Everything other than a per-row memory context that's reset each time seems > > hard to manage in this case. > > > > Somehwat funnily, GetWALRecordsInfo() then ends up being unnecessarily > > dilligent about cleaning up O(1) memory, after not caring about O(N) > > memory... > > Thanks for reporting. I'll get back to you on this soon. The memory usage goes up with many WAL records in GetWALRecordsInfo(). The affected functions are pg_get_wal_records_info() and pg_get_wal_records_info_till_end_of_wal(). I think the best way to fix this is to use a temporary memory context (like the jsonfuncs.c), reset it after every tuple is put into the tuple store. This fix keeps the memory under limits. I'm attaching the patches here. For HEAD, I'd want to be a bit defensive and use the temporary memory context for pg_get_wal_fpi_info() too. And, the fix also needs to be back-patched to PG15. [1] HEAD: PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 1105979 ubuntu20 0 28.5g 28.4g 150492 R 80.7 93.0 1:47.12 postgres PATCHED: PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND 13149 ubuntu20 0 173244 156872 150688 R 79.0 0.5 1:25.09 postgres postgres=# select count(*) from pg_get_wal_records_info_till_end_of_wal('0/100'); count -- 35285649 (1 row) postgres=# select pg_backend_pid(); pg_backend_pid 13149 (1 row) -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 56ce258a6c4b4118a4cbe6612fbafb6cb172f7e3 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Wed, 15 Feb 2023 06:26:30 + Subject: [PATCH v1] Limit memory usage of pg_walinspect functions Some of the pg_walinspect functions can overuse and leak memory across WAL records iterations and can cause OOM if there are many WAL records to output are present. Fix this by using a temporary memory context that's reset for each WAL record iteraion. Reported-by: Peter Geoghegan Author: Bharath Rupireddy Discussion: https://www.postgresql.org/message-id/CAH2-WznLEJjn7ghmKOABOEZYuJvkTk%3DGKU3m0%2B-XBAH%2BerPiJQ%40mail.gmail.com Backpatch-through: 15 --- contrib/pg_walinspect/pg_walinspect.c | 28 +++ 1 file changed, 28 insertions(+) diff --git a/contrib/pg_walinspect/pg_walinspect.c b/contrib/pg_walinspect/pg_walinspect.c index 91b740ed27..9d429c6d73 100644 --- a/contrib/pg_walinspect/pg_walinspect.c +++ b/contrib/pg_walinspect/pg_walinspect.c @@ -304,6 +304,8 @@ pg_get_wal_fpi_info(PG_FUNCTION_ARGS) XLogRecPtr start_lsn; XLogRecPtr end_lsn; XLogReaderState *xlogreader; + MemoryContext old_cxt; + MemoryContext tmp_cxt; start_lsn = PG_GETARG_LSN(0); end_lsn = PG_GETARG_LSN(1); @@ -314,14 +316,26 @@ pg_get_wal_fpi_info(PG_FUNCTION_ARGS) xlogreader = InitXLogReaderState(start_lsn); + tmp_cxt = AllocSetContextCreate(CurrentMemoryContext, + "pg_get_wal_fpi_info temporary cxt", + ALLOCSET_DEFAULT_SIZES); + while (ReadNextXLogRecord(xlogreader) && xlogreader->EndRecPtr <= end_lsn) { + /* Use the tmp context so we can clean up after each tuple is done */ + old_cxt = MemoryContextSwitchTo(tmp_cxt); + GetWALFPIInfo(fcinfo, xlogreader); + /* clean up and switch back */ + MemoryContextSwitchTo(old_cxt); + MemoryContextReset(tmp_cxt); + CHECK_FOR_INTERRUPTS(); } + MemoryContextDelete(tmp_cxt); pfree(xlogreader->private_data); XLogReaderFree(xlogreader); @@ -440,23 +454,37 @@ GetWALRecordsInfo(FunctionCallInfo fcinfo, XLogRecPtr start_lsn, ReturnSetInfo *rsinfo = (ReturnSetInfo *) fcinfo->resultinfo; Datum values[PG_GET_WAL_RECORDS_INFO_COLS] = {0}; bool nulls[PG_GET_WAL_RECORDS_INFO_COLS] = {0}; + MemoryContext old_cxt; + MemoryContext tmp_cxt; InitMaterializedSRF(fcinfo, 0); xlogreader = InitXLogReaderState(start_lsn); + tmp_cxt = AllocSetContextCreate(CurrentMemoryContext, + "GetWALRecordsInfo temporary cxt", + ALLOCSET_DEFAULT_SIZES); + while (ReadNextXLogRecord(xlogreader) && xlogreader->EndRecPtr <= end_lsn) { + /* Use the tmp context so we can clean up after each tuple is done */ + old_cxt = MemoryContextSwitchTo(tmp_cxt); + GetWALRecordInfo(xlogreader, values, nulls, PG_GET_WAL_RECORDS_INFO_COLS); tuplestore_putvalues(rsinfo->setResult, rsinfo->setDesc,
Re: pg_walinspect memory leaks
On Tue, Feb 14, 2023 at 6:25 AM Andres Freund wrote: > > Hi, > > On 2023-02-13 15:22:02 -0800, Peter Geoghegan wrote: > > More concretely, it looks like GetWALRecordInfo() calls > > CStringGetTextDatum/cstring_to_text in a way that accumulates way too > > much memory in ExprContext. > > Additionally, we leak two stringinfos for each record. > > > > This could be avoided by using a separate memory context that is reset > > periodically, or something else along the same lines. > > Everything other than a per-row memory context that's reset each time seems > hard to manage in this case. > > Somehwat funnily, GetWALRecordsInfo() then ends up being unnecessarily > dilligent about cleaning up O(1) memory, after not caring about O(N) memory... Thanks for reporting. I'll get back to you on this soon. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pg_walinspect memory leaks
Hi, On 2023-02-13 15:22:02 -0800, Peter Geoghegan wrote: > More concretely, it looks like GetWALRecordInfo() calls > CStringGetTextDatum/cstring_to_text in a way that accumulates way too > much memory in ExprContext. Additionally, we leak two stringinfos for each record. > This could be avoided by using a separate memory context that is reset > periodically, or something else along the same lines. Everything other than a per-row memory context that's reset each time seems hard to manage in this case. Somehwat funnily, GetWALRecordsInfo() then ends up being unnecessarily dilligent about cleaning up O(1) memory, after not caring about O(N) memory... Greetings, Andres Freund