Re: [BUGS] Memory-leak in BackgroundWriter(and Checkpointer)
Any idea when it will be available in official PostgreSQL release 9.1.x or 9.2.x ? -- View this message in context: http://postgresql.1045698.n5.nabble.com/Memory-leak-in-BackgroundWriter-and-Checkpointer-tp5757869p5758783.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Memory-leak in BackgroundWriter(and Checkpointer)
* Pawel Kozik (pawel.ko...@alcatel-lucent.com) wrote: Any idea when it will be available in official PostgreSQL release 9.1.x or 9.2.x ? Yes, the next set of point releases should include Tom's patch to fix this leak. Thanks, Stephen signature.asc Description: Digital signature
[BUGS] Memory-leak in BackgroundWriter(and Checkpointer)
Hi All. I've found a memory-leak bug in PostgreSQL 9.1.9's background writer process. When following parameter is set, it occurs every time CHECKPOINT runs. wal_level = hot_standby I've also confirmed REL9_1_STABLE and it is not fixed yet. In additional, it also occurs in 9.3.beta1's checkpointer. I created a patch (attached file) for 9.1.9, so could anyone confirm? Best regards. --- Naoya Anzai NEC Soft,Ltd. http://www.necsoft.com/eng/ standby.patch Description: Binary data -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Memory-leak in BackgroundWriter(and Checkpointer)
* Naoya Anzai (anzai-na...@mxu.nes.nec.co.jp) wrote: I've found a memory-leak bug in PostgreSQL 9.1.9's background writer process. This looks legit, but probably not the right approach to fixing it. Looks like it'd be better to work out a way to use a static variable to reuse the same memory, ala what GetRunningTransactionData() does, and avoid having to do allocation while holding all the locks (or at least, not very often). Thanks, Stephen signature.asc Description: Digital signature
Re: [BUGS] Memory-leak in BackgroundWriter(and Checkpointer)
On 04.06.2013 15:27, Stephen Frost wrote: * Naoya Anzai (anzai-na...@mxu.nes.nec.co.jp) wrote: I've found a memory-leak bug in PostgreSQL 9.1.9's background writer process. This looks legit, but probably not the right approach to fixing it. Looks like it'd be better to work out a way to use a static variable to reuse the same memory, ala what GetRunningTransactionData() does, and avoid having to do allocation while holding all the locks (or at least, not very often). I can't get too excited about the overhead of a single palloc here. It's a fairly heavy operation anyway, and only runs once per checkpoint. And we haven't heard any actual complaints of latency hiccups with wal_level=hot_standby. - Heikki -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Memory-leak in BackgroundWriter(and Checkpointer)
On 2013-06-04 16:23:00 +0300, Heikki Linnakangas wrote: On 04.06.2013 15:27, Stephen Frost wrote: * Naoya Anzai (anzai-na...@mxu.nes.nec.co.jp) wrote: I've found a memory-leak bug in PostgreSQL 9.1.9's background writer process. This looks legit, but probably not the right approach to fixing it. Looks like it'd be better to work out a way to use a static variable to reuse the same memory, ala what GetRunningTransactionData() does, and avoid having to do allocation while holding all the locks (or at least, not very often). I can't get too excited about the overhead of a single palloc here. It's a fairly heavy operation anyway, and only runs once per checkpoint. And we haven't heard any actual complaints of latency hiccups with wal_level=hot_standby. I think we will have to resort to running it more frequently in the not too far away future, its way to easy to get into a situation where all of the checkpoints/xl_running_xact's are suboverflowed delaying consistency considerably. Seems more consistent with the rest of the code too. But anyway, I am fine with fixing it either way. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Memory-leak in BackgroundWriter(and Checkpointer)
* Andres Freund (and...@2ndquadrant.com) wrote: On 2013-06-04 16:23:00 +0300, Heikki Linnakangas wrote: I can't get too excited about the overhead of a single palloc here. It's a fairly heavy operation anyway, and only runs once per checkpoint. And we haven't heard any actual complaints of latency hiccups with wal_level=hot_standby. Perhaps, but we don't always hear about things that we should- this long standing memory leak being one of them. Seems more consistent with the rest of the code too. But anyway, I am fine with fixing it either way. And this is really the other point- having LogStandbySnapshot() need to clean up after GetRunningTransactionLocks() but not GetRunningTransactionData() would strike me as very odd. Thanks, Stephen signature.asc Description: Digital signature
Re: [BUGS] Memory-leak in BackgroundWriter(and Checkpointer)
Stephen Frost sfr...@snowman.net writes: * Andres Freund (and...@2ndquadrant.com) wrote: Seems more consistent with the rest of the code too. But anyway, I am fine with fixing it either way. And this is really the other point- having LogStandbySnapshot() need to clean up after GetRunningTransactionLocks() but not GetRunningTransactionData() would strike me as very odd. Meh. I'm not impressed with permanently allocating an array large enough to hold all the locks GetRunningTransactionLocks might return --- that's potentially much larger than the other array, and in fact I don't think we have a hard limit on its size at all. Besides which, it's not like there is *no* cleanup for GetRunningTransactionData --- it has a lock that has to be released ... I think the proposed fix is fine code-wise; the real problem here is crummy commenting. GetRunningTransactionLocks isn't documented as returning a palloc'd array, and why the heck do we have a long comment about its implementation in LogStandbySnapshot? regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Memory-leak in BackgroundWriter(and Checkpointer)
* Andres Freund (and...@2ndquadrant.com) wrote: Seems more consistent with the rest of the code too. But anyway, I am fine with fixing it either way. Patch attached which mirrors what GetRunningTransactionData() (the other function called from LogStandbySnapshot) does, more-or-less- uses a static variable to keep track of memory allocated for the data being returned to the caller. Looks like this will need to be back-patched to 9.0. Barring objections or better ideas, I'll do that in a few hours. Thanks, Stephen colordiff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c new file mode 100644 index 8cd871f..1fb7884 *** a/src/backend/storage/lmgr/lock.c --- b/src/backend/storage/lmgr/lock.c *** GetLockStatusData(void) *** 3401,3415 * Returns a list of currently held AccessExclusiveLocks, for use * by GetRunningTransactionData(). */ xl_standby_lock * GetRunningTransactionLocks(int *nlocks) { PROCLOCK *proclock; HASH_SEQ_STATUS seqstat; int i; int index; ! int els; ! xl_standby_lock *accessExclusiveLocks; /* * Acquire lock on the entire shared lock data structure. --- 3401,3445 * Returns a list of currently held AccessExclusiveLocks, for use * by GetRunningTransactionData(). */ + #define INIT_STANDBY_LOCK_ARR_SIZE 32 /* Initial return workspace size */ xl_standby_lock * GetRunningTransactionLocks(int *nlocks) { + /* result workspace */ + static xl_standby_locks *accessExclusiveLocks; + PROCLOCK *proclock; HASH_SEQ_STATUS seqstat; int i; int index; ! ! /* ! * Allocating space for all locks is overkill. We only need space for ! * access exclusive locks, but we can't count the number of locks without ! * locking all of the lock partitions. We don't really want to do ! * allocations while holding all those locks, so instead do a bit of ! * allocation up front, to hopefully avoid having to do them later. ! */ ! if (accessExclusiveLocks == NULL) ! { ! /* ! * First call ! * ! * Note: We are tracking the allocated space for locks in -nlocks, ! * not the actual number of locks found, that's handled with index ! * below. Also, allocating the structure will start us off with space ! * for one, so discount that from the malloc request. ! */ ! accessExclusiveLocks = (xl_standby_locks *) ! malloc(sizeof(xl_standby_locks) + ! ((INIT_STANDBY_LOCK_ARR_SIZE - 1) * sizeof(xl_standby_lock))); ! if (accessExclusiveLocks == NULL) ! ereport(ERROR, ! (errcode(ERRCODE_OUT_OF_MEMORY), ! errmsg(out of memory))); ! ! accessExclusiveLocks-nlocks = INIT_STANDBY_LOCK_ARR_SIZE; ! } /* * Acquire lock on the entire shared lock data structure. *** GetRunningTransactionLocks(int *nlocks) *** 3419,3433 for (i = 0; i NUM_LOCK_PARTITIONS; i++) LWLockAcquire(FirstLockMgrLock + i, LW_SHARED); - /* Now we can safely count the number of proclocks */ - els = hash_get_num_entries(LockMethodProcLockHash); - - /* - * Allocating enough space for all locks in the lock table is overkill, - * but it's more convenient and faster than having to enlarge the array. - */ - accessExclusiveLocks = palloc(els * sizeof(xl_standby_lock)); - /* Now scan the tables to copy the data */ hash_seq_init(seqstat, LockMethodProcLockHash); --- 3449,3454 *** GetRunningTransactionLocks(int *nlocks) *** 3459,3467 if (!TransactionIdIsValid(xid)) continue; ! accessExclusiveLocks[index].xid = xid; ! accessExclusiveLocks[index].dbOid = lock-tag.locktag_field1; ! accessExclusiveLocks[index].relOid = lock-tag.locktag_field2; index++; } --- 3480,3508 if (!TransactionIdIsValid(xid)) continue; ! /* ! * Check if we need to allocate more space in our workspace due to ! * the number of access exclusive locks actually found. If we do ! * run out, double the size of our return area to hopefully avoid ! * having to do this again any time soon. ! */ ! if (index = accessExclusiveLocks-nlocks) ! { ! accessExclusiveLocks-nlocks *= 2; ! ! accessExclusiveLocks = (xl_standby_locks *) ! realloc(accessExclusiveLocks, ! sizeof(xl_standby_locks) + ! ((accessExclusiveLocks-nlocks - 1) * ! sizeof(xl_standby_lock))); ! if (accessExclusiveLocks == NULL) ! ereport(ERROR, ! (errcode(ERRCODE_OUT_OF_MEMORY), ! errmsg(out of memory))); ! } ! accessExclusiveLocks-locks[index].xid = xid; ! accessExclusiveLocks-locks[index].dbOid = lock-tag.locktag_field1; ! accessExclusiveLocks-locks[index].relOid = lock-tag.locktag_field2; index++; } *** GetRunningTransactionLocks(int *nlocks) *** 3477,3484 for (i = NUM_LOCK_PARTITIONS; --i = 0;) LWLockRelease(FirstLockMgrLock + i); *nlocks = index; ! return accessExclusiveLocks; }
Re: [BUGS] Memory-leak in BackgroundWriter(and Checkpointer)
* Tom Lane (t...@sss.pgh.pa.us) wrote: Meh. I'm not impressed with permanently allocating an array large enough to hold all the locks GetRunningTransactionLocks might return --- that's potentially much larger than the other array, and in fact I don't think we have a hard limit on its size at all. Well, sure, which is why I didn't actually do that- but I did end up having to make it resize when necessary, which isn't entirely ideal either. Besides which, it's not like there is *no* cleanup for GetRunningTransactionData --- it has a lock that has to be released ... That's true.. I guess my general feeling is that it'd be good to do this all one way or the other- having it use a static variable into which we stick the pointer to some reused space for one and then doing a palloc for the other which needs to be pfree'd struck me as odd. I think the proposed fix is fine code-wise; the real problem here is crummy commenting. GetRunningTransactionLocks isn't documented as returning a palloc'd array, and why the heck do we have a long comment about its implementation in LogStandbySnapshot? Certainly good questions and better comments would have helped here. I can go back and rework the patch either way. Thanks, Stephen signature.asc Description: Digital signature
Re: [BUGS] Memory-leak in BackgroundWriter(and Checkpointer)
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: I think the proposed fix is fine code-wise; the real problem here is crummy commenting. GetRunningTransactionLocks isn't documented as returning a palloc'd array, and why the heck do we have a long comment about its implementation in LogStandbySnapshot? Certainly good questions and better comments would have helped here. I can go back and rework the patch either way. I'm already working on back-patching the attached. regards, tom lane diff --git a/src/backend/storage/ipc/standby.c b/src/backend/storage/ipc/standby.c index 615278b8ca2adf3057e5f21057c3cedfc66480aa..c704412366d5bc4b6512e9ab673855c46f7d993f 100644 *** a/src/backend/storage/ipc/standby.c --- b/src/backend/storage/ipc/standby.c *** LogStandbySnapshot(void) *** 865,880 /* * Get details of any AccessExclusiveLocks being held at the moment. - * - * XXX GetRunningTransactionLocks() currently holds a lock on all - * partitions though it is possible to further optimise the locking. By - * reference counting locks and storing the value on the ProcArray entry - * for each backend we can easily tell if any locks need recording without - * trying to acquire the partition locks and scanning the lock table. */ locks = GetRunningTransactionLocks(nlocks); if (nlocks 0) LogAccessExclusiveLocks(nlocks, locks); /* * Log details of all in-progress transactions. This should be the last --- 865,875 /* * Get details of any AccessExclusiveLocks being held at the moment. */ locks = GetRunningTransactionLocks(nlocks); if (nlocks 0) LogAccessExclusiveLocks(nlocks, locks); + pfree(locks); /* * Log details of all in-progress transactions. This should be the last diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 8cd871f4b40d7bfc8b0aaa7650241d5865c79ffe..273c72230274b46ba6a46bb8b295ef5375aaa36d 100644 *** a/src/backend/storage/lmgr/lock.c --- b/src/backend/storage/lmgr/lock.c *** GetLockStatusData(void) *** 3398,3415 } /* ! * Returns a list of currently held AccessExclusiveLocks, for use ! * by GetRunningTransactionData(). */ xl_standby_lock * GetRunningTransactionLocks(int *nlocks) { PROCLOCK *proclock; HASH_SEQ_STATUS seqstat; int i; int index; int els; - xl_standby_lock *accessExclusiveLocks; /* * Acquire lock on the entire shared lock data structure. --- 3398,3423 } /* ! * Returns a list of currently held AccessExclusiveLocks, for use by ! * LogStandbySnapshot(). The result is a palloc'd array, ! * with the number of elements returned into *nlocks. ! * ! * XXX This currently takes a lock on all partitions of the lock table, ! * but it's possible to do better. By reference counting locks and storing ! * the value in the ProcArray entry for each backend we could tell if any ! * locks need recording without having to acquire the partition locks and ! * scan the lock table. Whether that's worth the additional overhead ! * is pretty dubious though. */ xl_standby_lock * GetRunningTransactionLocks(int *nlocks) { + xl_standby_lock *accessExclusiveLocks; PROCLOCK *proclock; HASH_SEQ_STATUS seqstat; int i; int index; int els; /* * Acquire lock on the entire shared lock data structure. *** GetRunningTransactionLocks(int *nlocks) *** 3467,3472 --- 3475,3482 } } + Assert(index = els); + /* * And release locks. We do this in reverse order for two reasons: (1) * Anyone else who needs more than one of the locks will be trying to lock -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] Memory-leak in BackgroundWriter(and Checkpointer)
* Tom Lane (t...@sss.pgh.pa.us) wrote: I'm already working on back-patching the attached. Works for me, Thanks! Stephen signature.asc Description: Digital signature
Re: [BUGS] Memory-leak in BackgroundWriter(and Checkpointer)
Tom, Stephen, Heikki and Andres, You do FAST work! Thanks for your prompt responses. --- Naoya Anzai NEC Soft,Ltd. http://www.necsoft.com/eng/ -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs