Hi All,
I think that this is the second memory leak that I mentioned below. It does not
have to do with the send-queue-full message, but only with simply configuring
more than one active host. For example, if you configure three hosts such as “{
host1, host2, host3 }”, then the following leak will happen --
LogHostList::preproc_and_try_delete(LogBuffer *lb)
{
…
ink_release_assert(lb->m_references == 0);
m_references starts as zero
nr_host = nr = count();
ink_atomic_increment(&lb->m_references, nr_host);
increments the m_references by 3 _not_ by 1
for (LogHost *host = first(); host && nr; host = next(host)) {
LogHost *lh = host;
available_host = lh;
do {
ink_atomic_increment(&lb->m_references, 1);
increments the m_references by 1 for each host tried
success = lh->preproc_and_try_delete(lb);
LogHost::preproc_and_try_delete() will consume 1 ref-count
need_orphan = need_orphan && (success == false);
} while (lb && (success == false) && (lh = lh->failover_link.next));
nr--;
}
…
if (lb != nullptr) {
LogBuffer::destroy(lb);
this only consumes 1 ref-count
}
return 0;
}
So since the ink_atomic_increment() and LogBuffer::destroy() are not balanced
this will produce a memory leak. In my test scenario, at the end of 100K
transactions, ATS had allocated 5961 and only freed 4348 LogBuffers. Proposed
fix is to change ink_atomic_increment(&lb->m_references, nr_host) to
ink_atomic_increment(&lb->m_references, 1) to balance things. Appreciate any
second opinions, and I opened a PR #4041 with both memory leaks.
Thanks,
Peter
-----Original Message-----
From: Chou, Peter
Sent: Thursday, July 26, 2018 1:42 PM
To: dev <[email protected]>
Subject: RE: Issue #2416 - Log Collation Memory Leak.
Leif,
1) No problem. It is a two line change so we can carry in our custom 7.1.x if
necessary.
2) Yeah this feature has issues. After fixing this memory leak, we realized
that there is ANOTHER leak if you over-run the log collation transfer capacity
triggering the "send-queue full" message. Perhaps this can be mitigated by
increasing the send queue size (collation_max_send_buffers). This second issue
can be triggered by increased transaction rate and/or configuring multiple
active hosts, i.e., { host1, host2 } rather than active-standby { { host1 ,
host2 } }, to over-load the log sending capacity. I will take your proposal
back to discuss with my users.
Thanks,
Peter
-----Original Message-----
From: Leif Hedstrom [mailto:[email protected]]
Sent: Thursday, July 26, 2018 12:54 PM
To: dev <[email protected]>
Subject: Re: Issue #2416 - Log Collation Memory Leak.
> On Jul 26, 2018, at 3:14 AM, Chou, Peter <[email protected]> wrote:
>
> Hi All,
>
> After some additional study on how the ink_atomic_increment() and
> LogBuffer::destroy() work together in the LogFile::, LogHost::, and
> LogHostList::preproc_and_try_delete() functions, the following is probably a
> better patch to address this issue. Appreciate any second opinions on this
> before I open a PR.
Nice find.
However, I have two comments on this:
1) I’d really prefer not to respin the 7.1.x release over this. Most people
don’t use this (IMO broken) feature. I hope that’s ok.
2) I’d like to restart the discussions of removing this feature from v9.0.0
again (in fact, I kinda wish we could nuke its from 8.0.0, but a little late
for that).
For #2, I feel that there are a lot better tools out there, like Kafka, Elastic
Search (or Splunk for a commercial solution).
I’ll make a separate thread on asking for the deprecation and removal of this
feature :-)
Cheers,
— Leif