Re: [SR-Users] Fwd: Possible memory leak on 5.5.x (new)?

2022-01-07 Thread Daniel-Constantin Mierla
Hello,

On 07.01.22 13:04, George Diamantopoulos wrote:
> Hello Daniel,
>
> I see, thanks for the response. I guess I'll try upgrading all
> instances to 5.5.x for now, and hopefully that will fix it. If not,
> I'll revert to 5.4.x and post here again. If the issue does manifest
> with all instances on 5.5.x however, what information should I collect
> to investigate this further? I'm guessing shm stats is a start, but is
> it enough?

it is hard to say, starting with the stats and shm summary is good!

>
> Lastly, is sip_msg_shm_clone growing consistent with DMQ
> incompatibility between 5.5.x <-> 5.4.x, or should such memory leaks
> manifest in some other way in shm stats?

If the leak manifests only when dmq is used, then troubleshooting has to
be done with the same Kamailio version on all nodes, otherwise can be
waste of time. The cloning function is used for other features as well,
not only for dmq, could be other reasons to leak from it.

Cheers,
Daniel


>
> Cheers,
> George
>
> On Fri, 7 Jan 2022 at 13:20, Daniel-Constantin Mierla
>  wrote:
>
> Hello,
>
> if you do dmq replication between kamailio systems running
> different major versions, then it is likely to get memory leaks
> due to replication of data and most probably cannot be fixed. This
> is because internal structures of modules (also dmq commands) can
> change, practically what an instance does is not ensured to happen
> on the other instance. Just for example, from my mind, htable got
> some changes during past releases, dmq also has significant
> enhancements by getting support for more transport protocols.
>
> If you get memory leaks when you run same Kamailio major version
> on all Kamailio nodes, then that can be troubleshoot and fixed.
>
> Happy new year,
> Daniel
>
> On 07.01.22 11:33, George Diamantopoulos wrote:
>> Hello all and happy new year,
>>
>> I have some new information to share regarding this issue. I
>> believe the previous metrics I sent to the list might not be
>> indicative of the way the problem manifests. Here's what I
>> believe so far:
>>   - Issue is exacerbated (or manifests) during moderate-to-high
>> cps, or grows linearly with total traffic processed since last
>> restart
>>   - shm stats show a lot of memory consumed by sip_msg_shm_clone
>>   - also reproduced this time on bullseye with kamailio 5.5.3
>>
>> Here's some more meaningful stats taken at more appropriate times
>> (i.e. after more traffic has been processed) than the previous
>> ones. These two kamailio instances have identical configuration
>> and traffic patterns:
>> - https://pastebin.com/gHa803kB for kamailio 5.5.3 showing high
>> sip_msg_shm_clone on debian bullseye
>> - https://pastebin.com/JbcZbbSQ for kamailio 5.4.6 on debian buster
>>
>> There is still DMQ use for these instances despite the version
>> mismatch. Unfortunately I can't migrate all DMQ nodes to 5.5.x at
>> this time, not unless I can have assurances that it is DMQ that
>> causes this issue with shm memory exhaustion...
>>
>> After shmem was exhausted on 5.5.3, it stopped processing
>> traffic. I issued a kamctl trap at that time but I'm assuming the
>> backtrace won't show much except for the inability to allocate
>> shm? If you think the backtrace at that point would be useful in
>> any way, let me know and I'll try to share it privately. In case
>> it isn't useful, what other debugging information can be gathered
>> to dissect this issue? Thanks!
>>
>> BR,
>> George
>>
>> On Wed, 30 Jun 2021 at 19:20, Daniel-Constantin Mierla
>>  wrote:
>>
>> Hello,
>>
>> for the sake of completion: the autoexpire should clean the
>> items if they are not used during the expiration interval. If
>> you want to get them deleted after first expiration interval
>> always, see the updateexpire attribute for htable modparam.
>>
>> Also, note that replication should be done only between
>> Kamailio instances with same major version, because there can
>> be internal differences between major versions that can lead
>> to unexpected behaviour. In other words, if you replicate,
>> doing between two kamailio with version 5.5.x or between two
>> kamailio with version 5.4.x, but not between a kamailkio
>> 5.5.x and a kamailio 5.4.x.
>>
>> The total amount of used memory in the stats file for 5.5
>> does not seem to be high as a rough estimation. The highest
>> by module is in htable, but it is around 20MB. Maybe you took
>> the stats too early, quickly after a restart?
>>
>> Cheers,
>> Daniel
>>
>> On 30.06.21 17:20, George Diamantopoulos wrote:
>>> Hello Daniel,
>>>
>>> Thanks for the feedback. I think I might have been too quick
>>> to blame 

Re: [SR-Users] Fwd: Possible memory leak on 5.5.x (new)?

2022-01-07 Thread George Diamantopoulos
Hello Daniel,

I see, thanks for the response. I guess I'll try upgrading all instances to
5.5.x for now, and hopefully that will fix it. If not, I'll revert to 5.4.x
and post here again. If the issue does manifest with all instances on 5.5.x
however, what information should I collect to investigate this further? I'm
guessing shm stats is a start, but is it enough?

Lastly, is sip_msg_shm_clone growing consistent with DMQ incompatibility
between 5.5.x <-> 5.4.x, or should such memory leaks manifest in some other
way in shm stats?

Cheers,
George

On Fri, 7 Jan 2022 at 13:20, Daniel-Constantin Mierla 
wrote:

> Hello,
>
> if you do dmq replication between kamailio systems running different major
> versions, then it is likely to get memory leaks due to replication of data
> and most probably cannot be fixed. This is because internal structures of
> modules (also dmq commands) can change, practically what an instance does
> is not ensured to happen on the other instance. Just for example, from my
> mind, htable got some changes during past releases, dmq also has
> significant enhancements by getting support for more transport protocols.
>
> If you get memory leaks when you run same Kamailio major version on all
> Kamailio nodes, then that can be troubleshoot and fixed.
>
> Happy new year,
> Daniel
> On 07.01.22 11:33, George Diamantopoulos wrote:
>
> Hello all and happy new year,
>
> I have some new information to share regarding this issue. I believe the
> previous metrics I sent to the list might not be indicative of the way the
> problem manifests. Here's what I believe so far:
>   - Issue is exacerbated (or manifests) during moderate-to-high cps, or
> grows linearly with total traffic processed since last restart
>   - shm stats show a lot of memory consumed by sip_msg_shm_clone
>   - also reproduced this time on bullseye with kamailio 5.5.3
>
> Here's some more meaningful stats taken at more appropriate times (i.e.
> after more traffic has been processed) than the previous ones. These two
> kamailio instances have identical configuration and traffic patterns:
> - https://pastebin.com/gHa803kB for kamailio 5.5.3 showing high
> sip_msg_shm_clone on debian bullseye
> - https://pastebin.com/JbcZbbSQ for kamailio 5.4.6 on debian buster
>
> There is still DMQ use for these instances despite the version mismatch.
> Unfortunately I can't migrate all DMQ nodes to 5.5.x at this time, not
> unless I can have assurances that it is DMQ that causes this issue with shm
> memory exhaustion...
>
> After shmem was exhausted on 5.5.3, it stopped processing traffic. I
> issued a kamctl trap at that time but I'm assuming the backtrace won't show
> much except for the inability to allocate shm? If you think the backtrace
> at that point would be useful in any way, let me know and I'll try to share
> it privately. In case it isn't useful, what other debugging information can
> be gathered to dissect this issue? Thanks!
>
> BR,
> George
>
> On Wed, 30 Jun 2021 at 19:20, Daniel-Constantin Mierla 
> wrote:
>
>> Hello,
>>
>> for the sake of completion: the autoexpire should clean the items if they
>> are not used during the expiration interval. If you want to get them
>> deleted after first expiration interval always, see the updateexpire
>> attribute for htable modparam.
>>
>> Also, note that replication should be done only between Kamailio
>> instances with same major version, because there can be internal
>> differences between major versions that can lead to unexpected behaviour.
>> In other words, if you replicate, doing between two kamailio with version
>> 5.5.x or between two kamailio with version 5.4.x, but not between a
>> kamailkio 5.5.x and a kamailio 5.4.x.
>>
>> The total amount of used memory in the stats file for 5.5 does not seem
>> to be high as a rough estimation. The highest by module is in htable, but
>> it is around 20MB. Maybe you took the stats too early, quickly after a
>> restart?
>> Cheers,
>> Daniel
>>
>> On 30.06.21 17:20, George Diamantopoulos wrote:
>>
>> Hello Daniel,
>>
>> Thanks for the feedback. I think I might have been too quick to blame
>> htable for this behaviour. In fact, version 5.4 seems to consume more
>> memory than 5.5 (175129776 bytes vs 20581096), which makes sense since it
>> has been running for longer (I missed the extra digit previously).
>>
>> So I'm not sure htable is to blame. On the other hand, I don't see any
>> other modules using up too much of shmem either, so maybe memory stats
>> can't provide the answer here?
>>
>> To answer your question, though, I do use DMQ and both tables that use it
>> have autoexpire set to the same value on both 5.4 and 5.5:
>>
>> /etc/kamailio# grep dmq kamailio-module-params.cfg
>> modparam("dmq", "server_address", "sip:172.30.43.1:5090")
>> modparam("dmq", "notification_address", "sip:
>> dmq.services.mydomain.com:5090")
>> modparam("dmq", "multi_notify", 1)
>> modparam("htable", "enable_dmq", 1)
>> modparam("htable", "htable",
>> 

Re: [SR-Users] Fwd: Possible memory leak on 5.5.x (new)?

2022-01-07 Thread Daniel-Constantin Mierla
Hello,

if you do dmq replication between kamailio systems running different
major versions, then it is likely to get memory leaks due to replication
of data and most probably cannot be fixed. This is because internal
structures of modules (also dmq commands) can change, practically what
an instance does is not ensured to happen on the other instance. Just
for example, from my mind, htable got some changes during past releases,
dmq also has significant enhancements by getting support for more
transport protocols.

If you get memory leaks when you run same Kamailio major version on all
Kamailio nodes, then that can be troubleshoot and fixed.

Happy new year,
Daniel

On 07.01.22 11:33, George Diamantopoulos wrote:
> Hello all and happy new year,
>
> I have some new information to share regarding this issue. I believe
> the previous metrics I sent to the list might not be indicative of the
> way the problem manifests. Here's what I believe so far:
>   - Issue is exacerbated (or manifests) during moderate-to-high cps,
> or grows linearly with total traffic processed since last restart
>   - shm stats show a lot of memory consumed by sip_msg_shm_clone
>   - also reproduced this time on bullseye with kamailio 5.5.3
>
> Here's some more meaningful stats taken at more appropriate times
> (i.e. after more traffic has been processed) than the previous ones.
> These two kamailio instances have identical configuration and traffic
> patterns:
> - https://pastebin.com/gHa803kB for kamailio 5.5.3 showing high
> sip_msg_shm_clone on debian bullseye
> - https://pastebin.com/JbcZbbSQ for kamailio 5.4.6 on debian buster
>
> There is still DMQ use for these instances despite the version
> mismatch. Unfortunately I can't migrate all DMQ nodes to 5.5.x at this
> time, not unless I can have assurances that it is DMQ that causes this
> issue with shm memory exhaustion...
>
> After shmem was exhausted on 5.5.3, it stopped processing traffic. I
> issued a kamctl trap at that time but I'm assuming the backtrace won't
> show much except for the inability to allocate shm? If you think the
> backtrace at that point would be useful in any way, let me know and
> I'll try to share it privately. In case it isn't useful, what other
> debugging information can be gathered to dissect this issue? Thanks!
>
> BR,
> George
>
> On Wed, 30 Jun 2021 at 19:20, Daniel-Constantin Mierla
>  wrote:
>
> Hello,
>
> for the sake of completion: the autoexpire should clean the items
> if they are not used during the expiration interval. If you want
> to get them deleted after first expiration interval always, see
> the updateexpire attribute for htable modparam.
>
> Also, note that replication should be done only between Kamailio
> instances with same major version, because there can be internal
> differences between major versions that can lead to unexpected
> behaviour. In other words, if you replicate, doing between two
> kamailio with version 5.5.x or between two kamailio with version
> 5.4.x, but not between a kamailkio 5.5.x and a kamailio 5.4.x.
>
> The total amount of used memory in the stats file for 5.5 does not
> seem to be high as a rough estimation. The highest by module is in
> htable, but it is around 20MB. Maybe you took the stats too early,
> quickly after a restart?
>
> Cheers,
> Daniel
>
> On 30.06.21 17:20, George Diamantopoulos wrote:
>> Hello Daniel,
>>
>> Thanks for the feedback. I think I might have been too quick to
>> blame htable for this behaviour. In fact, version 5.4 seems to
>> consume more memory than 5.5 (175129776 bytes vs 20581096), which
>> makes sense since it has been running for longer (I missed the
>> extra digit previously).
>>
>> So I'm not sure htable is to blame. On the other hand, I don't
>> see any other modules using up too much of shmem either, so maybe
>> memory stats can't provide the answer here?
>>
>> To answer your question, though, I do use DMQ and both tables
>> that use it have autoexpire set to the same value on both 5.4 and
>> 5.5:
>>
>> /etc/kamailio# grep dmq kamailio-module-params.cfg
>> modparam("dmq", "server_address", "sip:172.30.43.1:5090
>> ")
>> modparam("dmq", "notification_address",
>> "sip:dmq.services.mydomain.com:5090
>> ")
>> modparam("dmq", "multi_notify", 1)
>> modparam("htable", "enable_dmq", 1)
>> modparam("htable", "htable",
>> 'cid2hi=>size=8;autoexpire=600;dmqreplicate=1')
>> modparam("htable", "htable",
>> 'xcid2count=>size=8;autoexpire=600;dmqreplicate=1')
>>
>> On Wed, 30 Jun 2021 at 17:43, Daniel-Constantin Mierla
>>  wrote:
>>
>> Hello,
>>
>> do you replicate items in the htable via dmq? Does the htable
>> have autoexpire value set?
>>
>> Cheers,
>> Daniel
>>
>> On 30.06.21 13:54, George 

Re: [SR-Users] Fwd: Possible memory leak on 5.5.x (new)?

2022-01-07 Thread George Diamantopoulos
Hello all and happy new year,

I have some new information to share regarding this issue. I believe the
previous metrics I sent to the list might not be indicative of the way the
problem manifests. Here's what I believe so far:
  - Issue is exacerbated (or manifests) during moderate-to-high cps, or
grows linearly with total traffic processed since last restart
  - shm stats show a lot of memory consumed by sip_msg_shm_clone
  - also reproduced this time on bullseye with kamailio 5.5.3

Here's some more meaningful stats taken at more appropriate times (i.e.
after more traffic has been processed) than the previous ones. These two
kamailio instances have identical configuration and traffic patterns:
- https://pastebin.com/gHa803kB for kamailio 5.5.3 showing high
sip_msg_shm_clone on debian bullseye
- https://pastebin.com/JbcZbbSQ for kamailio 5.4.6 on debian buster

There is still DMQ use for these instances despite the version mismatch.
Unfortunately I can't migrate all DMQ nodes to 5.5.x at this time, not
unless I can have assurances that it is DMQ that causes this issue with shm
memory exhaustion...

After shmem was exhausted on 5.5.3, it stopped processing traffic. I issued
a kamctl trap at that time but I'm assuming the backtrace won't show much
except for the inability to allocate shm? If you think the backtrace at
that point would be useful in any way, let me know and I'll try to share it
privately. In case it isn't useful, what other debugging information can be
gathered to dissect this issue? Thanks!

BR,
George

On Wed, 30 Jun 2021 at 19:20, Daniel-Constantin Mierla 
wrote:

> Hello,
>
> for the sake of completion: the autoexpire should clean the items if they
> are not used during the expiration interval. If you want to get them
> deleted after first expiration interval always, see the updateexpire
> attribute for htable modparam.
>
> Also, note that replication should be done only between Kamailio instances
> with same major version, because there can be internal differences between
> major versions that can lead to unexpected behaviour. In other words, if
> you replicate, doing between two kamailio with version 5.5.x or between two
> kamailio with version 5.4.x, but not between a kamailkio 5.5.x and a
> kamailio 5.4.x.
>
> The total amount of used memory in the stats file for 5.5 does not seem to
> be high as a rough estimation. The highest by module is in htable, but it
> is around 20MB. Maybe you took the stats too early, quickly after a restart?
> Cheers,
> Daniel
>
> On 30.06.21 17:20, George Diamantopoulos wrote:
>
> Hello Daniel,
>
> Thanks for the feedback. I think I might have been too quick to blame
> htable for this behaviour. In fact, version 5.4 seems to consume more
> memory than 5.5 (175129776 bytes vs 20581096), which makes sense since it
> has been running for longer (I missed the extra digit previously).
>
> So I'm not sure htable is to blame. On the other hand, I don't see any
> other modules using up too much of shmem either, so maybe memory stats
> can't provide the answer here?
>
> To answer your question, though, I do use DMQ and both tables that use it
> have autoexpire set to the same value on both 5.4 and 5.5:
>
> /etc/kamailio# grep dmq kamailio-module-params.cfg
> modparam("dmq", "server_address", "sip:172.30.43.1:5090")
> modparam("dmq", "notification_address", "sip:
> dmq.services.mydomain.com:5090")
> modparam("dmq", "multi_notify", 1)
> modparam("htable", "enable_dmq", 1)
> modparam("htable", "htable",
> 'cid2hi=>size=8;autoexpire=600;dmqreplicate=1')
> modparam("htable", "htable",
> 'xcid2count=>size=8;autoexpire=600;dmqreplicate=1')
>
> On Wed, 30 Jun 2021 at 17:43, Daniel-Constantin Mierla 
> wrote:
>
>> Hello,
>>
>> do you replicate items in the htable via dmq? Does the htable have
>> autoexpire value set?
>>
>> Cheers,
>> Daniel
>> On 30.06.21 13:54, George Diamantopoulos wrote:
>>
>> Forwarding my reply to the list, using gmail's reply button set Henning
>> as the sole recipient :-\
>>
>> -- Forwarded message -
>> From: George Diamantopoulos 
>> Date: Sat, 26 Jun 2021 at 02:25
>> Subject: Re: [SR-Users] Possible memory leak on 5.5.x (new)?
>> To: Henning Westerholt 
>>
>>
>> Hello Henning,
>>
>> Thanks for your reply. Here's what has come up after a few hours:
>>
>> shm55: https://pastebin.com/h9JCePmc
>> shm54: https://pastebin.com/Nx5xEEnA
>>
>> It seems to me htable is the culprit? Are you seeing anything different?
>> 54 has been running for 77020 seconds, 55 for 28521 (significantly less).
>>
>> I'm going to turn it off until we figure something out...
>>
>> BR,
>> George
>>
>> On Fri, 25 Jun 2021 at 18:17, Henning Westerholt  wrote:
>>
>>> Hello,
>>>
>>>
>>>
>>> Good observation. Please run the memory statistics CLI commands to get
>>> more hints about the module that might cause it (as per below link). Then
>>> please report more details. If you can point to a particular module, you
>>> can also open an issue on our tracker.
>>>
>>>
>>>