Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
Hi @charlesrchance, thanks for the time to look into this. What you say totally matches the "weird" behaviors I was seeing. You really gave me a lot of clarity understanding what is going on behind the scenes. To answer your questions: 1. Yes, I have database but my idea was to move away from it for each thing I handle using DMQ. This also matches chronology of events, I have been restarting nodes here and there, I wasn't "losing" dialogs, but making them "orphaned" instead without knowing. 2. So there are (at least) a couple different options here: 2a) stat counters would have all dialogs (local + replicated), which in reality is the true value, as from kamailio's perspective, those dialogs are there and they are active aren't they? 2b) stat counters would only have local dialogs, so if you want to have full dialog count you would have to stack the values from all nodes. Now there is also a `dlg.stats_active` that gives you current dialogs (not sure if it also shows replicated ones) so maybe we can benefit from both options, a counter that includes both local and replicated dialogs, and a different counter (dlg.stats_active?) that has only the current-local-count of dialogs. Since I fully restarted both SBCs (which in fact, that was clearing the orphaned dialogs) I haven't seen a spike (negative counter). At this point I don't know what the best options is, I do understand perfectly the problem though. @miconda what do you think about all this? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-409303111___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
Re: [sr-dev] [kamailio/kamailio] RPC command stats.get_statistics randomly reporting 9223372036854776000 for current active/early dialogs (#1591)
Having made some tests around this, whilst I have not yet been able to reproduce the negative counter issue, I do think there needs to be some further thought around dialog replication. First thing to note - stats are not affected by replicated dialogs, so I don't think DMQ is _directly_ responsible for the negative counting. However - and this may be _indirectly_ related - if a node is restarted, any dialogs owned by it at the time will be forever 'stuck'. This is owing to the fact that in its current implementation, the dialog owner is responsible for triggering update/removal across the rest of the cluster. If the owner no longer exists - or has been restarted and has no idea that it was once the owner of some/all of the dialogs it receives in its initial re-sync - then this link is broken permanently. It is further compounded by the fact these orphaned dialogs never (in my tests, anyway) timeout. I need to spend some more time on the DMQ side, since this is the first time I have looked at it properly. In the meantime, @joelsdc, regarding your issue: 1. Do you have database enabled alongside DMQ replication or was it only for testing? I suspect this is where the recent 38 'expired' dialogs came from - conversely, the earlier 'bad' dialogs you mentioned were likely a result of the owning node having been restarted (these would not have been included in the 'expired' counter). 2. Are you expecting the stats counters (the original ones, not the new 'dlg.stats_active') to reflect all dialogs across the cluster or just those handled directly by the local instance? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/kamailio/kamailio/issues/1591#issuecomment-409205552___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
[sr-dev] git:master:a1627221: modules: readme files regenerated - carrierroute ... [skip ci]
Module: kamailio Branch: master Commit: a1627221ba6dc2921356c691ca51e3dea3f9e82f URL: https://github.com/kamailio/kamailio/commit/a1627221ba6dc2921356c691ca51e3dea3f9e82f Author: Kamailio Dev Committer: Kamailio Dev Date: 2018-07-31T14:17:02+02:00 modules: readme files regenerated - carrierroute ... [skip ci] --- Modified: src/modules/carrierroute/README --- Diff: https://github.com/kamailio/kamailio/commit/a1627221ba6dc2921356c691ca51e3dea3f9e82f.diff Patch: https://github.com/kamailio/kamailio/commit/a1627221ba6dc2921356c691ca51e3dea3f9e82f.patch ___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
[sr-dev] git:master:a1f5fbe2: dmq: release resources instead of just doing continue to next job
Module: kamailio Branch: master Commit: a1f5fbe2c18246d4afefa44fd8a52612a5182a46 URL: https://github.com/kamailio/kamailio/commit/a1f5fbe2c18246d4afefa44fd8a52612a5182a46 Author: Daniel-Constantin Mierla Committer: Daniel-Constantin Mierla Date: 2018-07-31T13:59:44+02:00 dmq: release resources instead of just doing continue to next job - for cases when processing of the job is not fully completed --- Modified: src/modules/dmq/worker.c --- Diff: https://github.com/kamailio/kamailio/commit/a1f5fbe2c18246d4afefa44fd8a52612a5182a46.diff Patch: https://github.com/kamailio/kamailio/commit/a1f5fbe2c18246d4afefa44fd8a52612a5182a46.patch --- diff --git a/src/modules/dmq/worker.c b/src/modules/dmq/worker.c index 9c7873560c..14aadfc998 100644 --- a/src/modules/dmq/worker.c +++ b/src/modules/dmq/worker.c @@ -114,7 +114,7 @@ void worker_loop(int id) current_job->msg, _response, dmq_node); if(ret_value < 0) { LM_ERR("running job failed\n"); - continue; + goto nextjob; } /* add the body to the reply */ if(peer_response.body.s) { @@ -122,7 +122,7 @@ void worker_loop(int id) _response.content_type) < 0) { LM_ERR("error adding lumps\n"); - continue; + goto nextjob; } } /* send the reply */ @@ -130,8 +130,12 @@ void worker_loop(int id) _response.reason) < 0) { LM_ERR("error sending reply\n"); + } else { + LM_DBG("done sending reply\n"); } + worker->jobs_processed++; +nextjob: /* if body given, free the lumps and free the body */ if(peer_response.body.s) { del_nonshm_lump_rpl(_job->msg->reply_lump); @@ -141,10 +145,8 @@ void worker_loop(int id) free_to(current_job->msg->from->parsed); } - LM_DBG("sent reply\n"); shm_free(current_job->msg); shm_free(current_job); - worker->jobs_processed++; } } } ___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev
[sr-dev] git:master:4a64fb95: carrierroute: docs - removed mi commands section
Module: kamailio Branch: master Commit: 4a64fb95680b9efac47c79888f2a68bdcfb29ad6 URL: https://github.com/kamailio/kamailio/commit/4a64fb95680b9efac47c79888f2a68bdcfb29ad6 Author: Daniel-Constantin Mierla Committer: Daniel-Constantin Mierla Date: 2018-07-31T13:58:29+02:00 carrierroute: docs - removed mi commands section --- Modified: src/modules/carrierroute/doc/carrierroute_admin.xml --- Diff: https://github.com/kamailio/kamailio/commit/4a64fb95680b9efac47c79888f2a68bdcfb29ad6.diff Patch: https://github.com/kamailio/kamailio/commit/4a64fb95680b9efac47c79888f2a68bdcfb29ad6.patch ___ Kamailio (SER) - Development Mailing List sr-dev@lists.kamailio.org https://lists.kamailio.org/cgi-bin/mailman/listinfo/sr-dev