Re: [asterisk-users] PJSIP Lockup

2020-04-06 Thread George Joseph
On Thu, Apr 2, 2020 at 11:34 AM Nick Olsen 
wrote:

> Paddy, It's pretty easy to spot from the CLI.
>
> A voicemail gets called. And the screen basically stops scrolling from
> there. Eventually you'll get the "Task processors exceeded 500 queued
> tasks" or something like that. And maybe channels attempting to hangup due
> to lack of RTP (If you have no-rtp timers configured).
>
> Once you find the problem mailbox, You can call it via any method and
> it'll deadlock every time as soon as it tries to play the mailboxes unavail
> greeting. I've never had it occur when there is no unavail greeting. Each
> case deleting the problem recording from the database fixes the issue, And
> subsequent recordings for the same mailbox have no issue.
>

Given that the issue appears to be related to specific rows and not the
database in general, you might want to get a backtrace while the system is
locked as Josh suggested earlier.   Once you get the backtraces, open an
issue ar https://issues.asterisk.org.

https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace
NOTE: you do NOT need to recompile with the DEBUG_THREADS, MALLOC_DEBUG,
DONT_OPTIMIZE or BETTER_BACKTRACES but the Asterisk binaries need to still
have the symbols in them (un-stripped).



>
> *Nick Olsen*
> Network Engineer
> Office: 321-408-5000 x103
> Mobile: 321-794-0763
>
>
>
> On Wed, Apr 1, 2020 at 9:04 PM Paddy Grice  wrote:
>
>> Hi All
>>
>> This sounds just like a problem I have had and still investigating having
>> moved to 16.9 using chan_sip. I am still trying to repeat the problem it
>> looks from debug that the issue is either voicemail of call transfer but I
>> cant consistently repeat it.
>>
>> Voicemail is using ODBC and I just imported the data from the old system
>> into the new database.
>>
>> Nick - if you have any more info I would be grateful
>>
>> TIA
>>
>> Paddy
>>
>> --
>> *From:* asterisk-users [mailto:asterisk-users-boun...@lists.digium.com] *On
>> Behalf Of *Nick Olsen
>> *Sent:* 01 April 2020 18:54
>> *To:* Asterisk Users Mailing List - Non-Commercial Discussion
>> *Subject:* Re: [asterisk-users] PJSIP Lockup
>>
>> We ultimately found this to be a voicemail issue. The voicemail is held
>> in MYSQL as well (via ODBC). And we found when attempting to playback a
>> customers voicemail unavail greeting is when the deadlock would occur
>> (Immediately, every time. Throwing the same "task processors" errors, And
>> making pjsip completely unresponsive). We had imported a number of
>> greetings from a legacy asterisk system and the vast majority of them
>> worked. When we deleted the row containing the customers unavail greeting
>> (making asterisk revert to read the mailbox number) all issues went away.
>> If we re-record the customers unavail greeting it works fine and the
>> problem doesn't reoccur. This was one out of ~250 voicemails imported.
>>
>> Since then we've done a few more migrations and they've all gone smooth
>> with the exception of the most recent one. ~50% of the imported greetings
>> have caused asterisk to deadlock. We've been checking them now at time of
>> migration.
>>
>> What I can't figure out is what it doesn't like about the greeting. It
>> was on a previous asterisk system working fine. The row looks identical to
>> a working one. The only thing I can guess is something about the blob for
>> the recording goes wrong. It would be nice if asterisk handled that more
>> gracefully.
>>
>> I post this mostly just for internet history. To hopefully help the next
>> guy out who has this same issue.
>>
>> *Nick Olsen*
>> Network Engineer
>> Office: 321-408-5000 x103
>> Mobile: 321-794-0763
>>
>>
>>
>> On Mon, Mar 2, 2020 at 8:29 PM Joshua C. Colp  wrote:
>>
>>> On Mon, Mar 2, 2020 at 4:24 PM Nick Olsen <
>>> n...@floridavirtualsolutions.com> wrote:
>>>
>>>> Thanks for the info, Joshua.
>>>>
>>>> Does PJSIP handle database access the same way Chan_sip did? We had a
>>>> number of boxes running chan_sip referencing the same mysql server without
>>>> issue.
>>>>
>>>> We're going to attempt to get a backtrace on the next occurance. We're
>>>> also going to run a local copy of the database on the same physical
>>>> asterisk instance and have the system reference it. Just to "throw
>>>> everything at the wall".
>>>>
>>>
>>> It uses the same underlying API and layer. It ca

Re: [asterisk-users] PJSIP Lockup

2020-04-02 Thread Nick Olsen
Paddy, It's pretty easy to spot from the CLI.

A voicemail gets called. And the screen basically stops scrolling from
there. Eventually you'll get the "Task processors exceeded 500 queued
tasks" or something like that. And maybe channels attempting to hangup due
to lack of RTP (If you have no-rtp timers configured).

Once you find the problem mailbox, You can call it via any method and it'll
deadlock every time as soon as it tries to play the mailboxes unavail
greeting. I've never had it occur when there is no unavail greeting. Each
case deleting the problem recording from the database fixes the issue, And
subsequent recordings for the same mailbox have no issue.

*Nick Olsen*
Network Engineer
Office: 321-408-5000 x103
Mobile: 321-794-0763



On Wed, Apr 1, 2020 at 9:04 PM Paddy Grice  wrote:

> Hi All
>
> This sounds just like a problem I have had and still investigating having
> moved to 16.9 using chan_sip. I am still trying to repeat the problem it
> looks from debug that the issue is either voicemail of call transfer but I
> cant consistently repeat it.
>
> Voicemail is using ODBC and I just imported the data from the old system
> into the new database.
>
> Nick - if you have any more info I would be grateful
>
> TIA
>
> Paddy
>
> --
> *From:* asterisk-users [mailto:asterisk-users-boun...@lists.digium.com] *On
> Behalf Of *Nick Olsen
> *Sent:* 01 April 2020 18:54
> *To:* Asterisk Users Mailing List - Non-Commercial Discussion
> *Subject:* Re: [asterisk-users] PJSIP Lockup
>
> We ultimately found this to be a voicemail issue. The voicemail is held in
> MYSQL as well (via ODBC). And we found when attempting to playback a
> customers voicemail unavail greeting is when the deadlock would occur
> (Immediately, every time. Throwing the same "task processors" errors, And
> making pjsip completely unresponsive). We had imported a number of
> greetings from a legacy asterisk system and the vast majority of them
> worked. When we deleted the row containing the customers unavail greeting
> (making asterisk revert to read the mailbox number) all issues went away.
> If we re-record the customers unavail greeting it works fine and the
> problem doesn't reoccur. This was one out of ~250 voicemails imported.
>
> Since then we've done a few more migrations and they've all gone smooth
> with the exception of the most recent one. ~50% of the imported greetings
> have caused asterisk to deadlock. We've been checking them now at time of
> migration.
>
> What I can't figure out is what it doesn't like about the greeting. It was
> on a previous asterisk system working fine. The row looks identical to a
> working one. The only thing I can guess is something about the blob for the
> recording goes wrong. It would be nice if asterisk handled that more
> gracefully.
>
> I post this mostly just for internet history. To hopefully help the next
> guy out who has this same issue.
>
> *Nick Olsen*
> Network Engineer
> Office: 321-408-5000 x103
> Mobile: 321-794-0763
>
>
>
> On Mon, Mar 2, 2020 at 8:29 PM Joshua C. Colp  wrote:
>
>> On Mon, Mar 2, 2020 at 4:24 PM Nick Olsen <
>> n...@floridavirtualsolutions.com> wrote:
>>
>>> Thanks for the info, Joshua.
>>>
>>> Does PJSIP handle database access the same way Chan_sip did? We had a
>>> number of boxes running chan_sip referencing the same mysql server without
>>> issue.
>>>
>>> We're going to attempt to get a backtrace on the next occurance. We're
>>> also going to run a local copy of the database on the same physical
>>> asterisk instance and have the system reference it. Just to "throw
>>> everything at the wall".
>>>
>>
>> It uses the same underlying API and layer. It can do more frequent
>> database access though due to queries and because PJSIP is multithreaded.
>>
>> --
>> Joshua C. Colp
>> Asterisk Technical Lead
>> Sangoma Technologies
>> Check us out at www.sangoma.com and www.asterisk.org
>> --
>> _
>> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>>
>> Check out the new Asterisk community forum at:
>> https://community.asterisk.org/
>>
>> New to Asterisk? Start here:
>>   https://wiki.asterisk.org/wiki/display/AST/Getting+Started
>>
>> asterisk-users mailing list
>> To UNSUBSCRIBE or update options visit:
>>http://lists.digium.com/mailman/listinfo/asterisk-users
>
> --
> _
> -- Bandwidth and Colocation Provided by 

Re: [asterisk-users] PJSIP Lockup

2020-04-01 Thread Paddy Grice
Hi All
 
This sounds just like a problem I have had and still investigating having
moved to 16.9 using chan_sip. I am still trying to repeat the problem it
looks from debug that the issue is either voicemail of call transfer but I
cant consistently repeat it. 
 
Voicemail is using ODBC and I just imported the data from the old system
into the new database.
 
Nick - if you have any more info I would be grateful 
 
TIA
 
Paddy
 
  _  

From: asterisk-users [mailto:asterisk-users-boun...@lists.digium.com] On
Behalf Of Nick Olsen
Sent: 01 April 2020 18:54
To: Asterisk Users Mailing List - Non-Commercial Discussion
Subject: Re: [asterisk-users] PJSIP Lockup


We ultimately found this to be a voicemail issue. The voicemail is held in
MYSQL as well (via ODBC). And we found when attempting to playback a
customers voicemail unavail greeting is when the deadlock would occur
(Immediately, every time. Throwing the same "task processors" errors, And
making pjsip completely unresponsive). We had imported a number of greetings
from a legacy asterisk system and the vast majority of them worked. When we
deleted the row containing the customers unavail greeting (making asterisk
revert to read the mailbox number) all issues went away. If we re-record the
customers unavail greeting it works fine and the problem doesn't reoccur.
This was one out of ~250 voicemails imported. 

Since then we've done a few more migrations and they've all gone smooth with
the exception of the most recent one. ~50% of the imported greetings have
caused asterisk to deadlock. We've been checking them now at time of
migration.

What I can't figure out is what it doesn't like about the greeting. It was
on a previous asterisk system working fine. The row looks identical to a
working one. The only thing I can guess is something about the blob for the
recording goes wrong. It would be nice if asterisk handled that more
gracefully. 

I post this mostly just for internet history. To hopefully help the next guy
out who has this same issue.

Nick Olsen 
Network Engineer
Office: 321-408-5000 x103
Mobile: 321-794-0763
  <http://dl.floridavirtualsolutions.com/emaillogo/logo.png> 



On Mon, Mar 2, 2020 at 8:29 PM Joshua C. Colp  wrote:


On Mon, Mar 2, 2020 at 4:24 PM Nick Olsen 
wrote:


Thanks for the info, Joshua. 

Does PJSIP handle database access the same way Chan_sip did? We had a number
of boxes running chan_sip referencing the same mysql server without issue.

We're going to attempt to get a backtrace on the next occurance. We're also
going to run a local copy of the database on the same physical asterisk
instance and have the system reference it. Just to "throw everything at the
wall".


It uses the same underlying API and layer. It can do more frequent database
access though due to queries and because PJSIP is multithreaded.

-- 

Joshua C. Colp
Asterisk Technical Lead
Sangoma Technologies
Check us out at www.sangoma.com and www.asterisk.org

-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

Check out the new Asterisk community forum at:
https://community.asterisk.org/

New to Asterisk? Start here:
  https://wiki.asterisk.org/wiki/display/AST/Getting+Started

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users

-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

Check out the new Asterisk community forum at: https://community.asterisk.org/

New to Asterisk? Start here:
  https://wiki.asterisk.org/wiki/display/AST/Getting+Started

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users

Re: [asterisk-users] PJSIP Lockup

2020-04-01 Thread Nick Olsen
We ultimately found this to be a voicemail issue. The voicemail is held in
MYSQL as well (via ODBC). And we found when attempting to playback a
customers voicemail unavail greeting is when the deadlock would occur
(Immediately, every time. Throwing the same "task processors" errors, And
making pjsip completely unresponsive). We had imported a number of
greetings from a legacy asterisk system and the vast majority of them
worked. When we deleted the row containing the customers unavail greeting
(making asterisk revert to read the mailbox number) all issues went away.
If we re-record the customers unavail greeting it works fine and the
problem doesn't reoccur. This was one out of ~250 voicemails imported.

Since then we've done a few more migrations and they've all gone smooth
with the exception of the most recent one. ~50% of the imported greetings
have caused asterisk to deadlock. We've been checking them now at time of
migration.

What I can't figure out is what it doesn't like about the greeting. It was
on a previous asterisk system working fine. The row looks identical to a
working one. The only thing I can guess is something about the blob for the
recording goes wrong. It would be nice if asterisk handled that more
gracefully.

I post this mostly just for internet history. To hopefully help the next
guy out who has this same issue.

*Nick Olsen*
Network Engineer
Office: 321-408-5000 x103
Mobile: 321-794-0763



On Mon, Mar 2, 2020 at 8:29 PM Joshua C. Colp  wrote:

> On Mon, Mar 2, 2020 at 4:24 PM Nick Olsen <
> n...@floridavirtualsolutions.com> wrote:
>
>> Thanks for the info, Joshua.
>>
>> Does PJSIP handle database access the same way Chan_sip did? We had a
>> number of boxes running chan_sip referencing the same mysql server without
>> issue.
>>
>> We're going to attempt to get a backtrace on the next occurance. We're
>> also going to run a local copy of the database on the same physical
>> asterisk instance and have the system reference it. Just to "throw
>> everything at the wall".
>>
>
> It uses the same underlying API and layer. It can do more frequent
> database access though due to queries and because PJSIP is multithreaded.
>
> --
> Joshua C. Colp
> Asterisk Technical Lead
> Sangoma Technologies
> Check us out at www.sangoma.com and www.asterisk.org
> --
> _
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>
> Check out the new Asterisk community forum at:
> https://community.asterisk.org/
>
> New to Asterisk? Start here:
>   https://wiki.asterisk.org/wiki/display/AST/Getting+Started
>
> asterisk-users mailing list
> To UNSUBSCRIBE or update options visit:
>http://lists.digium.com/mailman/listinfo/asterisk-users
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

Check out the new Asterisk community forum at: https://community.asterisk.org/

New to Asterisk? Start here:
  https://wiki.asterisk.org/wiki/display/AST/Getting+Started

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users

Re: [asterisk-users] PJSIP Lockup

2020-03-02 Thread Joshua C. Colp
On Mon, Mar 2, 2020 at 4:24 PM Nick Olsen 
wrote:

> Thanks for the info, Joshua.
>
> Does PJSIP handle database access the same way Chan_sip did? We had a
> number of boxes running chan_sip referencing the same mysql server without
> issue.
>
> We're going to attempt to get a backtrace on the next occurance. We're
> also going to run a local copy of the database on the same physical
> asterisk instance and have the system reference it. Just to "throw
> everything at the wall".
>

It uses the same underlying API and layer. It can do more frequent database
access though due to queries and because PJSIP is multithreaded.

-- 
Joshua C. Colp
Asterisk Technical Lead
Sangoma Technologies
Check us out at www.sangoma.com and www.asterisk.org
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

Check out the new Asterisk community forum at: https://community.asterisk.org/

New to Asterisk? Start here:
  https://wiki.asterisk.org/wiki/display/AST/Getting+Started

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users

Re: [asterisk-users] PJSIP Lockup

2020-03-02 Thread Nick Olsen
Thanks for the info, Joshua.

Does PJSIP handle database access the same way Chan_sip did? We had a
number of boxes running chan_sip referencing the same mysql server without
issue.

We're going to attempt to get a backtrace on the next occurance. We're also
going to run a local copy of the database on the same physical asterisk
instance and have the system reference it. Just to "throw everything at the
wall".

*Nick Olsen*
Network Engineer
Office: 321-408-5000 x103
Mobile: 321-794-0763



On Mon, Mar 2, 2020 at 1:58 PM Joshua C. Colp  wrote:

> On Mon, Mar 2, 2020 at 2:52 PM Nick Olsen <
> n...@floridavirtualsolutions.com> wrote:
>
>> Hello All,
>> I'm using Asterisk 16.8.0 on a Centos 7 box. Previously 16.5.0, But
>> recently upgraded to attempt to resolve this issue. Using bundled PJSIP.
>> The PBX is using mysql realtime for most functions. The Mysql server is
>> on the same lan as the asterisk box.
>>
>> As more users have been moved to this box. It's become unstable.
>> Randomly, I'll start seeing "WARNING[12667] taskprocessor.c: The
>> 'pjsip/distributor-0173' task processor queue reached 500 scheduled
>> tasks."
>>
>> At that time, Running "pjsip show contacts" and "pjsip show endpoints"
>> returns nothing. And the box stops responding to all SIP.
>>
>> The only way I've found thus far to resolve the issue is a "service
>> asterisk restart".
>>
>> I can confirm at the time of the issue running "asterisk -x 'core show
>> taskprocessors' | grep 'distributor'" does show many items pending across
>> all queues. And the number just increases. Normally when all is fine.
>> They're all at 0.
>>
>> Google-foo hasn't produced anything for me outside issues from 13.x that
>> claim to be resolved. Since asterisk isn't fully crashing, I don't think I
>> can get backtrace. Someone please correct me if I'm wrong.
>> Any ideas? Tips
>> ?
>>
>
> The wiki[1] has instructions for getting a backtrace for a deadlock from a
> running process. It can be used to isolate why things are blocked.
> Generally, though, when realtime is involved I've found that it usually
> ends up being the database or that interaction in some way. Any hiccup or
> issue there can result in blocking in Asterisk.
>
> [1]
> https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace#GettingaBacktrace-GettingInformationForADeadlock
>
> --
> Joshua C. Colp
> Asterisk Technical Lead
> Sangoma Technologies
> Check us out at www.sangoma.com and www.asterisk.org
> --
> _
> -- Bandwidth and Colocation Provided by http://www.api-digital.com --
>
> Check out the new Asterisk community forum at:
> https://community.asterisk.org/
>
> New to Asterisk? Start here:
>   https://wiki.asterisk.org/wiki/display/AST/Getting+Started
>
> asterisk-users mailing list
> To UNSUBSCRIBE or update options visit:
>http://lists.digium.com/mailman/listinfo/asterisk-users
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

Check out the new Asterisk community forum at: https://community.asterisk.org/

New to Asterisk? Start here:
  https://wiki.asterisk.org/wiki/display/AST/Getting+Started

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users

Re: [asterisk-users] PJSIP Lockup

2020-03-02 Thread Joshua C. Colp
On Mon, Mar 2, 2020 at 2:52 PM Nick Olsen 
wrote:

> Hello All,
> I'm using Asterisk 16.8.0 on a Centos 7 box. Previously 16.5.0, But
> recently upgraded to attempt to resolve this issue. Using bundled PJSIP.
> The PBX is using mysql realtime for most functions. The Mysql server is on
> the same lan as the asterisk box.
>
> As more users have been moved to this box. It's become unstable. Randomly,
> I'll start seeing "WARNING[12667] taskprocessor.c: The
> 'pjsip/distributor-0173' task processor queue reached 500 scheduled
> tasks."
>
> At that time, Running "pjsip show contacts" and "pjsip show endpoints"
> returns nothing. And the box stops responding to all SIP.
>
> The only way I've found thus far to resolve the issue is a "service
> asterisk restart".
>
> I can confirm at the time of the issue running "asterisk -x 'core show
> taskprocessors' | grep 'distributor'" does show many items pending across
> all queues. And the number just increases. Normally when all is fine.
> They're all at 0.
>
> Google-foo hasn't produced anything for me outside issues from 13.x that
> claim to be resolved. Since asterisk isn't fully crashing, I don't think I
> can get backtrace. Someone please correct me if I'm wrong.
> Any ideas? Tips
> ?
>

The wiki[1] has instructions for getting a backtrace for a deadlock from a
running process. It can be used to isolate why things are blocked.
Generally, though, when realtime is involved I've found that it usually
ends up being the database or that interaction in some way. Any hiccup or
issue there can result in blocking in Asterisk.

[1]
https://wiki.asterisk.org/wiki/display/AST/Getting+a+Backtrace#GettingaBacktrace-GettingInformationForADeadlock

-- 
Joshua C. Colp
Asterisk Technical Lead
Sangoma Technologies
Check us out at www.sangoma.com and www.asterisk.org
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

Check out the new Asterisk community forum at: https://community.asterisk.org/

New to Asterisk? Start here:
  https://wiki.asterisk.org/wiki/display/AST/Getting+Started

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users

[asterisk-users] PJSIP Lockup

2020-03-02 Thread Nick Olsen
Hello All,
I'm using Asterisk 16.8.0 on a Centos 7 box. Previously 16.5.0, But
recently upgraded to attempt to resolve this issue. Using bundled PJSIP.
The PBX is using mysql realtime for most functions. The Mysql server is on
the same lan as the asterisk box.

As more users have been moved to this box. It's become unstable. Randomly,
I'll start seeing "WARNING[12667] taskprocessor.c: The
'pjsip/distributor-0173' task processor queue reached 500 scheduled
tasks."

At that time, Running "pjsip show contacts" and "pjsip show endpoints"
returns nothing. And the box stops responding to all SIP.

The only way I've found thus far to resolve the issue is a "service
asterisk restart".

I can confirm at the time of the issue running "asterisk -x 'core show
taskprocessors' | grep 'distributor'" does show many items pending across
all queues. And the number just increases. Normally when all is fine.
They're all at 0.

Google-foo hasn't produced anything for me outside issues from 13.x that
claim to be resolved. Since asterisk isn't fully crashing, I don't think I
can get backtrace. Someone please correct me if I'm wrong.
Any ideas? Tips?

*Nick Olsen*
Network Engineer
Office: 321-408-5000 x103
Mobile: 321-794-0763
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

Check out the new Asterisk community forum at: https://community.asterisk.org/

New to Asterisk? Start here:
  https://wiki.asterisk.org/wiki/display/AST/Getting+Started

asterisk-users mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-users