Re: [asterisk-dev] Asterisk Load Performance

2017-01-10 Thread bala murugan
How to disabled the channel_varset from stasis in asterisk 12 , can you
please provide the steps or configuration .

thanks,
bala

On Fri, Jun 17, 2016 at 5:31 PM, Matthew Jordan  wrote:

> On Fri, Jun 17, 2016 at 1:37 PM, Richard Mudgett 
> wrote:
> >
> >
> > On Fri, Jun 17, 2016 at 12:36 PM, Michael Petruzzello
> >  wrote:
> >>
> >> Hello,
> >>
> >> I am currently working on determining bottlenecks in Asterisk and a
> Stasis
> >> App. I'm currently trying to handle 83.3 calls/second. For the most
> part,
> >> Asterisk and the Stasis APP handle that well, but there is a 60+ second
> >> delay in response time.
> >>
> >> On the Asterisk side, I am seeing the following warnings. [Jun 17
> >> 12:00:16] WARNING[23561]: taskprocessor.c:803 taskprocessor_push: The
> >> 'subm:cdr_engine-0003' task processor queue reached 500 scheduled
> tasks.
> >> [Jun 17 12:00:18] WARNING[25477][C-0068]: taskprocessor.c:803
> >> taskprocessor_push: The 'subm:devService-test-0038' task processor
> queue
> >> reached 500 scheduled tasks.
> >> [Jun 17 12:00:21] WARNING[26298][C-00a3]: taskprocessor.c:803
> >> taskprocessor_push: The 'subp:PJSIP/sippeer-0022' task processor
> queue
> >> reached 500 scheduled tasks.
> >> [Jun 17 12:00:23] WARNING[27339][C-010d]: taskprocessor.c:803
> >> taskprocessor_push: The 'subm:ast_channel_topic_all-cached-0032'
> task
> >> processor queue reached 500 scheduled tasks.
> >> [Jun 17 12:01:32] WARNING[31697][C-03b2]: taskprocessor.c:803
> >> taskprocessor_push: The 'subm:ast_channel_topic_all-0036' task
> processor
> >> queue reached 500 scheduled tasks.
> >> [Jun 17 12:05:55] WARNING[23280]: taskprocessor.c:803
> taskprocessor_push:
> >> The 'SIP' task processor queue reached 500 scheduled tasks.
> >>
> >> I have not seen a configuration setting on Asterisk to prevent these
> >> warnings from occurring (I'm trying to avoid modifying Asterisk source
> code
> >> if possible). Looking at the task processors, I see the queue to the
> stasis
> >> app bottlenecks:
> >> subm:devService-test-00384560990  0
> >> 1041689. It does clear up relatively quickly. The CDR engine also bottle
> >> necks (extremely badly), but I don't use that. Nothing else comes close
> to
> >> having a large queue.
> >>
> >> The stasis app itself is extremely streamlined and is very capable of
> >> handling a large number of messages at a time. The app runs with the
> JVM so
> >> I am also researching into that as well as the netty library I am using
> for
> >> the websocket connections.
> >>
> >> Any insight into Asterisk's side of the equation and how it scales on 40
> >> vCPUs would be greatly appreciated.
> >
> >
> > There are no options to disable those taskprocessor queue size warnings.
> > They are a
> > symptom of the system being severely stressed.  If the stress continues
> it
> > is possible
> > that the system could consume all memory in those taskprocessor queues.
> >
> > Recent changes to the Asterisk v13 branch were made to help throttle back
> > incoming
> > SIP requests on PJSIP when the taskprocessors become backlogged like you
> are
> > seeing.
> > These changes will be in the forthcoming v13.10.0 release.  If you want,
> you
> > can test with
> > the current v13 branch to see how these changes affect your stress
> testing.
> >
> > If you don't need CDR's then you really need to disable them as they
> consume
> > a lot of
> > processing time and the CDR taskprocessor queue backlog can take minutes
> to
> > clear.
> >
>
> To echo what Richard said, because Asterisk is now sharing state
> across the Stasis message bus, turning off subscribers to that bus
> will help performance. Some easy ones to disable, if you aren't using
> them, are CDRs, CEL, and AMI. Those all do a reasonable amount of
> processing, and you can get some noticeable improvement by disabling
> them.
>
> Once you get past that, you can start fiddling with some of the lower
> level options. To start, you can throttle things back further by
> disabling certain internal messages in stasis.conf. As stasis.conf
> notes, functionality within Asterisk can break (or just not happen) if
> some messages are removed. For example, disabling
> 'ast_channel_snapshot_type' would break ... most things. You may
> however be able to streamline your application by looking at what ARI
> messages it cares about, what messages it doesn't, inspecting the
> code, and disabling those that you don't care about. Lots of testing
> should occur before doing this, of course.
>
> You may also be able to get some different performance characteristics
> by changing the threadpool options for the message bus in stasis.conf.
> This may make a difference, depending on the underlying machine.
>
> --
> Matthew Jordan
> Digium, Inc. | CTO
> 445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
> Check us out at: http://digium.com & 

Re: [asterisk-dev] Asterisk Load Performance

2016-07-12 Thread Michael Petruzzello
On Wed, Jul 6, 2016 at 2:41 PM, Matthew Jordon >
wrote:
>  While that's definitely a more sustainable approach, it has been awfully
> entertaining/interesting to see how far you were able to take it. I think
> everyone was pretty impressed when you hit 5000 channels in a single
> bridge. Thanks for giving it a shot!
>
> As an aside, what were you using to simulate the callers? SIPp + a pcap
> file, or something else?
>
> Matt

Before doing this level of load testing, I was using StarTrinity's Sip
Tester. Unfortunately, its' licensing scheme is a ripoff. The number of
concurrent calls are capped by the level of the license.

To handle the current testing, an internally developed stasis application
with Asterisk has been made. Right now it is pretty bare bones, but I plan
on making it open source.


*Michael J. Petruzzello*
Software Engineer
P.O. Box 4689
Greenwich, CT 06831
203-618-1811 ext.289 (office)
www.civi.com
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Re: [asterisk-dev] Asterisk Load Performance

2016-07-06 Thread Matthew Jordan
On Wed, Jul 6, 2016 at 2:20 PM, Michael Petruzzello <
michael.petruzze...@civi.com> wrote:

> On Tue, Jul 5, 2016 at 4:03 PM, Jonathan Rose  motorolasolutions.com> wrote:
>
> > If you don't need all of your participants actually to be speaking at a
> > time (and I hope not with that kind of volume), you could use holding
> > bridges for the vast majority of the partipants. Link the bridges using a
> > local channel with the Hold bridge side being set to use the 'announcer'
> > bridge role and the hold bridge will effectively just be voiceless
> > conference participants. If you want, you can listen for DTMF events to
> > move the participants back and forth between the different bridges.
>
> Doing the conference this way results in the same kind of long voice queue
> warnings/errors as before and eventually the DNS lookup for the server
> fails. All 5,000 callers were able to get in though, which is a bit better
> than before.
>
> On Tue, Jul 5, 2016 at 5:09 PM, *Richard Mudgett  *> wrote:
>
> > The exceptionally long voice queue length messages can be a symptom of
> > thread
> > starvation as the Local channels frame queue has developed an excessive
> > backlog.
> >
> > The forthcoming v13.10.0 release should indirectly take care of the
> EEXISTS
> > messages
> > as part of the https://issues.asterisk.org/jira/browse/ASTERISK-26088
> fix.
> > Working on
> > that issue I saw the EEXISTS messages for REGISTER and SUBSCRIBE message
> > processing.  The issue was a result of the original message and
> > retransmissions getting
> > backlogged in the serializer/taskprocessor and responses sent using
> another
> > serializer.
> >
> > Looks like your system's DNS resolver has gotten overwhelmed.
>
> Is there any configuration changes I can make to help alleviate the thread
> starvation on the Local channels frame queue?
>
> It does not make sense to me that the system's DNS resolver is getting
> overwhelmed. When I have 10,000 calls in one bridge, this does not occur.
> When I have multiple bridges with locally originated channels bridging them
> then the DNS errors occur.
>
> > Wow.  Thanks Jonathan.  I hadn't thought of doing it that way.  That
> should
> > really drop the mixing load.
> > Probably should allow only ulaw or alaw (pick one) for all participants
> to
> > minimize translation costs.
> > One additional thing I should add is that those linking Local channel
> > bridges should just allow the
> > chosen alaw/ulaw to reduce translation to each participant in the holding
> > bridge.  The forthcoming
> > v13.10.0 adds the ability to specify formats when ARI originates a
> channel
> > (Local in this case) and
> > an originator channel is not available.  (See CHANGES file)
>
> I have only been allowing ulaw. That is very interesting to note about the
> Local channels. I'll keep that in mind.
>
> Well, thank you Jonathan, Richard, and Matthew. You have all been really
> helpful. This has been really interesting trying to get 10,000 callers on
> one Asterisk server. As Asterisk is not capable on one server for what I am
> trying to do, I am going to design a scalable, multi-server architecture
> instead.
>

 While that's definitely a more sustainable approach, it has been awfully
entertaining/interesting to see how far you were able to take it. I think
everyone was pretty impressed when you hit 5000 channels in a single
bridge. Thanks for giving it a shot!

As an aside, what were you using to simulate the callers? SIPp + a pcap
file, or something else?

Matt

-- 
Matthew Jordan
Digium, Inc. | CTO
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: http://digium.com & http://asterisk.org
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Re: [asterisk-dev] Asterisk Load Performance

2016-07-06 Thread Michael Petruzzello
On Tue, Jul 5, 2016 at 4:03 PM, Jonathan Rose  wrote:

> If you don't need all of your participants actually to be speaking at a
> time (and I hope not with that kind of volume), you could use holding
> bridges for the vast majority of the partipants. Link the bridges using a
> local channel with the Hold bridge side being set to use the 'announcer'
> bridge role and the hold bridge will effectively just be voiceless
> conference participants. If you want, you can listen for DTMF events to
> move the participants back and forth between the different bridges.

Doing the conference this way results in the same kind of long voice queue
warnings/errors as before and eventually the DNS lookup for the server
fails. All 5,000 callers were able to get in though, which is a bit better
than before.

On Tue, Jul 5, 2016 at 5:09 PM, *Richard Mudgett http://digium.com>*> wrote:

> The exceptionally long voice queue length messages can be a symptom of
> thread
> starvation as the Local channels frame queue has developed an excessive
> backlog.
>
> The forthcoming v13.10.0 release should indirectly take care of the
EEXISTS
> messages
> as part of the https://issues.asterisk.org/jira/browse/ASTERISK-26088 fix.
> Working on
> that issue I saw the EEXISTS messages for REGISTER and SUBSCRIBE message
> processing.  The issue was a result of the original message and
> retransmissions getting
> backlogged in the serializer/taskprocessor and responses sent using
another
> serializer.
>
> Looks like your system's DNS resolver has gotten overwhelmed.

Is there any configuration changes I can make to help alleviate the thread
starvation on the Local channels frame queue?

It does not make sense to me that the system's DNS resolver is getting
overwhelmed. When I have 10,000 calls in one bridge, this does not occur.
When I have multiple bridges with locally originated channels bridging them
then the DNS errors occur.

> Wow.  Thanks Jonathan.  I hadn't thought of doing it that way.  That
should
> really drop the mixing load.
> Probably should allow only ulaw or alaw (pick one) for all participants to
> minimize translation costs.
> One additional thing I should add is that those linking Local channel
> bridges should just allow the
> chosen alaw/ulaw to reduce translation to each participant in the holding
> bridge.  The forthcoming
> v13.10.0 adds the ability to specify formats when ARI originates a channel
> (Local in this case) and
> an originator channel is not available.  (See CHANGES file)

I have only been allowing ulaw. That is very interesting to note about the
Local channels. I'll keep that in mind.

Well, thank you Jonathan, Richard, and Matthew. You have all been really
helpful. This has been really interesting trying to get 10,000 callers on
one Asterisk server. As Asterisk is not capable on one server for what I am
trying to do, I am going to design a scalable, multi-server architecture
instead.


*Michael J. Petruzzello*
Software Engineer
P.O. Box 4689
Greenwich, CT 06831
203-618-1811 ext.289 (office)
www.civi.com
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Re: [asterisk-dev] Asterisk Load Performance

2016-07-05 Thread Richard Mudgett
On Tue, Jul 5, 2016 at 4:03 PM, Jonathan Rose <
jonathan.r...@motorolasolutions.com> wrote:

>
> On Tue, Jul 5, 2016 at 3:43 PM, Michael Petruzzello <
> michael.petruzze...@civi.com> wrote:
>
>> On Wed, Jun 29 at 11:14:04 AM, Richard Mudgett> >
>> wrote:
>> > Each softmix bridge has only one thread performing all of the media
>> mixing
>> > for the bridge.  To
>> > get better mixing performance for such a large conference, you will
>> need to
>> > create several
>> > softmix bridges in a hierarchy with the bridges linked by local
>> channels.
>>
>> A bridge is only able to handle around 2000-2500 channels, so I created
>> 15 bridges with 14 channels bridging the bridges together.
>>
>> When doing this an error I see a lot is WARNING[98920]: channel.c:1101
>> __ast_queue_frame: Exceptionally long voice queue length queuing to
>> Local/**@default-;2, which then turns into WARNING[47525]:
>> pjproject:0 :  sip_transactio .Unable to register INVITE transaction
>> (key exists) and ERROR[47525]: res_pjsip.c:2777 ast_sip_create_dialog_uas:
>> Could not create dialog with endpoint sippeer. Object already exists
>> (PJ_EEXISTS). Finally the following repeats over and over again, [Jun 30
>> 12:22:21] ERROR[84189][C-0958]: netsock2.c:305 ast_sockaddr_resolve:
>> getaddrinfo("domain.name
>> ",
>> "(null)", ...): Temporary failure in name resolution
>> [Jun 30 12:22:21] WARNING[84189][C-0958]: acl.c:800 resolve_first:
>> Unable to lookup 'domain.name
>> 
>> '.
>>
>
The exceptionally long voice queue length messages can be a symptom of
thread
starvation as the Local channels frame queue has developed an excessive
backlog.

The forthcoming v13.10.0 release should indirectly take care of the EEXISTS
messages
as part of the https://issues.asterisk.org/jira/browse/ASTERISK-26088 fix.
Working on
that issue I saw the EEXISTS messages for REGISTER and SUBSCRIBE message
processing.  The issue was a result of the original message and
retransmissions getting
backlogged in the serializer/taskprocessor and responses sent using another
serializer.

Looks like your system's DNS resolver has gotten overwhelmed.


>
>> The last error just keeps on repeating and calls can no longer join (only
>> around 3,500 make it on before this starts to occur). Calling in manually I
>> receive an "all circuits are busy" message.
>>
>> I'm going to try halving the number of bridges, but is there anything
>> else I can do to improve performance? This seems to be the last hurdle to
>> use one server for 10,000 callers.
>>
>
> If you don't need all of your participants actually to be speaking at a
> time (and I hope not with that kind of volume), you could use holding
> bridges for the vast majority of the partipants. Link the bridges using a
> local channel with the Hold bridge side being set to use the 'announcer'
> bridge role and the hold bridge will effectively just be voiceless
> conference participants. If you want, you can listen for DTMF events to
> move the participants back and forth between the different bridges.
>

Wow.  Thanks Jonathan.  I hadn't thought of doing it that way.  That should
really drop the mixing load.
Probably should allow only ulaw or alaw (pick one) for all participants to
minimize translation costs.
One additional thing I should add is that those linking Local channel
bridges should just allow the
chosen alaw/ulaw to reduce translation to each participant in the holding
bridge.  The forthcoming
v13.10.0 adds the ability to specify formats when ARI originates a channel
(Local in this case) and
an originator channel is not available.  (See CHANGES file)

Richard
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Re: [asterisk-dev] Asterisk Load Performance

2016-07-05 Thread Jonathan Rose
On Tue, Jul 5, 2016 at 3:43 PM, Michael Petruzzello <
michael.petruzze...@civi.com> wrote:

> On Wed, Jun 29 at 11:14:04 AM, Richard Mudgett >
> wrote:
> > Each softmix bridge has only one thread performing all of the media
> mixing
> > for the bridge.  To
> > get better mixing performance for such a large conference, you will need
> to
> > create several
> > softmix bridges in a hierarchy with the bridges linked by local channels.
>
> A bridge is only able to handle around 2000-2500 channels, so I created 15
> bridges with 14 channels bridging the bridges together.
>
> When doing this an error I see a lot is WARNING[98920]: channel.c:1101
> __ast_queue_frame: Exceptionally long voice queue length queuing to
> Local/**@default-;2, which then turns into WARNING[47525]:
> pjproject:0 :  sip_transactio .Unable to register INVITE transaction
> (key exists) and ERROR[47525]: res_pjsip.c:2777 ast_sip_create_dialog_uas:
> Could not create dialog with endpoint sippeer. Object already exists
> (PJ_EEXISTS). Finally the following repeats over and over again, [Jun 30
> 12:22:21] ERROR[84189][C-0958]: netsock2.c:305 ast_sockaddr_resolve:
> getaddrinfo("domain.name
> ",
> "(null)", ...): Temporary failure in name resolution
> [Jun 30 12:22:21] WARNING[84189][C-0958]: acl.c:800 resolve_first:
> Unable to lookup 'domain.name
> 
> '.
>
> The last error just keeps on repeating and calls can no longer join (only
> around 3,500 make it on before this starts to occur). Calling in manually I
> receive an "all circuits are busy" message.
>
> I'm going to try halving the number of bridges, but is there anything else
> I can do to improve performance? This seems to be the last hurdle to use
> one server for 10,000 callers.
>

If you don't need all of your participants actually to be speaking at a
time (and I hope not with that kind of volume), you could use holding
bridges for the vast majority of the partipants. Link the bridges using a
local channel with the Hold bridge side being set to use the 'announcer'
bridge role and the hold bridge will effectively just be voiceless
conference participants. If you want, you can listen for DTMF events to
move the participants back and forth between the different bridges.

-- 

*Jonathan R. Rose*Senior Systems Engineer

Emergency CallWorks
Motorola Solutions

email: jonathan.r...@motorolasolutions.com
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Re: [asterisk-dev] Asterisk Load Performance

2016-07-05 Thread Michael Petruzzello
On Wed, Jun 29 at 11:14:04 AM, Richard Mudgett
wrote:
> Each softmix bridge has only one thread performing all of the media mixing
> for the bridge.  To
> get better mixing performance for such a large conference, you will need
to
> create several
> softmix bridges in a hierarchy with the bridges linked by local channels.

A bridge is only able to handle around 2000-2500 channels, so I created 15
bridges with 14 channels bridging the bridges together.

When doing this an error I see a lot is WARNING[98920]: channel.c:1101
__ast_queue_frame: Exceptionally long voice queue length queuing to
Local/**@default-;2, which then turns into WARNING[47525]:
pjproject:0 :  sip_transactio .Unable to register INVITE transaction
(key exists) and ERROR[47525]: res_pjsip.c:2777 ast_sip_create_dialog_uas:
Could not create dialog with endpoint sippeer. Object already exists
(PJ_EEXISTS). Finally the following repeats over and over again, [Jun 30
12:22:21] ERROR[84189][C-0958]: netsock2.c:305 ast_sockaddr_resolve:
getaddrinfo("domain.name", "(null)", ...): Temporary failure in name
resolution
[Jun 30 12:22:21] WARNING[84189][C-0958]: acl.c:800 resolve_first:
Unable to lookup 'domain.name'.

The last error just keeps on repeating and calls can no longer join (only
around 3,500 make it on before this starts to occur). Calling in manually I
receive an "all circuits are busy" message.

I'm going to try halving the number of bridges, but is there anything else
I can do to improve performance? This seems to be the last hurdle to use
one server for 10,000 callers.




*Michael J. Petruzzello*
Software Engineer
P.O. Box 4689
Greenwich, CT 06831
203-618-1811 ext.289 (office)
www.civi.com

On Wed, Jun 29, 2016 at 10:55 AM, Michael Petruzzello <
michael.petruzze...@civi.com> wrote:

> It is very interesting how threading issues on both a stasis application
> and Asterisk escalate each other. Using 15 websockets in one stasis
> application and removing all thread locking from the application have made
> the ARI messages flow smoothly. Right now I am using about 900 threads to
> process messages from Asterisk and Asterisk has at least 320 in stasis,
> though that can increase to infinity.
>
> I have also disabled the channel_varset from stasis because it becomes
> really unwieldy. When having thousands of callers in a bridge, every time a
> channel is added to a bridge or removed, every channel receives a channel
> var set message because of the BridgePeer variable.
>
> As of now, I have two remaining problems:
>
> 1. At around having 5,000 channels in a bridge (whether majority are muted
> or not), the audio breaks down. Anyone talking can only be heard in 3
> second bursts approximately every 5-10 seconds. At 10,000 channels only
> static can be heard in these 3 second bursts.
>
> Is there anything I can optimize so that Asterisk can handle all these
> channels in a bridge?
>
> 2. Every time a channel joins the bridge, the websocket responsible for
> that channel is then subscribed to the bridge. Then any events that occur
> on that bridge (such as another channel entering or exiting it) are sent to
> that websocket. Because every websocket then ends up receiving these
> messages, it defeats the point of having multiple websockets. To get around
> this I have been unsubscribing the websockets from the bridge anytime a
> channel from that websocket enters the bridge, but this isn't perfect as
> timing is an issue.
>
> Is there anyway to disable this automatic subscription behavior to a
> bridge?
>
>
> *Michael J. Petruzzello*
> Software Engineer
> P.O. Box 4689
> Greenwich, CT 06831
> 203-618-1811 ext.289 (office)
> www.civi.com
>
> On Tue, Jun 21, 2016 at 3:29 PM, Michael Petruzzello <
> michael.petruzze...@civi.com> wrote:
>
>> On Tue, Jun 21, 2016 at 12:16 PM, Richard Mudgett 
>> wrote:
>> > The subm:devService-test-0038 taskprocessor is servicing the stasis
>> > message bus
>> > communication with your devService-test ARI application.  Since each
>> > taskprocessor is
>> > executed by one thread, that is going to be a bottleneck.  One thing you
>> > can try is to
>> > register multiple copies of your ARI application and randomly spread the
>> > calls to the
>> > different copies of the application.  (devService-test1,
>> > devService-test2,...)
>>
>> Ah, that explains it! Everything else has been running well in Asterisk
>> as far as handling the actual channels and the SIP messaging with the large
>> calls / second.
>>
>> I was thinking about the potential of parallelizing the stasis message
>> bus communication to use multiple task processors, but that would introduce
>> other issues. Messages would be sent out of order, and the ARI application
>> would need to handle that.
>>
>> Your suggestion sounds like the best approach. That way I still have only
>> one application with multiple connections to Asterisk. No need to have
>> multiple applications and servers that would need to communicate 

Re: [asterisk-dev] Asterisk Load Performance

2016-06-29 Thread Richard Mudgett
On Wed, Jun 29, 2016 at 9:55 AM, Michael Petruzzello <
michael.petruzze...@civi.com> wrote:

> It is very interesting how threading issues on both a stasis application
> and Asterisk escalate each other. Using 15 websockets in one stasis
> application and removing all thread locking from the application have made
> the ARI messages flow smoothly. Right now I am using about 900 threads to
> process messages from Asterisk and Asterisk has at least 320 in stasis,
> though that can increase to infinity.
>
> I have also disabled the channel_varset from stasis because it becomes
> really unwieldy. When having thousands of callers in a bridge, every time a
> channel is added to a bridge or removed, every channel receives a channel
> var set message because of the BridgePeer variable.
>
> As of now, I have two remaining problems:
>
> 1. At around having 5,000 channels in a bridge (whether majority are muted
> or not), the audio breaks down. Anyone talking can only be heard in 3
> second bursts approximately every 5-10 seconds. At 10,000 channels only
> static can be heard in these 3 second bursts.
>
> Is there anything I can optimize so that Asterisk can handle all these
> channels in a bridge?
>

Each softmix bridge has only one thread performing all of the media mixing
for the bridge.  To
get better mixing performance for such a large conference, you will need to
create several
softmix bridges in a hierarchy with the bridges linked by local channels.

Richard
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Re: [asterisk-dev] Asterisk Load Performance

2016-06-29 Thread Michael Petruzzello
It is very interesting how threading issues on both a stasis application
and Asterisk escalate each other. Using 15 websockets in one stasis
application and removing all thread locking from the application have made
the ARI messages flow smoothly. Right now I am using about 900 threads to
process messages from Asterisk and Asterisk has at least 320 in stasis,
though that can increase to infinity.

I have also disabled the channel_varset from stasis because it becomes
really unwieldy. When having thousands of callers in a bridge, every time a
channel is added to a bridge or removed, every channel receives a channel
var set message because of the BridgePeer variable.

As of now, I have two remaining problems:

1. At around having 5,000 channels in a bridge (whether majority are muted
or not), the audio breaks down. Anyone talking can only be heard in 3
second bursts approximately every 5-10 seconds. At 10,000 channels only
static can be heard in these 3 second bursts.

Is there anything I can optimize so that Asterisk can handle all these
channels in a bridge?

2. Every time a channel joins the bridge, the websocket responsible for
that channel is then subscribed to the bridge. Then any events that occur
on that bridge (such as another channel entering or exiting it) are sent to
that websocket. Because every websocket then ends up receiving these
messages, it defeats the point of having multiple websockets. To get around
this I have been unsubscribing the websockets from the bridge anytime a
channel from that websocket enters the bridge, but this isn't perfect as
timing is an issue.

Is there anyway to disable this automatic subscription behavior to a bridge?


*Michael J. Petruzzello*
Software Engineer
P.O. Box 4689
Greenwich, CT 06831
203-618-1811 ext.289 (office)
www.civi.com

On Tue, Jun 21, 2016 at 3:29 PM, Michael Petruzzello <
michael.petruzze...@civi.com> wrote:

> On Tue, Jun 21, 2016 at 12:16 PM, Richard Mudgett 
> wrote:
> > The subm:devService-test-0038 taskprocessor is servicing the stasis
> > message bus
> > communication with your devService-test ARI application.  Since each
> > taskprocessor is
> > executed by one thread, that is going to be a bottleneck.  One thing you
> > can try is to
> > register multiple copies of your ARI application and randomly spread the
> > calls to the
> > different copies of the application.  (devService-test1,
> > devService-test2,...)
>
> Ah, that explains it! Everything else has been running well in Asterisk as
> far as handling the actual channels and the SIP messaging with the large
> calls / second.
>
> I was thinking about the potential of parallelizing the stasis message bus
> communication to use multiple task processors, but that would introduce
> other issues. Messages would be sent out of order, and the ARI application
> would need to handle that.
>
> Your suggestion sounds like the best approach. That way I still have only
> one application with multiple connections to Asterisk. No need to have
> multiple applications and servers that would need to communicate together.
>
> Thank you for the insight.
>
> On Tue, Jun 21, 2016 at 1:03 PM, Matthew Jordan
> wrote:
> > To follow up with Richard's suggestion:
> >
> > Events being written out (either over a WebSocket in ARI or over a
> > direct TCP socket in AMI) have to be fully written before the next
> > event is written. That means that the client application processing
> > the events can directly slow down the rate at which events are sent if
> > the process that is reading the event does not keep reading from the
> > socket as quickly as possible. You may already be doing this - in
> > which case, disregard the suggestion - but you may want to have one
> > thread/process read from the ARI WebSocket, and farm out the
> > processing of the events to some other thread/process.
>
> I am already doing this, but thank you for the suggestion. Database access
> really slows down the processing of events so I have had to do this from
> the start of my project.
>
>
> *Michael J. Petruzzello*
> Software Engineer
> P.O. Box 4689
> Greenwich, CT 06831
> 203-618-1811 ext.289 (office)
> www.civi.com
>
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Re: [asterisk-dev] Asterisk Load Performance

2016-06-21 Thread Michael Petruzzello
On Tue, Jun 21, 2016 at 12:16 PM, Richard Mudgett 
wrote:
> The subm:devService-test-0038 taskprocessor is servicing the stasis
> message bus
> communication with your devService-test ARI application.  Since each
> taskprocessor is
> executed by one thread, that is going to be a bottleneck.  One thing you
> can try is to
> register multiple copies of your ARI application and randomly spread the
> calls to the
> different copies of the application.  (devService-test1,
> devService-test2,...)

Ah, that explains it! Everything else has been running well in Asterisk as
far as handling the actual channels and the SIP messaging with the large
calls / second.

I was thinking about the potential of parallelizing the stasis message bus
communication to use multiple task processors, but that would introduce
other issues. Messages would be sent out of order, and the ARI application
would need to handle that.

Your suggestion sounds like the best approach. That way I still have only
one application with multiple connections to Asterisk. No need to have
multiple applications and servers that would need to communicate together.

Thank you for the insight.

On Tue, Jun 21, 2016 at 1:03 PM, Matthew Jordan
wrote:
> To follow up with Richard's suggestion:
>
> Events being written out (either over a WebSocket in ARI or over a
> direct TCP socket in AMI) have to be fully written before the next
> event is written. That means that the client application processing
> the events can directly slow down the rate at which events are sent if
> the process that is reading the event does not keep reading from the
> socket as quickly as possible. You may already be doing this - in
> which case, disregard the suggestion - but you may want to have one
> thread/process read from the ARI WebSocket, and farm out the
> processing of the events to some other thread/process.

I am already doing this, but thank you for the suggestion. Database access
really slows down the processing of events so I have had to do this from
the start of my project.


*Michael J. Petruzzello*
Software Engineer
P.O. Box 4689
Greenwich, CT 06831
203-618-1811 ext.289 (office)
www.civi.com
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Re: [asterisk-dev] Asterisk Load Performance

2016-06-21 Thread Matthew Jordan
On Tue, Jun 21, 2016 at 12:16 PM, Richard Mudgett  wrote:
>
>
> On Tue, Jun 21, 2016 at 11:12 AM, Michael Petruzzello
>  wrote:
>>
>> >On Fri, Jun 17, 2016 at 1:37 PM, Richard Mudgett 
>> > wrote:
>> >>
>> >>
>> >> On Fri, Jun 17, 2016 at 12:36 PM, Michael Petruzzello
>> >>  wrote:
>> >>>
>> >>> Hello,
>> >>>
>> >>> I am currently working on determining bottlenecks in Asterisk and a
>> >>> Stasis
>> >>> App. I'm currently trying to handle 83.3 calls/second. For the most
>> >>> part,
>> >>> Asterisk and the Stasis APP handle that well, but there is a 60+
>> >>> second
>> >>> delay in response time.
>> >>>
>> >>> On the Asterisk side, I am seeing the following warnings. [Jun 17
>> >>> 12:00:16] WARNING[23561]: taskprocessor.c:803 taskprocessor_push: The
>> >>> 'subm:cdr_engine-0003' task processor queue reached 500 scheduled
>> >>> tasks.
>> >>> [Jun 17 12:00:18] WARNING[25477][C-0068]: taskprocessor.c:803
>> >>> taskprocessor_push: The 'subm:devService-test-0038' task processor
>> >>> queue
>> >>> reached 500 scheduled tasks.
>> >>> [Jun 17 12:00:21] WARNING[26298][C-00a3]: taskprocessor.c:803
>> >>> taskprocessor_push: The 'subp:PJSIP/sippeer-0022' task processor
>> >>> queue
>> >>> reached 500 scheduled tasks.
>> >>> [Jun 17 12:00:23] WARNING[27339][C-010d]: taskprocessor.c:803
>> >>> taskprocessor_push: The 'subm:ast_channel_topic_all-cached-0032'
>> >>> task
>> >>> processor queue reached 500 scheduled tasks.
>> >>> [Jun 17 12:01:32] WARNING[31697][C-03b2]: taskprocessor.c:803
>> >>> taskprocessor_push: The 'subm:ast_channel_topic_all-0036' task
>> >>> processor
>> >>> queue reached 500 scheduled tasks.
>> >>> [Jun 17 12:05:55] WARNING[23280]: taskprocessor.c:803
>> >>> taskprocessor_push:
>> >>> The 'SIP' task processor queue reached 500 scheduled tasks.
>> >>>
>> >>> I have not seen a configuration setting on Asterisk to prevent these
>> >>> warnings from occurring (I'm trying to avoid modifying Asterisk source
>> >>> code
>> >>> if possible). Looking at the task processors, I see the queue to the
>> >>> stasis
>> >>> app bottlenecks:
>> >>> subm:devService-test-00384560990  0
>> >>> 1041689. It does clear up relatively quickly. The CDR engine also
>> >>> bottle
>> >>> necks (extremely badly), but I don't use that. Nothing else comes
>> >>> close to
>> >>> having a large queue.
>> >>>
>> >>> The stasis app itself is extremely streamlined and is very capable of
>> >>> handling a large number of messages at a time. The app runs with the
>> >>> JVM so
>> >>> I am also researching into that as well as the netty library I am
>> >>> using for
>> >>> the websocket connections.
>> >>>
>> >>> Any insight into Asterisk's side of the equation and how it scales on
>> >>> 40
>> >>> vCPUs would be greatly appreciated.
>> >>
>> >>
>> >> There are no options to disable those taskprocessor queue size
>> >> warnings.
>> >> They are a
>> >> symptom of the system being severely stressed.  If the stress continues
>> >> it
>> >> is possible
>> >> that the system could consume all memory in those taskprocessor queues.
>> >>
>> >> Recent changes to the Asterisk v13 branch were made to help throttle
>> >> back
>> >> incoming
>> >> SIP requests on PJSIP when the taskprocessors become backlogged like
>> >> you are
>> >> seeing.
>> >> These changes will be in the forthcoming v13.10.0 release.  If you
>> >> want, you
>> >> can test with
>> >> the current v13 branch to see how these changes affect your stress
>> >> testing.
>> >>
>> >> If you don't need CDR's then you really need to disable them as they
>> >> consume
>> >> a lot of
>> >> processing time and the CDR taskprocessor queue backlog can take
>> >> minutes to
>> >> clear.
>> >>
>> >
>> >To echo what Richard said, because Asterisk is now sharing state
>> >across the Stasis message bus, turning off subscribers to that bus
>> >will help performance. Some easy ones to disable, if you aren't using
>> >them, are CDRs, CEL, and AMI. Those all do a reasonable amount of
>> >processing, and you can get some noticeable improvement by disabling
>> >them.
>> >
>> >Once you get past that, you can start fiddling with some of the lower
>> >level options. To start, you can throttle things back further by
>> >disabling certain internal messages in stasis.conf. As stasis.conf
>> >notes, functionality within Asterisk can break (or just not happen) if
>> >some messages are removed. For example, disabling
>> >'ast_channel_snapshot_type' would break ... most things. You may
>> >however be able to streamline your application by looking at what ARI
>> >messages it cares about, what messages it doesn't, inspecting the
>> >code, and disabling those that you don't care about. Lots of testing
>> >should occur before doing this, of course.
>> >
>> >You may also be able to get some different performance characteristics
>> >by changing 

Re: [asterisk-dev] Asterisk Load Performance

2016-06-21 Thread Richard Mudgett
On Tue, Jun 21, 2016 at 11:12 AM, Michael Petruzzello <
michael.petruzze...@civi.com> wrote:

> >On Fri, Jun 17, 2016 at 1:37 PM, Richard Mudgett 
> wrote:
> >>
> >>
> >> On Fri, Jun 17, 2016 at 12:36 PM, Michael Petruzzello
> >>  wrote:
> >>>
> >>> Hello,
> >>>
> >>> I am currently working on determining bottlenecks in Asterisk and a
> Stasis
> >>> App. I'm currently trying to handle 83.3 calls/second. For the most
> part,
> >>> Asterisk and the Stasis APP handle that well, but there is a 60+ second
> >>> delay in response time.
> >>>
> >>> On the Asterisk side, I am seeing the following warnings. [Jun 17
> >>> 12:00:16] WARNING[23561]: taskprocessor.c:803 taskprocessor_push: The
> >>> 'subm:cdr_engine-0003' task processor queue reached 500 scheduled
> tasks.
> >>> [Jun 17 12:00:18] WARNING[25477][C-0068]: taskprocessor.c:803
> >>> taskprocessor_push: The 'subm:devService-test-0038' task processor
> queue
> >>> reached 500 scheduled tasks.
> >>> [Jun 17 12:00:21] WARNING[26298][C-00a3]: taskprocessor.c:803
> >>> taskprocessor_push: The 'subp:PJSIP/sippeer-0022' task processor
> queue
> >>> reached 500 scheduled tasks.
> >>> [Jun 17 12:00:23] WARNING[27339][C-010d]: taskprocessor.c:803
> >>> taskprocessor_push: The 'subm:ast_channel_topic_all-cached-0032'
> task
> >>> processor queue reached 500 scheduled tasks.
> >>> [Jun 17 12:01:32] WARNING[31697][C-03b2]: taskprocessor.c:803
> >>> taskprocessor_push: The 'subm:ast_channel_topic_all-0036' task
> processor
> >>> queue reached 500 scheduled tasks.
> >>> [Jun 17 12:05:55] WARNING[23280]: taskprocessor.c:803
> taskprocessor_push:
> >>> The 'SIP' task processor queue reached 500 scheduled tasks.
> >>>
> >>> I have not seen a configuration setting on Asterisk to prevent these
> >>> warnings from occurring (I'm trying to avoid modifying Asterisk source
> code
> >>> if possible). Looking at the task processors, I see the queue to the
> stasis
> >>> app bottlenecks:
> >>> subm:devService-test-00384560990  0
> >>> 1041689. It does clear up relatively quickly. The CDR engine also
> bottle
> >>> necks (extremely badly), but I don't use that. Nothing else comes
> close to
> >>> having a large queue.
> >>>
> >>> The stasis app itself is extremely streamlined and is very capable of
> >>> handling a large number of messages at a time. The app runs with the
> JVM so
> >>> I am also researching into that as well as the netty library I am
> using for
> >>> the websocket connections.
> >>>
> >>> Any insight into Asterisk's side of the equation and how it scales on
> 40
> >>> vCPUs would be greatly appreciated.
> >>
> >>
> >> There are no options to disable those taskprocessor queue size warnings.
> >> They are a
> >> symptom of the system being severely stressed.  If the stress continues
> it
> >> is possible
> >> that the system could consume all memory in those taskprocessor queues.
> >>
> >> Recent changes to the Asterisk v13 branch were made to help throttle
> back
> >> incoming
> >> SIP requests on PJSIP when the taskprocessors become backlogged like
> you are
> >> seeing.
> >> These changes will be in the forthcoming v13.10.0 release.  If you
> want, you
> >> can test with
> >> the current v13 branch to see how these changes affect your stress
> testing.
> >>
> >> If you don't need CDR's then you really need to disable them as they
> consume
> >> a lot of
> >> processing time and the CDR taskprocessor queue backlog can take
> minutes to
> >> clear.
> >>
> >
> >To echo what Richard said, because Asterisk is now sharing state
> >across the Stasis message bus, turning off subscribers to that bus
> >will help performance. Some easy ones to disable, if you aren't using
> >them, are CDRs, CEL, and AMI. Those all do a reasonable amount of
> >processing, and you can get some noticeable improvement by disabling
> >them.
> >
> >Once you get past that, you can start fiddling with some of the lower
> >level options. To start, you can throttle things back further by
> >disabling certain internal messages in stasis.conf. As stasis.conf
> >notes, functionality within Asterisk can break (or just not happen) if
> >some messages are removed. For example, disabling
> >'ast_channel_snapshot_type' would break ... most things. You may
> >however be able to streamline your application by looking at what ARI
> >messages it cares about, what messages it doesn't, inspecting the
> >code, and disabling those that you don't care about. Lots of testing
> >should occur before doing this, of course.
> >
> >You may also be able to get some different performance characteristics
> >by changing the threadpool options for the message bus in stasis.conf.
> >This may make a difference, depending on the underlying machine.
>
> Thank you for the suggestions.
>
> I'm running Asterisk on 40 vCPUs with 120 GB of RAM. Changing the thread
> pool options to many more threads is not increasing 

Re: [asterisk-dev] Asterisk Load Performance

2016-06-21 Thread Michael Petruzzello
>On Fri, Jun 17, 2016 at 1:37 PM, Richard Mudgett 
wrote:
>>
>>
>> On Fri, Jun 17, 2016 at 12:36 PM, Michael Petruzzello
>>  wrote:
>>>
>>> Hello,
>>>
>>> I am currently working on determining bottlenecks in Asterisk and a
Stasis
>>> App. I'm currently trying to handle 83.3 calls/second. For the most
part,
>>> Asterisk and the Stasis APP handle that well, but there is a 60+ second
>>> delay in response time.
>>>
>>> On the Asterisk side, I am seeing the following warnings. [Jun 17
>>> 12:00:16] WARNING[23561]: taskprocessor.c:803 taskprocessor_push: The
>>> 'subm:cdr_engine-0003' task processor queue reached 500 scheduled
tasks.
>>> [Jun 17 12:00:18] WARNING[25477][C-0068]: taskprocessor.c:803
>>> taskprocessor_push: The 'subm:devService-test-0038' task processor
queue
>>> reached 500 scheduled tasks.
>>> [Jun 17 12:00:21] WARNING[26298][C-00a3]: taskprocessor.c:803
>>> taskprocessor_push: The 'subp:PJSIP/sippeer-0022' task processor
queue
>>> reached 500 scheduled tasks.
>>> [Jun 17 12:00:23] WARNING[27339][C-010d]: taskprocessor.c:803
>>> taskprocessor_push: The 'subm:ast_channel_topic_all-cached-0032'
task
>>> processor queue reached 500 scheduled tasks.
>>> [Jun 17 12:01:32] WARNING[31697][C-03b2]: taskprocessor.c:803
>>> taskprocessor_push: The 'subm:ast_channel_topic_all-0036' task
processor
>>> queue reached 500 scheduled tasks.
>>> [Jun 17 12:05:55] WARNING[23280]: taskprocessor.c:803
taskprocessor_push:
>>> The 'SIP' task processor queue reached 500 scheduled tasks.
>>>
>>> I have not seen a configuration setting on Asterisk to prevent these
>>> warnings from occurring (I'm trying to avoid modifying Asterisk source
code
>>> if possible). Looking at the task processors, I see the queue to the
stasis
>>> app bottlenecks:
>>> subm:devService-test-00384560990  0
>>> 1041689. It does clear up relatively quickly. The CDR engine also bottle
>>> necks (extremely badly), but I don't use that. Nothing else comes close
to
>>> having a large queue.
>>>
>>> The stasis app itself is extremely streamlined and is very capable of
>>> handling a large number of messages at a time. The app runs with the
JVM so
>>> I am also researching into that as well as the netty library I am using
for
>>> the websocket connections.
>>>
>>> Any insight into Asterisk's side of the equation and how it scales on 40
>>> vCPUs would be greatly appreciated.
>>
>>
>> There are no options to disable those taskprocessor queue size warnings.
>> They are a
>> symptom of the system being severely stressed.  If the stress continues
it
>> is possible
>> that the system could consume all memory in those taskprocessor queues.
>>
>> Recent changes to the Asterisk v13 branch were made to help throttle back
>> incoming
>> SIP requests on PJSIP when the taskprocessors become backlogged like you
are
>> seeing.
>> These changes will be in the forthcoming v13.10.0 release.  If you want,
you
>> can test with
>> the current v13 branch to see how these changes affect your stress
testing.
>>
>> If you don't need CDR's then you really need to disable them as they
consume
>> a lot of
>> processing time and the CDR taskprocessor queue backlog can take minutes
to
>> clear.
>>
>
>To echo what Richard said, because Asterisk is now sharing state
>across the Stasis message bus, turning off subscribers to that bus
>will help performance. Some easy ones to disable, if you aren't using
>them, are CDRs, CEL, and AMI. Those all do a reasonable amount of
>processing, and you can get some noticeable improvement by disabling
>them.
>
>Once you get past that, you can start fiddling with some of the lower
>level options. To start, you can throttle things back further by
>disabling certain internal messages in stasis.conf. As stasis.conf
>notes, functionality within Asterisk can break (or just not happen) if
>some messages are removed. For example, disabling
>'ast_channel_snapshot_type' would break ... most things. You may
>however be able to streamline your application by looking at what ARI
>messages it cares about, what messages it doesn't, inspecting the
>code, and disabling those that you don't care about. Lots of testing
>should occur before doing this, of course.
>
>You may also be able to get some different performance characteristics
>by changing the threadpool options for the message bus in stasis.conf.
>This may make a difference, depending on the underlying machine.

Thank you for the suggestions.

I'm running Asterisk on 40 vCPUs with 120 GB of RAM. Changing the thread
pool options to many more threads is not increasing performance, and at a
certain point it decreases performance.

With further testing and having implemented your suggestions, I am
realizing the subm:devService-test-0038 task processor is a major
bottleneck. I have always read that Asterisk can handle as many calls as
the hardware allows, but I'm not seeing that.

I can go 

Re: [asterisk-dev] Asterisk Load Performance

2016-06-17 Thread Matthew Jordan
On Fri, Jun 17, 2016 at 1:37 PM, Richard Mudgett  wrote:
>
>
> On Fri, Jun 17, 2016 at 12:36 PM, Michael Petruzzello
>  wrote:
>>
>> Hello,
>>
>> I am currently working on determining bottlenecks in Asterisk and a Stasis
>> App. I'm currently trying to handle 83.3 calls/second. For the most part,
>> Asterisk and the Stasis APP handle that well, but there is a 60+ second
>> delay in response time.
>>
>> On the Asterisk side, I am seeing the following warnings. [Jun 17
>> 12:00:16] WARNING[23561]: taskprocessor.c:803 taskprocessor_push: The
>> 'subm:cdr_engine-0003' task processor queue reached 500 scheduled tasks.
>> [Jun 17 12:00:18] WARNING[25477][C-0068]: taskprocessor.c:803
>> taskprocessor_push: The 'subm:devService-test-0038' task processor queue
>> reached 500 scheduled tasks.
>> [Jun 17 12:00:21] WARNING[26298][C-00a3]: taskprocessor.c:803
>> taskprocessor_push: The 'subp:PJSIP/sippeer-0022' task processor queue
>> reached 500 scheduled tasks.
>> [Jun 17 12:00:23] WARNING[27339][C-010d]: taskprocessor.c:803
>> taskprocessor_push: The 'subm:ast_channel_topic_all-cached-0032' task
>> processor queue reached 500 scheduled tasks.
>> [Jun 17 12:01:32] WARNING[31697][C-03b2]: taskprocessor.c:803
>> taskprocessor_push: The 'subm:ast_channel_topic_all-0036' task processor
>> queue reached 500 scheduled tasks.
>> [Jun 17 12:05:55] WARNING[23280]: taskprocessor.c:803 taskprocessor_push:
>> The 'SIP' task processor queue reached 500 scheduled tasks.
>>
>> I have not seen a configuration setting on Asterisk to prevent these
>> warnings from occurring (I'm trying to avoid modifying Asterisk source code
>> if possible). Looking at the task processors, I see the queue to the stasis
>> app bottlenecks:
>> subm:devService-test-00384560990  0
>> 1041689. It does clear up relatively quickly. The CDR engine also bottle
>> necks (extremely badly), but I don't use that. Nothing else comes close to
>> having a large queue.
>>
>> The stasis app itself is extremely streamlined and is very capable of
>> handling a large number of messages at a time. The app runs with the JVM so
>> I am also researching into that as well as the netty library I am using for
>> the websocket connections.
>>
>> Any insight into Asterisk's side of the equation and how it scales on 40
>> vCPUs would be greatly appreciated.
>
>
> There are no options to disable those taskprocessor queue size warnings.
> They are a
> symptom of the system being severely stressed.  If the stress continues it
> is possible
> that the system could consume all memory in those taskprocessor queues.
>
> Recent changes to the Asterisk v13 branch were made to help throttle back
> incoming
> SIP requests on PJSIP when the taskprocessors become backlogged like you are
> seeing.
> These changes will be in the forthcoming v13.10.0 release.  If you want, you
> can test with
> the current v13 branch to see how these changes affect your stress testing.
>
> If you don't need CDR's then you really need to disable them as they consume
> a lot of
> processing time and the CDR taskprocessor queue backlog can take minutes to
> clear.
>

To echo what Richard said, because Asterisk is now sharing state
across the Stasis message bus, turning off subscribers to that bus
will help performance. Some easy ones to disable, if you aren't using
them, are CDRs, CEL, and AMI. Those all do a reasonable amount of
processing, and you can get some noticeable improvement by disabling
them.

Once you get past that, you can start fiddling with some of the lower
level options. To start, you can throttle things back further by
disabling certain internal messages in stasis.conf. As stasis.conf
notes, functionality within Asterisk can break (or just not happen) if
some messages are removed. For example, disabling
'ast_channel_snapshot_type' would break ... most things. You may
however be able to streamline your application by looking at what ARI
messages it cares about, what messages it doesn't, inspecting the
code, and disabling those that you don't care about. Lots of testing
should occur before doing this, of course.

You may also be able to get some different performance characteristics
by changing the threadpool options for the message bus in stasis.conf.
This may make a difference, depending on the underlying machine.

-- 
Matthew Jordan
Digium, Inc. | CTO
445 Jan Davis Drive NW - Huntsville, AL 35806 - USA
Check us out at: http://digium.com & http://asterisk.org

-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev


Re: [asterisk-dev] Asterisk Load Performance

2016-06-17 Thread Richard Mudgett
On Fri, Jun 17, 2016 at 12:36 PM, Michael Petruzzello <
michael.petruzze...@civi.com> wrote:

> Hello,
>
> I am currently working on determining bottlenecks in Asterisk and a Stasis
> App. I'm currently trying to handle 83.3 calls/second. For the most part,
> Asterisk and the Stasis APP handle that well, but there is a 60+ second
> delay in response time.
>
> On the Asterisk side, I am seeing the following warnings. [Jun 17
> 12:00:16] WARNING[23561]: taskprocessor.c:803 taskprocessor_push: The
> 'subm:cdr_engine-0003' task processor queue reached 500 scheduled tasks.
> [Jun 17 12:00:18] WARNING[25477][C-0068]: taskprocessor.c:803
> taskprocessor_push: The 'subm:devService-test-0038' task processor
> queue reached 500 scheduled tasks.
> [Jun 17 12:00:21] WARNING[26298][C-00a3]: taskprocessor.c:803
> taskprocessor_push: The 'subp:PJSIP/sippeer-0022' task processor queue
> reached 500 scheduled tasks.
> [Jun 17 12:00:23] WARNING[27339][C-010d]: taskprocessor.c:803
> taskprocessor_push: The 'subm:ast_channel_topic_all-cached-0032' task
> processor queue reached 500 scheduled tasks.
> [Jun 17 12:01:32] WARNING[31697][C-03b2]: taskprocessor.c:803
> taskprocessor_push: The 'subm:ast_channel_topic_all-0036' task
> processor queue reached 500 scheduled tasks.
> [Jun 17 12:05:55] WARNING[23280]: taskprocessor.c:803 taskprocessor_push:
> The 'SIP' task processor queue reached 500 scheduled tasks.
>
> I have not seen a configuration setting on Asterisk to prevent these
> warnings from occurring (I'm trying to avoid modifying Asterisk source code
> if possible). Looking at the task processors, I see the queue to the stasis
> app bottlenecks:
> subm:devService-test-00384560990  0
> 1041689. It does clear up relatively quickly. The CDR engine also bottle
> necks (extremely badly), but I don't use that. Nothing else comes close to
> having a large queue.
>
> The stasis app itself is extremely streamlined and is very capable of
> handling a large number of messages at a time. The app runs with the JVM so
> I am also researching into that as well as the netty library I am using for
> the websocket connections.
>
> Any insight into Asterisk's side of the equation and how it scales on 40
> vCPUs would be greatly appreciated.
>

There are no options to disable those taskprocessor queue size warnings.
They are a
symptom of the system being severely stressed.  If the stress continues it
is possible
that the system could consume all memory in those taskprocessor queues.

Recent changes to the Asterisk v13 branch were made to help throttle back
incoming
SIP requests on PJSIP when the taskprocessors become backlogged like you
are seeing.
These changes will be in the forthcoming v13.10.0 release.  If you want,
you can test with
the current v13 branch to see how these changes affect your stress testing.

If you don't need CDR's then you really need to disable them as they
consume a lot of
processing time and the CDR taskprocessor queue backlog can take minutes to
clear.

Richard
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

[asterisk-dev] Asterisk Load Performance

2016-06-17 Thread Michael Petruzzello
Hello,

I am currently working on determining bottlenecks in Asterisk and a Stasis
App. I'm currently trying to handle 83.3 calls/second. For the most part,
Asterisk and the Stasis APP handle that well, but there is a 60+ second
delay in response time.

On the Asterisk side, I am seeing the following warnings. [Jun 17 12:00:16]
WARNING[23561]: taskprocessor.c:803 taskprocessor_push: The
'subm:cdr_engine-0003' task processor queue reached 500 scheduled tasks.
[Jun 17 12:00:18] WARNING[25477][C-0068]: taskprocessor.c:803
taskprocessor_push: The 'subm:devService-test-0038' task processor
queue reached 500 scheduled tasks.
[Jun 17 12:00:21] WARNING[26298][C-00a3]: taskprocessor.c:803
taskprocessor_push: The 'subp:PJSIP/sippeer-0022' task processor queue
reached 500 scheduled tasks.
[Jun 17 12:00:23] WARNING[27339][C-010d]: taskprocessor.c:803
taskprocessor_push: The 'subm:ast_channel_topic_all-cached-0032' task
processor queue reached 500 scheduled tasks.
[Jun 17 12:01:32] WARNING[31697][C-03b2]: taskprocessor.c:803
taskprocessor_push: The 'subm:ast_channel_topic_all-0036' task
processor queue reached 500 scheduled tasks.
[Jun 17 12:05:55] WARNING[23280]: taskprocessor.c:803 taskprocessor_push:
The 'SIP' task processor queue reached 500 scheduled tasks.

I have not seen a configuration setting on Asterisk to prevent these
warnings from occurring (I'm trying to avoid modifying Asterisk source code
if possible). Looking at the task processors, I see the queue to the stasis
app bottlenecks:
subm:devService-test-00384560990  0
1041689. It does clear up relatively quickly. The CDR engine also bottle
necks (extremely badly), but I don't use that. Nothing else comes close to
having a large queue.

The stasis app itself is extremely streamlined and is very capable of
handling a large number of messages at a time. The app runs with the JVM so
I am also researching into that as well as the netty library I am using for
the websocket connections.

Any insight into Asterisk's side of the equation and how it scales on 40
vCPUs would be greatly appreciated.

Thanks,


*Michael J. Petruzzello*
Software Engineer
P.O. Box 4689
Greenwich, CT 06831
203-618-1811 ext.289 (office)
www.civi.com
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev