-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviewboard.asterisk.org/r/3568/
-----------------------------------------------------------

Review request for Asterisk Developers.


Bugs: ASTERISK-23811
    https://issues.asterisk.org/jira/browse/ASTERISK-23811


Repository: Asterisk


Description
-------

During some performance testing of Asterisk with AGI, ARI, and lots of Local 
channels, we noticed that there's quite a hit in performance during channel 
creation and releasing to the dialplan (ARI continue). After investigating the 
performance spike that occurs during channel creation, we discovered that we 
create a lot of channel snapshots that are technically unnecessary. This 
includes creating snapshots during:

* AGI execution
* Returning objects for ARI commands
* During some Local channel operations
* During some dialling operations
* During variable setting
* During some bridging operations

And more.

This patch does the following:
 - It removes a number of fields from channel snapshots. These fields were 
rarely used, were expensive to have on the snapshot, and hurt performance. This 
included formats, translation paths, Log Call ID, callgroup, pickup group, and 
all channel variables. As a result, AMI Status, "core show channel", "core show 
channelvar", and "pjsip show channel" were modified to either hit the live 
channel or not show certain pieces of data. While this is unfortunate, the 
performance gain from this patch is worth the loss in behaviour.
 - It adds a mechanism to publish a cached snapshot + blob. A large number of 
publications were changed to use this, including:
   - During Dial begin
   - During Variable assignment (if no AMI variables are emitted - if AMI 
variables are set, we have to make snapshots when a variable is changed)
   - During channel pickup
   - When a channel is put on hold/unhold
   - When a DTMF digit is begun/ended
   - When creating a bridge snapshot
   - When an AOC event is raised
   - During Local channel optimization/Local bridging
   - When endpoint snapshots are generated
   - All AGI events
   - All ARI responses that return a channel
   - Events in the AgentPool, MeetMe, and some in Queue
 - Additionally, some extraneous channel snapshots were being made that were 
unnecessary. These were removed.
 - The result of ast_hashtab_hash_string is now cached in stasis_cache. This 
reduces a large number of calls to ast_hashtab_hash_string, which reduced the 
amount of time spent in this function in gprof by around 50%.


Diffs
-----

  /branches/12/res/res_pjsip/pjsip_configuration.c 414971 
  /branches/12/res/res_agi.c 414971 
  /branches/12/res/ari/resource_channels.c 414971 
  /branches/12/main/stasis_channels.c 414971 
  /branches/12/main/stasis_cache.c 414971 
  /branches/12/main/stasis_bridges.c 414971 
  /branches/12/main/pickup.c 414971 
  /branches/12/main/pbx.c 414971 
  /branches/12/main/manager.c 414971 
  /branches/12/main/endpoints.c 414971 
  /branches/12/main/dial.c 414971 
  /branches/12/main/core_local.c 414971 
  /branches/12/main/cli.c 414971 
  /branches/12/main/channel_internal_api.c 414971 
  /branches/12/main/channel.c 414971 
  /branches/12/main/cel.c 414971 
  /branches/12/main/bridge_channel.c 414971 
  /branches/12/main/aoc.c 414971 
  /branches/12/include/asterisk/stasis_channels.h 414971 
  /branches/12/include/asterisk/channel.h 414971 
  /branches/12/apps/app_queue.c 414971 
  /branches/12/apps/app_meetme.c 414971 
  /branches/12/apps/app_agent_pool.c 414971 

Diff: https://reviewboard.asterisk.org/r/3568/diff/


Testing
-------

All sorts of different testing was done. For automated tests that cover these 
changes, see https://bamboo.asterisk.org/bamboo/browse/ATB-TEAMSP-6. Note that 
the test failures on that branch are currently also failing on the 12 
branch/trunk. A few new tests were written and two Parking test's CDR results 
were tweaked (but not in a fashion that violates the spec or the intent of the 
records) - for those changes, see: https://reviewboard.asterisk.org/r/3578/

The first round of testing was profiling Asterisk with gprof to see if I could 
reduce the amount of time spent creating channel snapshots. This was tested 
using an ARI script that created a Local channel, subscribed to it, then 
released that channel to the dialplan. The released Local channel would Dial 10 
or so Local channels, which went back into the Stasis application. This 
increased the number of Local channels rapidly on the system - letting this run 
for awhile (until hilarity ensued) was a good mechanism to test the affects of 
removing channel snapshots.

Once snapshot creation was no longer easily moved down the gprof list, I went 
ahead and did a bit of investigation on the effects of this patch on CPU using 
just the dialplan. The fact that snapshots were getting made a lot during 
AGI/ARI was clearly causing problems, but since dialplan execution still has to 
create a snapshot, I was curious what effect this patch would have on more 
"traditional" scenarios. For each of these rounds of tests, I let Asterisk run 
with SIPp for about 3000 channels. Watching top, I recorded the min/max that 
showed up during that period of time. Not exactly scientific - but it gives a 
general impression of the effect of the changes.

** Note that wherever I reference CPU, it should be noted that this was run on 
my laptop (which is rather old) while running Firefox, Pidgin, and a whole host 
of other stuff. I wasn't attempting to run a true performance analysis; 
instead, I was just trying to get a relative measurement of things with and 
without these changes. **

** ROUND 1 **

The first used an inbound PJSIP channel that had a variety of variables set on 
it, and then got hung up:

Dialplan:

exten => test_sip,1,NoOp()
 same => n,Set(var1=foo)
 same => n,Set(var2=bar)
 same => n,Set(var3=yackity)
 same => n,Set(var4=schmackity)
 same => n,Answer()
 same => n,Echo()

SIPp:

sipp 127.0.0.1:5060 -sn uac -i 127.0.0.1 -p 5061 -s test_sip -d 5000

With this patch, CPU was around 9-10%; without this patch, around 10-11%. So 
not much difference. This isn't too surprising: there's no dialling, there's no 
interfaces being used, and very few changes being made to the channel. Dialplan 
execution is the same with/without this patch - the largest 'savings' here is 
the usage of cached snapshots during variable assignment.

** ROUND 2 **

This round threw in a Local channel and Dial operation. There's a slightly 
larger improvement here with/without the patch:

Dialplan 2:

exten => test_dial_sip,1,NoOp()
 same => n,Set(var1=foo)
 same => n,Set(var2=bar)
 same => n,Set(var3=yackity)
 same => n,Set(var4=schmackity)
 same => n,Dial(Local/term@default)
 same => n,Echo()

exten => term,1,NoOp()
 same => n,Set(var1=foo)
 same => n,Set(var2=bar)
 same => n,Set(var3=yackity)
 same => n,Set(var4=schmackity)
 same => n,Answer()
 same => n,Echo()

With this patch, CPU was around 14-18%; without this patch, around 16-21%. 
Saving a few snapshots during Local channel "bridging", dialling, and setting 
variables on them shaves 2-3% off the CPU.

** ROUND 3 **

Same as round 2, only with a parallel Dial of multiple Local channels. 
Improvement was actually about the same as Round 2; with the patch, around 
19-23% CPU, without the patch, around 22-26% CPU.

Dialplan 3:

exten => test_dial_sip,1,NoOp()
 same => n,Set(var1=foo)
 same => n,Set(var2=bar)
 same => n,Set(var3=yackity)
 same => n,Set(var4=schmackity)
 same => 
n,Dial(Local/term@default&Local/term@default&Local/term@default&Local/term@default&Local/term@default)
 same => n,Echo()

exten => term,1,NoOp()
 same => n,Set(var1=foo)
 same => n,Set(var2=bar)
 same => n,Set(var3=yackity)
 same => n,Set(var4=schmackity)
 same => n,Answer()
 same => n,Echo()


There's a few things to draw from all of this:
(1) Snapshots aren't free. If you can use a cached version, use it. If you 
can't, stage the snapshot until all attributes have been changed (which we do 
quite frequently; this patch did not need to modify any of that behaviour).
(2) Snapshots should be lean. Don't put something on there that doesn't need to 
be there. While there is benefit in not hitting live data to get information 
(reduces the interference of inspection commands in APIs on real time 
operations), it isn't worth hurting the performance of the overall system.
(3) We should pay attention to the cost of things in astobj2. Shifting the 
burden from creating snapshots to cached ones means the Stasis cache - which 
uses ao2 heavily - has to have good performance.
(4) It would be worthwhile to see if we can't optimize creation of snapshots 
further. Often, when we create a snapshot, we only modify one or two properties 
on it. Usage of memory pools, fixed sized fields with memcpy, and other 
techniques _may_ improve performance further.


Thanks,

Matt Jordan

-- 
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Reply via email to