Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241

2016-03-02 Thread George Joseph
On Wed, Mar 2, 2016 at 12:51 PM, Michael Ulitskiy 
wrote:

> Hello,
>
>
>
> Since you started to look at it again, let me repeat myself.
>
> The problem is described in detail here:
> http://lists.digium.com/pipermail/asterisk-dev/2015-October/075128.html
>
> It has to do with the fact that at initial load pjsip realtime issues
> separate db query for each endpoint/aor/etc in the system.
>
> In my case of ~10K endpoints it took asterisk ~1.5minutes to load.
>
> Further in that discussion I suggested that having the following API call
> to populate sorcery cache would go a long way to
>
> reducing the scale of this problem:
>
>
>
> ast_sorcery_retrieve_by_fields(sip_sorcery,
> "endpoint",AST_RETRIEVE_FLAG_MULTIPLE | AST_RETRIEVE_FLAG_ALL, NULL);
>
>
>
> I haven't looked at pjsip since the time of that discussion as that's
> clearly a show-stopper for me, but I doubt anything changed.
>
> Also I haven't received any feedback if that suggestion is viable, so I'd
> love to hear your (and/or other developers) opinion on it.
>
> Any other idea on how to deal with it is more than welcome as well.
>

​So part of this I just fixed in review 2312.  The cache is now populated
at startup.  Together with full_backend_cache there should be *some*
relief.

One of the base issues however is that we can't use the power of SQL to
narrow down the result set before shipping them all back to Asterisk
because not all the backends support SQL.  If they did, we could, for
instance, 'select * from ps_endpoints a, ps_aors b where a.id = b.id and
b.qualify_frequency > 0' (well almost) to get only the endpoints that need
to be scheduled.  Or 'select * from ps_endpoints where id in
('user','user@domain')'.  We just can't do that right now.

All is not lost however.  We've been noodling with some ideas on how to
make this work more efficiently but it's not something that's going to
happen this week, :)


>
>
> Thanks,
>
> Michael
>
>
>
>
>
-- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

[asterisk-dev] dahdi-tools 2.11.1 : fix for relocation error on build

2016-03-02 Thread sean darcy

Building dahdi-tools 2.11.1:

gcc -DHAVE_CONFIG_H -I. -g -Wall -O2 
-I/home/asterisk/rpmbuild/BUILD/dahdi-tools-2.11.1/include  -c -o 
fxstest.o fxstest.c
/bin/sh ./libtool  --tag=CC   --mode=link gcc  -g -Wall -O2 
-I/home/asterisk/rpmbuild/BUILD/dahdi-tools-2.11.1/include 
-Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -o fxstest 
fxstest.o libtonezone.la -lpthread -lm
libtool: link: gcc -g -Wall -O2 
-I/home/asterisk/rpmbuild/BUILD/dahdi-tools-2.11.1/include -Wl,-z 
-Wl,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -o .libs/fxstest 
fxstest.o  ./.libs/libtonezone.so -lpthread -lm -Wl,-rpath -Wl,/usr/lib64
/usr/bin/ld: fxstest.o: relocation R_X86_64_32 against `.rodata.str1.1' 
can not be used when making a shared object; recompile with -fPIC


Even though:

+ CFLAGS='-O2 -march=native -mtune=native -ftree-vectorize -ffast-math 
-fPIC'

+ export CFLAGS
+ CXXFLAGS='-O2 -march=native -mtune=native -ftree-vectorize -ffast-math 
-fPIC'

+ export CXXFLAGS

because CFLAGS doesn't override the Makefile CFLAGS. Notice no -fPIC above.

The fix: add -fPIC to CFLAGS in Makefile.am, ./bootstrap.sh, build.

sean



--
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-dev


Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241

2016-03-02 Thread Michael Ulitskiy
Hello,

Since you started to look at it again, let me repeat myself.
The problem is described in detail here: 
http://lists.digium.com/pipermail/asterisk-dev/2015-October/075128.html
It has to do with the fact that at initial load pjsip realtime issues separate 
db query for each endpoint/aor/etc in the system.
In my case of ~10K endpoints it took asterisk ~1.5minutes to load.
Further in that discussion I suggested that having the following API call to 
populate sorcery cache would go a long way to 
reducing the scale of this problem:

ast_sorcery_retrieve_by_fields(sip_sorcery, 
"endpoint",AST_RETRIEVE_FLAG_MULTIPLE | AST_RETRIEVE_FLAG_ALL, NULL);

I haven't looked at pjsip since the time of that discussion as that's clearly a 
show-stopper for me, but I doubt anything changed.
Also I haven't received any feedback if that suggestion is viable, so I'd love 
to hear your (and/or other developers) opinion on it.
Any other idea on how to deal with it is more than welcome as well.

Thanks,
Michael

On Wednesday, March 02, 2016 06:04:15 PM Ross Beer wrote:
> Hi George,
>  
> I have commented out those lines and it hasn't improved the load times, its 
> still taking 15 mins. It has improved it a little.
>  
> Regards,
>  
> Ross
>  
> From: george.jos...@fairview5.com
> Date: Wed, 2 Mar 2016 08:19:01 -0700
> To: asterisk-dev@lists.digium.com
> Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
> 
> 
> 
> On Wed, Mar 2, 2016 at 2:56 AM, Ross Beer  wrote:
> 
> 
> 
> Hi George,
>  
> I have re-built the 'c1bf014ea08cf66835a6f000e2bd6c7da588da6b' commit and 
> PJSIP and Asterisk hasn't crashed after reload. However it did take 25 mins 
> to load.
>  
> As requested I have opened a ticket for the realtime issue:
>  
> https://issues.asterisk.org/jira/browse/ASTERISK-25826
> ​Got it, thanks.​ 
>  
> Basically, I think this could be resolved by a configuration option that 
> stops sourcery/pjsip loading all peers at start-up as this is not needed for 
> the current setup. This has been discussed before on the mailing list however 
> it doesn't look like it progresses any further.
> 
> ​If you're up for trying something, ​you can comment out the 
> qualify_and_schedule_all function ​in ​line​s​ 1135​-1147​ of 
> res/res_pjsip/pjsip_options.c, then comment out its 2 references on lines 
> 1245 and 1281.  If that drops your startup times, then we know we're on the 
> right track.
>  
>  
> I would like to thank you for all of your help tying to identify the issue 
> and hope that we can resolve it soon.
> 
> ​No worries!​ 
>  
> Kind regards,
>  
> Ross
>  
> From: george.jos...@fairview5.com
> Date: Tue, 1 Mar 2016 16:27:06 -0700
> To: asterisk-dev@lists.digium.com
> Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
> 
> 
> 
> On Tue, Mar 1, 2016 at 3:07 PM, Ross Beer  wrote:
> 
> 
> 
> ok,
>  
> That took 15 mins to load and then crashed. This will be due to the 
> pjsip_dlg_create_uas_and_inc_lock commit.
> ​It should not have crashed.  That commit had the fix for it.  If it did 
> crash with that commit, open a Jira issue and ​attach a full backtrace. 
>  
> However 15 mins to start is a long time and would cause issues in a 
> production environment.
> ​Would you open a Jira issue on the realtime problem (if one isn't already 
> open).I'm starting to look at alternatives.
> 
> 
>  
> Thank you for your help here,
>  
> Ross
> 
>  
> From: george.jos...@fairview5.com
> Date: Tue, 1 Mar 2016 14:02:38 -0700
> To: asterisk-dev@lists.digium.com
> Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
> 
> 
> 
> On Tue, Mar 1, 2016 at 1:04 PM, Ross Beer  wrote:
> 
> 
> 
> Hi George,
>  
> Using a development test box for testing!!
>  
> Asterisk 13.7.2 with no cache takes 4:12 to load, that with PJSIP Commit 5240
> 
> ​Ok, try this combination..."git checkout 
> c1bf014ea08cf66835a6f000e2bd6c7da588da6b"pjproject from trunk.with caching.
> The commit I referenced is the one that handles the 
> pjsip_dlg_create_uas_and_inc_lock​
> 
> 
>   
> Qualify time on the aor is set to zero, I guess a query could be made to 
> check for a value greater than zero instead of loading all endpoints.
>  
> Ross
>  
> From: george.jos...@fairview5.com
> Date: Tue, 1 Mar 2016 12:45:28 -0700
> To: asterisk-dev@lists.digium.com
> Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
> 
> 
> 
> On Tue, Mar 1, 2016 at 12:21 PM, Ross Beer  wrote:
> 
> 
> 
> Hi George,
>  
> No endpoints are qualified, there are 20,000 endpoints with only 75 static 
> contacts defined in the aors. The database is a MySQL cluster.
>  
> With the current Asterisk 13 branch with cache disabled and the latest PJSIP 
> it takes 5 mins and then before finishing it crashes.
>  
> With Asterisk 13.7.2 with cache it takes around 1 1/2 min to load, however 
> due to the bug with PJSIP Commit 5241 asterisk crashes 

Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241

2016-03-02 Thread Ross Beer
Hi George,
 
I've just rolled back to 13.7.2 and the following modules are in the latest git 
repository and not in 13.7.2:
 
res_pjproject.so
res_odbc_transaction.so
res_pjsip_history.so
 
Not sure if any of these would make a difference to the load time?
 
Regards,
 
Ross
 
From: ross.b...@outlook.com
To: asterisk-dev@lists.digium.com
Date: Wed, 2 Mar 2016 18:04:15 +
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241




Hi George,
 
I have commented out those lines and it hasn't improved the load times, its 
still taking 15 mins. It has improved it a little.
 
Regards,
 
Ross
 
From: george.jos...@fairview5.com
Date: Wed, 2 Mar 2016 08:19:01 -0700
To: asterisk-dev@lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Wed, Mar 2, 2016 at 2:56 AM, Ross Beer  wrote:



Hi George,
 
I have re-built the 'c1bf014ea08cf66835a6f000e2bd6c7da588da6b' commit and PJSIP 
and Asterisk hasn't crashed after reload. However it did take 25 mins to load.
 
As requested I have opened a ticket for the realtime issue:
 
https://issues.asterisk.org/jira/browse/ASTERISK-25826
​Got it, thanks.​ 
 
Basically, I think this could be resolved by a configuration option that stops 
sourcery/pjsip loading all peers at start-up as this is not needed for the 
current setup. This has been discussed before on the mailing list however it 
doesn't look like it progresses any further.

​If you're up for trying something, ​you can comment out the 
qualify_and_schedule_all function ​in ​line​s​ 1135​-1147​ of 
res/res_pjsip/pjsip_options.c, then comment out its 2 references on lines 1245 
and 1281.  If that drops your startup times, then we know we're on the right 
track.
 
 
I would like to thank you for all of your help tying to identify the issue and 
hope that we can resolve it soon.

​No worries!​ 
 
Kind regards,
 
Ross
 
From: george.jos...@fairview5.com
Date: Tue, 1 Mar 2016 16:27:06 -0700
To: asterisk-dev@lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 3:07 PM, Ross Beer  wrote:



ok,
 
That took 15 mins to load and then crashed. This will be due to the 
pjsip_dlg_create_uas_and_inc_lock commit.
​It should not have crashed.  That commit had the fix for it.  If it did crash 
with that commit, open a Jira issue and ​attach a full backtrace. 
 
However 15 mins to start is a long time and would cause issues in a production 
environment.
​Would you open a Jira issue on the realtime problem (if one isn't already 
open).I'm starting to look at alternatives.


 
Thank you for your help here,
 
Ross

 
From: george.jos...@fairview5.com
Date: Tue, 1 Mar 2016 14:02:38 -0700
To: asterisk-dev@lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 1:04 PM, Ross Beer  wrote:



Hi George,
 
Using a development test box for testing!!
 
Asterisk 13.7.2 with no cache takes 4:12 to load, that with PJSIP Commit 5240

​Ok, try this combination..."git checkout 
c1bf014ea08cf66835a6f000e2bd6c7da588da6b"pjproject from trunk.with caching.
The commit I referenced is the one that handles the 
pjsip_dlg_create_uas_and_inc_lock​


  
Qualify time on the aor is set to zero, I guess a query could be made to check 
for a value greater than zero instead of loading all endpoints.
 
Ross
 
From: george.jos...@fairview5.com
Date: Tue, 1 Mar 2016 12:45:28 -0700
To: asterisk-dev@lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 12:21 PM, Ross Beer  wrote:



Hi George,
 
No endpoints are qualified, there are 20,000 endpoints with only 75 static 
contacts defined in the aors. The database is a MySQL cluster.
 
With the current Asterisk 13 branch with cache disabled and the latest PJSIP it 
takes 5 mins and then before finishing it crashes.
 
With Asterisk 13.7.2 with cache it takes around 1 1/2 min to load, however due 
to the bug with PJSIP Commit 5241 asterisk crashes when using TLS devices.

​Try 13.7.2 without the cache.  I'm trying to understand where the time is 
being spent.​  I know it will crash because of that bug.  You're not doing this 
on a production system are you??  
The main issue here is that the endpoints are loaded as soon as PJSIP loads, 
ideally endpoints would only be loaded once a device registers or attempts to 
make a call. Much in the same way as Asterisk 1.8 chan_sip manages realtime.
 
There is no need to load the endpoints as they are not qualified.

​How do you know they're not qualified if you don't load them? :)
Time to load up a database with 20,000 endpoints I guess.​  
Ross
 
From: george.jos...@fairview5.com
Date: Tue, 1 Mar 2016 11:58:15 -0700
To: asterisk-dev@lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 11:38 AM, Michael 

Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241

2016-03-02 Thread Ross Beer
Hi George,
 
I have commented out those lines and it hasn't improved the load times, its 
still taking 15 mins. It has improved it a little.
 
Regards,
 
Ross
 
From: george.jos...@fairview5.com
Date: Wed, 2 Mar 2016 08:19:01 -0700
To: asterisk-dev@lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Wed, Mar 2, 2016 at 2:56 AM, Ross Beer  wrote:



Hi George,
 
I have re-built the 'c1bf014ea08cf66835a6f000e2bd6c7da588da6b' commit and PJSIP 
and Asterisk hasn't crashed after reload. However it did take 25 mins to load.
 
As requested I have opened a ticket for the realtime issue:
 
https://issues.asterisk.org/jira/browse/ASTERISK-25826
​Got it, thanks.​ 
 
Basically, I think this could be resolved by a configuration option that stops 
sourcery/pjsip loading all peers at start-up as this is not needed for the 
current setup. This has been discussed before on the mailing list however it 
doesn't look like it progresses any further.

​If you're up for trying something, ​you can comment out the 
qualify_and_schedule_all function ​in ​line​s​ 1135​-1147​ of 
res/res_pjsip/pjsip_options.c, then comment out its 2 references on lines 1245 
and 1281.  If that drops your startup times, then we know we're on the right 
track.
 
 
I would like to thank you for all of your help tying to identify the issue and 
hope that we can resolve it soon.

​No worries!​ 
 
Kind regards,
 
Ross
 
From: george.jos...@fairview5.com
Date: Tue, 1 Mar 2016 16:27:06 -0700
To: asterisk-dev@lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 3:07 PM, Ross Beer  wrote:



ok,
 
That took 15 mins to load and then crashed. This will be due to the 
pjsip_dlg_create_uas_and_inc_lock commit.
​It should not have crashed.  That commit had the fix for it.  If it did crash 
with that commit, open a Jira issue and ​attach a full backtrace. 
 
However 15 mins to start is a long time and would cause issues in a production 
environment.
​Would you open a Jira issue on the realtime problem (if one isn't already 
open).I'm starting to look at alternatives.


 
Thank you for your help here,
 
Ross

 
From: george.jos...@fairview5.com
Date: Tue, 1 Mar 2016 14:02:38 -0700
To: asterisk-dev@lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 1:04 PM, Ross Beer  wrote:



Hi George,
 
Using a development test box for testing!!
 
Asterisk 13.7.2 with no cache takes 4:12 to load, that with PJSIP Commit 5240

​Ok, try this combination..."git checkout 
c1bf014ea08cf66835a6f000e2bd6c7da588da6b"pjproject from trunk.with caching.
The commit I referenced is the one that handles the 
pjsip_dlg_create_uas_and_inc_lock​


  
Qualify time on the aor is set to zero, I guess a query could be made to check 
for a value greater than zero instead of loading all endpoints.
 
Ross
 
From: george.jos...@fairview5.com
Date: Tue, 1 Mar 2016 12:45:28 -0700
To: asterisk-dev@lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 12:21 PM, Ross Beer  wrote:



Hi George,
 
No endpoints are qualified, there are 20,000 endpoints with only 75 static 
contacts defined in the aors. The database is a MySQL cluster.
 
With the current Asterisk 13 branch with cache disabled and the latest PJSIP it 
takes 5 mins and then before finishing it crashes.
 
With Asterisk 13.7.2 with cache it takes around 1 1/2 min to load, however due 
to the bug with PJSIP Commit 5241 asterisk crashes when using TLS devices.

​Try 13.7.2 without the cache.  I'm trying to understand where the time is 
being spent.​  I know it will crash because of that bug.  You're not doing this 
on a production system are you??  
The main issue here is that the endpoints are loaded as soon as PJSIP loads, 
ideally endpoints would only be loaded once a device registers or attempts to 
make a call. Much in the same way as Asterisk 1.8 chan_sip manages realtime.
 
There is no need to load the endpoints as they are not qualified.

​How do you know they're not qualified if you don't load them? :)
Time to load up a database with 20,000 endpoints I guess.​  
Ross
 
From: george.jos...@fairview5.com
Date: Tue, 1 Mar 2016 11:58:15 -0700
To: asterisk-dev@lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 11:38 AM, Michael Ulitskiy  wrote:


Hello,
 
Please see this discussion 
http://lists.digium.com/pipermail/asterisk-dev/2015-October/075122.html
I guess you're talking about the same problem.
​It's possible.​
 

 
Michael
 
On Tuesday, March 01, 2016 06:26:27 PM Ross Beer wrote:
> Hi George,
>  
> We need to store contacts in realtime for our system. However not all 
> endpoints are registered only about 200, yet asterisk loops through every 
> 

Re: [asterisk-dev] app_swift crash asterisk 11.20.0-rc1

2016-03-02 Thread Jeremy Kister

On 3/2/16 6:28 AM, Joshua Colp wrote:

The frame in app_swift should be memset to zeroes to ensure it is
completely clean.



damn, you hit the nail right on the head.  app_swift fixed up.

thank you for your help

--

Jeremy Kister
http://jeremy.kister.net/

--
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-dev


[asterisk-dev] Asterisk Crash

2016-03-02 Thread Ross Beer
 Hi,
 
PJSIP just caused a seg fault at the current location:
 
[?1034h(gdb) bt
#0  __pthread_mutex_unlock_usercnt (mutex=0x0) at pthread_mutex_unlock.c:289
#1  __pthread_mutex_unlock (mutex=0x0) at pthread_mutex_unlock.c:290
#2  0x2b686ac0f203 in pj_mutex_unlock () from /usr/lib64/libpj.so.2
#3  0x2b6868c641f6 in pjsip_dlg_dec_lock () from /usr/lib64/libpjsip.so.2
#4  0x2b68708393f8 in distributor (rdata=0x2b6904021178) at 
res_pjsip/pjsip_distributor.c:301
#5  0x2b6868c47e16 in pjsip_endpt_process_rx_data () from 
/usr/lib64/libpjsip.so.2
#6  0x2b6868c4809a in endpt_on_rx_msg () from /usr/lib64/libpjsip.so.2
#7  0x2b6868c50368 in pjsip_tpmgr_receive_packet () from 
/usr/lib64/libpjsip.so.2
#8  0x2b6868c51e3b in udp_on_read_complete () from /usr/lib64/libpjsip.so.2
#9  0x2b686ac0be96 in ioqueue_dispatch_read_event () from 
/usr/lib64/libpj.so.2
#10 0x2b686ac0db24 in pj_ioqueue_poll () from /usr/lib64/libpj.so.2
#11 0x2b6868c47b21 in pjsip_endpt_handle_events2 () from 
/usr/lib64/libpjsip.so.2
#12 0x2b6868c47bcf in pjsip_endpt_handle_events () from 
/usr/lib64/libpjsip.so.2
#13 0x2b6870823a9c in monitor_thread_exec (endpt=0x0) at res_pjsip.c:3555
#14 0x2b686ac0e88f in thread_main () from /usr/lib64/libpj.so.2
#15 0x2b66fa7e2a51 in start_thread (arg=0x2b68416a5700) at 
pthread_create.c:301
#16 0x2b66fb67493d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(gdb) bt full
#0  __pthread_mutex_unlock_usercnt (mutex=0x0) at pthread_mutex_unlock.c:289
type = 
#1  __pthread_mutex_unlock (mutex=0x0) at pthread_mutex_unlock.c:290
No locals.
#2  0x2b686ac0f203 in pj_mutex_unlock () from /usr/lib64/libpj.so.2
No symbol table info available.
#3  0x2b6868c641f6 in pjsip_dlg_dec_lock () from /usr/lib64/libpjsip.so.2
No symbol table info available.
#4  0x2b68708393f8 in distributor (rdata=0x2b6904021178) at 
res_pjsip/pjsip_distributor.c:301
dlg = 0x2b68b00f58a8
dist = 0x2b6b893723c8
serializer = 0x2b6b89673578
req_serializer = 0x2b6b89673578
clone = 0x2b69040517e8
#5  0x2b6868c47e16 in pjsip_endpt_process_rx_data () from 
/usr/lib64/libpjsip.so.2
No symbol table info available.
#6  0x2b6868c4809a in endpt_on_rx_msg () from /usr/lib64/libpjsip.so.2
No symbol table info available.
#7  0x2b6868c50368 in pjsip_tpmgr_receive_packet () from 
/usr/lib64/libpjsip.so.2
No symbol table info available.
#8  0x2b6868c51e3b in udp_on_read_complete () from /usr/lib64/libpjsip.so.2
No symbol table info available.
#9  0x2b686ac0be96 in ioqueue_dispatch_read_event () from 
/usr/lib64/libpj.so.2
No symbol table info available.
#10 0x2b686ac0db24 in pj_ioqueue_poll () from /usr/lib64/libpj.so.2
No symbol table info available.
#11 0x2b6868c47b21 in pjsip_endpt_handle_events2 () from 
/usr/lib64/libpjsip.so.2
No symbol table info available.
#12 0x2b6868c47bcf in pjsip_endpt_handle_events () from 
/usr/lib64/libpjsip.so.2
No symbol table info available.
#13 0x2b6870823a9c in monitor_thread_exec (endpt=0x0) at res_pjsip.c:3555
delay = {sec = 0, msec = 10}
#14 0x2b686ac0e88f in thread_main () from /usr/lib64/libpj.so.2
No symbol table info available.
#15 0x2b66fa7e2a51 in start_thread (arg=0x2b68416a5700) at 
pthread_create.c:301
__res = 
pd = 0x2b68416a5700
now = 
unwind_buf = {cancel_jmp_buf = {{jmp_buf = {47726774081280, 
3565715072785538793, 140727544518944, 47726774081984, 44910560, 3, 
7470192104369774313, 7473162878733999849}, mask_was_saved = 0}}, priv = {pad = 
{0x0, 0x0, 0x0, 0x0}, 
data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
not_first_call = 
pagesize_m1 = 
sp = 
freesize = 
#16 0x2b66fb67493d in clone () at 
../sysdeps/unix/sysv/linux/x86_64/clone.S:115
No locals.
 
Can anyone advise if this is fixed by PJSIP commit 5243?
 
Thanks,
 
Ross
 
 
  -- 
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev

Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241

2016-03-02 Thread George Joseph
On Wed, Mar 2, 2016 at 2:56 AM, Ross Beer  wrote:

> Hi George,
>
> I have re-built the 'c1bf014ea08cf66835a6f000e2bd6c7da588da6b' commit
> and PJSIP and Asterisk hasn't crashed after reload. However it did take 25
> mins to load.
>
> As requested I have opened a ticket for the realtime issue:
>
> https://issues.asterisk.org/jira/browse/ASTERISK-25826
>

​Got it, thanks.​


>
>
> Basically, I think this could be resolved by a configuration option that
> stops sourcery/pjsip loading all peers at start-up as this is not
> needed for the current setup. This has been discussed before on the mailing
> list however it doesn't look like it progresses any further.
>

​If you're up for trying something, ​you can comment out the
qualify_and_schedule_all function
​in ​
line
​s​
1135
​-1147​
of res/res_pjsip/pjsip_options.c, then comment out its 2 references on
lines 1245 and 1281.  If that drops your startup times, then we know we're
on the right track.


>
> I would like to thank you for all of your help tying to identify the issue
> and hope that we can resolve it soon.
>

​No worries!​


>
> Kind regards,
>
> Ross
>
> --
> From: george.jos...@fairview5.com
> Date: Tue, 1 Mar 2016 16:27:06 -0700
>
> To: asterisk-dev@lists.digium.com
> Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
>
>
>
> On Tue, Mar 1, 2016 at 3:07 PM, Ross Beer  wrote:
>
> ok,
>
> That took 15 mins to load and then crashed. This will be due to the
> pjsip_dlg_create_uas_and_inc_lock commit.
>
>
> ​It should not have crashed.  That commit had the fix for it.  If it did
> crash with that commit, open a Jira issue and ​attach a full backtrace.
>
>
>
>
> However 15 mins to start is a long time and would cause issues in a
> production environment.
>
>
> ​Would you open a Jira issue on the realtime problem (if one isn't already
> open).
> I'm starting to look at alternatives.
>
>
>
>
> Thank you for your help here,
>
> Ross
>
>
> --
> From: george.jos...@fairview5.com
> Date: Tue, 1 Mar 2016 14:02:38 -0700
>
> To: asterisk-dev@lists.digium.com
> Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
>
>
>
> On Tue, Mar 1, 2016 at 1:04 PM, Ross Beer  wrote:
>
> Hi George,
>
> Using a development test box for testing!!
>
> Asterisk 13.7.2 with no cache takes 4:12 to load, that with PJSIP Commit
> 5240
>
>
> ​Ok, try this combination...
> "git checkout c1bf014ea08cf66835a6f000e2bd6c7da588da6b"
> pjproject from trunk.
> with caching.
>
> The commit I referenced is the one that handles the
> pjsip_dlg_create_uas_and_inc_lock
> ​
>
>
>
>
>
>
> Qualify time on the aor is set to zero, I guess a query could be made to
> check for a value greater than zero instead of loading all endpoints.
>
> Ross
>
> --
> From: george.jos...@fairview5.com
> Date: Tue, 1 Mar 2016 12:45:28 -0700
>
> To: asterisk-dev@lists.digium.com
> Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
>
>
>
> On Tue, Mar 1, 2016 at 12:21 PM, Ross Beer  wrote:
>
> Hi George,
>
> No endpoints are qualified, there are 20,000 endpoints with only 75 static
> contacts defined in the aors. The database is a MySQL cluster.
>
> With the current Asterisk 13 branch with cache disabled and the latest
> PJSIP it takes 5 mins and then before finishing it crashes.
>
> With Asterisk 13.7.2 with cache it takes around 1 1/2 min to load, however
> due to the bug with PJSIP Commit 5241 asterisk crashes when using TLS
> devices.
>
>
> ​Try 13.7.2 without the cache.  I'm trying to understand where the time is
> being spent.​  I know it will crash because of that bug.  You're not doing
> this on a production system are you??
>
>
>
> The main issue here is that the endpoints are loaded as soon as PJSIP
> loads, ideally endpoints would only be loaded once a device registers or
> attempts to make a call. Much in the same way as Asterisk 1.8 chan_sip
> manages realtime.
>
> There is no need to load the endpoints as they are not qualified.
>
>
> ​How do you know they're not qualified if you don't load them? :)
>
> Time to load up a database with 20,000 endpoints I guess.​
>
>
>
> Ross
>
> --
> From: george.jos...@fairview5.com
> Date: Tue, 1 Mar 2016 11:58:15 -0700
> To: asterisk-dev@lists.digium.com
> Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
>
>
>
> On Tue, Mar 1, 2016 at 11:38 AM, Michael Ulitskiy 
> wrote:
>
> Hello,
>
>
>
> Please see this discussion
> http://lists.digium.com/pipermail/asterisk-dev/2015-October/075122.html
>
> I guess you're talking about the same problem.
>
>
> ​It's possible.​
>
>
>
>
>
> Michael
>
>
>
> On Tuesday, March 01, 2016 06:26:27 PM Ross Beer wrote:
>
> > Hi George,
>
> >
>
> > We need to store contacts in realtime for our system. However not all
> endpoints are registered only about 

Re: [asterisk-dev] app_swift crash asterisk 11.20.0-rc1

2016-03-02 Thread Joshua Colp

Jeremy Kister wrote:

On 3/1/2016 8:21 PM, Jeremy Kister wrote:

FYI, the problem is caused by the translate.c changes in commit
c7f7c7c35d

https://code.asterisk.org/code/changelog/asterisk?cs=c7f8c8c35db2fe1c4ce9f27c4a28649452dc5463



[cant keep up with myself here--]

the exact changes causing pain appear on lines 516-521

revering back to 'framein(p,out);' on line 515 makes app_swift happy again.

/me begs.


This has probably exposed a bug in app_swift where the Asterisk frame it 
generates is not completely zeroed out. As a result depending on the 
state of the memory it may think that it is a chain of frames when 
really there is not. This causes the above change to try to translate it 
and since it points to nothing, it crashes.


The frame in app_swift should be memset to zeroes to ensure it is 
completely clean.


Cheers,

--
Joshua Colp
Digium, Inc. | Senior Software Developer
445 Jan Davis Drive NW - Huntsville, AL 35806 - US
Check us out at: www.digium.com & www.asterisk.org


--
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-dev


Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241

2016-03-02 Thread Ross Beer
Hi George,
 
I have re-built the 'c1bf014ea08cf66835a6f000e2bd6c7da588da6b' commit and PJSIP 
and Asterisk hasn't crashed after reload. However it did take 25 mins to load.
 
As requested I have opened a ticket for the realtime issue:
 
https://issues.asterisk.org/jira/browse/ASTERISK-25826
 
Basically, I think this could be resolved by a configuration option that stops 
sourcery/pjsip loading all peers at start-up as this is not needed for the 
current setup. This has been discussed before on the mailing list however it 
doesn't look like it progresses any further.
 
I would like to thank you for all of your help tying to identify the issue and 
hope that we can resolve it soon.
 
Kind regards,
 
Ross
 
From: george.jos...@fairview5.com
Date: Tue, 1 Mar 2016 16:27:06 -0700
To: asterisk-dev@lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 3:07 PM, Ross Beer  wrote:



ok,
 
That took 15 mins to load and then crashed. This will be due to the 
pjsip_dlg_create_uas_and_inc_lock commit.
​It should not have crashed.  That commit had the fix for it.  If it did crash 
with that commit, open a Jira issue and ​attach a full backtrace. 
 
However 15 mins to start is a long time and would cause issues in a production 
environment.
​Would you open a Jira issue on the realtime problem (if one isn't already 
open).I'm starting to look at alternatives.


 
Thank you for your help here,
 
Ross

 
From: george.jos...@fairview5.com
Date: Tue, 1 Mar 2016 14:02:38 -0700
To: asterisk-dev@lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 1:04 PM, Ross Beer  wrote:



Hi George,
 
Using a development test box for testing!!
 
Asterisk 13.7.2 with no cache takes 4:12 to load, that with PJSIP Commit 5240

​Ok, try this combination..."git checkout 
c1bf014ea08cf66835a6f000e2bd6c7da588da6b"pjproject from trunk.with caching.
The commit I referenced is the one that handles the 
pjsip_dlg_create_uas_and_inc_lock​


  
Qualify time on the aor is set to zero, I guess a query could be made to check 
for a value greater than zero instead of loading all endpoints.
 
Ross
 
From: george.jos...@fairview5.com
Date: Tue, 1 Mar 2016 12:45:28 -0700
To: asterisk-dev@lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 12:21 PM, Ross Beer  wrote:



Hi George,
 
No endpoints are qualified, there are 20,000 endpoints with only 75 static 
contacts defined in the aors. The database is a MySQL cluster.
 
With the current Asterisk 13 branch with cache disabled and the latest PJSIP it 
takes 5 mins and then before finishing it crashes.
 
With Asterisk 13.7.2 with cache it takes around 1 1/2 min to load, however due 
to the bug with PJSIP Commit 5241 asterisk crashes when using TLS devices.

​Try 13.7.2 without the cache.  I'm trying to understand where the time is 
being spent.​  I know it will crash because of that bug.  You're not doing this 
on a production system are you??  
The main issue here is that the endpoints are loaded as soon as PJSIP loads, 
ideally endpoints would only be loaded once a device registers or attempts to 
make a call. Much in the same way as Asterisk 1.8 chan_sip manages realtime.
 
There is no need to load the endpoints as they are not qualified.

​How do you know they're not qualified if you don't load them? :)
Time to load up a database with 20,000 endpoints I guess.​  
Ross
 
From: george.jos...@fairview5.com
Date: Tue, 1 Mar 2016 11:58:15 -0700
To: asterisk-dev@lists.digium.com
Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241



On Tue, Mar 1, 2016 at 11:38 AM, Michael Ulitskiy  wrote:


Hello,
 
Please see this discussion 
http://lists.digium.com/pipermail/asterisk-dev/2015-October/075122.html
I guess you're talking about the same problem.
​It's possible.​
 

 
Michael
 
On Tuesday, March 01, 2016 06:26:27 PM Ross Beer wrote:
> Hi George,
>  
> We need to store contacts in realtime for our system. However not all 
> endpoints are registered only about 200, yet asterisk loops through every 
> endpoint which has been defined. It does this if contacts are in realtime or 
> not.
>  
> Its almost like pjsip is loading them to check if they need to be qualified 
> etc.
>  
> Asterisk 1.8 only put things into cache once they were accessed, is this an 
> option for sourcery?
​Well, in order to initiate qualify of contacts, Asterisk does have to "access" 
them all​ so I'm not quite sure what the problem is.
Can we reset to a known config and see what happens?

pjproject from the published 2.4.5 tarball.Asterisk from the published 13.7.2 
tarball.Disable memory_cache altogether in sorcery.conf.

See what happens.
Give me an estimate of how many endpoints and aors there are in the database, 
how many of those aors have static contacts