Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
On Wed, Mar 2, 2016 at 12:51 PM, Michael Ulitskiywrote: > Hello, > > > > Since you started to look at it again, let me repeat myself. > > The problem is described in detail here: > http://lists.digium.com/pipermail/asterisk-dev/2015-October/075128.html > > It has to do with the fact that at initial load pjsip realtime issues > separate db query for each endpoint/aor/etc in the system. > > In my case of ~10K endpoints it took asterisk ~1.5minutes to load. > > Further in that discussion I suggested that having the following API call > to populate sorcery cache would go a long way to > > reducing the scale of this problem: > > > > ast_sorcery_retrieve_by_fields(sip_sorcery, > "endpoint",AST_RETRIEVE_FLAG_MULTIPLE | AST_RETRIEVE_FLAG_ALL, NULL); > > > > I haven't looked at pjsip since the time of that discussion as that's > clearly a show-stopper for me, but I doubt anything changed. > > Also I haven't received any feedback if that suggestion is viable, so I'd > love to hear your (and/or other developers) opinion on it. > > Any other idea on how to deal with it is more than welcome as well. > So part of this I just fixed in review 2312. The cache is now populated at startup. Together with full_backend_cache there should be *some* relief. One of the base issues however is that we can't use the power of SQL to narrow down the result set before shipping them all back to Asterisk because not all the backends support SQL. If they did, we could, for instance, 'select * from ps_endpoints a, ps_aors b where a.id = b.id and b.qualify_frequency > 0' (well almost) to get only the endpoints that need to be scheduled. Or 'select * from ps_endpoints where id in ('user','user@domain')'. We just can't do that right now. All is not lost however. We've been noodling with some ideas on how to make this work more efficiently but it's not something that's going to happen this week, :) > > > Thanks, > > Michael > > > > > -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev
[asterisk-dev] dahdi-tools 2.11.1 : fix for relocation error on build
Building dahdi-tools 2.11.1: gcc -DHAVE_CONFIG_H -I. -g -Wall -O2 -I/home/asterisk/rpmbuild/BUILD/dahdi-tools-2.11.1/include -c -o fxstest.o fxstest.c /bin/sh ./libtool --tag=CC --mode=link gcc -g -Wall -O2 -I/home/asterisk/rpmbuild/BUILD/dahdi-tools-2.11.1/include -Wl,-z,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -o fxstest fxstest.o libtonezone.la -lpthread -lm libtool: link: gcc -g -Wall -O2 -I/home/asterisk/rpmbuild/BUILD/dahdi-tools-2.11.1/include -Wl,-z -Wl,relro -specs=/usr/lib/rpm/redhat/redhat-hardened-ld -o .libs/fxstest fxstest.o ./.libs/libtonezone.so -lpthread -lm -Wl,-rpath -Wl,/usr/lib64 /usr/bin/ld: fxstest.o: relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; recompile with -fPIC Even though: + CFLAGS='-O2 -march=native -mtune=native -ftree-vectorize -ffast-math -fPIC' + export CFLAGS + CXXFLAGS='-O2 -march=native -mtune=native -ftree-vectorize -ffast-math -fPIC' + export CXXFLAGS because CFLAGS doesn't override the Makefile CFLAGS. Notice no -fPIC above. The fix: add -fPIC to CFLAGS in Makefile.am, ./bootstrap.sh, build. sean -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev
Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
Hello, Since you started to look at it again, let me repeat myself. The problem is described in detail here: http://lists.digium.com/pipermail/asterisk-dev/2015-October/075128.html It has to do with the fact that at initial load pjsip realtime issues separate db query for each endpoint/aor/etc in the system. In my case of ~10K endpoints it took asterisk ~1.5minutes to load. Further in that discussion I suggested that having the following API call to populate sorcery cache would go a long way to reducing the scale of this problem: ast_sorcery_retrieve_by_fields(sip_sorcery, "endpoint",AST_RETRIEVE_FLAG_MULTIPLE | AST_RETRIEVE_FLAG_ALL, NULL); I haven't looked at pjsip since the time of that discussion as that's clearly a show-stopper for me, but I doubt anything changed. Also I haven't received any feedback if that suggestion is viable, so I'd love to hear your (and/or other developers) opinion on it. Any other idea on how to deal with it is more than welcome as well. Thanks, Michael On Wednesday, March 02, 2016 06:04:15 PM Ross Beer wrote: > Hi George, > > I have commented out those lines and it hasn't improved the load times, its > still taking 15 mins. It has improved it a little. > > Regards, > > Ross > > From: george.jos...@fairview5.com > Date: Wed, 2 Mar 2016 08:19:01 -0700 > To: asterisk-dev@lists.digium.com > Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 > > > > On Wed, Mar 2, 2016 at 2:56 AM, Ross Beerwrote: > > > > Hi George, > > I have re-built the 'c1bf014ea08cf66835a6f000e2bd6c7da588da6b' commit and > PJSIP and Asterisk hasn't crashed after reload. However it did take 25 mins > to load. > > As requested I have opened a ticket for the realtime issue: > > https://issues.asterisk.org/jira/browse/ASTERISK-25826 > Got it, thanks. > > Basically, I think this could be resolved by a configuration option that > stops sourcery/pjsip loading all peers at start-up as this is not needed for > the current setup. This has been discussed before on the mailing list however > it doesn't look like it progresses any further. > > If you're up for trying something, you can comment out the > qualify_and_schedule_all function in lines 1135-1147 of > res/res_pjsip/pjsip_options.c, then comment out its 2 references on lines > 1245 and 1281. If that drops your startup times, then we know we're on the > right track. > > > I would like to thank you for all of your help tying to identify the issue > and hope that we can resolve it soon. > > No worries! > > Kind regards, > > Ross > > From: george.jos...@fairview5.com > Date: Tue, 1 Mar 2016 16:27:06 -0700 > To: asterisk-dev@lists.digium.com > Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 > > > > On Tue, Mar 1, 2016 at 3:07 PM, Ross Beer wrote: > > > > ok, > > That took 15 mins to load and then crashed. This will be due to the > pjsip_dlg_create_uas_and_inc_lock commit. > It should not have crashed. That commit had the fix for it. If it did > crash with that commit, open a Jira issue and attach a full backtrace. > > However 15 mins to start is a long time and would cause issues in a > production environment. > Would you open a Jira issue on the realtime problem (if one isn't already > open).I'm starting to look at alternatives. > > > > Thank you for your help here, > > Ross > > > From: george.jos...@fairview5.com > Date: Tue, 1 Mar 2016 14:02:38 -0700 > To: asterisk-dev@lists.digium.com > Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 > > > > On Tue, Mar 1, 2016 at 1:04 PM, Ross Beer wrote: > > > > Hi George, > > Using a development test box for testing!! > > Asterisk 13.7.2 with no cache takes 4:12 to load, that with PJSIP Commit 5240 > > Ok, try this combination..."git checkout > c1bf014ea08cf66835a6f000e2bd6c7da588da6b"pjproject from trunk.with caching. > The commit I referenced is the one that handles the > pjsip_dlg_create_uas_and_inc_lock > > > > Qualify time on the aor is set to zero, I guess a query could be made to > check for a value greater than zero instead of loading all endpoints. > > Ross > > From: george.jos...@fairview5.com > Date: Tue, 1 Mar 2016 12:45:28 -0700 > To: asterisk-dev@lists.digium.com > Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 > > > > On Tue, Mar 1, 2016 at 12:21 PM, Ross Beer wrote: > > > > Hi George, > > No endpoints are qualified, there are 20,000 endpoints with only 75 static > contacts defined in the aors. The database is a MySQL cluster. > > With the current Asterisk 13 branch with cache disabled and the latest PJSIP > it takes 5 mins and then before finishing it crashes. > > With Asterisk 13.7.2 with cache it takes around 1 1/2 min to load, however > due to the bug with PJSIP Commit 5241 asterisk crashes
Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
Hi George, I've just rolled back to 13.7.2 and the following modules are in the latest git repository and not in 13.7.2: res_pjproject.so res_odbc_transaction.so res_pjsip_history.so Not sure if any of these would make a difference to the load time? Regards, Ross From: ross.b...@outlook.com To: asterisk-dev@lists.digium.com Date: Wed, 2 Mar 2016 18:04:15 + Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 Hi George, I have commented out those lines and it hasn't improved the load times, its still taking 15 mins. It has improved it a little. Regards, Ross From: george.jos...@fairview5.com Date: Wed, 2 Mar 2016 08:19:01 -0700 To: asterisk-dev@lists.digium.com Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 On Wed, Mar 2, 2016 at 2:56 AM, Ross Beerwrote: Hi George, I have re-built the 'c1bf014ea08cf66835a6f000e2bd6c7da588da6b' commit and PJSIP and Asterisk hasn't crashed after reload. However it did take 25 mins to load. As requested I have opened a ticket for the realtime issue: https://issues.asterisk.org/jira/browse/ASTERISK-25826 Got it, thanks. Basically, I think this could be resolved by a configuration option that stops sourcery/pjsip loading all peers at start-up as this is not needed for the current setup. This has been discussed before on the mailing list however it doesn't look like it progresses any further. If you're up for trying something, you can comment out the qualify_and_schedule_all function in lines 1135-1147 of res/res_pjsip/pjsip_options.c, then comment out its 2 references on lines 1245 and 1281. If that drops your startup times, then we know we're on the right track. I would like to thank you for all of your help tying to identify the issue and hope that we can resolve it soon. No worries! Kind regards, Ross From: george.jos...@fairview5.com Date: Tue, 1 Mar 2016 16:27:06 -0700 To: asterisk-dev@lists.digium.com Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 On Tue, Mar 1, 2016 at 3:07 PM, Ross Beer wrote: ok, That took 15 mins to load and then crashed. This will be due to the pjsip_dlg_create_uas_and_inc_lock commit. It should not have crashed. That commit had the fix for it. If it did crash with that commit, open a Jira issue and attach a full backtrace. However 15 mins to start is a long time and would cause issues in a production environment. Would you open a Jira issue on the realtime problem (if one isn't already open).I'm starting to look at alternatives. Thank you for your help here, Ross From: george.jos...@fairview5.com Date: Tue, 1 Mar 2016 14:02:38 -0700 To: asterisk-dev@lists.digium.com Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 On Tue, Mar 1, 2016 at 1:04 PM, Ross Beer wrote: Hi George, Using a development test box for testing!! Asterisk 13.7.2 with no cache takes 4:12 to load, that with PJSIP Commit 5240 Ok, try this combination..."git checkout c1bf014ea08cf66835a6f000e2bd6c7da588da6b"pjproject from trunk.with caching. The commit I referenced is the one that handles the pjsip_dlg_create_uas_and_inc_lock Qualify time on the aor is set to zero, I guess a query could be made to check for a value greater than zero instead of loading all endpoints. Ross From: george.jos...@fairview5.com Date: Tue, 1 Mar 2016 12:45:28 -0700 To: asterisk-dev@lists.digium.com Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 On Tue, Mar 1, 2016 at 12:21 PM, Ross Beer wrote: Hi George, No endpoints are qualified, there are 20,000 endpoints with only 75 static contacts defined in the aors. The database is a MySQL cluster. With the current Asterisk 13 branch with cache disabled and the latest PJSIP it takes 5 mins and then before finishing it crashes. With Asterisk 13.7.2 with cache it takes around 1 1/2 min to load, however due to the bug with PJSIP Commit 5241 asterisk crashes when using TLS devices. Try 13.7.2 without the cache. I'm trying to understand where the time is being spent. I know it will crash because of that bug. You're not doing this on a production system are you?? The main issue here is that the endpoints are loaded as soon as PJSIP loads, ideally endpoints would only be loaded once a device registers or attempts to make a call. Much in the same way as Asterisk 1.8 chan_sip manages realtime. There is no need to load the endpoints as they are not qualified. How do you know they're not qualified if you don't load them? :) Time to load up a database with 20,000 endpoints I guess. Ross From: george.jos...@fairview5.com Date: Tue, 1 Mar 2016 11:58:15 -0700 To: asterisk-dev@lists.digium.com Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 On Tue, Mar 1, 2016 at 11:38 AM, Michael
Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
Hi George, I have commented out those lines and it hasn't improved the load times, its still taking 15 mins. It has improved it a little. Regards, Ross From: george.jos...@fairview5.com Date: Wed, 2 Mar 2016 08:19:01 -0700 To: asterisk-dev@lists.digium.com Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 On Wed, Mar 2, 2016 at 2:56 AM, Ross Beerwrote: Hi George, I have re-built the 'c1bf014ea08cf66835a6f000e2bd6c7da588da6b' commit and PJSIP and Asterisk hasn't crashed after reload. However it did take 25 mins to load. As requested I have opened a ticket for the realtime issue: https://issues.asterisk.org/jira/browse/ASTERISK-25826 Got it, thanks. Basically, I think this could be resolved by a configuration option that stops sourcery/pjsip loading all peers at start-up as this is not needed for the current setup. This has been discussed before on the mailing list however it doesn't look like it progresses any further. If you're up for trying something, you can comment out the qualify_and_schedule_all function in lines 1135-1147 of res/res_pjsip/pjsip_options.c, then comment out its 2 references on lines 1245 and 1281. If that drops your startup times, then we know we're on the right track. I would like to thank you for all of your help tying to identify the issue and hope that we can resolve it soon. No worries! Kind regards, Ross From: george.jos...@fairview5.com Date: Tue, 1 Mar 2016 16:27:06 -0700 To: asterisk-dev@lists.digium.com Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 On Tue, Mar 1, 2016 at 3:07 PM, Ross Beer wrote: ok, That took 15 mins to load and then crashed. This will be due to the pjsip_dlg_create_uas_and_inc_lock commit. It should not have crashed. That commit had the fix for it. If it did crash with that commit, open a Jira issue and attach a full backtrace. However 15 mins to start is a long time and would cause issues in a production environment. Would you open a Jira issue on the realtime problem (if one isn't already open).I'm starting to look at alternatives. Thank you for your help here, Ross From: george.jos...@fairview5.com Date: Tue, 1 Mar 2016 14:02:38 -0700 To: asterisk-dev@lists.digium.com Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 On Tue, Mar 1, 2016 at 1:04 PM, Ross Beer wrote: Hi George, Using a development test box for testing!! Asterisk 13.7.2 with no cache takes 4:12 to load, that with PJSIP Commit 5240 Ok, try this combination..."git checkout c1bf014ea08cf66835a6f000e2bd6c7da588da6b"pjproject from trunk.with caching. The commit I referenced is the one that handles the pjsip_dlg_create_uas_and_inc_lock Qualify time on the aor is set to zero, I guess a query could be made to check for a value greater than zero instead of loading all endpoints. Ross From: george.jos...@fairview5.com Date: Tue, 1 Mar 2016 12:45:28 -0700 To: asterisk-dev@lists.digium.com Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 On Tue, Mar 1, 2016 at 12:21 PM, Ross Beer wrote: Hi George, No endpoints are qualified, there are 20,000 endpoints with only 75 static contacts defined in the aors. The database is a MySQL cluster. With the current Asterisk 13 branch with cache disabled and the latest PJSIP it takes 5 mins and then before finishing it crashes. With Asterisk 13.7.2 with cache it takes around 1 1/2 min to load, however due to the bug with PJSIP Commit 5241 asterisk crashes when using TLS devices. Try 13.7.2 without the cache. I'm trying to understand where the time is being spent. I know it will crash because of that bug. You're not doing this on a production system are you?? The main issue here is that the endpoints are loaded as soon as PJSIP loads, ideally endpoints would only be loaded once a device registers or attempts to make a call. Much in the same way as Asterisk 1.8 chan_sip manages realtime. There is no need to load the endpoints as they are not qualified. How do you know they're not qualified if you don't load them? :) Time to load up a database with 20,000 endpoints I guess. Ross From: george.jos...@fairview5.com Date: Tue, 1 Mar 2016 11:58:15 -0700 To: asterisk-dev@lists.digium.com Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 On Tue, Mar 1, 2016 at 11:38 AM, Michael Ulitskiy wrote: Hello, Please see this discussion http://lists.digium.com/pipermail/asterisk-dev/2015-October/075122.html I guess you're talking about the same problem. It's possible. Michael On Tuesday, March 01, 2016 06:26:27 PM Ross Beer wrote: > Hi George, > > We need to store contacts in realtime for our system. However not all > endpoints are registered only about 200, yet asterisk loops through every >
Re: [asterisk-dev] app_swift crash asterisk 11.20.0-rc1
On 3/2/16 6:28 AM, Joshua Colp wrote: The frame in app_swift should be memset to zeroes to ensure it is completely clean. damn, you hit the nail right on the head. app_swift fixed up. thank you for your help -- Jeremy Kister http://jeremy.kister.net/ -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev
[asterisk-dev] Asterisk Crash
Hi, PJSIP just caused a seg fault at the current location: [?1034h(gdb) bt #0 __pthread_mutex_unlock_usercnt (mutex=0x0) at pthread_mutex_unlock.c:289 #1 __pthread_mutex_unlock (mutex=0x0) at pthread_mutex_unlock.c:290 #2 0x2b686ac0f203 in pj_mutex_unlock () from /usr/lib64/libpj.so.2 #3 0x2b6868c641f6 in pjsip_dlg_dec_lock () from /usr/lib64/libpjsip.so.2 #4 0x2b68708393f8 in distributor (rdata=0x2b6904021178) at res_pjsip/pjsip_distributor.c:301 #5 0x2b6868c47e16 in pjsip_endpt_process_rx_data () from /usr/lib64/libpjsip.so.2 #6 0x2b6868c4809a in endpt_on_rx_msg () from /usr/lib64/libpjsip.so.2 #7 0x2b6868c50368 in pjsip_tpmgr_receive_packet () from /usr/lib64/libpjsip.so.2 #8 0x2b6868c51e3b in udp_on_read_complete () from /usr/lib64/libpjsip.so.2 #9 0x2b686ac0be96 in ioqueue_dispatch_read_event () from /usr/lib64/libpj.so.2 #10 0x2b686ac0db24 in pj_ioqueue_poll () from /usr/lib64/libpj.so.2 #11 0x2b6868c47b21 in pjsip_endpt_handle_events2 () from /usr/lib64/libpjsip.so.2 #12 0x2b6868c47bcf in pjsip_endpt_handle_events () from /usr/lib64/libpjsip.so.2 #13 0x2b6870823a9c in monitor_thread_exec (endpt=0x0) at res_pjsip.c:3555 #14 0x2b686ac0e88f in thread_main () from /usr/lib64/libpj.so.2 #15 0x2b66fa7e2a51 in start_thread (arg=0x2b68416a5700) at pthread_create.c:301 #16 0x2b66fb67493d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 (gdb) bt full #0 __pthread_mutex_unlock_usercnt (mutex=0x0) at pthread_mutex_unlock.c:289 type = #1 __pthread_mutex_unlock (mutex=0x0) at pthread_mutex_unlock.c:290 No locals. #2 0x2b686ac0f203 in pj_mutex_unlock () from /usr/lib64/libpj.so.2 No symbol table info available. #3 0x2b6868c641f6 in pjsip_dlg_dec_lock () from /usr/lib64/libpjsip.so.2 No symbol table info available. #4 0x2b68708393f8 in distributor (rdata=0x2b6904021178) at res_pjsip/pjsip_distributor.c:301 dlg = 0x2b68b00f58a8 dist = 0x2b6b893723c8 serializer = 0x2b6b89673578 req_serializer = 0x2b6b89673578 clone = 0x2b69040517e8 #5 0x2b6868c47e16 in pjsip_endpt_process_rx_data () from /usr/lib64/libpjsip.so.2 No symbol table info available. #6 0x2b6868c4809a in endpt_on_rx_msg () from /usr/lib64/libpjsip.so.2 No symbol table info available. #7 0x2b6868c50368 in pjsip_tpmgr_receive_packet () from /usr/lib64/libpjsip.so.2 No symbol table info available. #8 0x2b6868c51e3b in udp_on_read_complete () from /usr/lib64/libpjsip.so.2 No symbol table info available. #9 0x2b686ac0be96 in ioqueue_dispatch_read_event () from /usr/lib64/libpj.so.2 No symbol table info available. #10 0x2b686ac0db24 in pj_ioqueue_poll () from /usr/lib64/libpj.so.2 No symbol table info available. #11 0x2b6868c47b21 in pjsip_endpt_handle_events2 () from /usr/lib64/libpjsip.so.2 No symbol table info available. #12 0x2b6868c47bcf in pjsip_endpt_handle_events () from /usr/lib64/libpjsip.so.2 No symbol table info available. #13 0x2b6870823a9c in monitor_thread_exec (endpt=0x0) at res_pjsip.c:3555 delay = {sec = 0, msec = 10} #14 0x2b686ac0e88f in thread_main () from /usr/lib64/libpj.so.2 No symbol table info available. #15 0x2b66fa7e2a51 in start_thread (arg=0x2b68416a5700) at pthread_create.c:301 __res = pd = 0x2b68416a5700 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {47726774081280, 3565715072785538793, 140727544518944, 47726774081984, 44910560, 3, 7470192104369774313, 7473162878733999849}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = pagesize_m1 = sp = freesize = #16 0x2b66fb67493d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 No locals. Can anyone advise if this is fixed by PJSIP commit 5243? Thanks, Ross -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev
Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
On Wed, Mar 2, 2016 at 2:56 AM, Ross Beerwrote: > Hi George, > > I have re-built the 'c1bf014ea08cf66835a6f000e2bd6c7da588da6b' commit > and PJSIP and Asterisk hasn't crashed after reload. However it did take 25 > mins to load. > > As requested I have opened a ticket for the realtime issue: > > https://issues.asterisk.org/jira/browse/ASTERISK-25826 > Got it, thanks. > > > Basically, I think this could be resolved by a configuration option that > stops sourcery/pjsip loading all peers at start-up as this is not > needed for the current setup. This has been discussed before on the mailing > list however it doesn't look like it progresses any further. > If you're up for trying something, you can comment out the qualify_and_schedule_all function in line s 1135 -1147 of res/res_pjsip/pjsip_options.c, then comment out its 2 references on lines 1245 and 1281. If that drops your startup times, then we know we're on the right track. > > I would like to thank you for all of your help tying to identify the issue > and hope that we can resolve it soon. > No worries! > > Kind regards, > > Ross > > -- > From: george.jos...@fairview5.com > Date: Tue, 1 Mar 2016 16:27:06 -0700 > > To: asterisk-dev@lists.digium.com > Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 > > > > On Tue, Mar 1, 2016 at 3:07 PM, Ross Beer wrote: > > ok, > > That took 15 mins to load and then crashed. This will be due to the > pjsip_dlg_create_uas_and_inc_lock commit. > > > It should not have crashed. That commit had the fix for it. If it did > crash with that commit, open a Jira issue and attach a full backtrace. > > > > > However 15 mins to start is a long time and would cause issues in a > production environment. > > > Would you open a Jira issue on the realtime problem (if one isn't already > open). > I'm starting to look at alternatives. > > > > > Thank you for your help here, > > Ross > > > -- > From: george.jos...@fairview5.com > Date: Tue, 1 Mar 2016 14:02:38 -0700 > > To: asterisk-dev@lists.digium.com > Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 > > > > On Tue, Mar 1, 2016 at 1:04 PM, Ross Beer wrote: > > Hi George, > > Using a development test box for testing!! > > Asterisk 13.7.2 with no cache takes 4:12 to load, that with PJSIP Commit > 5240 > > > Ok, try this combination... > "git checkout c1bf014ea08cf66835a6f000e2bd6c7da588da6b" > pjproject from trunk. > with caching. > > The commit I referenced is the one that handles the > pjsip_dlg_create_uas_and_inc_lock > > > > > > > > Qualify time on the aor is set to zero, I guess a query could be made to > check for a value greater than zero instead of loading all endpoints. > > Ross > > -- > From: george.jos...@fairview5.com > Date: Tue, 1 Mar 2016 12:45:28 -0700 > > To: asterisk-dev@lists.digium.com > Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 > > > > On Tue, Mar 1, 2016 at 12:21 PM, Ross Beer wrote: > > Hi George, > > No endpoints are qualified, there are 20,000 endpoints with only 75 static > contacts defined in the aors. The database is a MySQL cluster. > > With the current Asterisk 13 branch with cache disabled and the latest > PJSIP it takes 5 mins and then before finishing it crashes. > > With Asterisk 13.7.2 with cache it takes around 1 1/2 min to load, however > due to the bug with PJSIP Commit 5241 asterisk crashes when using TLS > devices. > > > Try 13.7.2 without the cache. I'm trying to understand where the time is > being spent. I know it will crash because of that bug. You're not doing > this on a production system are you?? > > > > The main issue here is that the endpoints are loaded as soon as PJSIP > loads, ideally endpoints would only be loaded once a device registers or > attempts to make a call. Much in the same way as Asterisk 1.8 chan_sip > manages realtime. > > There is no need to load the endpoints as they are not qualified. > > > How do you know they're not qualified if you don't load them? :) > > Time to load up a database with 20,000 endpoints I guess. > > > > Ross > > -- > From: george.jos...@fairview5.com > Date: Tue, 1 Mar 2016 11:58:15 -0700 > To: asterisk-dev@lists.digium.com > Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 > > > > On Tue, Mar 1, 2016 at 11:38 AM, Michael Ulitskiy > wrote: > > Hello, > > > > Please see this discussion > http://lists.digium.com/pipermail/asterisk-dev/2015-October/075122.html > > I guess you're talking about the same problem. > > > It's possible. > > > > > > Michael > > > > On Tuesday, March 01, 2016 06:26:27 PM Ross Beer wrote: > > > Hi George, > > > > > > We need to store contacts in realtime for our system. However not all > endpoints are registered only about
Re: [asterisk-dev] app_swift crash asterisk 11.20.0-rc1
Jeremy Kister wrote: On 3/1/2016 8:21 PM, Jeremy Kister wrote: FYI, the problem is caused by the translate.c changes in commit c7f7c7c35d https://code.asterisk.org/code/changelog/asterisk?cs=c7f8c8c35db2fe1c4ce9f27c4a28649452dc5463 [cant keep up with myself here--] the exact changes causing pain appear on lines 516-521 revering back to 'framein(p,out);' on line 515 makes app_swift happy again. /me begs. This has probably exposed a bug in app_swift where the Asterisk frame it generates is not completely zeroed out. As a result depending on the state of the memory it may think that it is a chain of frames when really there is not. This causes the above change to try to translate it and since it points to nothing, it crashes. The frame in app_swift should be memset to zeroes to ensure it is completely clean. Cheers, -- Joshua Colp Digium, Inc. | Senior Software Developer 445 Jan Davis Drive NW - Huntsville, AL 35806 - US Check us out at: www.digium.com & www.asterisk.org -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-dev mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-dev
Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241
Hi George, I have re-built the 'c1bf014ea08cf66835a6f000e2bd6c7da588da6b' commit and PJSIP and Asterisk hasn't crashed after reload. However it did take 25 mins to load. As requested I have opened a ticket for the realtime issue: https://issues.asterisk.org/jira/browse/ASTERISK-25826 Basically, I think this could be resolved by a configuration option that stops sourcery/pjsip loading all peers at start-up as this is not needed for the current setup. This has been discussed before on the mailing list however it doesn't look like it progresses any further. I would like to thank you for all of your help tying to identify the issue and hope that we can resolve it soon. Kind regards, Ross From: george.jos...@fairview5.com Date: Tue, 1 Mar 2016 16:27:06 -0700 To: asterisk-dev@lists.digium.com Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 On Tue, Mar 1, 2016 at 3:07 PM, Ross Beerwrote: ok, That took 15 mins to load and then crashed. This will be due to the pjsip_dlg_create_uas_and_inc_lock commit. It should not have crashed. That commit had the fix for it. If it did crash with that commit, open a Jira issue and attach a full backtrace. However 15 mins to start is a long time and would cause issues in a production environment. Would you open a Jira issue on the realtime problem (if one isn't already open).I'm starting to look at alternatives. Thank you for your help here, Ross From: george.jos...@fairview5.com Date: Tue, 1 Mar 2016 14:02:38 -0700 To: asterisk-dev@lists.digium.com Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 On Tue, Mar 1, 2016 at 1:04 PM, Ross Beer wrote: Hi George, Using a development test box for testing!! Asterisk 13.7.2 with no cache takes 4:12 to load, that with PJSIP Commit 5240 Ok, try this combination..."git checkout c1bf014ea08cf66835a6f000e2bd6c7da588da6b"pjproject from trunk.with caching. The commit I referenced is the one that handles the pjsip_dlg_create_uas_and_inc_lock Qualify time on the aor is set to zero, I guess a query could be made to check for a value greater than zero instead of loading all endpoints. Ross From: george.jos...@fairview5.com Date: Tue, 1 Mar 2016 12:45:28 -0700 To: asterisk-dev@lists.digium.com Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 On Tue, Mar 1, 2016 at 12:21 PM, Ross Beer wrote: Hi George, No endpoints are qualified, there are 20,000 endpoints with only 75 static contacts defined in the aors. The database is a MySQL cluster. With the current Asterisk 13 branch with cache disabled and the latest PJSIP it takes 5 mins and then before finishing it crashes. With Asterisk 13.7.2 with cache it takes around 1 1/2 min to load, however due to the bug with PJSIP Commit 5241 asterisk crashes when using TLS devices. Try 13.7.2 without the cache. I'm trying to understand where the time is being spent. I know it will crash because of that bug. You're not doing this on a production system are you?? The main issue here is that the endpoints are loaded as soon as PJSIP loads, ideally endpoints would only be loaded once a device registers or attempts to make a call. Much in the same way as Asterisk 1.8 chan_sip manages realtime. There is no need to load the endpoints as they are not qualified. How do you know they're not qualified if you don't load them? :) Time to load up a database with 20,000 endpoints I guess. Ross From: george.jos...@fairview5.com Date: Tue, 1 Mar 2016 11:58:15 -0700 To: asterisk-dev@lists.digium.com Subject: Re: [asterisk-dev] Asterisk Segfault After PJSIP Commit 5241 On Tue, Mar 1, 2016 at 11:38 AM, Michael Ulitskiy wrote: Hello, Please see this discussion http://lists.digium.com/pipermail/asterisk-dev/2015-October/075122.html I guess you're talking about the same problem. It's possible. Michael On Tuesday, March 01, 2016 06:26:27 PM Ross Beer wrote: > Hi George, > > We need to store contacts in realtime for our system. However not all > endpoints are registered only about 200, yet asterisk loops through every > endpoint which has been defined. It does this if contacts are in realtime or > not. > > Its almost like pjsip is loading them to check if they need to be qualified > etc. > > Asterisk 1.8 only put things into cache once they were accessed, is this an > option for sourcery? Well, in order to initiate qualify of contacts, Asterisk does have to "access" them all so I'm not quite sure what the problem is. Can we reset to a known config and see what happens? pjproject from the published 2.4.5 tarball.Asterisk from the published 13.7.2 tarball.Disable memory_cache altogether in sorcery.conf. See what happens. Give me an estimate of how many endpoints and aors there are in the database, how many of those aors have static contacts