Hi Pavel,

thank you for an answer - it inspired me a lot, and we're now much closer to the resolution (I hope). It seems that there is something wrong with memory allocation for RTP frames (probably res_rtp_asterisk.c). I explain details below, and I hope that one of Asterisk gurus will help us.

First I have to correct something I wrote before. Frame with src=RTP, which caused segfault, didn't come from DAHDI, it came from IP network (SIP). I verified it also by dropping udp rtp packets on the network - RTP frames in V.21 detection function then disappeared too. I'm not sure, but it seems that frames from network are stored into memory using res_rtp_asterisk.c module (or something very related to it) and probably there lives our bug.

You was right when you wrote, that there's something bad with datalen. As I know, a-law sample is 13bit integer, stored usually into 16bit integer for easier manipulation. We cannot store 13bit integer into 8bit integer without loosing information. Also libspandsp is expecting 16bit samples for V.21 detection. Asterisk module res_fax_spandsp calls spandsp function modem_connect_tones_rx() which is declared as: int modem_connect_tones_rx(modem_connect_tones_rx_state_t *s, const int16_t amp[], int len) where "amp" is array of 16-bit integers (samples), and "len" is number of samples (not number of bytes!!!!), as you can see from modem_connect_tones_rx() source code. When Asterisk pass "amp" pointer to modem_connect_tones_rx() together with "len" = 160, libspandsp will read 16-bit integer 160-times, staring from the pointer address, so it will read 320 bytes.

Let's look again on ast_frame which caused segfault:
- frametype = 2
- datalen = 160
- samples = 160
- mallocd = 1
- mallocd_hdr_len = 562
- offset = 64
- src = RTP
- flags = 1
- ts = 9140
- len = 20
- seqno = 1489
- data.ptr = 0xb4ef4f30

I am not sure about mallocd_hdr_len and other values, but I think that 160 bytes space (datalen) is _definitely_ not enough for 160 alaw/slin samples.

As I know, segfault happens when application tries to access memory, which doesn't belong to it. If we have 160bytes allocated, and we will try to read 320bytes from this memory, we'll probably read also something else, what we didn't expect. If this memory space is on the border of application memory region, we could be trying to read from memory which does not belong to this application - and this will cause segfault. Definitely.

So now it seems, that problem is not in res_fax_spandsp, neither in libspandsp, but somewhere in the Asterisk, where memory for RTP frames (coming from IP network) is allocated.

--
Michal Rybarik


On 01/27/2014 06:53 PM, Pavel Troller wrote:
Hello Michal,
   I'm afraid that I can't help, but I'm observing exactly the same crashes
as you are. They are very rare here, about one crash per 2 - 3 millions of
successful calls (but of course most of the calls are voice ones), so
debugging is very problematic. However, it's crashing exactly on the same
place in spandsp code.
   Please note that not only mallocd_hdr_len is changing, but primarily
datalen is too! If you subtract those malloc_hdr_len values (722 - 562),
you will get 160, which is exactly the difference between the datalen
values. I think that primary cause is the different datalen, and the size
of allocated memory just reflects this. However, another value, samples,
is the same in both cases: 160. Isn't it suspicious ? Why I need twice as
much data length for the same number of samples ? Oh, possibly because they
are in linear format, thus 16 bits wide (because conversion alaw2lin
produces 13bit samples), while in the second case they are in some other
format (see src=alawtolin in the "good" case and src=RTP in the "wrong"
one). But which one is it ? Native a-law ? Possibly... But it could be
also u-law ? How the routine gets the actual codec, in which the samples
are ?
   So, we digged at least some information about the crash, but in my case,
my theoretical background of the V.21 detection is almost none, so I can't
find more. Maybe this is enough for some more skilled person, like the
res_fax author, to judge, what can be a primary cause of this problem ?
   With regards,
     Pavel

Hello,

I have problem with random Asterisk segfaults on the machine, which I use
as T.38 gateway between DAHDI and SIP. I would like to kindly ask somebody
to take a look at it, and help me to find what's wrong... Asterisk is
version 11 from SVN, r382022 (I'm using this because of other dependencies
- I compared relevant sources to current v11 SVN and they are almost
unchanged).

Segfault happens on voice calls, during detection of fax preamble.
Segfaults happens randomly - sometimes there is segfault after 50.000
calls, sometimes after 5 calls. In coredumps I see, that segfault happens
in libspandsp2.so (version 0.06-pre21, and latest snapshot too).

I asked Steve Underwood (spandsp author) about this, and he pointed me to
the application itself - probably there is something wrong with "amp"
(pointer to the audio samples data), because this pointer is first time
used in function fsk_rx(), where segfault happens. So I looked deeper into
this, and added some debug info into the res_fax_spandsp.c source, into
function spandsp_v21_detect(), just before calling modem_connect_tones_rx()
(the function, which calls fsk_rx() later). Now I see the contents of frame
which caused segfault, and also the "amp" pointer (in asterisk it is
f->data.ptr), but I'm not sure what's wrong with it.

[Jan 27 14:00:22] VERBOSE[30694][C-000006cb] app_dial.c:     -- Called
DAHDI/G2/123456789
[Jan 27 14:00:27] VERBOSE[30694][C-000006cb] app_dial.c:     -- DAHDI/57-1
is proceeding passing it to SIP/mypbx-00000729
[Jan 27 14:00:27] VERBOSE[30694][C-000006cb] app_dial.c:     -- DAHDI/57-1
is ringing
[Jan 27 14:00:32] VERBOSE[30694][C-000006cb] app_dial.c:     -- DAHDI/57-1
answered SIP/mypbx-00000729
[Jan 27 14:00:32] NOTICE[30694][C-000006cb] res_fax_spandsp.c: frame={
frametype=2, datalen=320, samples=160, mallocd=1, mallocd_hdr_len=722,
offset=64, src=alawtolin, flags=0, ts=0, len=0, seqno=0,
data.ptr=0xb50c91b8  }
[Jan 27 14:00:32] NOTICE[30694][C-000006cb] res_fax_spandsp.c: frame={
frametype=2, datalen=160, samples=160, mallocd=1, mallocd_hdr_len=562,
offset=64, src=RTP, flags=1, ts=9140, len=20, seqno=1489,
data.ptr=0xb4ef4f30  }
  (... segfault now ...)

Core was generated by `/usr/sbin/asterisk -f -p -U asterisk -vvvg -c'.
Program terminated with signal 11, Segmentation fault.
#0  fsk_rx (s=0x83ea7e8, amp=0xb4ef4f30, len=160) at fsk.c:381
381                 s->window[j][buf_ptr].re = (ph.re*amp[i])>>
s->scaling_shift;

Last line from Asterisk log shows contents of ast_frame struct *f, which
caused segfault. I see that segfualt was caused by first frame, which
arrived from DAHDI (src=RTP) and which was passed to spands_v21_detect(),
and then to modem_connect_tones_rx(), and then fsk_rx().

Only one unusual thing, which I see on this frame, is that
f->mallocd_hdr_len=562. Many other frames have this set to 722 (if
f->mallocd==1) or to 0 (if f->mallocd==0). But in a few cases, I saw frames
with malloc_hdr_len set to different values, and these frames didn't cause
segfault.

Is there anybody who can help?
Many thanks..

--
Michal Rybarik


--
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-dev


--
_____________________________________________________________________
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-dev mailing list
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-dev

Reply via email to