Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1
Hi all, sorry for joining so late, but i am on holidays (by the end of the week) and rarely checking my mailbox. Thanks to bad weather i did that today :) To the OP: while reading the first posts i thought it is an old problem with REL/RSC loop (persistent on start with ANSI signaling) which was fixed in libss7 instead of sig_ss7, but not sure if it is a similar yet different one or it is the same issue. It really is a (remaining) problem if we receive RLC on previous REL, but after we have sent RSC. I was thinking to clear the old status bits after we receive RLC, but this will not fix the double RLC received problem and we can't ignore the first one (or just clear the SENT_REL flag), because we may never get a second one, so it should probably be better to ignore sending second RSC inside isup_handle_unexpected() if the previous one was sent T17 (timer seconds) ago. Because the timer is stopped on RLC it should be another timer or some flag to ignore it's expiration and not reset again ... will work on this next week when i am back. The code in my branch is actually Domjan Attila's version (the patches attached to the SS7-27 issue) ported to later Asterisk versions with very few additions/modifications, so the muffins are for him, while the bugs are from me :) P.S. apologies for top posting - the connection is unstable and i had to write the post offline and just copy/paste it On 2013-06-26 06:42, Pavel Troller wrote: Hi! So, I'm replying to my own original post, to keep the question and a possible answer together without any excessive or unrelated information. I hope I've found the cause of the problem and I hope I solved it. A modified libss7 is now online and I'm waiting for busy hours to see, whether it will help. The problem is, that in the isup_rel() function, all the important got_sent_msg flags are cleared, so the stack forgets a preceding call state: ... isup_rel(): c-got_sent_msg |= ISUP_SENT_REL; c-got_sent_msg = ~(ISUP_SENT_IAM | ISUP_PENDING_IAM | ISUP_CALL_CONNECTED | ISUP_GOT_IAM | ISUP_GOT_CCR | ISUP_SENT_INR); ... So, an incoming MSU, which was perfectly legitimate before sending REL, is now handled as unexpected. My solution adds the following code to the isup_receive() function for every message, which can confuse the stack by the discovered cause (an example for ACM message): case ISUP_ACM: + if (c-got_sent_msg ISUP_SENT_REL) { + ss7_message(ss7, Got unexpected ACM after sending REL on CIC %d PC %d, ignoring , c-cic, opc); + return 0; + } if (!(c-got_sent_msg ISUP_SENT_IAM)) { ss7_message(ss7, Got ACM but we didn't send IAM on CIC %d PC %d , c-cic, opc); return isup_handle_unexpected(ss7, c, opc); } If my change will prove good, I'm planning to remove the ss7_message() to limit the stack verbosity, as these situations are relatively frequent under heavy load and I think they are moreless logical and normal. I would be glad for some words from the KNK branch maintainer(s), whether to create a JIRA issue and put my patch there or how to proceed now in general. With regards, Pavel Hi! I would like to share my expiernce with deployment of this experimental SS7 branch. The first impressions are good, especially the timers seem to work well, saving many calls from being frozen. However, there are still some strange things, which I would like to discuss here, one by one. The first one is, that the channel sometimes doesn't recognize a message (mostly RLC), even it comes from an action initiated by the channel itself. Typically, the following is appearing often: [Jun 24 13:33:41] ERROR[3975]: chan_dahdi.c:14406 dahdi_ss7_error: [1] ISUP timer t17 expired on CIC 27 DPC 4097 [1] Got RLC but we didn't send REL/RSC on CIC 27 PC 4097 reseting the cic As I understand, there were some timeouts and now the channel tries to recover by sending RSC and firing T17. However, it seems that it immediately rejects RLC, which comes back as a response to the RSC which was just sent upon expiry of T17. And this appears again and again in the rhythm of T17, and the channel is not operational. ss7 show calls shows the following line for the misbehaving CIC: 27 4097 11 IAM IAM Or, a very similar situation: [2] Got SUS but no call on CIC 48 PC 4096 reseting the CIC [2] Got RLC but we didn't send REL/RSC on CIC 48 PC 4096 reseting the CIC The first question is, why there was no call while SUS was received. My idea is, that both the parties hung up their phones in the same time and that the call was undergoing destruction on Asterisk side (REL just sent or something like this), while SUS arrived. Maybe the call was marked as cleared even before RLC came back ? OK, I can understand this. But if the CIC was reset as the first message says (i.e. RSC was sent), why the RLC going back is not recognized then ? Or, just
Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1
Almost forgot. Please do not post patches (if any) in this list, but attach them to the SS7-27 issue instead with proper license agreement, so it can be included in Asterisk codebase On 2013-06-26 14:57, Kaloyan Kovachev wrote: Hi all, sorry for joining so late, but i am on holidays (by the end of the week) and rarely checking my mailbox. Thanks to bad weather i did that today :) To the OP: while reading the first posts i thought it is an old problem with REL/RSC loop (persistent on start with ANSI signaling) which was fixed in libss7 instead of sig_ss7, but not sure if it is a similar yet different one or it is the same issue. It really is a (remaining) problem if we receive RLC on previous REL, but after we have sent RSC. I was thinking to clear the old status bits after we receive RLC, but this will not fix the double RLC received problem and we can't ignore the first one (or just clear the SENT_REL flag), because we may never get a second one, so it should probably be better to ignore sending second RSC inside isup_handle_unexpected() if the previous one was sent T17 (timer seconds) ago. Because the timer is stopped on RLC it should be another timer or some flag to ignore it's expiration and not reset again ... will work on this next week when i am back. The code in my branch is actually Domjan Attila's version (the patches attached to the SS7-27 issue) ported to later Asterisk versions with very few additions/modifications, so the muffins are for him, while the bugs are from me :) P.S. apologies for top posting - the connection is unstable and i had to write the post offline and just copy/paste it On 2013-06-26 06:42, Pavel Troller wrote: Hi! So, I'm replying to my own original post, to keep the question and a possible answer together without any excessive or unrelated information. I hope I've found the cause of the problem and I hope I solved it. A modified libss7 is now online and I'm waiting for busy hours to see, whether it will help. The problem is, that in the isup_rel() function, all the important got_sent_msg flags are cleared, so the stack forgets a preceding call state: ... isup_rel(): c-got_sent_msg |= ISUP_SENT_REL; c-got_sent_msg = ~(ISUP_SENT_IAM | ISUP_PENDING_IAM | ISUP_CALL_CONNECTED | ISUP_GOT_IAM | ISUP_GOT_CCR | ISUP_SENT_INR); ... So, an incoming MSU, which was perfectly legitimate before sending REL, is now handled as unexpected. My solution adds the following code to the isup_receive() function for every message, which can confuse the stack by the discovered cause (an example for ACM message): case ISUP_ACM: + if (c-got_sent_msg ISUP_SENT_REL) { + ss7_message(ss7, Got unexpected ACM after sending REL on CIC %d PC %d, ignoring , c-cic, opc); + return 0; + } if (!(c-got_sent_msg ISUP_SENT_IAM)) { ss7_message(ss7, Got ACM but we didn't send IAM on CIC %d PC %d , c-cic, opc); return isup_handle_unexpected(ss7, c, opc); } If my change will prove good, I'm planning to remove the ss7_message() to limit the stack verbosity, as these situations are relatively frequent under heavy load and I think they are moreless logical and normal. I would be glad for some words from the KNK branch maintainer(s), whether to create a JIRA issue and put my patch there or how to proceed now in general. With regards, Pavel Hi! I would like to share my expiernce with deployment of this experimental SS7 branch. The first impressions are good, especially the timers seem to work well, saving many calls from being frozen. However, there are still some strange things, which I would like to discuss here, one by one. The first one is, that the channel sometimes doesn't recognize a message (mostly RLC), even it comes from an action initiated by the channel itself. Typically, the following is appearing often: [Jun 24 13:33:41] ERROR[3975]: chan_dahdi.c:14406 dahdi_ss7_error: [1] ISUP timer t17 expired on CIC 27 DPC 4097 [1] Got RLC but we didn't send REL/RSC on CIC 27 PC 4097 reseting the cic As I understand, there were some timeouts and now the channel tries to recover by sending RSC and firing T17. However, it seems that it immediately rejects RLC, which comes back as a response to the RSC which was just sent upon expiry of T17. And this appears again and again in the rhythm of T17, and the channel is not operational. ss7 show calls shows the following line for the misbehaving CIC: 27 4097 11 IAM IAM Or, a very similar situation: [2] Got SUS but no call on CIC 48 PC 4096 reseting the CIC [2] Got RLC but we didn't send REL/RSC on CIC 48 PC 4096 reseting the CIC The first question is, why there was no call while SUS was received. My idea is, that both the parties hung up their phones in the same time and that the call was undergoing destruction on Asterisk side (REL just sent or something like this), while SUS arrived. Maybe the call was
Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1
Thanks Kaloyan. Before this thread, there were no mentions at all to a KNK tree, so I though this was stock libss7. I'm using my own patched libss7. I processed over one million call setups with ten servers, with many difficult setups (third party STPs, STPs 1000 miles away with transmission lines that do fail from time to time, connections with almost one dozen types of ISUP switches, sharing two links with an STP to half a dozen switches), and the issues reported don't happen at all. So this look like a bug in your patch or in Attila's code. My patch is for paying customers only (they get the source, and could release if they want to, but chose not to). I have done very small changes to the ISUP side of things, but some fairly major changes to MTP2, MTP3 and DAHDI mtp2 mode. I even implemented very basic STP functionality, and MTP2 over UDP signalling (between asterisks only). It might be worth trying to look at the diffs from stock to your branch. On 06/26/13 09:05, Kaloyan Kovachev wrote: Almost forgot. Please do not post patches (if any) in this list, but attach them to the SS7-27 issue instead with proper license agreement, so it can be included in Asterisk codebase On 2013-06-26 14:57, Kaloyan Kovachev wrote: Hi all, sorry for joining so late, but i am on holidays (by the end of the week) and rarely checking my mailbox. Thanks to bad weather i did that today :) To the OP: while reading the first posts i thought it is an old problem with REL/RSC loop (persistent on start with ANSI signaling) which was fixed in libss7 instead of sig_ss7, but not sure if it is a similar yet different one or it is the same issue. It really is a (remaining) problem if we receive RLC on previous REL, but after we have sent RSC. I was thinking to clear the old status bits after we receive RLC, but this will not fix the double RLC received problem and we can't ignore the first one (or just clear the SENT_REL flag), because we may never get a second one, so it should probably be better to ignore sending second RSC inside isup_handle_unexpected() if the previous one was sent T17 (timer seconds) ago. Because the timer is stopped on RLC it should be another timer or some flag to ignore it's expiration and not reset again ... will work on this next week when i am back. The code in my branch is actually Domjan Attila's version (the patches attached to the SS7-27 issue) ported to later Asterisk versions with very few additions/modifications, so the muffins are for him, while the bugs are from me :) P.S. apologies for top posting - the connection is unstable and i had to write the post offline and just copy/paste it On 2013-06-26 06:42, Pavel Troller wrote: Hi! So, I'm replying to my own original post, to keep the question and a possible answer together without any excessive or unrelated information. I hope I've found the cause of the problem and I hope I solved it. A modified libss7 is now online and I'm waiting for busy hours to see, whether it will help. The problem is, that in the isup_rel() function, all the important got_sent_msg flags are cleared, so the stack forgets a preceding call state: ... isup_rel(): c-got_sent_msg |= ISUP_SENT_REL; c-got_sent_msg = ~(ISUP_SENT_IAM | ISUP_PENDING_IAM | ISUP_CALL_CONNECTED | ISUP_GOT_IAM | ISUP_GOT_CCR | ISUP_SENT_INR); ... So, an incoming MSU, which was perfectly legitimate before sending REL, is now handled as unexpected. My solution adds the following code to the isup_receive() function for every message, which can confuse the stack by the discovered cause (an example for ACM message): case ISUP_ACM: + if (c-got_sent_msg ISUP_SENT_REL) { + ss7_message(ss7, Got unexpected ACM after sending REL on CIC %d PC %d, ignoring , c-cic, opc); + return 0; + } if (!(c-got_sent_msg ISUP_SENT_IAM)) { ss7_message(ss7, Got ACM but we didn't send IAM on CIC %d PC %d , c-cic, opc); return isup_handle_unexpected(ss7, c, opc); } If my change will prove good, I'm planning to remove the ss7_message() to limit the stack verbosity, as these situations are relatively frequent under heavy load and I think they are moreless logical and normal. I would be glad for some words from the KNK branch maintainer(s), whether to create a JIRA issue and put my patch there or how to proceed now in general. With regards, Pavel Hi! I would like to share my expiernce with deployment of this experimental SS7 branch. The first impressions are good, especially the timers seem to work well, saving many calls from being frozen. However, there are still some strange things, which I would like to discuss here, one by one. The first one is, that the channel sometimes doesn't recognize a message (mostly RLC), even it comes from an action initiated by the channel itself. Typically, the following is appearing often: [Jun 24
Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1
The problem with stock libss7 is that one will never complete the tests required from telcos in Europe as it is missing functionality which ITU have described in the test procedures. Without the ISUP timers (the main functionality added from the patches) it is just not possible and the link may not even come UP in some cases. Probably in ANSI world it works fine, but not in the ITU world. One of the difficulties was to keep the code working as before without the timers defined in chan_dahdi.conf, another is the hard to find (freely available) ANSI standard and it's requirements, then the code base/functionality have changed quite a lot from 1.6 separating sig_ss7 from chan_dahdi etc. I am sure there are bugs and place for improvements for that branch, but the original version from Domjan is used from me and many more for few years already (that's why i said the bugs are from me in that branch :) ) and we are stuck with 1.6 because of that. I have tried to get the changes in Asterisk 11, but my (below average) C skills and available time did not allowed to do that, while the more time passes more difficult it will be to keep it up to date with the rest of the Asterisk code. I hope with some help from others (testing and patches) this code will finally find it's way in Asterisk and then we may look to adding the cluster/routing/STP functionality On 2013-06-26 16:33, Marcelo Pacheco wrote: Thanks Kaloyan. Before this thread, there were no mentions at all to a KNK tree, so I though this was stock libss7. I'm using my own patched libss7. I processed over one million call setups with ten servers, with many difficult setups (third party STPs, STPs 1000 miles away with transmission lines that do fail from time to time, connections with almost one dozen types of ISUP switches, sharing two links with an STP to half a dozen switches), and the issues reported don't happen at all. So this look like a bug in your patch or in Attila's code. My patch is for paying customers only (they get the source, and could release if they want to, but chose not to). I have done very small changes to the ISUP side of things, but some fairly major changes to MTP2, MTP3 and DAHDI mtp2 mode. I even implemented very basic STP functionality, and MTP2 over UDP signalling (between asterisks only). It might be worth trying to look at the diffs from stock to your branch. On 06/26/13 09:05, Kaloyan Kovachev wrote: Almost forgot. Please do not post patches (if any) in this list, but attach them to the SS7-27 issue instead with proper license agreement, so it can be included in Asterisk codebase On 2013-06-26 14:57, Kaloyan Kovachev wrote: Hi all, sorry for joining so late, but i am on holidays (by the end of the week) and rarely checking my mailbox. Thanks to bad weather i did that today :) To the OP: while reading the first posts i thought it is an old problem with REL/RSC loop (persistent on start with ANSI signaling) which was fixed in libss7 instead of sig_ss7, but not sure if it is a similar yet different one or it is the same issue. It really is a (remaining) problem if we receive RLC on previous REL, but after we have sent RSC. I was thinking to clear the old status bits after we receive RLC, but this will not fix the double RLC received problem and we can't ignore the first one (or just clear the SENT_REL flag), because we may never get a second one, so it should probably be better to ignore sending second RSC inside isup_handle_unexpected() if the previous one was sent T17 (timer seconds) ago. Because the timer is stopped on RLC it should be another timer or some flag to ignore it's expiration and not reset again ... will work on this next week when i am back. The code in my branch is actually Domjan Attila's version (the patches attached to the SS7-27 issue) ported to later Asterisk versions with very few additions/modifications, so the muffins are for him, while the bugs are from me :) P.S. apologies for top posting - the connection is unstable and i had to write the post offline and just copy/paste it On 2013-06-26 06:42, Pavel Troller wrote: Hi! So, I'm replying to my own original post, to keep the question and a possible answer together without any excessive or unrelated information. I hope I've found the cause of the problem and I hope I solved it. A modified libss7 is now online and I'm waiting for busy hours to see, whether it will help. The problem is, that in the isup_rel() function, all the important got_sent_msg flags are cleared, so the stack forgets a preceding call state: ... isup_rel(): c-got_sent_msg |= ISUP_SENT_REL; c-got_sent_msg = ~(ISUP_SENT_IAM | ISUP_PENDING_IAM | ISUP_CALL_CONNECTED | ISUP_GOT_IAM | ISUP_GOT_CCR | ISUP_SENT_INR); ... So, an incoming MSU, which was perfectly legitimate before sending REL, is now handled as unexpected. My solution adds the following code to the isup_receive() function for every message, which can confuse the stack by the discovered cause (an example
Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1
Hi Kaloyan, Hi all, sorry for joining so late, but i am on holidays (by the end of the week) and rarely checking my mailbox. Thanks to bad weather i did that today :) Never mind, I'm happy you're here! To the OP: while reading the first posts i thought it is an old problem with REL/RSC loop (persistent on start with ANSI signaling) which was fixed in libss7 instead of sig_ss7, but not sure if it is a similar yet different one or it is the same issue. It really is a (remaining) problem if we receive RLC on previous REL, but after we have sent RSC. I was thinking to clear the old status bits after we receive RLC, but this will not fix the double RLC received problem and we can't ignore the first one (or just clear the SENT_REL flag), because we may never get a second one, so it should probably be better to ignore sending second RSC inside isup_handle_unexpected() if the previous one was sent T17 (timer seconds) ago. Because the timer is stopped on RLC it should be another timer or some flag to ignore it's expiration and not reset again ... will work on this next week when i am back. I think it's another problem. Sometimes I have also this kind of loop, lasting for hours, until it somewhat settles itself. But the error I've reported here is, that we clear the old status flags immediately after sending our REL and if an MSU is already coming back (it may be any common MSU like ACM, CPG, ANM, SUS, RES, REL..., at least I've encountered all these), we don't expect it, we call isup_handle_unexpected() and we send RSC, which is absolutely surplus, because there is nothing wrong with the call state, we just have to ignore this (and possibly any other) MSUs, until we get RLC acknowledging our REL. My patch does it by checking ISUP_SENT_REL, however, it might be better to postpone clearing the got_sent_msg flags from isup_rel() to the ISUP_RLC case in isup_receive(). However, I didn't know, whether leaving these flags set after sending REL wouldn't make harm somewhere, so I did it as written, and about 300 thousands of calls during yesterday didn't discover any problem with the patch. So, today I removed the ss7_message() calls from my patch and since then, Asterisk is very quiet and seems very happy, and cooperating EWSDs as well :-). With regards, Pavel The code in my branch is actually Domjan Attila's version (the patches attached to the SS7-27 issue) ported to later Asterisk versions with very few additions/modifications, so the muffins are for him, while the bugs are from me :) P.S. apologies for top posting - the connection is unstable and i had to write the post offline and just copy/paste it On 2013-06-26 06:42, Pavel Troller wrote: Hi! So, I'm replying to my own original post, to keep the question and a possible answer together without any excessive or unrelated information. I hope I've found the cause of the problem and I hope I solved it. A modified libss7 is now online and I'm waiting for busy hours to see, whether it will help. The problem is, that in the isup_rel() function, all the important got_sent_msg flags are cleared, so the stack forgets a preceding call state: ... isup_rel(): c-got_sent_msg |= ISUP_SENT_REL; c-got_sent_msg = ~(ISUP_SENT_IAM | ISUP_PENDING_IAM | ISUP_CALL_CONNECTED | ISUP_GOT_IAM | ISUP_GOT_CCR | ISUP_SENT_INR); ... So, an incoming MSU, which was perfectly legitimate before sending REL, is now handled as unexpected. My solution adds the following code to the isup_receive() function for every message, which can confuse the stack by the discovered cause (an example for ACM message): case ISUP_ACM: + if (c-got_sent_msg ISUP_SENT_REL) { + ss7_message(ss7, Got unexpected ACM after sending REL on CIC %d PC %d, ignoring , c-cic, opc); + return 0; + } if (!(c-got_sent_msg ISUP_SENT_IAM)) { ss7_message(ss7, Got ACM but we didn't send IAM on CIC %d PC %d , c-cic, opc); return isup_handle_unexpected(ss7, c, opc); } If my change will prove good, I'm planning to remove the ss7_message() to limit the stack verbosity, as these situations are relatively frequent under heavy load and I think they are moreless logical and normal. I would be glad for some words from the KNK branch maintainer(s), whether to create a JIRA issue and put my patch there or how to proceed now in general. With regards, Pavel Hi! I would like to share my expiernce with deployment of this experimental SS7 branch. The first impressions are good, especially the timers seem to work well, saving many calls from being frozen. However, there are still some strange things, which I would like to discuss here, one by one. The first one is, that the channel sometimes doesn't recognize a message (mostly RLC), even it comes from an action initiated by the channel itself. Typically, the following is appearing
Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1
Per usual, read the fine manual. Wait, there's no manual ! Since you seem to have done your part and actually knows some ss7 and isup, here comes a hint. You created two or more linksets where you must have a single one. libss7 don't have the ss7 routing feature. In libss7 linkset concept is diferent from official ss7 linkset. All signalling links that carry ISUP traffic for a given set of channels must be kept on a single linkset, as well as all ISUP channels that go through those links. It looks like you're getting incoming signalling for ISUP channels that are on another linkset. I'm sure you didn't find any libss7 bug. I have a highly customized version of libss7/dahdi/asterisk, fixing lots of issue, but this isn't one of them. Processed over one million call setups, with a very complex setup (6 linksets, 7 links, 6E1 on a single switch, plus another 6E1 on remote switches using my simple STP solution, sharing the local links over SS7 over UDP - my simpler proprietary alternative to sigtran). If you need commercial support, contact me off list. On 06/24/13 09:02, Pavel Troller wrote: Hi! I would like to share my expiernce with deployment of this experimental SS7 branch. The first impressions are good, especially the timers seem to work well, saving many calls from being frozen. However, there are still some strange things, which I would like to discuss here, one by one. The first one is, that the channel sometimes doesn't recognize a message (mostly RLC), even it comes from an action initiated by the channel itself. Typically, the following is appearing often: [Jun 24 13:33:41] ERROR[3975]: chan_dahdi.c:14406 dahdi_ss7_error: [1] ISUP timer t17 expired on CIC 27 DPC 4097 [1] Got RLC but we didn't send REL/RSC on CIC 27 PC 4097 reseting the cic As I understand, there were some timeouts and now the channel tries to recover by sending RSC and firing T17. However, it seems that it immediately rejects RLC, which comes back as a response to the RSC which was just sent upon expiry of T17. And this appears again and again in the rhythm of T17, and the channel is not operational. ss7 show calls shows the following line for the misbehaving CIC: 27 4097 11 IAM IAM Or, a very similar situation: [2] Got SUS but no call on CIC 48 PC 4096 reseting the CIC [2] Got RLC but we didn't send REL/RSC on CIC 48 PC 4096 reseting the CIC The first question is, why there was no call while SUS was received. My idea is, that both the parties hung up their phones in the same time and that the call was undergoing destruction on Asterisk side (REL just sent or something like this), while SUS arrived. Maybe the call was marked as cleared even before RLC came back ? OK, I can understand this. But if the CIC was reset as the first message says (i.e. RSC was sent), why the RLC going back is not recognized then ? Or, just now the following appeared: [1] Got ACM but we didn't send IAM on CIC 10 PC 4097 reseting the cic [1] Got RLC but we didn't send REL/RSC on CIC 10 PC 4097 reseting the cic Again, it's questionable, why this happened, but the second line seems to indicate some brokeness again. To explain: The channel is operating on a gateway equipped with 16 E1s and current traffic is about 10 CAPS, there are two linksets to two cooperating exchanges. They are EWSDs, which have very mature and stable SS7, so I'm almost sure that they are not making signalling errors. With regards, Pavel -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-ss7 mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-ss7 -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-ss7 mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-ss7
Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1
Hello Marcelo, Per usual, read the fine manual. Wait, there's no manual ! You're right :-). Since you seem to have done your part and actually knows some ss7 and isup, here comes a hint. You created two or more linksets where you must have a single one. libss7 don't have the ss7 routing feature. It seems strange to me. Let's try to explain this in more detailed way. There is 1 (one) Asterisk box. It has 2 (two) linksets configured, with 1 (one) signallink link per linkset. Linkset 1 is configured for one DPC and with CICs 1 - 496. Linkset 2 is configured for another (different) DPC and also with CICs 1 - 496. Both the systems connected to this Asterisk box are configured to respond directly to the linkset between them and the Asterisk, so it's sure that a MSU from DPC1 cannot come over LS2 and vice versa. I hope that this extremely simple setup is in the scope of current libss7 functionality. Or am I wrong ? In libss7 linkset concept is diferent from official ss7 linkset. All signalling links that carry ISUP traffic for a given set of channels must be kept on a single linkset, as well as all ISUP channels that go through those links. I hope that my setup is conformant with this limitation. It looks like you're getting incoming signalling for ISUP channels that are on another linkset. It really looks like this, but I still hope it's not the case. Please note that the traffic on the box is rather high, such an error occurs for one of, say, 1 call attempts. I think that in case of such a fatal routing problem, which you are talking about, it wouldn't be possible to use the system regularly. I'm sure you didn't find any libss7 bug. Really strong words! I wouldn't say it for any of my programs :-). I have a highly customized version of libss7/dahdi/asterisk, fixing lots of issue, but this isn't one of them. Possibly your setup/usage scenario is a bit different ? Processed over one million call setups, with a very complex setup (6 linksets, 7 links, 6E1 on a single switch, plus another 6E1 on remote switches using my simple STP solution, sharing the local links over SS7 over UDP - my simpler proprietary alternative to sigtran). These switches (I have two of them, but the second one is still on a regular unpatched SS7 stack) make approx. 3 millions of call setups per week. My record (without restarting/crashing Asterisk) is about 3 weeks with more than 10 millions of calls. If you need commercial support, contact me off list. Thanks for your offer. With regards, Pavel On 06/24/13 09:02, Pavel Troller wrote: Hi! I would like to share my expiernce with deployment of this experimental SS7 branch. The first impressions are good, especially the timers seem to work well, saving many calls from being frozen. However, there are still some strange things, which I would like to discuss here, one by one. The first one is, that the channel sometimes doesn't recognize a message (mostly RLC), even it comes from an action initiated by the channel itself. Typically, the following is appearing often: [Jun 24 13:33:41] ERROR[3975]: chan_dahdi.c:14406 dahdi_ss7_error: [1] ISUP timer t17 expired on CIC 27 DPC 4097 [1] Got RLC but we didn't send REL/RSC on CIC 27 PC 4097 reseting the cic As I understand, there were some timeouts and now the channel tries to recover by sending RSC and firing T17. However, it seems that it immediately rejects RLC, which comes back as a response to the RSC which was just sent upon expiry of T17. And this appears again and again in the rhythm of T17, and the channel is not operational. ss7 show calls shows the following line for the misbehaving CIC: 27 4097 11 IAM IAM Or, a very similar situation: [2] Got SUS but no call on CIC 48 PC 4096 reseting the CIC [2] Got RLC but we didn't send REL/RSC on CIC 48 PC 4096 reseting the CIC The first question is, why there was no call while SUS was received. My idea is, that both the parties hung up their phones in the same time and that the call was undergoing destruction on Asterisk side (REL just sent or something like this), while SUS arrived. Maybe the call was marked as cleared even before RLC came back ? OK, I can understand this. But if the CIC was reset as the first message says (i.e. RSC was sent), why the RLC going back is not recognized then ? Or, just now the following appeared: [1] Got ACM but we didn't send IAM on CIC 10 PC 4097 reseting the cic [1] Got RLC but we didn't send REL/RSC on CIC 10 PC 4097 reseting the cic Again, it's questionable, why this happened, but the second line seems to indicate some brokeness again. To explain: The channel is operating on a gateway equipped with 16 E1s and current traffic is about 10 CAPS, there are two linksets to two cooperating exchanges. They are EWSDs, which have very mature and stable SS7, so I'm almost sure
Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1
Another possibility is you're mixing the whole thing in a single linkset where you must use two linksets in the way you explained. Can you see those errors with just a few test calls ? I found about 20 bugs / structural design flaws in stock libss7 / dahdi mtp2 support. With my changes the mtp2/mtp3 layers are far more robust than stock libss7. Fixed all but a single one, related to knowing then the linkset is up or down, and not trying to send isup messages, specially IAM through a down linkset - all sigchans down. If there's a bug, use ss7 set debug on linkset X to trace ss7 messages and track isup message flow. I used libss7 succesfully with telcobridges tmedia, digitro switches, ericsson AXE, huawei NGN, Nortel DMS, several STPs, EWSS, Nec NEAX, and I'm probably missing a couple switch types. I never ran into SS7 / ISUP bugs of other switches, always libss7, but, the nature of the bugs found are nothing like what you're reporting. I started testing libss7 with those kinds of switches 5 years ago, so I have a some mileage to make those statements, specially from reading and understanding a large portion of the libss7 / sig_ss7 / chan_dahdi code. The issue you're describing is caused by Asterisk getting ss7 messages that belong to another linkset or sending ss7 messages on the wrong ss7 link. Check for UCIC or CFN ISUP responses. you need to define chan_dahdi.conf basicly like this: ; basic ss7 / isup parameters, usually the same for the whole libss7 setup signalling=ss7 ss7type=itu/ansi ss7_called_nai=subscriber/national/international/unknown ss7_calling_nai=subscriber/national/international/unknown networkindicator=national/international/... ; Your local pointcode pointcode = X ; Start definition for linkset N linkset = N adjpointcode = STP point code otherwise switch point code ; Instantiate a signalling link on channel 16 belonging to linkset N, with adjacency to adjpointcode sigchan = 16 ; Define more signalling links if needed, with adjpointcode and sigchan defaultdpc = pointcode for ISUP messages cicbeginswith= CIC of the next voice channel defined ; Instantiate voice channel on linkset N, talking to PC defaultdpc, CIC numbering incremented automatically channel = dahdi channel range cicbeginswith= next CIC range, if non contiguous channel = dahdi channel range defaultdpc = another point code belonging to the same linkset (if links share signalling to multiple switches, typically links through an STP) ;repeat cicbeginswith, channel ; Starts definition of another linkset linkset = M ; repeat same sequence as above On 06/25/13 05:13, Pavel Troller wrote: Hello Marcelo, Per usual, read the fine manual. Wait, there's no manual ! You're right :-). Since you seem to have done your part and actually knows some ss7 and isup, here comes a hint. You created two or more linksets where you must have a single one. libss7 don't have the ss7 routing feature. It seems strange to me. Let's try to explain this in more detailed way. There is 1 (one) Asterisk box. It has 2 (two) linksets configured, with 1 (one) signallink link per linkset. Linkset 1 is configured for one DPC and with CICs 1 - 496. Linkset 2 is configured for another (different) DPC and also with CICs 1 - 496. Both the systems connected to this Asterisk box are configured to respond directly to the linkset between them and the Asterisk, so it's sure that a MSU from DPC1 cannot come over LS2 and vice versa. I hope that this extremely simple setup is in the scope of current libss7 functionality. Or am I wrong ? In libss7 linkset concept is diferent from official ss7 linkset. All signalling links that carry ISUP traffic for a given set of channels must be kept on a single linkset, as well as all ISUP channels that go through those links. I hope that my setup is conformant with this limitation. It looks like you're getting incoming signalling for ISUP channels that are on another linkset. It really looks like this, but I still hope it's not the case. Please note that the traffic on the box is rather high, such an error occurs for one of, say, 1 call attempts. I think that in case of such a fatal routing problem, which you are talking about, it wouldn't be possible to use the system regularly. I'm sure you didn't find any libss7 bug. Really strong words! I wouldn't say it for any of my programs :-). I have a highly customized version of libss7/dahdi/asterisk, fixing lots of issue, but this isn't one of them. Possibly your setup/usage scenario is a bit different ? Processed over one million call setups, with a very complex setup (6 linksets, 7 links, 6E1 on a single switch, plus another 6E1 on remote switches using my simple STP solution, sharing the local links over SS7 over UDP - my simpler proprietary alternative to sigtran). These switches (I have two of them, but the second one is still on a regular unpatched SS7 stack) make approx. 3 millions of call setups per week. My
Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1
Hello Marcelo, Another possibility is you're mixing the whole thing in a single linkset where you must use two linksets in the way you explained. I hope I'm not doing this. Can you see those errors with just a few test calls ? No. These errors are occuring only in the high traffic periods. I found about 20 bugs / structural design flaws in stock libss7 / dahdi mtp2 support. With my changes the mtp2/mtp3 layers are far more robust than stock libss7. Fixed all but a single one, related to knowing then the linkset is up or down, and not trying to send isup messages, specially IAM through a down linkset - all sigchans down. I believe that you fixed most of the bugs in the stock libss7, but now I'm trying to use the patched one, which contains isup timers (it was a pain to live with them) and improved diagnostic, more dumping commands etc. This is the reason I'm trying to use it. If there's a bug, use ss7 set debug on linkset X to trace ss7 messages and track isup message flow. The problem is, that the format of the dump is good for manual viewing, but not for machine processing (greping for patterns etc.). And running a minute of ss7 debug on a single linkse creates a really huge file (10+ MB), which is very hard to view personally. I used libss7 succesfully with telcobridges tmedia, digitro switches, ericsson AXE, huawei NGN, Nortel DMS, several STPs, EWSS, Nec NEAX, and I'm probably missing a couple switch types. Generally, your experience is really large. I never ran into SS7 / ISUP bugs of other switches, always libss7, but, the nature of the bugs found are nothing like what you're reporting. I started testing libss7 with those kinds of switches 5 years ago, so I have a some mileage to make those statements, specially from reading and understanding a large portion of the libss7 / sig_ss7 / chan_dahdi code. The issue you're describing is caused by Asterisk getting ss7 messages that belong to another linkset or sending ss7 messages on the wrong ss7 link. Check for UCIC or CFN ISUP responses. There will be no UCIC messages, because both the linksets have identical CICs, so even if the case the messages are mixed between linksets, the CIC will be always there. Sometimes I can see CFN, but always easily understandable in the regular call context (invalid parameters being sent etc.). you need to define chan_dahdi.conf basicly like this: There is my config: signalling=ss7 ss7type=itu ss7_called_nai=dynamic ss7_calling_nai=dynamic ss7_internationalprefix=00 ss7_nationalprefix= ss7_subscriberprefix= ss7_unknownprefix= ss7_explictacm=yes ; == ALI 01 === ; All settings apply to linkset 1 linkset=1 slc=0 pointcode=8 adjpointcode=4097 defaultdpc=4097 networkindicator=national ; First signalling channel sigchan=1 mtp3_timer.t21=1 isup_timer.t1 = 15000 ; Wait for RLC isup_timer.t2 = 18 ; User SUS received ;isup_timer.t3 = 12 ; Overload ;isup_timer.t4 = 30 ; MTP Inaccessible Remote User Timer isup_timer.t5 = 30 ; Wait for RLC after initial REL isup_timer.t6 = 3 ; Network SUS received isup_timer.t7 = 3 ; Last Address Message, waiting for ACM/CON ;isup_timer.t11 = 15000 ; Automatic ACM timer isup_timer.t12 = 15000 ; BLO - BLA timer isup_timer.t13 = 30 ; Initial BLO - BLA timer isup_timer.t14 = 15000 ; UBL - UBA timer isup_timer.t15 = 30 ; Initial UBL - UBA timer isup_timer.t16 = 15000 ; RSC timer due to T5 expiry isup_timer.t17 = 30 ; Initial RSC -''- isup_timer.t18 = 15000 ; CGB - CGBA timer isup_timer.t19 = 30 ; Initial CGB - CGBA timer isup_timer.t20 = 15000 ; CGU - CGUA timer isup_timer.t21 = 30 ; Initial CGU - CGUA timer isup_timer.t22 = 15000 ; CGR - CGRA timer isup_timer.t23 = 3 ; Initial CGR - CGRA timer isup_timer.t27 = 24 ; COT failure isup_timer.t33 = 15000 ; INR - INF timer isup_timer.t35 = 15000 ; Overlap dialling timer group=1 context=from_ss7 faxdetect=no ; Begin CIC (Circuit indication codes) count with this number cicbeginswith=2 ; Channels to associate with CICs on this linkset channel=2-31 cicbeginswith=33 channel=32-62 ... for all other spans ; == ALI 02 === linkset=2 pointcode=8 adjpointcode=4096 defaultdpc=4096 networkindicator=national ; First signalling channel sigchan=125 mtp3_timer.t21=1 ... the rest the same as for Linkset 1 ... of course channel numbers differ in the channel= definitions So, it differs in the following from your suggestion below: - Own pointcode is stated in the linkset sections, but it's the same in all the linksets. - There are both adjpointcode and defaultdpc specified in every linkset definition, both being the same to be sure. Thank you again for your help with my problems! With regards, Pavel ; basic ss7 / isup parameters, usually the same for the whole libss7 setup signalling=ss7 ss7type=itu/ansi ss7_called_nai=subscriber/national/international/unknown
Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1
Hello Marcelo, so I did some tracing. It was really hard to isolate MSUs for one particular connection, I had to collect them from about 5 MB file, but ok, it's done, and it's in total harmony with my original ideas. So, let's look at it with me: Initial conditions: There is a call running on LS1, DPC4097, CIC 12. Our Asterisk decided to clear this call down: [1] ISUP timer t1 (15000ms) started on CIC 12 DPC 4097 [1] ISUP timer t5 (30ms) started on CIC 12 DPC 4097 [1] Len = 16 [ bc c3 0d 85 01 10 02 c0 0c 00 0c 02 00 02 81 90 ] [1] FSN: 67 FIB 1 [1] BSN: 60 BIB 1 [1] [4097:0] MSU [1] [ bc c3 0d ] [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) [1] [ 85 ] [1] OPC 8 DPC 4097 SLS 12 [1] [ 01 10 02 c0 ] [1] CIC: 12 [1] [ 0c 00 ] [1] Message Type: REL(0x0c) [1] [ 0c ] [1] --VARIABLE LENGTH PARMS[1]-- [1] Cause Indicator: [1] Coding Standard: 0 [1] Location: 1 [1] Cause Class: 1 [1] Cause Subclass: 0 [1] Cause: Normal call clearing (16) [1] [ 02 81 90 ] [1] But, the remote party also decided to hang up, and our REL just crossed their SUS going back (please look at BSN and compare with our FSN, they don't know about our REL yet). [1] Len = 13 [ c0 bd 0a 85 08 40 00 c4 0c 00 0d 01 00 ] [1] FSN: 61 FIB 1 [1] BSN: 64 BIB 1 [1] [4097:0] MSU [1] [ c0 bd 0a ] [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) [1] [ 85 ] [1] OPC 4097 DPC 8 SLS 12 [1] [ 08 40 00 c4 ] [1] CIC: 12 [1] [ 0c 00 ] [1] Message Type: SUS(0x0d) [1] [ 0d ] [1] --FIXED LENGTH PARMS[1]-- [1] Suspend/Resume Indicators: [1] SUS/RES indicator: Network initiated (1)˙[1] [ 01 ] [1] And what happens now is a clear BUG in libss7: As RLC has not been received yet, the call must still be considered as active! But we already forgot it and now we are surprised that we got some MSU about it. [1] Got SUS but no call on CIC 12 PC 4097 ˙[1] reseting the cic The situation is getting complicated, we are sending RSC. [1] ISUP timer t1 stopped on CIC 12 DPC: 4097 [1] ISUP timer t5 stopped on CIC 12 DPC: 4097 [1] ISUP timer t17 (30ms) started on CIC 12 DPC 4097 [1] Len = 11 [ bd c4 08 85 01 10 02 c0 0c 00 12 ] [1] FSN: 68 FIB 1 [1] BSN: 61 BIB 1 [1] [4097:0] MSU [1] [ bd c4 08 ] [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) [1] [ 85 ] [1] OPC 8 DPC 4097 SLS 12 [1] [ 01 10 02 c0 ] [1] CIC: 12 [1] [ 0c 00 ] [1] Message Type: RSC(0x12) [1] [ 12 ] And we get a RLC. IMHO it is a RLC confirming our REL, not RSC (according to BSN, the peer already received all our MSUs, but they probably already had the RLC queued, so they sent it) [1] [1] Len = 12 [ c4 be 09 85 08 40 00 c4 0c 00 10 00 ] [1] FSN: 62 FIB 1 [1] BSN: 68 BIB 1 [1] [4097:0] MSU [1] [ c4 be 09 ] [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) [1] [ 85 ] [1] OPC 4097 DPC 8 SLS 12 [1] [ 08 40 00 c4 ] [1] CIC: 12 [1] [ 0c 00 ] [1] Message Type: RLC(0x10) [1] [ 10 ] [1] [1] ISUP timer t17 stopped on CIC 12 DPC: 4097 Linkset 1: Processing event: ISUP_EVENT_RLC And now, we get a second RLC, probably to our RSC. There is a jump in FSN because there was a MSU sent from them, which was not related to our call. [1] Len = 12 [ c4 c0 09 85 08 40 00 c4 0c 00 10 00 ] [1] FSN: 64 FIB 1 [1] BSN: 68 BIB 1 [1] [4097:0] MSU [1] [ c4 c0 09 ] [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) [1] [ 85 ] [1] OPC 4097 DPC 8 SLS 12 [1] [ 08 40 00 c4 ] [1] CIC: 12 [1] [ 0c 00 ] [1] Message Type: RLC(0x10) [1] [ 10 ] [1] And this RLC seems unsolicited to us, because we were taking the first RLC as a response to our RSC, which was not the case. [1] Got RLC but we didn't send REL/RSC on CIC 12 PC 4097 So, no MSUs received from another linksets, all is perfectly fitting together... This trace is a clear demonstration of an existing bug in libss7, which may be formulated as follows: When we are terminating the call and sending REL to the remote party, we must keep the record of the connection and silently accept and absorb all MSUs, which may come back, until we receive a RLC or T5 expires. What do you think about it ? With regards, Pavel Another possibility is you're mixing the whole thing in a single linkset where you must use two linksets in the way you explained. Can you see those errors with just a few test calls ? I found about 20 bugs / structural design flaws in stock libss7 / dahdi mtp2 support. With my changes the mtp2/mtp3 layers are far more robust than stock libss7. Fixed all but a single one, related
Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1
What code are you using ? Is this not stock libss7 ? Stock libss7 can't decode ISUP SUS/RES like that. In my code, I explicitly ignore ALL SUS / RES, they have no needed processing associated with Brazilian ISUP. Asterisk and kernel dahdi version ? If you enable dahdi_pcap: # dahdi_pcap -c 16 -f /tmp/mycap.ss7 Capturing protocol mtp2 on channels 16 to file /tmp/mycap.ss7 Packets captured: 7 Then you can analyze the capture in wireshark / ethereal. But it has one bug, if you shutdown the owner of the link while dahdi_pcap is running, the system will reset on its own. As long as you don't leave dahdi_pcap running around, its not a problem. On 06/25/13 09:38, Pavel Troller wrote: Hello Marcelo, so I did some tracing. It was really hard to isolate MSUs for one particular connection, I had to collect them from about 5 MB file, but ok, it's done, and it's in total harmony with my original ideas. So, let's look at it with me: Initial conditions: There is a call running on LS1, DPC4097, CIC 12. Our Asterisk decided to clear this call down: [1] ISUP timer t1 (15000ms) started on CIC 12 DPC 4097 [1] ISUP timer t5 (30ms) started on CIC 12 DPC 4097 [1] Len = 16 [ bc c3 0d 85 01 10 02 c0 0c 00 0c 02 00 02 81 90 ] [1] FSN: 67 FIB 1 [1] BSN: 60 BIB 1 [1] [4097:0] MSU [1] [ bc c3 0d ] [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) [1] [ 85 ] [1] OPC 8 DPC 4097 SLS 12 [1] [ 01 10 02 c0 ] [1] CIC: 12 [1] [ 0c 00 ] [1] Message Type: REL(0x0c) [1] [ 0c ] [1] --VARIABLE LENGTH PARMS[1]-- [1] Cause Indicator: [1] Coding Standard: 0 [1] Location: 1 [1] Cause Class: 1 [1] Cause Subclass: 0 [1] Cause: Normal call clearing (16) [1] [ 02 81 90 ] [1] But, the remote party also decided to hang up, and our REL just crossed their SUS going back (please look at BSN and compare with our FSN, they don't know about our REL yet). [1] Len = 13 [ c0 bd 0a 85 08 40 00 c4 0c 00 0d 01 00 ] [1] FSN: 61 FIB 1 [1] BSN: 64 BIB 1 [1] [4097:0] MSU [1] [ c0 bd 0a ] [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) [1] [ 85 ] [1] OPC 4097 DPC 8 SLS 12 [1] [ 08 40 00 c4 ] [1] CIC: 12 [1] [ 0c 00 ] [1] Message Type: SUS(0x0d) [1] [ 0d ] [1] --FIXED LENGTH PARMS[1]-- [1] Suspend/Resume Indicators: [1] SUS/RES indicator: Network initiated (1)˙[1] [ 01 ] [1] And what happens now is a clear BUG in libss7: As RLC has not been received yet, the call must still be considered as active! But we already forgot it and now we are surprised that we got some MSU about it. [1] Got SUS but no call on CIC 12 PC 4097 ˙[1] reseting the cic The situation is getting complicated, we are sending RSC. [1] ISUP timer t1 stopped on CIC 12 DPC: 4097 [1] ISUP timer t5 stopped on CIC 12 DPC: 4097 [1] ISUP timer t17 (30ms) started on CIC 12 DPC 4097 [1] Len = 11 [ bd c4 08 85 01 10 02 c0 0c 00 12 ] [1] FSN: 68 FIB 1 [1] BSN: 61 BIB 1 [1] [4097:0] MSU [1] [ bd c4 08 ] [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) [1] [ 85 ] [1] OPC 8 DPC 4097 SLS 12 [1] [ 01 10 02 c0 ] [1] CIC: 12 [1] [ 0c 00 ] [1] Message Type: RSC(0x12) [1] [ 12 ] And we get a RLC. IMHO it is a RLC confirming our REL, not RSC (according to BSN, the peer already received all our MSUs, but they probably already had the RLC queued, so they sent it) [1] [1] Len = 12 [ c4 be 09 85 08 40 00 c4 0c 00 10 00 ] [1] FSN: 62 FIB 1 [1] BSN: 68 BIB 1 [1] [4097:0] MSU [1] [ c4 be 09 ] [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) [1] [ 85 ] [1] OPC 4097 DPC 8 SLS 12 [1] [ 08 40 00 c4 ] [1] CIC: 12 [1] [ 0c 00 ] [1] Message Type: RLC(0x10) [1] [ 10 ] [1] [1] ISUP timer t17 stopped on CIC 12 DPC: 4097 Linkset 1: Processing event: ISUP_EVENT_RLC And now, we get a second RLC, probably to our RSC. There is a jump in FSN because there was a MSU sent from them, which was not related to our call. [1] Len = 12 [ c4 c0 09 85 08 40 00 c4 0c 00 10 00 ] [1] FSN: 64 FIB 1 [1] BSN: 68 BIB 1 [1] [4097:0] MSU [1] [ c4 c0 09 ] [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) [1] [ 85 ] [1] OPC 4097 DPC 8 SLS 12 [1] [ 08 40 00 c4 ] [1] CIC: 12 [1] [ 0c 00 ] [1] Message Type: RLC(0x10) [1] [ 10 ] [1] And this RLC seems unsolicited to us, because we were taking the first RLC as a response to our RSC, which was not the case. [1] Got RLC but we didn't send REL/RSC on CIC 12 PC 4097 So, no MSUs received from another linksets, all is perfectly fitting together... This trace is a clear demonstration of an existing bug in
Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1
On 2013-06-25 14:56, Marcelo Pacheco wrote: What code are you using ? Is this not stock libss7 ? Stock libss7 can't decode ISUP SUS/RES like that. In my code, I explicitly ignore ALL SUS / RES, they have no needed processing associated with Brazilian ISUP. Asterisk and kernel dahdi version ? If you enable dahdi_pcap: # dahdi_pcap -c 16 -f /tmp/mycap.ss7 Capturing protocol mtp2 on channels 16 to file /tmp/mycap.ss7 Packets captured: 7 Then you can analyze the capture in wireshark / ethereal. But it has one bug, if you shutdown the owner of the link while dahdi_pcap is running, the system will reset on its own. As long as you don't leave dahdi_pcap running around, its not a problem. Which hardware for your E1's are you guys using ? I have seen this nasty behaviour on Sangoma cards only. Where to report problems for this KNK SS7-27 branch ? Jira or this list ? Thanks Hans -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-ss7 mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-ss7
Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1
Hello Marcelo, What code are you using ? Is this not stock libss7 ? Stock libss7 can't decode ISUP SUS/RES like that. No, it's not stock libss7. It's written in the subject, as well as in my first sentence in the first post. It's a special branch, available for both Asterisk and libsss7 (version 2), which must be applied together. And when I started playing with it, I was told that I can post my experiences/problems to this ML, which is what I just did. But you are still the only one who responded to me. In my code, I explicitly ignore ALL SUS / RES, they have no needed processing associated with Brazilian ISUP. We have SUS/RES in Czech Republic in the national ISUP spec, so we must handle it properly. However, it's not just problem with SUS, this problem may appear at any time, when A-side clears down, while a MSU from B-side (any obvious MSU like ACM, ANM, CON, CPG...) is already underway. Asterisk and kernel dahdi version ? Asterisk 11 branch, dahdi kernel the last state available from SVN (they now moved to git and I still didn't adapt my working copy, as I also have many private patches in it and it will be a pain to incorporate them to my local git repo). If you enable dahdi_pcap: # dahdi_pcap -c 16 -f /tmp/mycap.ss7 Capturing protocol mtp2 on channels 16 to file /tmp/mycap.ss7 Packets captured: 7 Then you can analyze the capture in wireshark / ethereal. But it has one bug, if you shutdown the owner of the link while dahdi_pcap is running, the system will reset on its own. As long as you don't leave dahdi_pcap running around, its not a problem. A good hint, I really didn't know about it! Thanks, I will use it (with care to prevent system crash :-) ). With regards, Pavel On 06/25/13 09:38, Pavel Troller wrote: Hello Marcelo, so I did some tracing. It was really hard to isolate MSUs for one particular connection, I had to collect them from about 5 MB file, but ok, it's done, and it's in total harmony with my original ideas. So, let's look at it with me: Initial conditions: There is a call running on LS1, DPC4097, CIC 12. Our Asterisk decided to clear this call down: [1] ISUP timer t1 (15000ms) started on CIC 12 DPC 4097 [1] ISUP timer t5 (30ms) started on CIC 12 DPC 4097 [1] Len = 16 [ bc c3 0d 85 01 10 02 c0 0c 00 0c 02 00 02 81 90 ] [1] FSN: 67 FIB 1 [1] BSN: 60 BIB 1 [1] [4097:0] MSU [1] [ bc c3 0d ] [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) [1] [ 85 ] [1] OPC 8 DPC 4097 SLS 12 [1] [ 01 10 02 c0 ] [1] CIC: 12 [1] [ 0c 00 ] [1] Message Type: REL(0x0c) [1] [ 0c ] [1] --VARIABLE LENGTH PARMS[1]-- [1] Cause Indicator: [1] Coding Standard: 0 [1] Location: 1 [1] Cause Class: 1 [1] Cause Subclass: 0 [1] Cause: Normal call clearing (16) [1] [ 02 81 90 ] [1] But, the remote party also decided to hang up, and our REL just crossed their SUS going back (please look at BSN and compare with our FSN, they don't know about our REL yet). [1] Len = 13 [ c0 bd 0a 85 08 40 00 c4 0c 00 0d 01 00 ] [1] FSN: 61 FIB 1 [1] BSN: 64 BIB 1 [1] [4097:0] MSU [1] [ c0 bd 0a ] [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) [1] [ 85 ] [1] OPC 4097 DPC 8 SLS 12 [1] [ 08 40 00 c4 ] [1] CIC: 12 [1] [ 0c 00 ] [1] Message Type: SUS(0x0d) [1] [ 0d ] [1] --FIXED LENGTH PARMS[1]-- [1] Suspend/Resume Indicators: [1] SUS/RES indicator: Network initiated (1)˙[1] [ 01 ] [1] And what happens now is a clear BUG in libss7: As RLC has not been received yet, the call must still be considered as active! But we already forgot it and now we are surprised that we got some MSU about it. [1] Got SUS but no call on CIC 12 PC 4097 ˙[1] reseting the cic The situation is getting complicated, we are sending RSC. [1] ISUP timer t1 stopped on CIC 12 DPC: 4097 [1] ISUP timer t5 stopped on CIC 12 DPC: 4097 [1] ISUP timer t17 (30ms) started on CIC 12 DPC 4097 [1] Len = 11 [ bd c4 08 85 01 10 02 c0 0c 00 12 ] [1] FSN: 68 FIB 1 [1] BSN: 61 BIB 1 [1] [4097:0] MSU [1] [ bd c4 08 ] [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5) [1] [ 85 ] [1] OPC 8 DPC 4097 SLS 12 [1] [ 01 10 02 c0 ] [1] CIC: 12 [1] [ 0c 00 ] [1] Message Type: RSC(0x12) [1] [ 12 ] And we get a RLC. IMHO it is a RLC confirming our REL, not RSC (according to BSN, the peer already received all our
Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1
Hi Hans! On 2013-06-25 14:56, Marcelo Pacheco wrote: What code are you using ? Is this not stock libss7 ? Stock libss7 can't decode ISUP SUS/RES like that. In my code, I explicitly ignore ALL SUS / RES, they have no needed processing associated with Brazilian ISUP. Asterisk and kernel dahdi version ? If you enable dahdi_pcap: # dahdi_pcap -c 16 -f /tmp/mycap.ss7 Capturing protocol mtp2 on channels 16 to file /tmp/mycap.ss7 Packets captured: 7 Then you can analyze the capture in wireshark / ethereal. But it has one bug, if you shutdown the owner of the link while dahdi_pcap is running, the system will reset on its own. As long as you don't leave dahdi_pcap running around, its not a problem. Which hardware for your E1's are you guys using ? I have seen this nasty behaviour on Sangoma cards only. On the older server, we are using Sangoma cards with wanpipe drivers. But we are in the process of abandoning them and the new server is using Digium TE820P cards (our target is 4 per server, i.e. 32 E1s, but now we run on two). Where to report problems for this KNK SS7-27 branch ? Jira or this list ? It's exactly the question. I was told (from somebody who seemed familiar with this branch and who sent me a direct link to the patchset), that I have to post my experiences here in this list. Thanks Hans You're welcome! With regards, Pavel -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-ss7 mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-ss7
Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1
Hello Gentlemen, I find this post very interesting. I have set up many traditional T1s with my father here in Canada (I am 17), and looking to experiment with SS7 signaling. I have been reading up on it quite a bit and have some questions. I see you guys are using T1 cards for the interconnect. Is this after the link has been muxed? Or is the signaling coming in on T1/E1s? Sorry if this is a stupid question. More specifically is the SS7 interconnect with the CO done using grouped PRI trunks mapped in a TE1/3 transport layer. Or A-Links, STM-1. I really want to experiment with an interconnect using the SS7 signaling to help bring to light what I have been studying. What is the minimum I would need to ask a service provider for in terms of service (i.e., T1 with SS7 signaling?), and hardware. Last question I promise!!! It seems that this setup Asterisk+lib_ss7+digium cards act as a media gateway? This is really cool once stable! Kind Regards, Nick. -- _ -- Bandwidth and Colocation Provided by http://www.api-digital.com -- asterisk-ss7 mailing list To UNSUBSCRIBE or update options visit: http://lists.digium.com/mailman/listinfo/asterisk-ss7