Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1

2013-06-26 Thread Kaloyan Kovachev

Hi all,
sorry for joining so late, but i am on holidays (by the end of the week) 
and rarely checking my mailbox. Thanks to bad weather i did that today 
:)


To the OP:
while reading the first posts i thought it is an old problem with 
REL/RSC loop (persistent on start with ANSI signaling) which was fixed 
in libss7 instead of sig_ss7, but not sure if it is a similar yet 
different one or it is the same issue. It really is a (remaining) 
problem if we receive RLC on previous REL, but after we have sent RSC. I 
was thinking to clear the old status bits after we receive RLC, but this 
will not fix the double RLC received problem and we can't ignore the 
first one (or just clear the SENT_REL flag), because we may never get a 
second one, so it should probably be better to ignore sending second RSC 
inside isup_handle_unexpected() if the previous one was sent T17 (timer 
seconds) ago. Because the timer is stopped on RLC it should be another 
timer or some flag to ignore it's expiration and not reset again ... 
will work on this next week when i am back.


The code in my branch is actually Domjan Attila's version (the patches 
attached to the SS7-27 issue) ported to later Asterisk versions with 
very few additions/modifications, so the muffins are for him, while the 
bugs are from me :)


P.S.
apologies for top posting - the connection is unstable and i had to 
write the post offline and just copy/paste it


On 2013-06-26 06:42, Pavel Troller wrote:

Hi!
So, I'm replying to my own original post, to keep the question and a
possible answer together without any excessive or unrelated 
information.

I hope I've found the cause of the problem and I hope I solved it. A
modified libss7 is now online and I'm waiting for busy hours to see, 
whether

it will help.
The problem is, that in the isup_rel() function, all the important
got_sent_msg flags are cleared, so the stack forgets a preceding call
state:
... isup_rel():
c-got_sent_msg |= ISUP_SENT_REL;
c-got_sent_msg = ~(ISUP_SENT_IAM | ISUP_PENDING_IAM |
ISUP_CALL_CONNECTED | ISUP_GOT_IAM | ISUP_GOT_CCR | ISUP_SENT_INR);
...
So, an incoming MSU, which was perfectly legitimate before sending REL,
is now handled as unexpected.
My solution adds the following code to the isup_receive() function for
every message, which can confuse the stack by the discovered cause
(an example for ACM message):
case ISUP_ACM:
+   if (c-got_sent_msg  ISUP_SENT_REL) {
+   ss7_message(ss7, Got unexpected ACM
after sending REL on CIC %d PC %d, ignoring , c-cic, opc);
+   return 0;
+   }

if (!(c-got_sent_msg  ISUP_SENT_IAM)) {
ss7_message(ss7, Got ACM but we didn't send IAM on CIC %d PC %d ,
c-cic, opc);
return isup_handle_unexpected(ss7, c, opc);
}

If my change will prove good, I'm planning to remove the ss7_message() 
to
limit the stack verbosity, as these situations are relatively frequent 
under

heavy load and I think they are moreless logical and normal.

I would be glad for some words from the KNK branch maintainer(s), 
whether to
create a JIRA issue and put my patch there or how to proceed now in 
general.


With regards,
Pavel



Hi!
I would like to share my expiernce with deployment of this experimental 
SS7

branch.
The first impressions are good, especially the timers seem to work 
well,

saving many calls from being frozen.
However, there are still some strange things, which I would like to 
discuss

here, one by one.
The first one is, that the channel sometimes doesn't recognize a 
message
(mostly RLC), even it comes from an action initiated by the channel 
itself.

Typically, the following is appearing often:

[Jun 24 13:33:41] ERROR[3975]: chan_dahdi.c:14406 dahdi_ss7_error: [1] 
ISUP timer t17 expired on CIC 27 DPC 4097
[1] Got RLC but we didn't send REL/RSC on CIC 27 PC 4097 reseting the 
cic


As I understand, there were some timeouts and now the channel tries to
recover by sending RSC and firing T17. However, it seems that it 
immediately
rejects RLC, which comes back as a response to the RSC which was just 
sent
upon expiry of T17. And this appears again and again in the rhythm of 
T17,

and the channel is not operational.
ss7 show calls shows the following line for the misbehaving CIC:
27  4097  11  IAM   IAM

Or, a very similar situation:
[2] Got SUS but no call on CIC 48 PC 4096 reseting the CIC
[2] Got RLC but we didn't send REL/RSC on CIC 48 PC 4096 reseting the 
CIC


The first question is, why there was no call while SUS was received. My
idea is, that both the parties hung up their phones in the same time 
and
that the call was undergoing destruction on Asterisk side (REL just 
sent
or something like this), while SUS arrived. Maybe the call was marked 
as

cleared even before RLC came back ? OK, I can understand this. But
if the CIC was reset as the first message says (i.e. RSC was sent), why 
the

RLC going back is not recognized then ?

Or, just 

Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1

2013-06-26 Thread Kaloyan Kovachev
Almost forgot. Please do not post patches (if any) in this list, but 
attach them to the SS7-27 issue instead with proper license agreement, 
so it can be included in Asterisk codebase


On 2013-06-26 14:57, Kaloyan Kovachev wrote:

Hi all,
sorry for joining so late, but i am on holidays (by the end of the
week) and rarely checking my mailbox. Thanks to bad weather i did that
today :)

To the OP:
while reading the first posts i thought it is an old problem with
REL/RSC loop (persistent on start with ANSI signaling) which was fixed
in libss7 instead of sig_ss7, but not sure if it is a similar yet
different one or it is the same issue. It really is a (remaining)
problem if we receive RLC on previous REL, but after we have sent RSC.
I was thinking to clear the old status bits after we receive RLC, but
this will not fix the double RLC received problem and we can't ignore
the first one (or just clear the SENT_REL flag), because we may never
get a second one, so it should probably be better to ignore sending
second RSC inside isup_handle_unexpected() if the previous one was
sent T17 (timer seconds) ago. Because the timer is stopped on RLC it
should be another timer or some flag to ignore it's expiration and not
reset again ... will work on this next week when i am back.

The code in my branch is actually Domjan Attila's version (the patches
attached to the SS7-27 issue) ported to later Asterisk versions with
very few additions/modifications, so the muffins are for him, while
the bugs are from me :)

P.S.
apologies for top posting - the connection is unstable and i had to
write the post offline and just copy/paste it

On 2013-06-26 06:42, Pavel Troller wrote:
Hi!
So, I'm replying to my own original post, to keep the question and a
possible answer together without any excessive or unrelated 
information.

I hope I've found the cause of the problem and I hope I solved it. A
modified libss7 is now online and I'm waiting for busy hours to see, 
whether

it will help.
The problem is, that in the isup_rel() function, all the important
got_sent_msg flags are cleared, so the stack forgets a preceding call
state:
... isup_rel():
c-got_sent_msg |= ISUP_SENT_REL;
c-got_sent_msg = ~(ISUP_SENT_IAM | ISUP_PENDING_IAM |
ISUP_CALL_CONNECTED | ISUP_GOT_IAM | ISUP_GOT_CCR | ISUP_SENT_INR);
...
So, an incoming MSU, which was perfectly legitimate before sending REL,
is now handled as unexpected.
My solution adds the following code to the isup_receive() function for
every message, which can confuse the stack by the discovered cause
(an example for ACM message):
case ISUP_ACM:
+   if (c-got_sent_msg  ISUP_SENT_REL) {
+   ss7_message(ss7, Got unexpected ACM
after sending REL on CIC %d PC %d, ignoring , c-cic, opc);
+   return 0;
+   }

if (!(c-got_sent_msg  ISUP_SENT_IAM)) {
ss7_message(ss7, Got ACM but we didn't send IAM on CIC %d PC %d ,
c-cic, opc);
return isup_handle_unexpected(ss7, c, opc);
}

If my change will prove good, I'm planning to remove the ss7_message() 
to
limit the stack verbosity, as these situations are relatively frequent 
under

heavy load and I think they are moreless logical and normal.

I would be glad for some words from the KNK branch maintainer(s), 
whether to
create a JIRA issue and put my patch there or how to proceed now in 
general.


With regards,
Pavel



Hi!
I would like to share my expiernce with deployment of this experimental 
SS7

branch.
The first impressions are good, especially the timers seem to work 
well,

saving many calls from being frozen.
However, there are still some strange things, which I would like to 
discuss

here, one by one.
The first one is, that the channel sometimes doesn't recognize a 
message
(mostly RLC), even it comes from an action initiated by the channel 
itself.

Typically, the following is appearing often:

[Jun 24 13:33:41] ERROR[3975]: chan_dahdi.c:14406 dahdi_ss7_error: [1] 
ISUP timer t17 expired on CIC 27 DPC 4097
[1] Got RLC but we didn't send REL/RSC on CIC 27 PC 4097 reseting the 
cic


As I understand, there were some timeouts and now the channel tries to
recover by sending RSC and firing T17. However, it seems that it 
immediately
rejects RLC, which comes back as a response to the RSC which was just 
sent
upon expiry of T17. And this appears again and again in the rhythm of 
T17,

and the channel is not operational.
ss7 show calls shows the following line for the misbehaving CIC:
27  4097  11  IAM   IAM

Or, a very similar situation:
[2] Got SUS but no call on CIC 48 PC 4096 reseting the CIC
[2] Got RLC but we didn't send REL/RSC on CIC 48 PC 4096 reseting the 
CIC


The first question is, why there was no call while SUS was received. My
idea is, that both the parties hung up their phones in the same time 
and
that the call was undergoing destruction on Asterisk side (REL just 
sent
or something like this), while SUS arrived. Maybe the call was 

Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1

2013-06-26 Thread Marcelo Pacheco
Thanks Kaloyan.
Before this thread, there were no mentions at all to a KNK tree, so I
though this was stock libss7.
I'm using my own patched libss7.
I processed over one million call setups with ten servers, with many
difficult setups (third party STPs, STPs 1000 miles away with
transmission lines that do fail from time to time, connections with
almost one dozen types of ISUP switches, sharing two links with an STP
to half a dozen switches), and the issues reported don't happen at all.
So this look like a bug in your patch or in Attila's code.
My patch is for paying customers only (they get the source, and could
release if they want to, but chose not to).
I have done very small changes to the ISUP side of things, but some
fairly major changes to MTP2, MTP3 and DAHDI mtp2 mode.
I even implemented very basic STP functionality, and MTP2 over UDP
signalling (between asterisks only).
It might be worth trying to look at the diffs from stock to your branch.

On 06/26/13 09:05, Kaloyan Kovachev wrote:
 Almost forgot. Please do not post patches (if any) in this list, but
 attach them to the SS7-27 issue instead with proper license agreement,
 so it can be included in Asterisk codebase

 On 2013-06-26 14:57, Kaloyan Kovachev wrote:
 Hi all,
 sorry for joining so late, but i am on holidays (by the end of the
 week) and rarely checking my mailbox. Thanks to bad weather i did that
 today :)

 To the OP:
 while reading the first posts i thought it is an old problem with
 REL/RSC loop (persistent on start with ANSI signaling) which was fixed
 in libss7 instead of sig_ss7, but not sure if it is a similar yet
 different one or it is the same issue. It really is a (remaining)
 problem if we receive RLC on previous REL, but after we have sent RSC.
 I was thinking to clear the old status bits after we receive RLC, but
 this will not fix the double RLC received problem and we can't ignore
 the first one (or just clear the SENT_REL flag), because we may never
 get a second one, so it should probably be better to ignore sending
 second RSC inside isup_handle_unexpected() if the previous one was
 sent T17 (timer seconds) ago. Because the timer is stopped on RLC it
 should be another timer or some flag to ignore it's expiration and not
 reset again ... will work on this next week when i am back.

 The code in my branch is actually Domjan Attila's version (the patches
 attached to the SS7-27 issue) ported to later Asterisk versions with
 very few additions/modifications, so the muffins are for him, while
 the bugs are from me :)

 P.S.
 apologies for top posting - the connection is unstable and i had to
 write the post offline and just copy/paste it

 On 2013-06-26 06:42, Pavel Troller wrote:
 Hi!
 So, I'm replying to my own original post, to keep the question and a
 possible answer together without any excessive or unrelated information.
 I hope I've found the cause of the problem and I hope I solved it. A
 modified libss7 is now online and I'm waiting for busy hours to see,
 whether
 it will help.
 The problem is, that in the isup_rel() function, all the important
 got_sent_msg flags are cleared, so the stack forgets a preceding call
 state:
 ... isup_rel():
 c-got_sent_msg |= ISUP_SENT_REL;
 c-got_sent_msg = ~(ISUP_SENT_IAM | ISUP_PENDING_IAM |
 ISUP_CALL_CONNECTED | ISUP_GOT_IAM | ISUP_GOT_CCR | ISUP_SENT_INR);
 ...
 So, an incoming MSU, which was perfectly legitimate before sending REL,
 is now handled as unexpected.
 My solution adds the following code to the isup_receive() function for
 every message, which can confuse the stack by the discovered cause
 (an example for ACM message):
 case ISUP_ACM:
 +   if (c-got_sent_msg  ISUP_SENT_REL) {
 +   ss7_message(ss7, Got unexpected ACM
 after sending REL on CIC %d PC %d, ignoring , c-cic, opc);
 +   return 0;
 +   }

 if (!(c-got_sent_msg  ISUP_SENT_IAM)) {
 ss7_message(ss7, Got ACM but we didn't send IAM on CIC %d PC %d ,
 c-cic, opc);
 return isup_handle_unexpected(ss7, c, opc);
 }

 If my change will prove good, I'm planning to remove the
 ss7_message() to
 limit the stack verbosity, as these situations are relatively
 frequent under
 heavy load and I think they are moreless logical and normal.

 I would be glad for some words from the KNK branch maintainer(s),
 whether to
 create a JIRA issue and put my patch there or how to proceed now in
 general.

 With regards,
 Pavel



 Hi!
 I would like to share my expiernce with deployment of this
 experimental SS7
 branch.
 The first impressions are good, especially the timers seem to work well,
 saving many calls from being frozen.
 However, there are still some strange things, which I would like to
 discuss
 here, one by one.
 The first one is, that the channel sometimes doesn't recognize a message
 (mostly RLC), even it comes from an action initiated by the channel
 itself.
 Typically, the following is appearing often:

 [Jun 24 

Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1

2013-06-26 Thread Kaloyan Kovachev
The problem with stock libss7 is that one will never complete the tests 
required from telcos in Europe as it is missing functionality which ITU 
have described in the test procedures.
Without the ISUP timers (the main functionality added from the patches) 
it is just not possible and the link may not even come UP in some cases. 
Probably in ANSI world it works fine, but not in the ITU world.
One of the difficulties was to keep the code working as before without 
the timers defined in chan_dahdi.conf, another is the hard to find 
(freely available) ANSI standard and it's requirements, then the code 
base/functionality have changed quite a lot from 1.6 separating sig_ss7 
from chan_dahdi etc.
I am sure there are bugs and place for improvements for that branch, but 
the original version from Domjan is used from me and many more for few 
years already (that's why i said the bugs are from me in that branch :) 
) and we are stuck with 1.6 because of that. I have tried to get the 
changes in Asterisk 11, but my (below average) C skills and available 
time did not allowed to do that, while the more time passes more 
difficult it will be to keep it up to date with the rest of the Asterisk 
code.
I hope with some help from others (testing and patches) this code will 
finally find it's way in Asterisk and then we may look to adding the 
cluster/routing/STP functionality


On 2013-06-26 16:33, Marcelo Pacheco wrote:

Thanks Kaloyan.
Before this thread, there were no mentions at all to a KNK tree, so I
though this was stock libss7.
I'm using my own patched libss7.
I processed over one million call setups with ten servers, with many
difficult setups (third party STPs, STPs 1000 miles away with
transmission lines that do fail from time to time, connections with
almost one dozen types of ISUP switches, sharing two links with an STP
to half a dozen switches), and the issues reported don't happen at all.
So this look like a bug in your patch or in Attila's code.
My patch is for paying customers only (they get the source, and could
release if they want to, but chose not to).
I have done very small changes to the ISUP side of things, but some
fairly major changes to MTP2, MTP3 and DAHDI mtp2 mode.
I even implemented very basic STP functionality, and MTP2 over UDP
signalling (between asterisks only).
It might be worth trying to look at the diffs from stock to your 
branch.


On 06/26/13 09:05, Kaloyan Kovachev wrote:
Almost forgot. Please do not post patches (if any) in this list, but
attach them to the SS7-27 issue instead with proper license agreement,
so it can be included in Asterisk codebase

On 2013-06-26 14:57, Kaloyan Kovachev wrote:
Hi all,
sorry for joining so late, but i am on holidays (by the end of the
week) and rarely checking my mailbox. Thanks to bad weather i did that
today :)

To the OP:
while reading the first posts i thought it is an old problem with
REL/RSC loop (persistent on start with ANSI signaling) which was fixed
in libss7 instead of sig_ss7, but not sure if it is a similar yet
different one or it is the same issue. It really is a (remaining)
problem if we receive RLC on previous REL, but after we have sent RSC.
I was thinking to clear the old status bits after we receive RLC, but
this will not fix the double RLC received problem and we can't ignore
the first one (or just clear the SENT_REL flag), because we may never
get a second one, so it should probably be better to ignore sending
second RSC inside isup_handle_unexpected() if the previous one was
sent T17 (timer seconds) ago. Because the timer is stopped on RLC it
should be another timer or some flag to ignore it's expiration and not
reset again ... will work on this next week when i am back.

The code in my branch is actually Domjan Attila's version (the patches
attached to the SS7-27 issue) ported to later Asterisk versions with
very few additions/modifications, so the muffins are for him, while
the bugs are from me :)

P.S.
apologies for top posting - the connection is unstable and i had to
write the post offline and just copy/paste it

On 2013-06-26 06:42, Pavel Troller wrote:
Hi!
So, I'm replying to my own original post, to keep the question and a
possible answer together without any excessive or unrelated 
information.

I hope I've found the cause of the problem and I hope I solved it. A
modified libss7 is now online and I'm waiting for busy hours to see,
whether
it will help.
The problem is, that in the isup_rel() function, all the important
got_sent_msg flags are cleared, so the stack forgets a preceding call
state:
... isup_rel():
c-got_sent_msg |= ISUP_SENT_REL;
c-got_sent_msg = ~(ISUP_SENT_IAM | ISUP_PENDING_IAM |
ISUP_CALL_CONNECTED | ISUP_GOT_IAM | ISUP_GOT_CCR | ISUP_SENT_INR);
...
So, an incoming MSU, which was perfectly legitimate before sending REL,
is now handled as unexpected.
My solution adds the following code to the isup_receive() function for
every message, which can confuse the stack by the discovered cause
(an example 

Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1

2013-06-26 Thread Pavel Troller
Hi Kaloyan,

 Hi all,
 sorry for joining so late, but i am on holidays (by the end of the week) 
 and rarely checking my mailbox. Thanks to bad weather i did that today :)

Never mind, I'm happy you're here!


 To the OP:
 while reading the first posts i thought it is an old problem with REL/RSC 
 loop (persistent on start with ANSI signaling) which was fixed in libss7 
 instead of sig_ss7, but not sure if it is a similar yet different one or it 
 is the same issue. It really is a (remaining) problem if we receive RLC on 
 previous REL, but after we have sent RSC. I was thinking to clear the old 
 status bits after we receive RLC, but this will not fix the double RLC 
 received problem and we can't ignore the first one (or just clear the 
 SENT_REL flag), because we may never get a second one, so it should 
 probably be better to ignore sending second RSC inside 
 isup_handle_unexpected() if the previous one was sent T17 (timer seconds) 
 ago. Because the timer is stopped on RLC it should be another timer or some 
 flag to ignore it's expiration and not reset again ... will work on this 
 next week when i am back.

I think it's another problem. Sometimes I have also this kind of loop, lasting
for hours, until it somewhat settles itself. But the error I've reported here
is, that we clear the old status flags immediately after sending our REL and
if an MSU is already coming back (it may be any common MSU like ACM, CPG, ANM,
SUS, RES, REL..., at least I've encountered all these), we don't expect it,
we call isup_handle_unexpected() and we send RSC, which is absolutely surplus,
because there is nothing wrong with the call state, we just have to ignore
this (and possibly any other) MSUs, until we get RLC acknowledging our REL.
  My patch does it by checking ISUP_SENT_REL, however, it might be better to
postpone clearing the got_sent_msg flags from isup_rel() to the ISUP_RLC case
in isup_receive(). However, I didn't know, whether leaving these flags set after
sending REL wouldn't make harm somewhere, so I did it as written, and about 300
thousands of calls during yesterday didn't discover any problem with the patch.
So, today I removed the ss7_message() calls from my patch and since then,
Asterisk is very quiet and seems very happy, and cooperating EWSDs as well :-).

With regards,
  Pavel


 The code in my branch is actually Domjan Attila's version (the patches 
 attached to the SS7-27 issue) ported to later Asterisk versions with very 
 few additions/modifications, so the muffins are for him, while the bugs are 
 from me :)

 P.S.
 apologies for top posting - the connection is unstable and i had to write 
 the post offline and just copy/paste it

 On 2013-06-26 06:42, Pavel Troller wrote:
 Hi!
 So, I'm replying to my own original post, to keep the question and a
 possible answer together without any excessive or unrelated information.
 I hope I've found the cause of the problem and I hope I solved it. A
 modified libss7 is now online and I'm waiting for busy hours to see, 
 whether
 it will help.
 The problem is, that in the isup_rel() function, all the important
 got_sent_msg flags are cleared, so the stack forgets a preceding call
 state:
 ... isup_rel():
 c-got_sent_msg |= ISUP_SENT_REL;
 c-got_sent_msg = ~(ISUP_SENT_IAM | ISUP_PENDING_IAM |
 ISUP_CALL_CONNECTED | ISUP_GOT_IAM | ISUP_GOT_CCR | ISUP_SENT_INR);
 ...
 So, an incoming MSU, which was perfectly legitimate before sending REL,
 is now handled as unexpected.
 My solution adds the following code to the isup_receive() function for
 every message, which can confuse the stack by the discovered cause
 (an example for ACM message):
 case ISUP_ACM:
 +   if (c-got_sent_msg  ISUP_SENT_REL) {
 +   ss7_message(ss7, Got unexpected ACM
 after sending REL on CIC %d PC %d, ignoring , c-cic, opc);
 +   return 0;
 +   }

 if (!(c-got_sent_msg  ISUP_SENT_IAM)) {
 ss7_message(ss7, Got ACM but we didn't send IAM on CIC %d PC %d ,
 c-cic, opc);
 return isup_handle_unexpected(ss7, c, opc);
 }

 If my change will prove good, I'm planning to remove the ss7_message() to
 limit the stack verbosity, as these situations are relatively frequent 
 under
 heavy load and I think they are moreless logical and normal.

 I would be glad for some words from the KNK branch maintainer(s), whether 
 to
 create a JIRA issue and put my patch there or how to proceed now in 
 general.

 With regards,
 Pavel



 Hi!
 I would like to share my expiernce with deployment of this experimental 
 SS7
 branch.
 The first impressions are good, especially the timers seem to work well,
 saving many calls from being frozen.
 However, there are still some strange things, which I would like to 
 discuss
 here, one by one.
 The first one is, that the channel sometimes doesn't recognize a message
 (mostly RLC), even it comes from an action initiated by the channel 
 itself.
 Typically, the following is appearing 

Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1

2013-06-25 Thread Marcelo Pacheco
Per usual, read the fine manual. Wait, there's no manual !
Since you seem to have done your part and actually knows some ss7 and
isup, here comes a hint.

You created two or more linksets where you must have a single one.
libss7 don't have the ss7 routing feature.
In libss7 linkset concept is diferent from official ss7 linkset.

All signalling links that carry ISUP traffic for a given set of channels
must be kept on a single linkset, as well as all ISUP channels that go
through those links.

It looks like you're getting incoming signalling for ISUP channels that
are on another linkset.

I'm sure you didn't find any libss7 bug.
I have a highly customized version of libss7/dahdi/asterisk, fixing lots
of issue, but this isn't one of them.
Processed over one million call setups, with a very complex setup (6
linksets, 7 links, 6E1 on a single switch, plus another 6E1 on remote
switches using my simple STP solution, sharing the local links over SS7
over UDP - my simpler proprietary alternative to sigtran).

If you need commercial support, contact me off list.

On 06/24/13 09:02, Pavel Troller wrote:
 Hi!
   I would like to share my expiernce with deployment of this experimental SS7
 branch.
   The first impressions are good, especially the timers seem to work well,
 saving many calls from being frozen.
   However, there are still some strange things, which I would like to discuss
 here, one by one.
   The first one is, that the channel sometimes doesn't recognize a message
 (mostly RLC), even it comes from an action initiated by the channel itself.
 Typically, the following is appearing often:

 [Jun 24 13:33:41] ERROR[3975]: chan_dahdi.c:14406 dahdi_ss7_error: [1] ISUP 
 timer t17 expired on CIC 27 DPC 4097
 [1] Got RLC but we didn't send REL/RSC on CIC 27 PC 4097 reseting the cic

   As I understand, there were some timeouts and now the channel tries to
 recover by sending RSC and firing T17. However, it seems that it immediately
 rejects RLC, which comes back as a response to the RSC which was just sent
 upon expiry of T17. And this appears again and again in the rhythm of T17,
 and the channel is not operational.
 ss7 show calls shows the following line for the misbehaving CIC:
27  4097  11  IAM   IAM
  
   Or, a very similar situation:
 [2] Got SUS but no call on CIC 48 PC 4096 reseting the CIC
 [2] Got RLC but we didn't send REL/RSC on CIC 48 PC 4096 reseting the CIC

   The first question is, why there was no call while SUS was received. My
 idea is, that both the parties hung up their phones in the same time and
 that the call was undergoing destruction on Asterisk side (REL just sent
 or something like this), while SUS arrived. Maybe the call was marked as
 cleared even before RLC came back ? OK, I can understand this. But
 if the CIC was reset as the first message says (i.e. RSC was sent), why the
 RLC going back is not recognized then ?

 Or, just now the following appeared:

 [1] Got ACM but we didn't send IAM on CIC 10 PC 4097 reseting the cic
 [1] Got RLC but we didn't send REL/RSC on CIC 10 PC 4097 reseting the cic

 Again, it's questionable, why this happened, but the second line seems
 to indicate some brokeness again.

 To explain: The channel is operating on a gateway equipped with 16 E1s
 and current traffic is about 10 CAPS, there are two linksets to two
 cooperating exchanges. They are EWSDs, which have very mature and stable
 SS7, so I'm almost sure that they are not making signalling errors.

 With regards,
   Pavel

 --
 _
 -- Bandwidth and Colocation Provided by http://www.api-digital.com --

 asterisk-ss7 mailing list
 To UNSUBSCRIBE or update options visit:
http://lists.digium.com/mailman/listinfo/asterisk-ss7


--
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-ss7 mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-ss7


Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1

2013-06-25 Thread Pavel Troller
Hello Marcelo,

 Per usual, read the fine manual. Wait, there's no manual !

You're right :-).

 Since you seem to have done your part and actually knows some ss7 and
 isup, here comes a hint.
 
 You created two or more linksets where you must have a single one.
 libss7 don't have the ss7 routing feature.

It seems strange to me. Let's try to explain this in more detailed way.
There is 1 (one) Asterisk box.
It has 2 (two) linksets configured, with 1 (one) signallink link per linkset.
Linkset 1 is configured for one DPC and with CICs 1 - 496.
Linkset 2 is configured for another (different) DPC and also with CICs 1 - 496.
Both the systems connected to this Asterisk box are configured to respond
directly to the linkset between them and the Asterisk, so it's sure that
a MSU from DPC1 cannot come over LS2 and vice versa.
I hope that this extremely simple setup is in the scope of current libss7
functionality. Or am I wrong ?

 In libss7 linkset concept is diferent from official ss7 linkset.
 
 All signalling links that carry ISUP traffic for a given set of channels
 must be kept on a single linkset, as well as all ISUP channels that go
 through those links.

I hope that my setup is conformant with this limitation.

 
 It looks like you're getting incoming signalling for ISUP channels that
 are on another linkset.

It really looks like this, but I still hope it's not the case. Please note that
the traffic on the box is rather high, such an error occurs for one of, say,
1 call attempts. I think that in case of such a fatal routing problem,
which you are talking about, it wouldn't be possible to use the system
regularly.

 
 I'm sure you didn't find any libss7 bug.

Really strong words! I wouldn't say it for any of my programs :-).

 I have a highly customized version of libss7/dahdi/asterisk, fixing lots
 of issue, but this isn't one of them.

Possibly your setup/usage scenario is a bit different ?


 Processed over one million call setups, with a very complex setup (6
 linksets, 7 links, 6E1 on a single switch, plus another 6E1 on remote
 switches using my simple STP solution, sharing the local links over SS7
 over UDP - my simpler proprietary alternative to sigtran).

These switches (I have two of them, but the second one is still on a regular
unpatched SS7 stack) make approx. 3 millions of call setups per week. My
record (without restarting/crashing Asterisk) is about 3 weeks with more than
10 millions of calls.

 
 If you need commercial support, contact me off list.

Thanks for your offer.

With regards, Pavel

 
 On 06/24/13 09:02, Pavel Troller wrote:
  Hi!
I would like to share my expiernce with deployment of this experimental 
  SS7
  branch.
The first impressions are good, especially the timers seem to work well,
  saving many calls from being frozen.
However, there are still some strange things, which I would like to 
  discuss
  here, one by one.
The first one is, that the channel sometimes doesn't recognize a message
  (mostly RLC), even it comes from an action initiated by the channel itself.
  Typically, the following is appearing often:
 
  [Jun 24 13:33:41] ERROR[3975]: chan_dahdi.c:14406 dahdi_ss7_error: [1] ISUP 
  timer t17 expired on CIC 27 DPC 4097
  [1] Got RLC but we didn't send REL/RSC on CIC 27 PC 4097 reseting the cic
 
As I understand, there were some timeouts and now the channel tries to
  recover by sending RSC and firing T17. However, it seems that it immediately
  rejects RLC, which comes back as a response to the RSC which was just sent
  upon expiry of T17. And this appears again and again in the rhythm of T17,
  and the channel is not operational.
  ss7 show calls shows the following line for the misbehaving CIC:
 27  4097  11  IAM   IAM
   
Or, a very similar situation:
  [2] Got SUS but no call on CIC 48 PC 4096 reseting the CIC
  [2] Got RLC but we didn't send REL/RSC on CIC 48 PC 4096 reseting the CIC
 
The first question is, why there was no call while SUS was received. My
  idea is, that both the parties hung up their phones in the same time and
  that the call was undergoing destruction on Asterisk side (REL just sent
  or something like this), while SUS arrived. Maybe the call was marked as
  cleared even before RLC came back ? OK, I can understand this. But
  if the CIC was reset as the first message says (i.e. RSC was sent), why the
  RLC going back is not recognized then ?
 
  Or, just now the following appeared:
 
  [1] Got ACM but we didn't send IAM on CIC 10 PC 4097 reseting the cic
  [1] Got RLC but we didn't send REL/RSC on CIC 10 PC 4097 reseting the cic
 
  Again, it's questionable, why this happened, but the second line seems
  to indicate some brokeness again.
 
  To explain: The channel is operating on a gateway equipped with 16 E1s
  and current traffic is about 10 CAPS, there are two linksets to two
  cooperating exchanges. They are EWSDs, which have very mature and stable
  SS7, so I'm almost sure 

Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1

2013-06-25 Thread Marcelo Pacheco
Another possibility is you're mixing the whole thing in a single linkset
where you must use two linksets in the way you explained.

Can you see those errors with just a few test calls ?


I found about 20 bugs / structural design flaws in stock libss7 / dahdi
mtp2 support. With my changes the mtp2/mtp3 layers are far more robust
than stock libss7.
Fixed all but a single one, related to knowing then the linkset is up or
down, and not trying to send isup messages, specially IAM through a down
linkset - all sigchans down.

If there's a bug, use ss7 set debug on linkset X to trace ss7 messages
and track isup message flow.

I used libss7 succesfully with telcobridges tmedia, digitro switches,
ericsson AXE, huawei NGN, Nortel DMS, several STPs, EWSS, Nec NEAX, and
I'm probably missing a couple switch types.
I never ran into SS7 / ISUP bugs of other switches, always libss7, but,
the nature of the bugs found are nothing like what you're reporting.
I started testing libss7 with those kinds of switches 5 years ago, so I
have a some mileage to make those statements, specially from reading and
understanding a large portion of the libss7 / sig_ss7 / chan_dahdi code.

The issue you're describing is caused by Asterisk getting ss7 messages
that belong to another linkset or sending ss7 messages on the wrong ss7
link.
Check for UCIC or CFN ISUP responses.



you need to define chan_dahdi.conf basicly like this:

; basic ss7 / isup parameters, usually the same for the whole libss7 setup
signalling=ss7
ss7type=itu/ansi
ss7_called_nai=subscriber/national/international/unknown
ss7_calling_nai=subscriber/national/international/unknown
networkindicator=national/international/...

; Your local pointcode
pointcode = X

; Start definition for linkset N
linkset = N

adjpointcode = STP point code otherwise switch point code
; Instantiate a signalling link on channel 16 belonging to linkset N,
with adjacency to adjpointcode
sigchan = 16
; Define more signalling links if needed, with adjpointcode and sigchan

defaultdpc = pointcode for ISUP messages
cicbeginswith= CIC of the next voice channel defined
; Instantiate voice channel on linkset N, talking to PC defaultdpc, CIC
numbering incremented automatically
channel = dahdi channel range

cicbeginswith= next CIC range, if non contiguous
channel = dahdi channel range

defaultdpc = another point code belonging to the same linkset (if links
share signalling to multiple switches, typically links through an STP)
;repeat cicbeginswith, channel

; Starts definition of another linkset
linkset = M
; repeat same sequence as above


On 06/25/13 05:13, Pavel Troller wrote:
 Hello Marcelo,

 Per usual, read the fine manual. Wait, there's no manual !
 You're right :-).

 Since you seem to have done your part and actually knows some ss7 and
 isup, here comes a hint.

 You created two or more linksets where you must have a single one.
 libss7 don't have the ss7 routing feature.
 It seems strange to me. Let's try to explain this in more detailed way.
 There is 1 (one) Asterisk box.
 It has 2 (two) linksets configured, with 1 (one) signallink link per 
 linkset.
 Linkset 1 is configured for one DPC and with CICs 1 - 496.
 Linkset 2 is configured for another (different) DPC and also with CICs 1 - 
 496.
 Both the systems connected to this Asterisk box are configured to respond
 directly to the linkset between them and the Asterisk, so it's sure that
 a MSU from DPC1 cannot come over LS2 and vice versa.
 I hope that this extremely simple setup is in the scope of current libss7
 functionality. Or am I wrong ?

 In libss7 linkset concept is diferent from official ss7 linkset.

 All signalling links that carry ISUP traffic for a given set of channels
 must be kept on a single linkset, as well as all ISUP channels that go
 through those links.
 I hope that my setup is conformant with this limitation.

 It looks like you're getting incoming signalling for ISUP channels that
 are on another linkset.
 It really looks like this, but I still hope it's not the case. Please note 
 that
 the traffic on the box is rather high, such an error occurs for one of, say,
 1 call attempts. I think that in case of such a fatal routing problem,
 which you are talking about, it wouldn't be possible to use the system
 regularly.

 I'm sure you didn't find any libss7 bug.
 Really strong words! I wouldn't say it for any of my programs :-).

 I have a highly customized version of libss7/dahdi/asterisk, fixing lots
 of issue, but this isn't one of them.
 Possibly your setup/usage scenario is a bit different ?


 Processed over one million call setups, with a very complex setup (6
 linksets, 7 links, 6E1 on a single switch, plus another 6E1 on remote
 switches using my simple STP solution, sharing the local links over SS7
 over UDP - my simpler proprietary alternative to sigtran).
 These switches (I have two of them, but the second one is still on a regular
 unpatched SS7 stack) make approx. 3 millions of call setups per week. My
 

Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1

2013-06-25 Thread Pavel Troller
Hello Marcelo,

 Another possibility is you're mixing the whole thing in a single linkset
 where you must use two linksets in the way you explained.

I hope I'm not doing this.

 
 Can you see those errors with just a few test calls ?
 

No. These errors are occuring only in the high traffic periods.

 
 I found about 20 bugs / structural design flaws in stock libss7 / dahdi
 mtp2 support. With my changes the mtp2/mtp3 layers are far more robust
 than stock libss7.
 Fixed all but a single one, related to knowing then the linkset is up or
 down, and not trying to send isup messages, specially IAM through a down
 linkset - all sigchans down.

I believe that you fixed most of the bugs in the stock libss7, but now I'm
trying to use the patched one, which contains isup timers (it was a pain
to live with them) and improved diagnostic, more dumping commands etc.
This is the reason I'm trying to use it.

 
 If there's a bug, use ss7 set debug on linkset X to trace ss7 messages
 and track isup message flow.

The problem is, that the format of the dump is good for manual viewing,
but not for machine processing (greping for patterns etc.). And running
a minute of ss7 debug on a single linkse creates a really huge file (10+ MB),
which is very hard to view personally.

 
 I used libss7 succesfully with telcobridges tmedia, digitro switches,
 ericsson AXE, huawei NGN, Nortel DMS, several STPs, EWSS, Nec NEAX, and
 I'm probably missing a couple switch types.

Generally, your experience is really large. 

 I never ran into SS7 / ISUP bugs of other switches, always libss7, but,
 the nature of the bugs found are nothing like what you're reporting.
 I started testing libss7 with those kinds of switches 5 years ago, so I
 have a some mileage to make those statements, specially from reading and
 understanding a large portion of the libss7 / sig_ss7 / chan_dahdi code.
 
 The issue you're describing is caused by Asterisk getting ss7 messages
 that belong to another linkset or sending ss7 messages on the wrong ss7
 link.
 Check for UCIC or CFN ISUP responses.
 

There will be no UCIC messages, because both the linksets have identical
CICs, so even if the case the messages are mixed between linksets, the CIC
will be always there.
Sometimes I can see CFN, but always easily understandable in the regular
call context (invalid parameters being sent etc.).



 
 
 you need to define chan_dahdi.conf basicly like this:

There is my config:

signalling=ss7
ss7type=itu
ss7_called_nai=dynamic
ss7_calling_nai=dynamic
ss7_internationalprefix=00
ss7_nationalprefix=
ss7_subscriberprefix=
ss7_unknownprefix=
ss7_explictacm=yes

; == ALI 01 ===

; All settings apply to linkset 1
linkset=1
slc=0
pointcode=8
adjpointcode=4097
defaultdpc=4097
networkindicator=national

; First signalling channel
sigchan=1
mtp3_timer.t21=1
isup_timer.t1 = 15000   ; Wait for RLC
isup_timer.t2 = 18  ; User SUS received
;isup_timer.t3 = 12  ; Overload
;isup_timer.t4 = 30  ; MTP Inaccessible Remote User Timer
isup_timer.t5 = 30  ; Wait for RLC after initial REL
isup_timer.t6 = 3   ; Network SUS received
isup_timer.t7 = 3   ; Last Address Message, waiting for ACM/CON
;isup_timer.t11 = 15000  ; Automatic ACM timer
isup_timer.t12 = 15000  ; BLO - BLA timer
isup_timer.t13 = 30 ; Initial BLO - BLA timer
isup_timer.t14 = 15000  ; UBL - UBA timer
isup_timer.t15 = 30 ; Initial UBL - UBA timer
isup_timer.t16 = 15000  ; RSC timer due to T5 expiry
isup_timer.t17 = 30 ; Initial RSC -''-
isup_timer.t18 = 15000  ; CGB - CGBA timer
isup_timer.t19 = 30 ; Initial CGB - CGBA timer
isup_timer.t20 = 15000  ; CGU - CGUA timer
isup_timer.t21 = 30 ; Initial CGU - CGUA timer
isup_timer.t22 = 15000  ; CGR - CGRA timer
isup_timer.t23 = 3  ; Initial CGR - CGRA timer
isup_timer.t27 = 24 ; COT failure
isup_timer.t33 = 15000  ; INR - INF timer
isup_timer.t35 = 15000  ; Overlap dialling timer

group=1
context=from_ss7
faxdetect=no

; Begin CIC (Circuit indication codes) count with this number
cicbeginswith=2
; Channels to associate with CICs on this linkset
channel=2-31
cicbeginswith=33
channel=32-62
... for all other spans

; == ALI 02 ===
linkset=2
pointcode=8
adjpointcode=4096
defaultdpc=4096
networkindicator=national

; First signalling channel
sigchan=125
mtp3_timer.t21=1
... the rest the same as for Linkset 1
... of course channel numbers differ in the channel= definitions


So, it differs in the following from your suggestion below:
- Own pointcode is stated in the linkset sections, but it's the same
  in all the linksets.
- There are both adjpointcode and defaultdpc specified in every linkset
  definition, both being the same to be sure.

Thank you again for your help with my problems!

With regards,
  Pavel


 
 ; basic ss7 / isup parameters, usually the same for the whole libss7 setup
 signalling=ss7
 ss7type=itu/ansi
 ss7_called_nai=subscriber/national/international/unknown
 

Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1

2013-06-25 Thread Pavel Troller
Hello Marcelo,
  so I did some tracing. It was really hard to isolate MSUs for one particular
connection, I had to collect them from about 5 MB file, but ok, it's done,
and it's in total harmony with my original ideas. So, let's look at it with
me:

Initial conditions: There is a call running on LS1, DPC4097, CIC 12.

Our Asterisk decided to clear this call down:

[1] ISUP timer t1 (15000ms) started on CIC 12 DPC 4097
[1] ISUP timer t5 (30ms) started on CIC 12 DPC 4097
[1] Len = 16 [ bc c3 0d 85 01 10 02 c0 0c 00 0c 02 00 02 81 90 ]
[1] FSN: 67 FIB 1
[1] BSN: 60 BIB 1
[1] [4097:0] MSU
[1] [ bc c3 0d ]
[1] Network Indicator: 2 Priority: 0 User Part: ISUP (5)
[1] [ 85 ]
[1] OPC 8 DPC 4097 SLS 12
[1] [ 01 10 02 c0 ]
[1] CIC: 12
[1] [ 0c 00 ]
[1] Message Type: REL(0x0c)
[1] [ 0c ]
[1] --VARIABLE LENGTH PARMS[1]--
[1] Cause Indicator:
[1] Coding Standard: 0
[1] Location: 1
[1] Cause Class: 1
[1] Cause Subclass: 0
[1] Cause: Normal call clearing (16)
[1] [ 02 81 90 ]
[1] 

But, the remote party also decided to hang up, and our REL just crossed
their SUS going back (please look at BSN and compare with our FSN, they
don't know about our REL yet).

[1] Len = 13 [ c0 bd 0a 85 08 40 00 c4 0c 00 0d 01 00 ]
[1] FSN: 61 FIB 1
[1] BSN: 64 BIB 1
[1] [4097:0] MSU
[1] [ c0 bd 0a ]
[1] Network Indicator: 2 Priority: 0 User Part: ISUP (5)
[1] [ 85 ]
[1] OPC 4097 DPC 8 SLS 12
[1] [ 08 40 00 c4 ]
[1] CIC: 12
[1] [ 0c 00 ]
[1] Message Type: SUS(0x0d)
[1] [ 0d ]
[1] --FIXED LENGTH PARMS[1]--
[1] Suspend/Resume Indicators:
[1] SUS/RES indicator: Network initiated (1)˙[1]
[ 01 ]
[1] 

And what happens now is a clear  BUG  in libss7: As RLC
has not been received yet, the call must still be considered as active!
But we already forgot it and now we are surprised that we got some MSU
about it.

[1] Got SUS but no call on CIC 12 PC 4097 ˙[1] reseting the cic

The situation is getting complicated, we are sending RSC.

[1] ISUP timer t1 stopped on CIC 12 DPC: 4097
[1] ISUP timer t5 stopped on CIC 12 DPC: 4097
[1] ISUP timer t17 (30ms) started on CIC 12 DPC 4097
[1] Len = 11 [ bd c4 08 85 01 10 02 c0 0c 00 12 ]
[1] FSN: 68 FIB 1
[1] BSN: 61 BIB 1
[1] [4097:0] MSU
[1] [ bd c4 08 ]
[1] Network Indicator: 2 Priority: 0 User Part: ISUP (5)
[1] [ 85 ]
[1] OPC 8 DPC 4097 SLS 12
[1] [ 01 10 02 c0 ]
[1] CIC: 12
[1] [ 0c 00 ]
[1] Message Type: RSC(0x12)
[1] [ 12 ]

And we get a RLC. IMHO it is a RLC confirming our REL, not
RSC (according to BSN, the peer already received all our MSUs,
but they probably already had the RLC queued, so they sent it)

[1] 
[1] Len = 12 [ c4 be 09 85 08 40 00 c4 0c 00 10 00 ]
[1] FSN: 62 FIB 1
[1] BSN: 68 BIB 1
[1] [4097:0] MSU
[1] [ c4 be 09 ]
[1] Network Indicator: 2 Priority: 0 User Part: ISUP (5)
[1] [ 85 ]
[1] OPC 4097 DPC 8 SLS 12
[1] [ 08 40 00 c4 ]
[1] CIC: 12
[1] [ 0c 00 ]
[1] Message Type: RLC(0x10)
[1] [ 10 ]
[1] 
[1] ISUP timer t17 stopped on CIC 12 DPC: 4097
Linkset 1: Processing event: ISUP_EVENT_RLC

And now, we get a second RLC, probably to our RSC. There is a jump
in FSN because there was a MSU sent from them, which was not
related to our call.

[1] Len = 12 [ c4 c0 09 85 08 40 00 c4 0c 00 10 00 ]
[1] FSN: 64 FIB 1
[1] BSN: 68 BIB 1
[1] [4097:0] MSU
[1] [ c4 c0 09 ]
[1] Network Indicator: 2 Priority: 0 User Part: ISUP (5)
[1] [ 85 ]
[1] OPC 4097 DPC 8 SLS 12
[1] [ 08 40 00 c4 ]
[1] CIC: 12
[1] [ 0c 00 ]
[1] Message Type: RLC(0x10)
[1] [ 10 ]
[1] 

And this RLC seems unsolicited to us, because we were taking the
first RLC as a response to our RSC, which was not the case.

[1] Got RLC but we didn't send REL/RSC on CIC 12 PC 4097 

So, no MSUs received from another linksets, all is perfectly fitting
together...

This trace is a clear demonstration of an existing bug in libss7, which
may be formulated as follows: When we are terminating the call and sending
REL to the remote party, we must keep the record of the connection and 
silently accept and absorb all MSUs, which may come back, until we receive
a RLC or T5 expires.

What do you think about it ?

With regards,
  Pavel

 Another possibility is you're mixing the whole thing in a single linkset
 where you must use two linksets in the way you explained.
 
 Can you see those errors with just a few test calls ?
 
 
 I found about 20 bugs / structural design flaws in stock libss7 / dahdi
 mtp2 support. With my changes the mtp2/mtp3 layers are far more robust
 than stock libss7.
 Fixed all but a single one, related 

Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1

2013-06-25 Thread Marcelo Pacheco
What code are you using ?
Is this not stock libss7 ? Stock libss7 can't decode ISUP SUS/RES like that.

In my code, I explicitly ignore ALL SUS / RES, they have no needed
processing associated with Brazilian ISUP.

Asterisk and kernel dahdi version ?

If you enable dahdi_pcap:
# dahdi_pcap -c 16 -f /tmp/mycap.ss7
Capturing protocol mtp2 on channels 16 to file /tmp/mycap.ss7
Packets captured: 7

Then you can analyze the capture in wireshark / ethereal.
But it has one bug, if you shutdown the owner of the link while
dahdi_pcap is running, the system will reset on its own.
As long as you don't leave dahdi_pcap running around, its not a problem.



On 06/25/13 09:38, Pavel Troller wrote:
 Hello Marcelo,
   so I did some tracing. It was really hard to isolate MSUs for one particular
 connection, I had to collect them from about 5 MB file, but ok, it's done,
 and it's in total harmony with my original ideas. So, let's look at it with
 me:

 Initial conditions: There is a call running on LS1, DPC4097, CIC 12.

 Our Asterisk decided to clear this call down:

 [1] ISUP timer t1 (15000ms) started on CIC 12 DPC 4097
 [1] ISUP timer t5 (30ms) started on CIC 12 DPC 4097
 [1] Len = 16 [ bc c3 0d 85 01 10 02 c0 0c 00 0c 02 00 02 81 90 ]
 [1] FSN: 67 FIB 1
 [1] BSN: 60 BIB 1
 [1] [4097:0] MSU
 [1] [ bc c3 0d ]
 [1]   Network Indicator: 2 Priority: 0 User Part: ISUP (5)
 [1]   [ 85 ]
 [1]   OPC 8 DPC 4097 SLS 12
 [1]   [ 01 10 02 c0 ]
 [1]   CIC: 12
 [1]   [ 0c 00 ]
 [1]   Message Type: REL(0x0c)
 [1]   [ 0c ]
 [1]   --VARIABLE LENGTH PARMS[1]--
 [1]   Cause Indicator:
 [1]   Coding Standard: 0
 [1]   Location: 1
 [1]   Cause Class: 1
 [1]   Cause Subclass: 0
 [1]   Cause: Normal call clearing (16)
 [1]   [ 02 81 90 ]
 [1] 

 But, the remote party also decided to hang up, and our REL just crossed
 their SUS going back (please look at BSN and compare with our FSN, they
 don't know about our REL yet).

 [1] Len = 13 [ c0 bd 0a 85 08 40 00 c4 0c 00 0d 01 00 ]
 [1] FSN: 61 FIB 1
 [1] BSN: 64 BIB 1
 [1] [4097:0] MSU
 [1] [ c0 bd 0a ]
 [1]   Network Indicator: 2 Priority: 0 User Part: ISUP (5)
 [1]   [ 85 ]
 [1]   OPC 4097 DPC 8 SLS 12
 [1]   [ 08 40 00 c4 ]
 [1]   CIC: 12
 [1]   [ 0c 00 ]
 [1]   Message Type: SUS(0x0d)
 [1]   [ 0d ]
 [1]   --FIXED LENGTH PARMS[1]--
 [1]   Suspend/Resume Indicators:
 [1]   SUS/RES indicator: Network initiated (1)˙[1]
 [ 01 ]
 [1] 

 And what happens now is a clear  BUG  in libss7: As RLC
 has not been received yet, the call must still be considered as active!
 But we already forgot it and now we are surprised that we got some MSU
 about it.

 [1] Got SUS but no call on CIC 12 PC 4097 ˙[1] reseting the cic

 The situation is getting complicated, we are sending RSC.

 [1] ISUP timer t1 stopped on CIC 12 DPC: 4097
 [1] ISUP timer t5 stopped on CIC 12 DPC: 4097
 [1] ISUP timer t17 (30ms) started on CIC 12 DPC 4097
 [1] Len = 11 [ bd c4 08 85 01 10 02 c0 0c 00 12 ]
 [1] FSN: 68 FIB 1
 [1] BSN: 61 BIB 1
 [1] [4097:0] MSU
 [1] [ bd c4 08 ]
 [1]   Network Indicator: 2 Priority: 0 User Part: ISUP (5)
 [1]   [ 85 ]
 [1]   OPC 8 DPC 4097 SLS 12
 [1]   [ 01 10 02 c0 ]
 [1]   CIC: 12
 [1]   [ 0c 00 ]
 [1]   Message Type: RSC(0x12)
 [1]   [ 12 ]

 And we get a RLC. IMHO it is a RLC confirming our REL, not
 RSC (according to BSN, the peer already received all our MSUs,
 but they probably already had the RLC queued, so they sent it)

 [1] 
 [1] Len = 12 [ c4 be 09 85 08 40 00 c4 0c 00 10 00 ]
 [1] FSN: 62 FIB 1
 [1] BSN: 68 BIB 1
 [1] [4097:0] MSU
 [1] [ c4 be 09 ]
 [1]   Network Indicator: 2 Priority: 0 User Part: ISUP (5)
 [1]   [ 85 ]
 [1]   OPC 4097 DPC 8 SLS 12
 [1]   [ 08 40 00 c4 ]
 [1]   CIC: 12
 [1]   [ 0c 00 ]
 [1]   Message Type: RLC(0x10)
 [1]   [ 10 ]
 [1] 
 [1] ISUP timer t17 stopped on CIC 12 DPC: 4097
 Linkset 1: Processing event: ISUP_EVENT_RLC

 And now, we get a second RLC, probably to our RSC. There is a jump
 in FSN because there was a MSU sent from them, which was not
 related to our call.

 [1] Len = 12 [ c4 c0 09 85 08 40 00 c4 0c 00 10 00 ]
 [1] FSN: 64 FIB 1
 [1] BSN: 68 BIB 1
 [1] [4097:0] MSU
 [1] [ c4 c0 09 ]
 [1]   Network Indicator: 2 Priority: 0 User Part: ISUP (5)
 [1]   [ 85 ]
 [1]   OPC 4097 DPC 8 SLS 12
 [1]   [ 08 40 00 c4 ]
 [1]   CIC: 12
 [1]   [ 0c 00 ]
 [1]   Message Type: RLC(0x10)
 [1]   [ 10 ]
 [1] 

 And this RLC seems unsolicited to us, because we were taking the
 first RLC as a response to our RSC, which was not the case.

 [1] Got RLC but we didn't send REL/RSC on CIC 12 PC 4097 

 So, no MSUs received from another linksets, all is perfectly fitting
 together...

 This trace is a clear demonstration of an existing bug in 

Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1

2013-06-25 Thread Johann Steinwendtner

On 2013-06-25 14:56, Marcelo Pacheco wrote:

What code are you using ?
Is this not stock libss7 ? Stock libss7 can't decode ISUP SUS/RES like that.

In my code, I explicitly ignore ALL SUS / RES, they have no needed
processing associated with Brazilian ISUP.

Asterisk and kernel dahdi version ?

If you enable dahdi_pcap:
# dahdi_pcap -c 16 -f /tmp/mycap.ss7
Capturing protocol mtp2 on channels 16 to file /tmp/mycap.ss7
Packets captured: 7

Then you can analyze the capture in wireshark / ethereal.
But it has one bug, if you shutdown the owner of the link while
dahdi_pcap is running, the system will reset on its own.
As long as you don't leave dahdi_pcap running around, its not a problem.



Which hardware for your E1's are you guys using ? I have seen this nasty 
behaviour
on Sangoma cards only.

Where to report problems for this  KNK SS7-27  branch ? Jira or this list ?

Thanks Hans


--
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-ss7 mailing list
To UNSUBSCRIBE or update options visit:
  http://lists.digium.com/mailman/listinfo/asterisk-ss7


Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1

2013-06-25 Thread Pavel Troller
Hello Marcelo,

 What code are you using ?
 Is this not stock libss7 ? Stock libss7 can't decode ISUP SUS/RES like that.

No, it's not stock libss7. It's written in the subject, as well as in my first
sentence in the first post. It's a special branch, available for both Asterisk
and libsss7 (version 2), which must be applied together. And when I started
playing with it, I was told that I can post my experiences/problems to this ML,
which is what I just did. But you are still the only one who responded to me.

 
 In my code, I explicitly ignore ALL SUS / RES, they have no needed
 processing associated with Brazilian ISUP.

We have SUS/RES in Czech Republic in the national ISUP spec, so we must
handle it properly. However, it's not just problem with SUS, this problem
may appear at any time, when A-side clears down, while a MSU from B-side
(any obvious MSU like ACM, ANM, CON, CPG...) is already underway. 

 
 Asterisk and kernel dahdi version ?

Asterisk 11 branch, dahdi kernel the last state available from SVN (they now
moved to git and I still didn't adapt my working copy, as I also have many
private patches in it and it will be a pain to incorporate them to my local
git repo).

 
 If you enable dahdi_pcap:
 # dahdi_pcap -c 16 -f /tmp/mycap.ss7
 Capturing protocol mtp2 on channels 16 to file /tmp/mycap.ss7
 Packets captured: 7
 
 Then you can analyze the capture in wireshark / ethereal.
 But it has one bug, if you shutdown the owner of the link while
 dahdi_pcap is running, the system will reset on its own.
 As long as you don't leave dahdi_pcap running around, its not a problem.

A good hint, I really didn't know about it! Thanks, I will use it (with care
to prevent system crash :-) ).

With regards,
  Pavel


 
 
 
 On 06/25/13 09:38, Pavel Troller wrote:
  Hello Marcelo,
so I did some tracing. It was really hard to isolate MSUs for one 
  particular
  connection, I had to collect them from about 5 MB file, but ok, it's done,
  and it's in total harmony with my original ideas. So, let's look at it with
  me:
 
  Initial conditions: There is a call running on LS1, DPC4097, CIC 12.
 
  Our Asterisk decided to clear this call down:
 
  [1] ISUP timer t1 (15000ms) started on CIC 12 DPC 4097
  [1] ISUP timer t5 (30ms) started on CIC 12 DPC 4097
  [1] Len = 16 [ bc c3 0d 85 01 10 02 c0 0c 00 0c 02 00 02 81 90 ]
  [1] FSN: 67 FIB 1
  [1] BSN: 60 BIB 1
  [1] [4097:0] MSU
  [1] [ bc c3 0d ]
  [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5)
  [1] [ 85 ]
  [1] OPC 8 DPC 4097 SLS 12
  [1] [ 01 10 02 c0 ]
  [1] CIC: 12
  [1] [ 0c 00 ]
  [1] Message Type: REL(0x0c)
  [1] [ 0c ]
  [1] --VARIABLE LENGTH PARMS[1]--
  [1] Cause Indicator:
  [1] Coding Standard: 0
  [1] Location: 1
  [1] Cause Class: 1
  [1] Cause Subclass: 0
  [1] Cause: Normal call clearing (16)
  [1] [ 02 81 90 ]
  [1] 
 
  But, the remote party also decided to hang up, and our REL just crossed
  their SUS going back (please look at BSN and compare with our FSN, they
  don't know about our REL yet).
 
  [1] Len = 13 [ c0 bd 0a 85 08 40 00 c4 0c 00 0d 01 00 ]
  [1] FSN: 61 FIB 1
  [1] BSN: 64 BIB 1
  [1] [4097:0] MSU
  [1] [ c0 bd 0a ]
  [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5)
  [1] [ 85 ]
  [1] OPC 4097 DPC 8 SLS 12
  [1] [ 08 40 00 c4 ]
  [1] CIC: 12
  [1] [ 0c 00 ]
  [1] Message Type: SUS(0x0d)
  [1] [ 0d ]
  [1] --FIXED LENGTH PARMS[1]--
  [1] Suspend/Resume Indicators:
  [1] SUS/RES indicator: Network initiated (1)˙[1]
  [ 01 ]
  [1] 
 
  And what happens now is a clear  BUG  in libss7: As RLC
  has not been received yet, the call must still be considered as active!
  But we already forgot it and now we are surprised that we got some MSU
  about it.
 
  [1] Got SUS but no call on CIC 12 PC 4097 ˙[1] reseting the cic
 
  The situation is getting complicated, we are sending RSC.
 
  [1] ISUP timer t1 stopped on CIC 12 DPC: 4097
  [1] ISUP timer t5 stopped on CIC 12 DPC: 4097
  [1] ISUP timer t17 (30ms) started on CIC 12 DPC 4097
  [1] Len = 11 [ bd c4 08 85 01 10 02 c0 0c 00 12 ]
  [1] FSN: 68 FIB 1
  [1] BSN: 61 BIB 1
  [1] [4097:0] MSU
  [1] [ bd c4 08 ]
  [1] Network Indicator: 2 Priority: 0 User Part: ISUP (5)
  [1] [ 85 ]
  [1] OPC 8 DPC 4097 SLS 12
  [1] [ 01 10 02 c0 ]
  [1] CIC: 12
  [1] [ 0c 00 ]
  [1] Message Type: RSC(0x12)
  [1] [ 12 ]
 
  And we get a RLC. IMHO it is a RLC confirming our REL, not
  RSC (according to BSN, the peer already received all our 

Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1

2013-06-25 Thread Pavel Troller
Hi Hans!

 On 2013-06-25 14:56, Marcelo Pacheco wrote:
 What code are you using ?
 Is this not stock libss7 ? Stock libss7 can't decode ISUP SUS/RES like that.

 In my code, I explicitly ignore ALL SUS / RES, they have no needed
 processing associated with Brazilian ISUP.

 Asterisk and kernel dahdi version ?

 If you enable dahdi_pcap:
 # dahdi_pcap -c 16 -f /tmp/mycap.ss7
 Capturing protocol mtp2 on channels 16 to file /tmp/mycap.ss7
 Packets captured: 7

 Then you can analyze the capture in wireshark / ethereal.
 But it has one bug, if you shutdown the owner of the link while
 dahdi_pcap is running, the system will reset on its own.
 As long as you don't leave dahdi_pcap running around, its not a problem.


 Which hardware for your E1's are you guys using ? I have seen this nasty 
 behaviour
 on Sangoma cards only.

On the older server, we are using Sangoma cards with wanpipe drivers. But we
are in the process of abandoning them and the new server is using Digium TE820P
cards (our target is 4 per server, i.e. 32 E1s, but now we run on two).


 Where to report problems for this  KNK SS7-27  branch ? Jira or this list ?

It's exactly the question. I was told (from somebody who seemed familiar with
this branch and who sent me a direct link to the patchset), that I have to post
my experiences here in this list.


 Thanks Hans

You're welcome! With regards, Pavel


--
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-ss7 mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-ss7


Re: [asterisk-ss7] KNK SS7-27 - first experiences - part 1

2013-06-25 Thread Nick Khamis
Hello Gentlemen,

I find this post very interesting. I have set up many traditional T1s
with my father here in Canada (I am 17), and looking to experiment
with SS7 signaling. I have been reading up on it quite a bit and have
some questions. I see you guys are using T1 cards for the
interconnect. Is this after the link has been muxed? Or is the
signaling coming in on T1/E1s? Sorry if this is a stupid question.
More specifically is the SS7 interconnect with the CO done using
grouped PRI trunks mapped in a TE1/3 transport layer. Or A-Links,
STM-1.

I really want to experiment with an interconnect using the SS7
signaling to help bring to light what I have been studying. What is
the minimum I would need to ask a service provider for in terms of
service (i.e., T1 with SS7 signaling?), and hardware.

Last question I promise!!! It seems that this setup
Asterisk+lib_ss7+digium cards act as a media gateway? This is really
cool once stable!

Kind Regards,

Nick.

--
_
-- Bandwidth and Colocation Provided by http://www.api-digital.com --

asterisk-ss7 mailing list
To UNSUBSCRIBE or update options visit:
   http://lists.digium.com/mailman/listinfo/asterisk-ss7