Fwd: Weird binding issue that causes queues to build up.

2010-09-23 Thread brett skinner
Hi guys

We have just hit this issue AGAIN this morning. Can ANYONE please give some
guidance here. I have had zero response on this critical issue for over two
weeks now. Can someone please help. This issue is becoming increasingly
urgent.

Regards,
Brett

-- Forwarded message --
From: brett skinner tatty.dishcl...@gmail.com
Date: Fri, Sep 17, 2010 at 2:16 PM
Subject: Re: Weird binding issue that causes queues to build up.
To: Users users@kannel.org


Hi

We have experienced this problem again. A couple of our binds to one
particular smsc (the rest were okay) had connectivity issues last night at
12 AM. The binds were re-established and reported as being online from the
status pages. However a queue for one of the binds built up on the
bearerbox. Only when I had run a stop-smsc and start-smsc for that bind did
the queue for that bind start processing again.

In the logs at 12AM we have a bunch of Errors:

2010-09-16 23:59:35 [32641] [44] ERROR: Error reading from fd 57:
2010-09-16 23:59:35 [32641] [44] ERROR: System error 104: Connection reset
by peer
2010-09-16 23:59:35 [32641] [44] ERROR: SMPP[XXX]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-16 23:59:38 [32641] [38] ERROR: Error reading from fd 52:
2010-09-16 23:59:38 [32641] [38] ERROR: System error 104: Connection reset
by peer
2010-09-16 23:59:38 [32641] [38] ERROR: SMPP[YYY]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-16 23:59:39 [32641] [46] ERROR: Error reading from fd 50:
2010-09-16 23:59:39 [32641] [46] ERROR: System error 104: Connection reset
by peer
2010-09-16 23:59:39 [32641] [46] ERROR: SMPP[AAA]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-16 23:59:47 [32641] [39] ERROR: Error reading from fd 49:
2010-09-16 23:59:47 [32641] [39] ERROR: System error 104: Connection reset
by peer
2010-09-16 23:59:47 [32641] [39] ERROR: SMPP[YYY]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-16 23:59:51 [32641] [48] ERROR: Error reading from fd 61:
2010-09-16 23:59:51 [32641] [48] ERROR: System error 104: Connection reset
by peer
2010-09-16 23:59:51 [32641] [48] ERROR: SMPP[BBB]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-17 00:00:00 [32641] [47] ERROR: Error reading from fd 40:
2010-09-17 00:00:00 [32641] [47] ERROR: System error 104: Connection reset
by peer
2010-09-17 00:00:00 [32641] [47] ERROR: SMPP[XXX]: Couldn't connect to SMS
center (retrying in 10 seconds).


I am not sure how Kannel works internally, but it is almost as if when the
bind is re-established, the old one is disposed and a new one is created but
the queue and the pointers are still sticking around for the old one and
have not been updated. This results in messages sitting in the queue and not
being routed to the bind which reports as being online.

I see that there might have been similar issues in the past:
http://www.kannel.org/pipermail/users/2009-May/007166.html. It might be
related maybe not.

http://www.kannel.org/pipermail/users/2009-May/007166.htmlWe have already
set our binds up in a transmitter and receiver. We are not running
transceiver.

Regards,


On Thu, Sep 9, 2010 at 3:42 PM, brett skinner tatty.dishcl...@gmail.comwrote:

 Thanks Alvaro for your response.

 I am running a build from SVN from about a 2 weeks ago. I am bit weary of
 turning the loggers to debug mode because we are doing a lot of traffic and
 debug mode is very verbose we will eat through our disk in no time. It would
 be different if it was reproducible or if we could anticipate the problem
 because we could just turn on the loggers at the right time. This happens so
 sporadically we would have to leave the loggers in debug mode. The last time
 this happened was last week.

 I will go check out that tool you mentioned.

 I am not that interested in the extra TLVs. They were just making a bit of
 noise in our logs :)

 Thanks again for your help.



 On Thu, Sep 9, 2010 at 3:35 PM, Alvaro Cornejo 
 cornejo.alv...@gmail.comwrote:

 Have you checked what does the system logs in debug mode?

 Regarding the queue, there is a tool created by Alejandro Guerreri
 that allows you to view the queue content and delete messages... well
 kannel does have several queues, so I don't know if it does wirk for
 the one you mention. I don't remember the details but you can check
 his blog. http://www.blogalex.com/archives/72

 About the TLV's you are receiving, you should ask yoru provider to see
 what does they mean and what info are they sending. If its of your
 interest, you can configure meta-data so you can capture that info;
 otherwise you can safely ignore. As the PDU type is deliver_sm, I
 suspect that it might be the dlr status... and that is why you have
 that queue.

 Also if you upgrade to a recent version, the status page was improved
 and it shows now separate counters for MT and dlrs. in older versions
 MT/dlr counters were mixed


 Hope helps

 Alvaro

Weird binding issue that causes queues to build up.

2010-09-23 Thread brett skinner
Hi

We are using transmitter and receiver because apparently there are
performance issues when using transceiver due to both the
transmitting/receiving being handled on a single thread. We were advised to
use Tx and Rx.

We have contacted the provider and while they acknowledge that they had an
outage for a couple of seconds everyone else was able to reconnect without
an issue. It was just us. But this is not limited to them, it seems any bind
that dies and comes back there is a chance that bearerbox will start
queuing.

Do you have any extra information on the work-around?

Regards,


On Thu, Sep 23, 2010 at 2:24 PM, Benaiad bena...@gmail.com wrote:

 Hi Brett,

 Which type of connection are you using? if it's not as transceiver, I
 suggest you to use it if your provider has a support for this.
 There is a known bug regarding the separated connections for Tx  Rx.
 I beleave that there is another workaround for this by defining two smsc
 groups one for Tx and the other for Rx.

 Regards

 --
 Benaiad



 On Thu, Sep 23, 2010 at 11:21 AM, brett skinner tatty.dishcl...@gmail.com
  wrote:

 Hi guys

 We have just hit this issue AGAIN this morning. Can ANYONE please give
 some guidance here. I have had zero response on this critical issue for over
 two weeks now. Can someone please help. This issue is becoming increasingly
 urgent.

 Regards,
 Brett

 -- Forwarded message --
 From: brett skinner tatty.dishcl...@gmail.com
  Date: Fri, Sep 17, 2010 at 2:16 PM
 Subject: Re: Weird binding issue that causes queues to build up.
 To: Users users@kannel.org


 Hi

 We have experienced this problem again. A couple of our binds to one
 particular smsc (the rest were okay) had connectivity issues last night at
 12 AM. The binds were re-established and reported as being online from the
 status pages. However a queue for one of the binds built up on the
 bearerbox. Only when I had run a stop-smsc and start-smsc for that bind did
 the queue for that bind start processing again.

 In the logs at 12AM we have a bunch of Errors:

 2010-09-16 23:59:35 [32641] [44] ERROR: Error reading from fd 57:
 2010-09-16 23:59:35 [32641] [44] ERROR: System error 104: Connection reset
 by peer
 2010-09-16 23:59:35 [32641] [44] ERROR: SMPP[XXX]: Couldn't connect to SMS
 center (retrying in 10 seconds).
 2010-09-16 23:59:38 [32641] [38] ERROR: Error reading from fd 52:
 2010-09-16 23:59:38 [32641] [38] ERROR: System error 104: Connection reset
 by peer
 2010-09-16 23:59:38 [32641] [38] ERROR: SMPP[YYY]: Couldn't connect to SMS
 center (retrying in 10 seconds).
 2010-09-16 23:59:39 [32641] [46] ERROR: Error reading from fd 50:
 2010-09-16 23:59:39 [32641] [46] ERROR: System error 104: Connection reset
 by peer
 2010-09-16 23:59:39 [32641] [46] ERROR: SMPP[AAA]: Couldn't connect to SMS
 center (retrying in 10 seconds).
 2010-09-16 23:59:47 [32641] [39] ERROR: Error reading from fd 49:
 2010-09-16 23:59:47 [32641] [39] ERROR: System error 104: Connection reset
 by peer
 2010-09-16 23:59:47 [32641] [39] ERROR: SMPP[YYY]: Couldn't connect to SMS
 center (retrying in 10 seconds).
 2010-09-16 23:59:51 [32641] [48] ERROR: Error reading from fd 61:
 2010-09-16 23:59:51 [32641] [48] ERROR: System error 104: Connection reset
 by peer
 2010-09-16 23:59:51 [32641] [48] ERROR: SMPP[BBB]: Couldn't connect to SMS
 center (retrying in 10 seconds).
 2010-09-17 00:00:00 [32641] [47] ERROR: Error reading from fd 40:
 2010-09-17 00:00:00 [32641] [47] ERROR: System error 104: Connection reset
 by peer
 2010-09-17 00:00:00 [32641] [47] ERROR: SMPP[XXX]: Couldn't connect to SMS
 center (retrying in 10 seconds).


 I am not sure how Kannel works internally, but it is almost as if when the
 bind is re-established, the old one is disposed and a new one is created but
 the queue and the pointers are still sticking around for the old one and
 have not been updated. This results in messages sitting in the queue and not
 being routed to the bind which reports as being online.

 I see that there might have been similar issues in the past:
 http://www.kannel.org/pipermail/users/2009-May/007166.html. It might be
 related maybe not.

 http://www.kannel.org/pipermail/users/2009-May/007166.htmlWe have
 already set our binds up in a transmitter and receiver. We are not running
 transceiver.

 Regards,


 On Thu, Sep 9, 2010 at 3:42 PM, brett skinner 
 tatty.dishcl...@gmail.comwrote:

 Thanks Alvaro for your response.

 I am running a build from SVN from about a 2 weeks ago. I am bit weary of
 turning the loggers to debug mode because we are doing a lot of traffic and
 debug mode is very verbose we will eat through our disk in no time. It would
 be different if it was reproducible or if we could anticipate the problem
 because we could just turn on the loggers at the right time. This happens so
 sporadically we would have to leave the loggers in debug mode. The last time
 this happened was last week.

 I will go check out that tool you mentioned.

 I am

Re: Weird binding issue that causes queues to build up.

2010-09-23 Thread Benaiad
Hi Brett,

I've found that workaround, which was offered by Mr. Donald Jackson.
you may find it at
http://www.mail-archive.com/users@kannel.org/msg15958.html

Regards
--
Abdulmnem Benaiad
Almontaha CTO
www.almontaha.ly
Tripoli-Libya



On Thu, Sep 23, 2010 at 2:58 PM, brett skinner tatty.dishcl...@gmail.comwrote:

 Hi

 We are using transmitter and receiver because apparently there are
 performance issues when using transceiver due to both the
 transmitting/receiving being handled on a single thread. We were advised to
 use Tx and Rx.

 We have contacted the provider and while they acknowledge that they had an
 outage for a couple of seconds everyone else was able to reconnect without
 an issue. It was just us. But this is not limited to them, it seems any bind
 that dies and comes back there is a chance that bearerbox will start
 queuing.

 Do you have any extra information on the work-around?

 Regards,


 On Thu, Sep 23, 2010 at 2:24 PM, Benaiad bena...@gmail.com wrote:

 Hi Brett,

 Which type of connection are you using? if it's not as transceiver, I
 suggest you to use it if your provider has a support for this.
 There is a known bug regarding the separated connections for Tx  Rx.
 I beleave that there is another workaround for this by defining two smsc
 groups one for Tx and the other for Rx.

 Regards

 --
 Benaiad



 On Thu, Sep 23, 2010 at 11:21 AM, brett skinner 
 tatty.dishcl...@gmail.com wrote:

 Hi guys

 We have just hit this issue AGAIN this morning. Can ANYONE please give
 some guidance here. I have had zero response on this critical issue for over
 two weeks now. Can someone please help. This issue is becoming increasingly
 urgent.

 Regards,
 Brett

 -- Forwarded message --
 From: brett skinner tatty.dishcl...@gmail.com
  Date: Fri, Sep 17, 2010 at 2:16 PM
 Subject: Re: Weird binding issue that causes queues to build up.
 To: Users users@kannel.org


 Hi

 We have experienced this problem again. A couple of our binds to one
 particular smsc (the rest were okay) had connectivity issues last night at
 12 AM. The binds were re-established and reported as being online from the
 status pages. However a queue for one of the binds built up on the
 bearerbox. Only when I had run a stop-smsc and start-smsc for that bind did
 the queue for that bind start processing again.

 In the logs at 12AM we have a bunch of Errors:

 2010-09-16 23:59:35 [32641] [44] ERROR: Error reading from fd 57:
 2010-09-16 23:59:35 [32641] [44] ERROR: System error 104: Connection
 reset by peer
 2010-09-16 23:59:35 [32641] [44] ERROR: SMPP[XXX]: Couldn't connect to
 SMS center (retrying in 10 seconds).
 2010-09-16 23:59:38 [32641] [38] ERROR: Error reading from fd 52:
 2010-09-16 23:59:38 [32641] [38] ERROR: System error 104: Connection
 reset by peer
 2010-09-16 23:59:38 [32641] [38] ERROR: SMPP[YYY]: Couldn't connect to
 SMS center (retrying in 10 seconds).
 2010-09-16 23:59:39 [32641] [46] ERROR: Error reading from fd 50:
 2010-09-16 23:59:39 [32641] [46] ERROR: System error 104: Connection
 reset by peer
 2010-09-16 23:59:39 [32641] [46] ERROR: SMPP[AAA]: Couldn't connect to
 SMS center (retrying in 10 seconds).
 2010-09-16 23:59:47 [32641] [39] ERROR: Error reading from fd 49:
 2010-09-16 23:59:47 [32641] [39] ERROR: System error 104: Connection
 reset by peer
 2010-09-16 23:59:47 [32641] [39] ERROR: SMPP[YYY]: Couldn't connect to
 SMS center (retrying in 10 seconds).
 2010-09-16 23:59:51 [32641] [48] ERROR: Error reading from fd 61:
 2010-09-16 23:59:51 [32641] [48] ERROR: System error 104: Connection
 reset by peer
 2010-09-16 23:59:51 [32641] [48] ERROR: SMPP[BBB]: Couldn't connect to
 SMS center (retrying in 10 seconds).
 2010-09-17 00:00:00 [32641] [47] ERROR: Error reading from fd 40:
 2010-09-17 00:00:00 [32641] [47] ERROR: System error 104: Connection
 reset by peer
 2010-09-17 00:00:00 [32641] [47] ERROR: SMPP[XXX]: Couldn't connect to
 SMS center (retrying in 10 seconds).


 I am not sure how Kannel works internally, but it is almost as if when
 the bind is re-established, the old one is disposed and a new one is created
 but the queue and the pointers are still sticking around for the old one and
 have not been updated. This results in messages sitting in the queue and not
 being routed to the bind which reports as being online.

 I see that there might have been similar issues in the past:
 http://www.kannel.org/pipermail/users/2009-May/007166.html. It might be
 related maybe not.

 http://www.kannel.org/pipermail/users/2009-May/007166.htmlWe have
 already set our binds up in a transmitter and receiver. We are not running
 transceiver.

 Regards,


 On Thu, Sep 9, 2010 at 3:42 PM, brett skinner tatty.dishcl...@gmail.com
  wrote:

 Thanks Alvaro for your response.

 I am running a build from SVN from about a 2 weeks ago. I am bit weary
 of turning the loggers to debug mode because we are doing a lot of traffic
 and debug mode is very verbose we will eat through our disk in no time

Re: Weird binding issue that causes queues to build up.

2010-09-23 Thread brett skinner
Thank you VERY MUCH for your help. We will give that a try.

On Thu, Sep 23, 2010 at 3:13 PM, Benaiad bena...@gmail.com wrote:

 Hi Brett,

 I've found that workaround, which was offered by Mr. Donald Jackson.
 you may find it at
 http://www.mail-archive.com/users@kannel.org/msg15958.html

 Regards
 --
 Abdulmnem Benaiad
 Almontaha CTO
 www.almontaha.ly
 Tripoli-Libya



 On Thu, Sep 23, 2010 at 2:58 PM, brett skinner 
 tatty.dishcl...@gmail.comwrote:

 Hi

 We are using transmitter and receiver because apparently there are
 performance issues when using transceiver due to both the
 transmitting/receiving being handled on a single thread. We were advised to
 use Tx and Rx.

 We have contacted the provider and while they acknowledge that they had an
 outage for a couple of seconds everyone else was able to reconnect without
 an issue. It was just us. But this is not limited to them, it seems any bind
 that dies and comes back there is a chance that bearerbox will start
 queuing.

 Do you have any extra information on the work-around?

 Regards,


 On Thu, Sep 23, 2010 at 2:24 PM, Benaiad bena...@gmail.com wrote:

 Hi Brett,

 Which type of connection are you using? if it's not as transceiver, I
 suggest you to use it if your provider has a support for this.
 There is a known bug regarding the separated connections for Tx  Rx.
 I beleave that there is another workaround for this by defining two smsc
 groups one for Tx and the other for Rx.

 Regards

 --
 Benaiad



 On Thu, Sep 23, 2010 at 11:21 AM, brett skinner 
 tatty.dishcl...@gmail.com wrote:

 Hi guys

 We have just hit this issue AGAIN this morning. Can ANYONE please give
 some guidance here. I have had zero response on this critical issue for 
 over
 two weeks now. Can someone please help. This issue is becoming increasingly
 urgent.

 Regards,
 Brett

 -- Forwarded message --
 From: brett skinner tatty.dishcl...@gmail.com
  Date: Fri, Sep 17, 2010 at 2:16 PM
 Subject: Re: Weird binding issue that causes queues to build up.
 To: Users users@kannel.org


 Hi

 We have experienced this problem again. A couple of our binds to one
 particular smsc (the rest were okay) had connectivity issues last night at
 12 AM. The binds were re-established and reported as being online from the
 status pages. However a queue for one of the binds built up on the
 bearerbox. Only when I had run a stop-smsc and start-smsc for that bind did
 the queue for that bind start processing again.

 In the logs at 12AM we have a bunch of Errors:

 2010-09-16 23:59:35 [32641] [44] ERROR: Error reading from fd 57:
 2010-09-16 23:59:35 [32641] [44] ERROR: System error 104: Connection
 reset by peer
 2010-09-16 23:59:35 [32641] [44] ERROR: SMPP[XXX]: Couldn't connect to
 SMS center (retrying in 10 seconds).
 2010-09-16 23:59:38 [32641] [38] ERROR: Error reading from fd 52:
 2010-09-16 23:59:38 [32641] [38] ERROR: System error 104: Connection
 reset by peer
 2010-09-16 23:59:38 [32641] [38] ERROR: SMPP[YYY]: Couldn't connect to
 SMS center (retrying in 10 seconds).
 2010-09-16 23:59:39 [32641] [46] ERROR: Error reading from fd 50:
 2010-09-16 23:59:39 [32641] [46] ERROR: System error 104: Connection
 reset by peer
 2010-09-16 23:59:39 [32641] [46] ERROR: SMPP[AAA]: Couldn't connect to
 SMS center (retrying in 10 seconds).
 2010-09-16 23:59:47 [32641] [39] ERROR: Error reading from fd 49:
 2010-09-16 23:59:47 [32641] [39] ERROR: System error 104: Connection
 reset by peer
 2010-09-16 23:59:47 [32641] [39] ERROR: SMPP[YYY]: Couldn't connect to
 SMS center (retrying in 10 seconds).
 2010-09-16 23:59:51 [32641] [48] ERROR: Error reading from fd 61:
 2010-09-16 23:59:51 [32641] [48] ERROR: System error 104: Connection
 reset by peer
 2010-09-16 23:59:51 [32641] [48] ERROR: SMPP[BBB]: Couldn't connect to
 SMS center (retrying in 10 seconds).
 2010-09-17 00:00:00 [32641] [47] ERROR: Error reading from fd 40:
 2010-09-17 00:00:00 [32641] [47] ERROR: System error 104: Connection
 reset by peer
 2010-09-17 00:00:00 [32641] [47] ERROR: SMPP[XXX]: Couldn't connect to
 SMS center (retrying in 10 seconds).


 I am not sure how Kannel works internally, but it is almost as if when
 the bind is re-established, the old one is disposed and a new one is 
 created
 but the queue and the pointers are still sticking around for the old one 
 and
 have not been updated. This results in messages sitting in the queue and 
 not
 being routed to the bind which reports as being online.

 I see that there might have been similar issues in the past:
 http://www.kannel.org/pipermail/users/2009-May/007166.html. It might be
 related maybe not.

 http://www.kannel.org/pipermail/users/2009-May/007166.htmlWe have
 already set our binds up in a transmitter and receiver. We are not running
 transceiver.

 Regards,


 On Thu, Sep 9, 2010 at 3:42 PM, brett skinner 
 tatty.dishcl...@gmail.com wrote:

 Thanks Alvaro for your response.

 I am running a build from SVN from about a 2 weeks ago. I am bit weary

Re: Weird binding issue that causes queues to build up.

2010-09-17 Thread brett skinner
 has had a chance to look at this yet?
  Thanks and appreciate any help.
 
  -- Forwarded message --
  From: brett skinner tatty.dishcl...@gmail.com
  Date: Tue, Sep 7, 2010 at 10:47 AM
  Subject: Weird binding issue that causes queues to build up.
  To: Users users@kannel.org
 
 
  Hi
  We are experiencing a rather weird occasional issue with Kannel. We have
 two
  different boxes each with a Kannel installation. Every now and then one
 of
  the boxes stops processing SMS queues and the queues just build up. This
  happens to both boxes (just not at the same time) When we have a look at
 the
  status page we can see the queue and there are sms queued to the
 bearerbox.
  I assume that it is the bearerbox queue. It looks as followed (from the
  status page)
  SMS: received 123 (0 queued), sent 123 (456 queued), store size -1
  It is the 456 queued part that we are concerned about. All the binds
 report
  as being online with 0 in the queues but that 456 queue does not
 disappear.
  If I sit trying to restart bind after bind one of them usually does the
  trick and queue disappears. The problem is we usually have no idea which
  bind it is and they are all reporting as being online. I have noticed
  looking through our logs from upstream applications that it appears that
  there was a network outage at round about the same time. I have not yet
  confirmed this with the hosting company. Also this is what appears in
 the
  syslog.
  Sep  6 23:02:46 123-123-123-123 avahi-daemon[17943]: Received response
 from
  host 64.150.181.120 with invalid source port 53756 on interface 'eth0.0'
  Sep  6 23:17:01 123-123-123-123 CRON[16934]: (root) CMD (   cd / 
  run-parts --report /etc/cron.hourly)
  Sep  6 23:32:46 123-123-123-123 avahi-daemon[17943]: Received response
 from
  host 64.150.181.120 with invalid source port 33895 on interface 'eth0.0'
  Sep  7 00:02:45 123-123-123-123 avahi-daemon[17943]: Received response
 from
  host 64.150.181.120 with invalid source port 55945 on interface 'eth0.0'
  Sep  7 00:17:01 123-123-123-123 CRON[17231]: (root) CMD (   cd / 
  run-parts --report /etc/cron.hourly)
  Sep  7 00:32:45 123-123-123-123 avahi-daemon[17943]: Received response
 from
  host 64.150.181.120 with invalid source port 45291 on interface 'eth0.0'
  Sep  7 01:02:45 123-123-123-123 avahi-daemon[17943]: Received response
 from
  host 64.150.181.120 with invalid source port 39067 on interface 'eth0.0'
  Sep  7 01:17:01 123-123-123-123 CRON[17479]: (root) CMD (   cd / 
  run-parts --report /etc/cron.hourly)
  That IP address is not in our kannel.conf file. I am not sure what these
  errors are about. I might need to investigate this further. I am not
  security expert so I have no idea if this is malicious or not.
  This is what appears in the bearerbox logs at about the same time as the
  outage:
  2010-09-06 23:02:46 [32641] [12] WARNING: SMPP: Unknown
  TLV(0x1406,0x0007,01906032503580) for PDU type (deliver_sm) received!
  2010-09-06 23:03:07 [32641] [12] WARNING: SMPP: Unknown
  TLV(0x1406,0x0007,01906032605180) for PDU type (deliver_sm) received!
  2010-09-06 23:08:04 [32641] [10] WARNING: SMPP: Unknown
  TLV(0x1406,0x0007,01906032113180) for PDU type (deliver_sm) received!
  2010-09-06 23:14:32 [32641] [9] WARNING: SMPP: Unknown
  TLV(0x1406,0x0007,01906032711480) for PDU type (deliver_sm) received!
  2010-09-06 23:26:12 [32641] [6] ERROR: SMPP[AAA]: I/O error or other
 error.
  Re-connecting.
  2010-09-06 23:26:12 [32641] [6] ERROR: SMPP[AAA]: Couldn't connect to
 SMS
  center (retrying in 10 seconds).
  2010-09-06 23:26:12 [32641] [18] WARNING: SMPP[BBB]: Not ACKED message
  found, will retransmit. SENT94sec. ago, SEQ423861,
 DST+x
  2010-09-06 23:26:12 [32641] [18] WARNING: SMPP[BBB]: Not ACKED message
  found, will retransmit. SENT94sec. ago, SEQ423862,
 DST+x
  2010-09-06 23:26:12 [32641] [18] ERROR: SMPP[BBB]: I/O error or other
 error.
  Re-connecting.
  2010-09-06 23:26:12 [32641] [18] ERROR: SMPP[BBB]: Couldn't connect to
 SMS
  center (retrying in 10 seconds).
  2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for
 concatenated
  message ''+x' '+yyy' 'EEE' '10' '2' '''. Send
 message
  parts as is.
  2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for
 concatenated
  message ''+x' '+yyy' 'EEE' '85' '2' '''. Send
 message
  parts as is.
  2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for
 concatenated
  message ''+x' '+yyy' 'EEE' '152' '2' '''. Send
 message
  parts as is.
  2010-09-06 23:26:42 [32641] [17] ERROR: SMPP[CCC]: I/O error or other
 error.
  Re-connecting.
  2010-09-06 23:26:42 [32641] [17] ERROR: SMPP[CCC]: Couldn't connect to
 SMS
  center (retrying in 10 seconds).
  2010-09-06 23:27:08 [32641] [13] WARNING: SMPP: Unknown
  TLV(0x1406,0x0007,01906032035180) for PDU type (deliver_sm) received!
  2010-09-06 23:27:12 [32641] [19] ERROR: SMPP[BBB]: I/O error

Fwd: Weird binding issue that causes queues to build up.

2010-09-09 Thread brett skinner
Hi everyone

Just wondering if anyone has had a chance to look at this yet?

Thanks and appreciate any help.

-- Forwarded message --
From: brett skinner tatty.dishcl...@gmail.com
Date: Tue, Sep 7, 2010 at 10:47 AM
Subject: Weird binding issue that causes queues to build up.
To: Users users@kannel.org


Hi

We are experiencing a rather weird occasional issue with Kannel. We have two
different boxes each with a Kannel installation. Every now and then one of
the boxes stops processing SMS queues and the queues just build up. This
happens to both boxes (just not at the same time) When we have a look at the
status page we can see the queue and there are sms queued to the bearerbox.
I assume that it is the bearerbox queue. It looks as followed (from the
status page)

SMS: received 123 (0 queued), sent 123 (456 queued), store size -1

It is the 456 queued part that we are concerned about. All the binds report
as being online with 0 in the queues but that 456 queue does not disappear.
If I sit trying to restart bind after bind one of them usually does the
trick and queue disappears. The problem is we usually have no idea which
bind it is and they are all reporting as being online. I have noticed
looking through our logs from upstream applications that it appears that
there was a network outage at round about the same time. I have not yet
confirmed this with the hosting company. Also this is what appears in the
syslog.

Sep  6 23:02:46 123-123-123-123 avahi-daemon[17943]: Received response from
host 64.150.181.120 with invalid source port 53756 on interface 'eth0.0'
Sep  6 23:17:01 123-123-123-123 CRON[16934]: (root) CMD (   cd / 
run-parts --report /etc/cron.hourly)
Sep  6 23:32:46 123-123-123-123 avahi-daemon[17943]: Received response from
host 64.150.181.120 with invalid source port 33895 on interface 'eth0.0'
Sep  7 00:02:45 123-123-123-123 avahi-daemon[17943]: Received response from
host 64.150.181.120 with invalid source port 55945 on interface 'eth0.0'
Sep  7 00:17:01 123-123-123-123 CRON[17231]: (root) CMD (   cd / 
run-parts --report /etc/cron.hourly)
Sep  7 00:32:45 123-123-123-123 avahi-daemon[17943]: Received response from
host 64.150.181.120 with invalid source port 45291 on interface 'eth0.0'
Sep  7 01:02:45 123-123-123-123 avahi-daemon[17943]: Received response from
host 64.150.181.120 with invalid source port 39067 on interface 'eth0.0'
Sep  7 01:17:01 123-123-123-123 CRON[17479]: (root) CMD (   cd / 
run-parts --report /etc/cron.hourly)

That IP address is not in our kannel.conf file. I am not sure what these
errors are about. I might need to investigate this further. I am not
security expert so I have no idea if this is malicious or not.

This is what appears in the bearerbox logs at about the same time as the
outage:

2010-09-06 23:02:46 [32641] [12] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032503580) for PDU type (deliver_sm) received!
2010-09-06 23:03:07 [32641] [12] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032605180) for PDU type (deliver_sm) received!
2010-09-06 23:08:04 [32641] [10] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032113180) for PDU type (deliver_sm) received!
2010-09-06 23:14:32 [32641] [9] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032711480) for PDU type (deliver_sm) received!
2010-09-06 23:26:12 [32641] [6] ERROR: SMPP[AAA]: I/O error or other error.
Re-connecting.
2010-09-06 23:26:12 [32641] [6] ERROR: SMPP[AAA]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-06 23:26:12 [32641] [18] WARNING: SMPP[BBB]: Not ACKED message
found, will retransmit. SENT94sec. ago, SEQ423861, DST+x
2010-09-06 23:26:12 [32641] [18] WARNING: SMPP[BBB]: Not ACKED message
found, will retransmit. SENT94sec. ago, SEQ423862, DST+x
2010-09-06 23:26:12 [32641] [18] ERROR: SMPP[BBB]: I/O error or other error.
Re-connecting.
2010-09-06 23:26:12 [32641] [18] ERROR: SMPP[BBB]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for concatenated
message ''+x' '+yyy' 'EEE' '10' '2' '''. Send message
parts as is.
2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for concatenated
message ''+x' '+yyy' 'EEE' '85' '2' '''. Send message
parts as is.
2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for concatenated
message ''+x' '+yyy' 'EEE' '152' '2' '''. Send message
parts as is.
2010-09-06 23:26:42 [32641] [17] ERROR: SMPP[CCC]: I/O error or other error.
Re-connecting.
2010-09-06 23:26:42 [32641] [17] ERROR: SMPP[CCC]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-06 23:27:08 [32641] [13] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032035180) for PDU type (deliver_sm) received!
2010-09-06 23:27:12 [32641] [19] ERROR: SMPP[BBB]: I/O error or other error.
Re-connecting.
2010-09-06 23:27:12 [32641] [19] ERROR: SMPP[BBB]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010

Re: Weird binding issue that causes queues to build up.

2010-09-09 Thread Alvaro Cornejo
Have you checked what does the system logs in debug mode?

Regarding the queue, there is a tool created by Alejandro Guerreri
that allows you to view the queue content and delete messages... well
kannel does have several queues, so I don't know if it does wirk for
the one you mention. I don't remember the details but you can check
his blog. http://www.blogalex.com/archives/72

About the TLV's you are receiving, you should ask yoru provider to see
what does they mean and what info are they sending. If its of your
interest, you can configure meta-data so you can capture that info;
otherwise you can safely ignore. As the PDU type is deliver_sm, I
suspect that it might be the dlr status... and that is why you have
that queue.

Also if you upgrade to a recent version, the status page was improved
and it shows now separate counters for MT and dlrs. in older versions
MT/dlr counters were mixed


Hope helps

Alvaro
|-|
Envíe y Reciba Datos y mensajes de Texto (SMS) hacia y desde cualquier
celular y Nextel
en el Perú, México y en mas de 180 paises. Use aplicaciones 2 vias via
SMS y GPRS online
              Visitenos en www.perusms.NET www.smsglobal.com.mx y
www.pravcom.com



On Thu, Sep 9, 2010 at 2:42 AM, brett skinner tatty.dishcl...@gmail.com wrote:
 Hi everyone
 Just wondering if anyone has had a chance to look at this yet?
 Thanks and appreciate any help.

 -- Forwarded message --
 From: brett skinner tatty.dishcl...@gmail.com
 Date: Tue, Sep 7, 2010 at 10:47 AM
 Subject: Weird binding issue that causes queues to build up.
 To: Users users@kannel.org


 Hi
 We are experiencing a rather weird occasional issue with Kannel. We have two
 different boxes each with a Kannel installation. Every now and then one of
 the boxes stops processing SMS queues and the queues just build up. This
 happens to both boxes (just not at the same time) When we have a look at the
 status page we can see the queue and there are sms queued to the bearerbox.
 I assume that it is the bearerbox queue. It looks as followed (from the
 status page)
 SMS: received 123 (0 queued), sent 123 (456 queued), store size -1
 It is the 456 queued part that we are concerned about. All the binds report
 as being online with 0 in the queues but that 456 queue does not disappear.
 If I sit trying to restart bind after bind one of them usually does the
 trick and queue disappears. The problem is we usually have no idea which
 bind it is and they are all reporting as being online. I have noticed
 looking through our logs from upstream applications that it appears that
 there was a network outage at round about the same time. I have not yet
 confirmed this with the hosting company. Also this is what appears in the
 syslog.
 Sep  6 23:02:46 123-123-123-123 avahi-daemon[17943]: Received response from
 host 64.150.181.120 with invalid source port 53756 on interface 'eth0.0'
 Sep  6 23:17:01 123-123-123-123 CRON[16934]: (root) CMD (   cd / 
 run-parts --report /etc/cron.hourly)
 Sep  6 23:32:46 123-123-123-123 avahi-daemon[17943]: Received response from
 host 64.150.181.120 with invalid source port 33895 on interface 'eth0.0'
 Sep  7 00:02:45 123-123-123-123 avahi-daemon[17943]: Received response from
 host 64.150.181.120 with invalid source port 55945 on interface 'eth0.0'
 Sep  7 00:17:01 123-123-123-123 CRON[17231]: (root) CMD (   cd / 
 run-parts --report /etc/cron.hourly)
 Sep  7 00:32:45 123-123-123-123 avahi-daemon[17943]: Received response from
 host 64.150.181.120 with invalid source port 45291 on interface 'eth0.0'
 Sep  7 01:02:45 123-123-123-123 avahi-daemon[17943]: Received response from
 host 64.150.181.120 with invalid source port 39067 on interface 'eth0.0'
 Sep  7 01:17:01 123-123-123-123 CRON[17479]: (root) CMD (   cd / 
 run-parts --report /etc/cron.hourly)
 That IP address is not in our kannel.conf file. I am not sure what these
 errors are about. I might need to investigate this further. I am not
 security expert so I have no idea if this is malicious or not.
 This is what appears in the bearerbox logs at about the same time as the
 outage:
 2010-09-06 23:02:46 [32641] [12] WARNING: SMPP: Unknown
 TLV(0x1406,0x0007,01906032503580) for PDU type (deliver_sm) received!
 2010-09-06 23:03:07 [32641] [12] WARNING: SMPP: Unknown
 TLV(0x1406,0x0007,01906032605180) for PDU type (deliver_sm) received!
 2010-09-06 23:08:04 [32641] [10] WARNING: SMPP: Unknown
 TLV(0x1406,0x0007,01906032113180) for PDU type (deliver_sm) received!
 2010-09-06 23:14:32 [32641] [9] WARNING: SMPP: Unknown
 TLV(0x1406,0x0007,01906032711480) for PDU type (deliver_sm) received!
 2010-09-06 23:26:12 [32641] [6] ERROR: SMPP[AAA]: I/O error or other error.
 Re-connecting.
 2010-09-06 23:26:12 [32641] [6] ERROR: SMPP[AAA]: Couldn't connect to SMS
 center (retrying in 10 seconds).
 2010-09-06 23:26:12 [32641] [18] WARNING: SMPP[BBB]: Not ACKED message

Fwd: Weird binding issue that causes queues to build up.

2010-09-09 Thread brett skinner
Thanks Alvaro for your response.

I am running a build from SVN from about a 2 weeks ago. I am bit weary of
turning the loggers to debug mode because we are doing a lot of traffic and
debug mode is very verbose we will eat through our disk in no time. It would
be different if it was reproducible or if we could anticipate the problem
because we could just turn on the loggers at the right time. This happens so
sporadically we would have to leave the loggers in debug mode. The last time
this happened was last week.

I will go check out that tool you mentioned.

I am not that interested in the extra TLVs. They were just making a bit of
noise in our logs :)

Thanks again for your help.



On Thu, Sep 9, 2010 at 3:35 PM, Alvaro Cornejo cornejo.alv...@gmail.comwrote:

 Have you checked what does the system logs in debug mode?

 Regarding the queue, there is a tool created by Alejandro Guerreri
 that allows you to view the queue content and delete messages... well
 kannel does have several queues, so I don't know if it does wirk for
 the one you mention. I don't remember the details but you can check
 his blog. http://www.blogalex.com/archives/72

 About the TLV's you are receiving, you should ask yoru provider to see
 what does they mean and what info are they sending. If its of your
 interest, you can configure meta-data so you can capture that info;
 otherwise you can safely ignore. As the PDU type is deliver_sm, I
 suspect that it might be the dlr status... and that is why you have
 that queue.

 Also if you upgrade to a recent version, the status page was improved
 and it shows now separate counters for MT and dlrs. in older versions
 MT/dlr counters were mixed


 Hope helps

 Alvaro

 |-|
 Envíe y Reciba Datos y mensajes de Texto (SMS) hacia y desde cualquier
 celular y Nextel
 en el Perú, México y en mas de 180 paises. Use aplicaciones 2 vias via
 SMS y GPRS online
   Visitenos en www.perusms.NET www.smsglobal.com.mx y
 www.pravcom.com



 On Thu, Sep 9, 2010 at 2:42 AM, brett skinner tatty.dishcl...@gmail.com
 wrote:
  Hi everyone
  Just wondering if anyone has had a chance to look at this yet?
  Thanks and appreciate any help.
 
  -- Forwarded message --
  From: brett skinner tatty.dishcl...@gmail.com
  Date: Tue, Sep 7, 2010 at 10:47 AM
  Subject: Weird binding issue that causes queues to build up.
  To: Users users@kannel.org
 
 
  Hi
  We are experiencing a rather weird occasional issue with Kannel. We have
 two
  different boxes each with a Kannel installation. Every now and then one
 of
  the boxes stops processing SMS queues and the queues just build up. This
  happens to both boxes (just not at the same time) When we have a look at
 the
  status page we can see the queue and there are sms queued to the
 bearerbox.
  I assume that it is the bearerbox queue. It looks as followed (from the
  status page)
  SMS: received 123 (0 queued), sent 123 (456 queued), store size -1
  It is the 456 queued part that we are concerned about. All the binds
 report
  as being online with 0 in the queues but that 456 queue does not
 disappear.
  If I sit trying to restart bind after bind one of them usually does the
  trick and queue disappears. The problem is we usually have no idea which
  bind it is and they are all reporting as being online. I have noticed
  looking through our logs from upstream applications that it appears that
  there was a network outage at round about the same time. I have not yet
  confirmed this with the hosting company. Also this is what appears in the
  syslog.
  Sep  6 23:02:46 123-123-123-123 avahi-daemon[17943]: Received response
 from
  host 64.150.181.120 with invalid source port 53756 on interface 'eth0.0'
  Sep  6 23:17:01 123-123-123-123 CRON[16934]: (root) CMD (   cd / 
  run-parts --report /etc/cron.hourly)
  Sep  6 23:32:46 123-123-123-123 avahi-daemon[17943]: Received response
 from
  host 64.150.181.120 with invalid source port 33895 on interface 'eth0.0'
  Sep  7 00:02:45 123-123-123-123 avahi-daemon[17943]: Received response
 from
  host 64.150.181.120 with invalid source port 55945 on interface 'eth0.0'
  Sep  7 00:17:01 123-123-123-123 CRON[17231]: (root) CMD (   cd / 
  run-parts --report /etc/cron.hourly)
  Sep  7 00:32:45 123-123-123-123 avahi-daemon[17943]: Received response
 from
  host 64.150.181.120 with invalid source port 45291 on interface 'eth0.0'
  Sep  7 01:02:45 123-123-123-123 avahi-daemon[17943]: Received response
 from
  host 64.150.181.120 with invalid source port 39067 on interface 'eth0.0'
  Sep  7 01:17:01 123-123-123-123 CRON[17479]: (root) CMD (   cd / 
  run-parts --report /etc/cron.hourly)
  That IP address is not in our kannel.conf file. I am not sure what these
  errors are about. I might need to investigate this further. I am not
  security expert so I have no idea if this is malicious

Weird binding issue that causes queues to build up.

2010-09-07 Thread brett skinner
Hi

We are experiencing a rather weird occasional issue with Kannel. We have two
different boxes each with a Kannel installation. Every now and then one of
the boxes stops processing SMS queues and the queues just build up. This
happens to both boxes (just not at the same time) When we have a look at the
status page we can see the queue and there are sms queued to the bearerbox.
I assume that it is the bearerbox queue. It looks as followed (from the
status page)

SMS: received 123 (0 queued), sent 123 (456 queued), store size -1

It is the 456 queued part that we are concerned about. All the binds report
as being online with 0 in the queues but that 456 queue does not disappear.
If I sit trying to restart bind after bind one of them usually does the
trick and queue disappears. The problem is we usually have no idea which
bind it is and they are all reporting as being online. I have noticed
looking through our logs from upstream applications that it appears that
there was a network outage at round about the same time. I have not yet
confirmed this with the hosting company. Also this is what appears in the
syslog.

Sep  6 23:02:46 123-123-123-123 avahi-daemon[17943]: Received response from
host 64.150.181.120 with invalid source port 53756 on interface 'eth0.0'
Sep  6 23:17:01 123-123-123-123 CRON[16934]: (root) CMD (   cd / 
run-parts --report /etc/cron.hourly)
Sep  6 23:32:46 123-123-123-123 avahi-daemon[17943]: Received response from
host 64.150.181.120 with invalid source port 33895 on interface 'eth0.0'
Sep  7 00:02:45 123-123-123-123 avahi-daemon[17943]: Received response from
host 64.150.181.120 with invalid source port 55945 on interface 'eth0.0'
Sep  7 00:17:01 123-123-123-123 CRON[17231]: (root) CMD (   cd / 
run-parts --report /etc/cron.hourly)
Sep  7 00:32:45 123-123-123-123 avahi-daemon[17943]: Received response from
host 64.150.181.120 with invalid source port 45291 on interface 'eth0.0'
Sep  7 01:02:45 123-123-123-123 avahi-daemon[17943]: Received response from
host 64.150.181.120 with invalid source port 39067 on interface 'eth0.0'
Sep  7 01:17:01 123-123-123-123 CRON[17479]: (root) CMD (   cd / 
run-parts --report /etc/cron.hourly)

That IP address is not in our kannel.conf file. I am not sure what these
errors are about. I might need to investigate this further. I am not
security expert so I have no idea if this is malicious or not.

This is what appears in the bearerbox logs at about the same time as the
outage:

2010-09-06 23:02:46 [32641] [12] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032503580) for PDU type (deliver_sm) received!
2010-09-06 23:03:07 [32641] [12] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032605180) for PDU type (deliver_sm) received!
2010-09-06 23:08:04 [32641] [10] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032113180) for PDU type (deliver_sm) received!
2010-09-06 23:14:32 [32641] [9] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032711480) for PDU type (deliver_sm) received!
2010-09-06 23:26:12 [32641] [6] ERROR: SMPP[AAA]: I/O error or other error.
Re-connecting.
2010-09-06 23:26:12 [32641] [6] ERROR: SMPP[AAA]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-06 23:26:12 [32641] [18] WARNING: SMPP[BBB]: Not ACKED message
found, will retransmit. SENT94sec. ago, SEQ423861, DST+x
2010-09-06 23:26:12 [32641] [18] WARNING: SMPP[BBB]: Not ACKED message
found, will retransmit. SENT94sec. ago, SEQ423862, DST+x
2010-09-06 23:26:12 [32641] [18] ERROR: SMPP[BBB]: I/O error or other error.
Re-connecting.
2010-09-06 23:26:12 [32641] [18] ERROR: SMPP[BBB]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for concatenated
message ''+x' '+yyy' 'EEE' '10' '2' '''. Send message
parts as is.
2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for concatenated
message ''+x' '+yyy' 'EEE' '85' '2' '''. Send message
parts as is.
2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for concatenated
message ''+x' '+yyy' 'EEE' '152' '2' '''. Send message
parts as is.
2010-09-06 23:26:42 [32641] [17] ERROR: SMPP[CCC]: I/O error or other error.
Re-connecting.
2010-09-06 23:26:42 [32641] [17] ERROR: SMPP[CCC]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-06 23:27:08 [32641] [13] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032035180) for PDU type (deliver_sm) received!
2010-09-06 23:27:12 [32641] [19] ERROR: SMPP[BBB]: I/O error or other error.
Re-connecting.
2010-09-06 23:27:12 [32641] [19] ERROR: SMPP[BBB]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-06 23:27:14 [32641] [12] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032031280) for PDU type (deliver_sm) received!
2010-09-06 23:27:25 [32641] [16] ERROR: SMPP[CCC]: I/O error or other error.
Re-connecting.
2010-09-06 23:27:25 [32641] [16] ERROR: SMPP[CCC]: Couldn't connect to SMS
center (retrying in 10 seconds).