Hi

We are experiencing a rather weird occasional issue with Kannel. We have two
different boxes each with a Kannel installation. Every now and then one of
the boxes stops processing SMS queues and the queues just build up. This
happens to both boxes (just not at the same time) When we have a look at the
status page we can see the queue and there are sms queued to the bearerbox.
I assume that it is the bearerbox queue. It looks as followed (from the
status page)

SMS: received 123 (0 queued), sent 123 (456 queued), store size -1

It is the 456 queued part that we are concerned about. All the binds report
as being online with 0 in the queues but that 456 queue does not disappear.
If I sit trying to restart bind after bind one of them usually does the
trick and queue disappears. The problem is we usually have no idea which
bind it is and they are all reporting as being online. I have noticed
looking through our logs from upstream applications that it appears that
there was a network outage at round about the same time. I have not yet
confirmed this with the hosting company. Also this is what appears in the
syslog.

Sep  6 23:02:46 123-123-123-123 avahi-daemon[17943]: Received response from
host 64.150.181.120 with invalid source port 53756 on interface 'eth0.0'
Sep  6 23:17:01 123-123-123-123 CRON[16934]: (root) CMD (   cd / &&
run-parts --report /etc/cron.hourly)
Sep  6 23:32:46 123-123-123-123 avahi-daemon[17943]: Received response from
host 64.150.181.120 with invalid source port 33895 on interface 'eth0.0'
Sep  7 00:02:45 123-123-123-123 avahi-daemon[17943]: Received response from
host 64.150.181.120 with invalid source port 55945 on interface 'eth0.0'
Sep  7 00:17:01 123-123-123-123 CRON[17231]: (root) CMD (   cd / &&
run-parts --report /etc/cron.hourly)
Sep  7 00:32:45 123-123-123-123 avahi-daemon[17943]: Received response from
host 64.150.181.120 with invalid source port 45291 on interface 'eth0.0'
Sep  7 01:02:45 123-123-123-123 avahi-daemon[17943]: Received response from
host 64.150.181.120 with invalid source port 39067 on interface 'eth0.0'
Sep  7 01:17:01 123-123-123-123 CRON[17479]: (root) CMD (   cd / &&
run-parts --report /etc/cron.hourly)

That IP address is not in our kannel.conf file. I am not sure what these
errors are about. I might need to investigate this further. I am not
security expert so I have no idea if this is malicious or not.

This is what appears in the bearerbox logs at about the same time as the
outage:

2010-09-06 23:02:46 [32641] [12] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032503580) for PDU type (deliver_sm) received!
2010-09-06 23:03:07 [32641] [12] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032605180) for PDU type (deliver_sm) received!
2010-09-06 23:08:04 [32641] [10] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032113180) for PDU type (deliver_sm) received!
2010-09-06 23:14:32 [32641] [9] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032711480) for PDU type (deliver_sm) received!
2010-09-06 23:26:12 [32641] [6] ERROR: SMPP[AAA]: I/O error or other error.
Re-connecting.
2010-09-06 23:26:12 [32641] [6] ERROR: SMPP[AAA]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-06 23:26:12 [32641] [18] WARNING: SMPP[BBB]: Not ACKED message
found, will retransmit. SENT<94>sec. ago, SEQ<423861>, DST<+xxxxxxxxxxxxx>
2010-09-06 23:26:12 [32641] [18] WARNING: SMPP[BBB]: Not ACKED message
found, will retransmit. SENT<94>sec. ago, SEQ<423862>, DST<+xxxxxxxxxxxxx>
2010-09-06 23:26:12 [32641] [18] ERROR: SMPP[BBB]: I/O error or other error.
Re-connecting.
2010-09-06 23:26:12 [32641] [18] ERROR: SMPP[BBB]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for concatenated
message ''+xxxxxxxxxxxxx>' '+yyyyyyyyyyy' 'EEE' '10' '2' '''. Send message
parts as is.
2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for concatenated
message ''+xxxxxxxxxxxxx>' '+yyyyyyyyyyy' 'EEE' '85' '2' '''. Send message
parts as is.
2010-09-06 23:26:12 [32641] [24] WARNING: Time-out waiting for concatenated
message ''+xxxxxxxxxxxxx>' '+yyyyyyyyyyy' 'EEE' '152' '2' '''. Send message
parts as is.
2010-09-06 23:26:42 [32641] [17] ERROR: SMPP[CCC]: I/O error or other error.
Re-connecting.
2010-09-06 23:26:42 [32641] [17] ERROR: SMPP[CCC]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-06 23:27:08 [32641] [13] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032035180) for PDU type (deliver_sm) received!
2010-09-06 23:27:12 [32641] [19] ERROR: SMPP[BBB]: I/O error or other error.
Re-connecting.
2010-09-06 23:27:12 [32641] [19] ERROR: SMPP[BBB]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-06 23:27:14 [32641] [12] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032031280) for PDU type (deliver_sm) received!
2010-09-06 23:27:25 [32641] [16] ERROR: SMPP[CCC]: I/O error or other error.
Re-connecting.
2010-09-06 23:27:25 [32641] [16] ERROR: SMPP[CCC]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-06 23:28:11 [32641] [6] WARNING: SMPP: Unknown
TLV(0x140e,0x000c,323738373735303030303200) for PDU type (deliver_sm)
received!
2010-09-06 23:55:18 [32641] [8] ERROR: SMPP[DDD]: I/O error or other error.
Re-connecting.
2010-09-06 23:55:18 [32641] [8] ERROR: SMPP[DDD]: Couldn't connect to SMS
center (retrying in 10 seconds).
2010-09-06 23:55:21 [32641] [11] WARNING: SMPP: Unknown
TLV(0x1406,0x0007,01906032858280) for PDU type (deliver_sm) received!

I am looking for any sort of guidance of where to start to resolve this
issue. Also any comments will be most welcome. In general I would like to
know:


   1. Is there anyway that I can see what is in that queue of 456, I would
   like to know which bind is down. I thought store-status might be it does not
   appear to be.
   2. What could be causing this issue? (If you suspect that is something to
   do with configuration I will post the configuration file)
   3. I notice some unknown TLV warnings. Is this something we should be
   concerned about?
   4. It seems that there was some sort of network problem and all the
   connections (to different smscs) disconnected and reconnected. Why does the
   queue not disappear after they reconnect?

I greatly appreciate your time and effort. Thanks

Regards,

Reply via email to