Re: [PATH] convert internal charset to UTF-8

Peter Christensen Thu, 20 Jul 2006 07:15:32 -0700

Hi,

Alexander Malysh wrote:

Hi,
Am 20.07.2006, 12:48 Uhr, schrieb Peter Christensen <[EMAIL PROTECTED]>:
Hi Alex,

Awesome initiative! I've been hoping for this to happen for quite a
Thanks!
while. There are a few issues though:
1. In the gwlib/latin1_to_gsm.h, <SP> (space) is replaced with <ESC>(0x1B), and <ESC> is mapped to NRP instead of just <ESC>. (If youfollow me)
ok, here was a typo, changed <SP> to 0x20 but <ESC> should be NRPbecause it's non representable in GSM.

I see your point. Assuming that kannel is updated if and when the GSMcharset is extended further in the future, the <ESC> really should beNRP, but then again, I've experienced a few gateways which required youto transmit the escape sign yourself for some reason... They probablyused iso-8559-1 charset and I needed € or whatever, and in such casesthe charset_utf8_to_gsm wouldn't be called anyway. My thought wasprimarily in case the GSM charset was changed further. (In short, I canlive without the <ESC> :D)

2. For some odd reason, smsbox trims the message to 160 characters,while it is in utf-8 format... My usual charset test message whichcontains all GSM characters except the Greek ones (wasn't possiblebefore now), looks like this:
Test: @£$¥èéùìòÇ
Øø
Åå_ÆæßÉ!"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑÜ§¿abcdefghijklmnopqrstuvwxyzäöñüà^{}\[~]|€
Which in UTF-8 takes up 163 octets, but only 141 septets in GSM. Whentransmitting, the € is omitted, and judging from a ngrep of datatransfered from smsbox to bearerbox, it is smsbox which does thetrimming. For the record, the string is exactly 160 octets long when €is omitted.Apparently it uses the size of the GSM string to determine when tosplit, but the trimming/splitting is done on the UTF-8 string.Obviously it is sms_split, which is to blame, but why is this functionused at all if splitting is done in bearerbox (according to commentsin source) - this problem is probably not directly related to theutf-8 patch.
hmm, strange... I will look in smsbox code if you don't beat me ;)
The smsbox check max allowed messages from config and try to splitmessage sms_split. If there more as allowed smsbox send only allowed count.


Heh, beating you probably wouldn't solve anything :o)

Actually I would have looked at it myself, if it wasn't because itapparently split the message just to combine the lot again, which seemedkinda silly.

Med venlig hilsen / Best regards

Peter Christensen

Developer
------------------
Cool Systems ApS

Tel: +45 2888 1600
Mai: [EMAIL PROTECTED]
www: www.coolsystems.dk


Alexander Malysh wrote:
Hi all,
at http://www.kannel.org/~amalysh/kannel-utf8.patch is a not so hugepatch that converts internal kannel charset to UTF-8. Please notethat I didn't add smsbox compatibility code, means smsbox expect textbody to be encoded in UTF-8 as default also MOs will be forwarded inUTF-8. It could be workarounded with charset cgi variable.
 Please test it and send feedback/patches.
I will maintain this patch for a while as long as we don't decide tocommit it to CVS.
 --Thanks,
Alex
--Thanks,
Alex

Re: [PATH] convert internal charset to UTF-8

Reply via email to