Friends Why not use iconv?
I have been experementing with it as I want character sets that libxml cannot deal with. Worik Bruno David Simões Rodrigues <[EMAIL PROTECTED]> writes: > On Wed, 2001-11-21 at 17:34, Nektarios K. Papadopoulos wrote: > Yes, it should be a bug. > When I coded it, I've asked for help in this part because: 1st I didn't know > the xml_* functions and > 2nd: I usually only use iso-8859-1 or ucs-2 directly. > That's why there's so many debug lines around it. > Feel free to correct it. > BTW, there should be a bug somewhere in this code that panics, I've seen it > once but I don't > recall what I've done (besides passing some differente charsets and codings) > > Andreas Fink wrote: > > > > > > Index: gw/smsbox.c > > >> =================================================================== > > >> RCS file: /home/cvs/gateway/gw/smsbox.c,v > > >> retrieving revision 1.156 > > >> diff -r1.156 smsbox.c > > >> 1392,1395d1391 > > >> < if (charset_processing(charset, &body, coding) == -1) { > > >> < *status = 415; > > >> < ret = octstr_create("Charset or body misformed, rejected"); > > >> < } > > > > > >votings from the smsbox hackers for the proposed change?! Andreas? > > >Nick? > > > > if its a bug, lets fix it. I had a user complaining that he has > > problems with greek characters. Sounds like the source of the problem. > > > > Actually this is a bug (I think) I found trying to solve the problem > with greek characters. > > Removing this line is not enough. > > The code in charset processing does well when coding==DC_UCS2 (well this > is the easy case). > > It also does well when coding==DC_7BIT and charset=="ISO-8859-1"(well > that is even easier: just do nothing) > > But when coding==DC_7BIT and charset!="ISO-8859-1" it seems to be trying > to do something like this: > first ... encode to UTF-8 > then UTF-8 to ISO-8859-1 > allways using libxml calls. > > Actually the code for UTF-8 to ISO-8859-1 is wrong and commented out. > /* UTF-8 to ISO-8859-1 */ > /* charset = octstr_create("ISO-8859-1"); > if (charset_from_utf8(new*body, &temp, charset) >= 0) { > octstr_destroy(new*body); > new*body = temp; > octstr_dump(new*body, 0); > > octstr_destroy(charset); > } else { > octstr_destroy(charset); > octstr_destroy(new*body); > return NULL; > } > debug("sms.http", 0, "coding=7bit, after iso8859-1, msgdata is %s", > octstr_get_cstr(n > ew*body)); > */ > > Anyway it would *NOT* do the job. libxml maps any characters that not > map directly to ISO-8859-1 to something like this Μ (the XML way). > Which is not good! > > I am working on a solution for the greek characters, which must be > relatively easy since GSM default alphabet has all the GREEK capital > letters. > > But I don't know how to give more general solution for all the possible > charsets (other than ISO-8859-7 which is for Greek). > -- Worik Macky Turei Stanton Whew! [EMAIL PROTECTED] Aotearoa