Hi list,

I have just commited the following (from ChangeLog):

2003-03-17  Stipe Tolj  <[EMAIL PROTECTED]>
    * gw/wap-error.[ch]: added error_converting() for reporting smart
error
      messages if the converter failed, ie. libxml2 parser failed
because of
      non-indicated encoding in the xml source.
    * gw/xml_shared.[ch]: added find_charset_encoding() to scan the
xml preamble
      line, ie. <?xml version="1.0" encoding="iso-8859-1" ?> and
extract a given
      charset encoding definition if available.
    * gwlib/octstr.h, gwlib/dbpool.c: cosmetic fixes
    * gw/wml_compiler.c: make wml_compile() more "inteligent"
concerning the 
      encapsulated charset encoding in the xml source. This also fixes
BUG#6.
      The problem was/is that the HTTP reponse header can deliver a
charset
      definition and the xml preamble. So we cascade now as follows:
      If xml preample has an encoding specified, this overrides the
HTTP reponse
      header, if not we take the HTTP reponse header if available. If
both
      are not available, we assume UTF-8 as default encoding charset.
      The charset information is stored in the libxml2 document tree
and re-used
      afterwards while text elements are parsed and transcoding to the
targeted
      charset has to be done.
    * gw/wap-appl.c: added smart error messaging if converters failed
while
      converting or compile a supported content-type, ie. a WML
compilation
      failed.

The main problem was that we did not take concern about a encoding
definition in the xml preamble line, ie.

  <?xml version="1.0" encoding="iso-8859-1" ?>

defines the WML source to be in ISO-8859-1 encoding. Kannel did anyway
tried this as UTF-8 and also set the WBXML flags as UTF-8. Hence you
could *not* tell Kannel to encode the wml_compile() output in an other
charset encoding then UTF-8.

Now we do a little bit more to be encoding safe, as follows:

If the HTTP response body (hence the wml source) contains an encoding
definition in it's xml preamble, we relly on that for further
processing. If not, we at least take the HTTP reponse header charset
definition if available. If both are not present, we have to assume
it's UTF-8.

The hack includes a transcoding of the text elements while libxml2
does the parsing of the tree. libxml2 uses UTF-8 encoding as internal
encoding. Hence we have to transcode to our targeted charset (ie.
ISO-8859-1) at the point we inject the characters to the WBXML code.

This should work now smoothly. Please test on your own and report if
something has been broken by the changes.

Stipe

[EMAIL PROTECTED]
-------------------------------------------------------------------
Wapme Systems AG

Vogelsanger Weg 80
40470 D�sseldorf

Tel: +49-211-74845-0
Fax: +49-211-74845-299

E-Mail: [EMAIL PROTECTED]
Internet: http://www.wapme-systems.de
-------------------------------------------------------------------
wapme.net - wherever you are

Reply via email to