Hi

I have a XML file with following contents and use xerces 2.8.0 to interpret it.

<?xml version="1.0"  encoding="UTF-8"  standalone="yes" ?>
<test>
    <value><![CDATA[123,<>\344\270\255\346\226\207]]></value>
</test>

\344\270\255\346\226\207 is the UTF-8 representation of Chinese word: 中文.

I use the SAX2 interface to get the contents of test's value.  in the
callback function'  characters(const XMLCh* const chars, const
unsigned int length).
the passed argument chars give the correct contents.

chars:  49 50 51 44 60 62 20013 25991
length:  8

When I use the XMLString::transcode to convert the chars to
std::string, I only get the 123,<> and the Chinese words are lost.

What I want to do is the convert the contents of value to GB18030
code, I know that I should first convert UTF-8 to Unicode and then
Unicode to GB18030.

The first thing I want to do is to get the bytes sequence of UTF-8
instead of XMLCh which is 16 bits.  How can I get that?

Another question,  can xerces support encoding of GBK(GB2312 or
GB18030)? it reports encoded exception if I use the following heads


<?xml version="1.0"  encoding="GBK"  standalone="yes" ?>

Appreciate your answers.

Thanks very much for your help

BRs

Zongjun

Reply via email to