Hi
I have a XML file with following contents and use xerces 2.8.0 to interpret it.
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<test>
<value><![CDATA[123,<>\344\270\255\346\226\207]]></value>
</test>
\344\270\255\346\226\207 is the UTF-8 representation of Chinese word: 中文.
I use the SAX2 interface to get the contents of test's value. in the
callback function' characters(const XMLCh* const chars, const
unsigned int length).
the passed argument chars give the correct contents.
chars: 49 50 51 44 60 62 20013 25991
length: 8
When I use the XMLString::transcode to convert the chars to
std::string, I only get the 123,<> and the Chinese words are lost.
What I want to do is the convert the contents of value to GB18030
code, I know that I should first convert UTF-8 to Unicode and then
Unicode to GB18030.
The first thing I want to do is to get the bytes sequence of UTF-8
instead of XMLCh which is 16 bits. How can I get that?
Another question, can xerces support encoding of GBK(GB2312 or
GB18030)? it reports encoded exception if I use the following heads
<?xml version="1.0" encoding="GBK" standalone="yes" ?>
Appreciate your answers.
Thanks very much for your help
BRs
Zongjun