RE: Axis with UTF-8

McCullough, Ryan Tue, 03 Feb 2009 17:54:57 -0800

The xerces parser does not parse the 3 bytes of utf8 characters. It is supposed 
to return the 3 characters but returns 1 byte of uninitialized memory (0xcd).

From: McCullough, Ryan [mailto:rmccullo...@rightnow.com]
Sent: Tuesday, February 03, 2009 6:48 PM
To: Apache AXIS C User List
Cc: Antonczyk, Ryszard
Subject: Axis with UTF-8

We are using Axis1 checked out from subversion along with Xerces-C Version 
2.2.0.

We are having trouble using Axis to retrieve UTF-8 characters. Is there any 
additional setup needed?

Here is where we think things are going arye.

axis\xml\xerces\XMLParserXerces.cpp::parse(bool ignoreWhitespace, bool peekIt)
About line 125 there is this:
// parse next token
m_bCanParseMore = m_pParser->parseNext(m_ScanToken);

It looks like the parseNext() function is converting 3 bytes of Unicode 
characters to 1 byte.

Here is the hex data being returned from our web service:
00000808h: EF A4 85                                        ; ï¤...

I have also attached the xml that was returned from the web service 
(xmlout14670.txt, this is logged on the server).

Ryan McCullough | RightNow Technologies | Integration Tools Engineer
406-556-3162 office | Bozeman, MT | 
rmccullo...@rightnow.com<mailto:rmccullo...@rightnow.com> | 
http://www.rightnow.com<http://www.rightnow.com/>

RE: Axis with UTF-8

Reply via email to