On Wednesday 05 July 2006 23:04, Matthew Brown wrote:
> Manuel,
>
> I believe you hit the problem on the head - the response prolog says
> utf-8 but (according to Etherpeak) the BOM is ff/ef. Coincidentally,
> by the time the response XML gets logged by axis, these initial
> characters are logged as ef bf bd ef bf bd.
>
Matt,

what about the rest of the byte stream when you look at it in Etherpeak. 
Is it UTF-16 encoded (2 bytes per char) or UTF-8 encoded (1 byte per 
char for all typical ascii characters)?

Manuel
> Unfortunately we may be in a bit of a tough place with having the
> producer of the XML change it; the customer whose web services we are
> consuming doesn't seem to see any issue with this (as they are fine
> with their .NET tools).
>
> If it is the case where we are seeing a UTF-16 BOM but a prolog that
> declares UTF-8; is there any way to instruct Axis/Xerces to parse it
> as UTF-16? Sorry if this question doesn't make much sense, but I'm
> not too familiar with how Axis and/or Xerces decide which character
> encoding to use when reading the XML.
>
> Thanks again
> Matt
>
> -----Original Message-----
> From: Manuel Mall [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, July 05, 2006 10:58 AM
> To: axis-user@ws.apache.org
> Subject: Re: Two questions - BOM in UTF-8, and manually cleaning XML
>
> On Wednesday 05 July 2006 22:16, Axel Bock wrote:
> > Yes, there is a work-around. It works if you encode the file with
> > UTF-8 (for example), and do not include the BOM at the beginning. I
> > use notepad++ for that task, where you can save in "UTF-8 without
> > BOM".
> >
> > The process for that is easy:
> > 1. open the file in notepad++
> > 2. mark everything via CTRL-A
> > 3. cut (not copy!)
> > 4. in the format menu, choose "ANSI" formatting and select "UTF
> > without BOM" at the bottom
> > 5. paste
> > 6. save.
> >
> > that is a crap workaround, but works for me. for automatically
> > generated files ..... I dunno :-)
> >
> >
> > Greetings,
> > Axel.
> >
> >
> > On 7/5/06, Matthew Brown < [EMAIL PROTECTED]
> > <mailto:[EMAIL PROTECTED]> > wrote:
> >
> > Hi all,
> >
> > I hate to do this, but can anyone please help me with either of
> > these issues? I've tried to upgrade Xerces to 2.8.0 but to no
> > avail.
> >
> > Is there anything else I could be doing?
>
> Just wondering if your file in question starts with hex 'ef bb bf'
> or 'ff ef' or 'ef ff'. If it is one of the latter two forms I believe
> you have an utf-16 encoded file (little endian or big endian) not
> utf-8. If it is the 'ef bb bf' sequence then it starts correctly with
> the utf-8 encoded unicode code point for BOM U+FEFF. In all cases
> xerces should be able to handle it. A problem may arise if it starts
> with 'ff ef' but the XML prolog says encoding="utf-8" as that is a
> contradiction I believe.
>
> I know this does not help directly but may help to check if the
> problem is with the producer of the XML document or your consumer.
>
> Manuel
>
> > What about the possibility of programmatically editing/cleaning the
> > response XML before it is given to the parser?
> >
> > Thanks
> > Matt
> >
> > -----Original Message-----
> > From: Matthew Brown [mailto: [EMAIL PROTECTED]
> > <mailto:[EMAIL PROTECTED]> ]
> > Sent: Saturday, July 01, 2006 12:41 PM
> > To: axis-user@ws.apache.org <mailto:axis-user@ws.apache.org>
> > Subject: Two questions - BOM in UTF-8, and manually cleaning XML
> >
> >
> > 1. From searching the mailing list archives, I see several
> > references to people having problems with Byte Order Mark
> > characters appearing before the prolog in their UTF-8 messages.
> > However I can't seem to find much of a known resolution to these
> > issues. Is there a standard/common workaround for these BOM and
> > UTF-8 issues?
> >
> > 2. If there is no answer to my #1, is there anyway that Axis will
> > allow me to pragmatically edit the response XML before it is passed
> > to the parser and de-serialized? I've tried adding Handlers, but
> > I'm assuming that the Handler comes into the picture after the
> > message is parsed, because my Handler is only ever seeing the
> > request message, and not the response.
> >
> > Thanks
> > Matt Brown
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to