did you see my response on setting the CHARACTER_SET_ENCODING? what is the exact stack trace you get on the client?
thanks, dims On 7/5/06, Matthew Brown <[EMAIL PROTECTED]> wrote:
text/xml and utf-8, which I suppose explains the attempt to parse the UTF-16 message as UTF-8. The customer has changed the format of the message to correctly be UTF-8 in actuality, although Xerces still isn't a fan of the UTF-8 BOM (ef bb bf). -----Original Message----- From: Simon Fell [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 05, 2006 2:46 PM To: axis-user@ws.apache.org Subject: RE: Two questions - BOM in UTF-8, and manually cleaning XML What does the content-type header say the charset is? That takes precedence over the payload (at least for SOAP 1.1) Cheers Simon -----Original Message----- From: Rodrigo Ruiz [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 05, 2006 8:30 AM To: axis-user@ws.apache.org Subject: Re: Two questions - BOM in UTF-8, and manually cleaning XML Maybe changing the xml prolog from "utf-8" to "utf-16" will be easier. It seems like a demo example for a servlet filter ;-) Hope this helps, Rodrigo Manuel Mall wrote: > On Wednesday 05 July 2006 23:12, Matthew Brown wrote: >> Two bytes per char; Etherpeak is showing the second byte as 00. >> > Seems you are stuck between a "rock and a hard place" here. The byte > stream appears to be correctly utf-16 encoded but the xml prolog says > utf-8. Not sure what to recommend. Fix it at the source is obvious but > not easily done. You may be able to write a handler that re-encodes > the byte stream into utf-8 before giving it to the Axis stacks. But > how to write such an Axis handler and how to hook it correctly into > the Axis processing chain is outside my area of expertise. > > May be someone else can give advice on how to attempt such a thing. > > Manuel >> -----Original Message----- >> From: Manuel Mall [mailto:[EMAIL PROTECTED] >> Sent: Wednesday, July 05, 2006 11:09 AM >> To: axis-user@ws.apache.org >> Subject: Re: Two questions - BOM in UTF-8, and manually cleaning XML >> >> On Wednesday 05 July 2006 23:04, Matthew Brown wrote: >>> Manuel, >>> >>> I believe you hit the problem on the head - the response prolog says >>> utf-8 but (according to Etherpeak) the BOM is ff/ef. >>> Coincidentally, by the time the response XML gets logged by axis, >>> these initial characters are logged as ef bf bd ef bf bd. >> Matt, >> >> what about the rest of the byte stream when you look at it in >> Etherpeak. Is it UTF-16 encoded (2 bytes per char) or UTF-8 encoded >> (1 byte per char for all typical ascii characters)? >> >> Manuel >> >>> Unfortunately we may be in a bit of a tough place with having the >>> producer of the XML change it; the customer whose web services we >>> are consuming doesn't seem to see any issue with this (as they are >>> fine with their .NET tools). >>> >>> If it is the case where we are seeing a UTF-16 BOM but a prolog that >>> declares UTF-8; is there any way to instruct Axis/Xerces to parse it >>> as UTF-16? Sorry if this question doesn't make much sense, but I'm >>> not too familiar with how Axis and/or Xerces decide which character >>> encoding to use when reading the XML. >>> >>> Thanks again >>> Matt >>> >>> -----Original Message----- >>> From: Manuel Mall [mailto:[EMAIL PROTECTED] >>> Sent: Wednesday, July 05, 2006 10:58 AM >>> To: axis-user@ws.apache.org >>> Subject: Re: Two questions - BOM in UTF-8, and manually cleaning XML >>> >>> On Wednesday 05 July 2006 22:16, Axel Bock wrote: >>>> Yes, there is a work-around. It works if you encode the file with >>>> UTF-8 (for example), and do not include the BOM at the beginning. >>>> I use notepad++ for that task, where you can save in "UTF-8 without >>>> BOM". >>>> >>>> The process for that is easy: >>>> 1. open the file in notepad++ >>>> 2. mark everything via CTRL-A >>>> 3. cut (not copy!) >>>> 4. in the format menu, choose "ANSI" formatting and select "UTF >>>> without BOM" at the bottom 5. paste 6. save. >>>> >>>> that is a crap workaround, but works for me. for automatically >>>> generated files ..... I dunno :-) >>>> >>>> >>>> Greetings, >>>> Axel. >>>> >>>> >>>> On 7/5/06, Matthew Brown < [EMAIL PROTECTED] >>>> <mailto:[EMAIL PROTECTED]> > wrote: >>>> >>>> Hi all, >>>> >>>> I hate to do this, but can anyone please help me with either of >>>> these issues? I've tried to upgrade Xerces to 2.8.0 but to no >>>> avail. >>>> >>>> Is there anything else I could be doing? >>> Just wondering if your file in question starts with hex 'ef bb bf' >>> or 'ff ef' or 'ef ff'. If it is one of the latter two forms I >>> believe you have an utf-16 encoded file (little endian or big >>> endian) not utf-8. If it is the 'ef bb bf' sequence then it starts >>> correctly with the utf-8 encoded unicode code point for BOM U+FEFF. >>> In all cases xerces should be able to handle it. A problem may arise >>> if it starts with 'ff ef' but the XML prolog says encoding="utf-8" >>> as that is a contradiction I believe. >>> >>> I know this does not help directly but may help to check if the >>> problem is with the producer of the XML document or your consumer. >>> >>> Manuel >>> >>>> What about the possibility of programmatically editing/cleaning the >>>> response XML before it is given to the parser? >>>> >>>> Thanks >>>> Matt >>>> >>>> -----Original Message----- >>>> From: Matthew Brown [mailto: [EMAIL PROTECTED] >>>> <mailto:[EMAIL PROTECTED]> ] >>>> Sent: Saturday, July 01, 2006 12:41 PM >>>> To: axis-user@ws.apache.org <mailto:axis-user@ws.apache.org> >>>> Subject: Two questions - BOM in UTF-8, and manually cleaning XML >>>> >>>> >>>> 1. From searching the mailing list archives, I see several >>>> references to people having problems with Byte Order Mark >>>> characters appearing before the prolog in their UTF-8 messages. >>>> However I can't seem to find much of a known resolution to these >>>> issues. Is there a standard/common workaround for these BOM and >>>> UTF-8 issues? >>>> >>>> 2. If there is no answer to my #1, is there anyway that Axis will >>>> allow me to pragmatically edit the response XML before it is passed >>>> to the parser and de-serialized? I've tried adding Handlers, but >>>> I'm assuming that the Handler comes into the picture after the >>>> message is parsed, because my Handler is only ever seeing the >>>> request message, and not the response. >>>> >>>> Thanks >>>> Matt Brown >>> ------------------------------------------------------------------- >>> -- To unsubscribe, e-mail: [EMAIL PROTECTED] For >>> additional commands, e-mail: [EMAIL PROTECTED] >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- ------------------------------------------------------------------- GRIDSYSTEMS Rodrigo Ruiz Aguayo Parc Bit - Son Espanyol 07120 Palma de Mallorca mailto:[EMAIL PROTECTED] Baleares - EspaƱa Tel:+34-971435085 Fax:+34-971435082 http://www.gridsystems.com ------------------------------------------------------------------- -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.9.9/382 - Release Date: 04/07/2006 --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
-- Davanum Srinivas : http://www.wso2.net (Oxygen for Web Service Developers) --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]