did you see my response on setting the CHARACTER_SET_ENCODING? what is
the exact stack trace you get on the client?

thanks,
dims

On 7/5/06, Matthew Brown <[EMAIL PROTECTED]> wrote:
text/xml and utf-8, which I suppose explains the attempt to parse the UTF-16 
message as UTF-8. The customer has changed the format of the message to 
correctly be UTF-8 in actuality, although Xerces still isn't a fan of the UTF-8 
BOM (ef bb bf).



-----Original Message-----
From: Simon Fell [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 05, 2006 2:46 PM
To: axis-user@ws.apache.org
Subject: RE: Two questions - BOM in UTF-8, and manually cleaning XML


What does the content-type header say the charset is? That takes precedence 
over the payload (at least for SOAP 1.1)

Cheers
Simon

-----Original Message-----
From: Rodrigo Ruiz [mailto:[EMAIL PROTECTED]
Sent: Wednesday, July 05, 2006 8:30 AM
To: axis-user@ws.apache.org
Subject: Re: Two questions - BOM in UTF-8, and manually cleaning XML

Maybe changing the xml prolog from "utf-8" to "utf-16" will be easier.
It seems like a demo example for a servlet filter ;-)


Hope this helps,
Rodrigo



Manuel Mall wrote:
> On Wednesday 05 July 2006 23:12, Matthew Brown wrote:
>> Two bytes per char; Etherpeak is showing the second byte as 00.
>>
> Seems you are stuck between a "rock and a hard place" here. The byte
> stream appears to be correctly utf-16 encoded but the xml prolog says
> utf-8. Not sure what to recommend. Fix it at the source is obvious but
> not easily done. You may be able to write a handler that re-encodes
> the byte stream into utf-8 before giving it to the Axis stacks. But
> how to write such an Axis handler and how to hook it correctly into
> the Axis processing chain is outside my area of expertise.
>
> May be someone else can give advice on how to attempt such a thing.
>
> Manuel
>> -----Original Message-----
>> From: Manuel Mall [mailto:[EMAIL PROTECTED]
>> Sent: Wednesday, July 05, 2006 11:09 AM
>> To: axis-user@ws.apache.org
>> Subject: Re: Two questions - BOM in UTF-8, and manually cleaning XML
>>
>> On Wednesday 05 July 2006 23:04, Matthew Brown wrote:
>>> Manuel,
>>>
>>> I believe you hit the problem on the head - the response prolog says
>>> utf-8 but (according to Etherpeak) the BOM is ff/ef.
>>> Coincidentally, by the time the response XML gets logged by axis,
>>> these initial characters are logged as ef bf bd ef bf bd.
>> Matt,
>>
>> what about the rest of the byte stream when you look at it in
>> Etherpeak. Is it UTF-16 encoded (2 bytes per char) or UTF-8 encoded
>> (1 byte per char for all typical ascii characters)?
>>
>> Manuel
>>
>>> Unfortunately we may be in a bit of a tough place with having the
>>> producer of the XML change it; the customer whose web services we
>>> are consuming doesn't seem to see any issue with this (as they are
>>> fine with their .NET tools).
>>>
>>> If it is the case where we are seeing a UTF-16 BOM but a prolog that
>>> declares UTF-8; is there any way to instruct Axis/Xerces to parse it
>>> as UTF-16? Sorry if this question doesn't make much sense, but I'm
>>> not too familiar with how Axis and/or Xerces decide which character
>>> encoding to use when reading the XML.
>>>
>>> Thanks again
>>> Matt
>>>
>>> -----Original Message-----
>>> From: Manuel Mall [mailto:[EMAIL PROTECTED]
>>> Sent: Wednesday, July 05, 2006 10:58 AM
>>> To: axis-user@ws.apache.org
>>> Subject: Re: Two questions - BOM in UTF-8, and manually cleaning XML
>>>
>>> On Wednesday 05 July 2006 22:16, Axel Bock wrote:
>>>> Yes, there is a work-around. It works if you encode the file with
>>>> UTF-8 (for example), and do not include the BOM at the beginning.
>>>> I use notepad++ for that task, where you can save in "UTF-8 without
>>>> BOM".
>>>>
>>>> The process for that is easy:
>>>> 1. open the file in notepad++
>>>> 2. mark everything via CTRL-A
>>>> 3. cut (not copy!)
>>>> 4. in the format menu, choose "ANSI" formatting and select "UTF
>>>> without BOM" at the bottom 5. paste 6. save.
>>>>
>>>> that is a crap workaround, but works for me. for automatically
>>>> generated files ..... I dunno :-)
>>>>
>>>>
>>>> Greetings,
>>>> Axel.
>>>>
>>>>
>>>> On 7/5/06, Matthew Brown < [EMAIL PROTECTED]
>>>> <mailto:[EMAIL PROTECTED]> > wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I hate to do this, but can anyone please help me with either of
>>>> these issues? I've tried to upgrade Xerces to 2.8.0 but to no
>>>> avail.
>>>>
>>>> Is there anything else I could be doing?
>>> Just wondering if your file in question starts with hex 'ef bb bf'
>>> or 'ff ef' or 'ef ff'. If it is one of the latter two forms I
>>> believe you have an utf-16 encoded file (little endian or big
>>> endian) not utf-8. If it is the 'ef bb bf' sequence then it starts
>>> correctly with the utf-8 encoded unicode code point for BOM U+FEFF.
>>> In all cases xerces should be able to handle it. A problem may arise
>>> if it starts with 'ff ef' but the XML prolog says encoding="utf-8"
>>> as that is a contradiction I believe.
>>>
>>> I know this does not help directly but may help to check if the
>>> problem is with the producer of the XML document or your consumer.
>>>
>>> Manuel
>>>
>>>> What about the possibility of programmatically editing/cleaning the
>>>> response XML before it is given to the parser?
>>>>
>>>> Thanks
>>>> Matt
>>>>
>>>> -----Original Message-----
>>>> From: Matthew Brown [mailto: [EMAIL PROTECTED]
>>>> <mailto:[EMAIL PROTECTED]> ]
>>>> Sent: Saturday, July 01, 2006 12:41 PM
>>>> To: axis-user@ws.apache.org <mailto:axis-user@ws.apache.org>
>>>> Subject: Two questions - BOM in UTF-8, and manually cleaning XML
>>>>
>>>>
>>>> 1. From searching the mailing list archives, I see several
>>>> references to people having problems with Byte Order Mark
>>>> characters appearing before the prolog in their UTF-8 messages.
>>>> However I can't seem to find much of a known resolution to these
>>>> issues. Is there a standard/common workaround for these BOM and
>>>> UTF-8 issues?
>>>>
>>>> 2. If there is no answer to my #1, is there anyway that Axis will
>>>> allow me to pragmatically edit the response XML before it is passed
>>>> to the parser and de-serialized? I've tried adding Handlers, but
>>>> I'm assuming that the Handler comes into the picture after the
>>>> message is parsed, because my Handler is only ever seeing the
>>>> request message, and not the response.
>>>>
>>>> Thanks
>>>> Matt Brown
>>> -------------------------------------------------------------------
>>> -- To unsubscribe, e-mail: [EMAIL PROTECTED] For
>>> additional commands, e-mail: [EMAIL PROTECTED]
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>

--
-------------------------------------------------------------------
GRIDSYSTEMS                    Rodrigo Ruiz Aguayo
Parc Bit - Son Espanyol
07120 Palma de Mallorca        mailto:[EMAIL PROTECTED]
Baleares - EspaƱa              Tel:+34-971435085 Fax:+34-971435082
http://www.gridsystems.com
-------------------------------------------------------------------


--
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.394 / Virus Database: 268.9.9/382 - Release Date: 04/07/2006


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
Davanum Srinivas : http://www.wso2.net (Oxygen for Web Service Developers)

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to