Re: XmlBeans does not support character encodings other than UTF-8

Raman Gupta Mon, 25 Sep 2006 06:53:01 -0700

Mark Swanson wrote:
> Thanks for your comments Raman, response inline...
> 
> Raman Gupta wrote:
>> Mark Swanson wrote:
>>> Raman Gupta wrote:
>>>> Mark Swanson wrote:
>>>>> At this stage, when I serialize the response Document, I set its
>>>>> responseDoc.documentProperties().setEncoding("UNICODEBIG"); I do this
>>>>> because this is what the client sent me. I can only assume clients
>>>>> will
>>>>> send me IANA character encodings, so I must respond in kind - a C#
>>>>> client would be confused if it saw a Java-specific character encoding
>>>>> name...
>>>>
>>>> According to
>>>>
>>>> http://www.iana.org/assignments/character-sets
>>>>
>>>> UNICODEBIG is NOT a valid IANA character set name. UnicodeBig DOES
>>>> seem to be a java alias for UTF-16BE -- so that explains why it works
>>>> on the incoming side.
>>>
>>> Interesting. I picked 'UNICODEBIG' from the source of Save.java - or
>>> was it EntryMapping.java - I'm off site atm and don't remember the
>>> filename exactly. The method was something like convertIANATojava()
>>> so I thought it actually contained valid IANA character encoding names.
>>>
>>> FYI UnicodeBig is NOT the Java alias for UTF-16BE, it's
>>> UnicodeBigUnmarked:
>>
>> I didn't say UnicodeBig was THE alias for UTF-16BE, I said it was AN
>> alias for UTF-16BE.  The IANA page does not (always) list java aliases
>> for every character set.
> 
> It's not an alias for UTF-16BE. I pasted the valid Java aliases for
> UTF-16BE and 'UnicodeBig' is not one of them.


Where are you getting your list from?  On 1.5.0_07, at least on Linux,
UnicodeBig certainly does seem to be defined:

BeanShell 2.0-0.b1.7jpp - by Pat Niemeyer ([EMAIL PROTECTED])
bsh % print(System.getProperty("java.version"));
1.5.0_07
bsh % b = "TEST".getBytes("UnicodeBig");
bsh % print(b.length);
10
bsh % print(new String(b, "UnicodeBig"));
TEST

>>> charset:UTF-16BE
>>>   alias:X-UTF-16BE
>>>   alias:UnicodeBigUnmarked
>>>   alias:UTF_16BE
>>>   alias:ISO-10646-UCS-2
>>>
>>> XmlBeans _only_ defines UNICODEBIG to correspond to UTF-16BE. So it
>>> still seems impossible to support UTF-16BE (IANA ISO-10646-UCS-2).
>>
>> As I said above, UNICODEBIG is a java alias for UTF-16BE, so why
>> shouldn't XmlBeans define this mapping?
> 
> It's not (see above).
> 
>>> NOTE: I initially tried (and would prefer) to use ISO-10646-UCS-2 as
>>> this is identical in IANA and Java. It does not work. No IANA/Java
>>> translation is required and XmlBeans still gets it wrong.
>>
>> Why not just use UTF-16BE, which is the canonical IANA name for this
>> character set?  Does XmlBeans still have the wrong behavior, for
>> either incoming or outgoing documents, if you specify UTF-16BE as the
>> charset?  If so, then I agree we have a bug.
> 
> Yes, I tried this and it didn't work.

Ok, then I agree, there is a problem. Was it the incoming or outgoing
or both that did not work? What was the failure mode?

>>> This is a bug.
>>
>> I'm not an XML beans developer, but I'm not sure I agree (unless, as I
>> said there is a problem with specifying UTF-16BE). Though certainly
>> there may be, and probably are, some mappings missing for certain IANA
>> aliases.
> 
> Since 'UNICODEBIG' is not a Java or IANA character set name perhaps the
> bug is a simple type for the characterset name. I can try fixing this
> and testing it on Tuesday.

bsh % b = "TEST".getBytes("UNICODEBIG");
// Error: // Uncaught Exception: target exception : at Line: 2 : in
file: <unknown file> : .getBytes ( "UNICODEBIG" )

Interesting, so UNICODEBIG is not valid, but UnicodeBig is (as shown
above).

> FYI the Java charset/aliases I posted were for Java 1.5. For Java 1.4.1
> they are:
> 
> charset:UTF-16BE
>   alias:X-UTF-16BE
>   alias:UTF_16BE
>   alias:ISO-10646-UCS-2
> 
> For Java 1.6 they are:
> 
> charset:UTF-16BE
>   alias:X-UTF-16BE
>   alias:UTF_16BE
>   alias:ISO-10646-UCS-2
>   alias:UnicodeBigUnmarked

Can I ask where you are getting this list from?

Cheers,
Raman


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: XmlBeans does not support character encodings other than UTF-8

Reply via email to