A bad news, ICU team refused to support UnicodeBig because it is not available in nio.
A good news is that I realize there is a smooth way to support these charsets. I tried to implement a SPI to accept the name "UnicodeBig" and it worked. We could support any other charsets and fix the bug which ICU team hesitated to do this way. I think it also brings us the extensibility, do you have any concern about implementing a harmony SPI? I'll go on if no one objects. On 10/19/06, Andrew Zhang <[EMAIL PROTECTED]> wrote:
On 10/19/06, Tony Wu <[EMAIL PROTECTED]> wrote: > > I think to support UnicodeBig in nio is not a bug but a feature. And > the key point is how can I get UnicodeBig supportted in IO/Lang? If ICU/NIO supports "UnicodeBig", wouldn't IO/LANG support "UnicodeBig" as well? On 10/19/06, Andrew Zhang <[EMAIL PROTECTED]> wrote: > > On 10/19/06, Tony Wu <[EMAIL PROTECTED]> wrote: > > > > > > The implemetion is from ICU, so, I think we'd better not to wrap it by > > > ourselves. I'll post to ICU mailing list and ask if they can help to > > > supply these legacy charsets. > > > > > > Hey Tony, please keep in mind that following code[1] should print false > and > > throw an UnsupportedCharsetException. If ICU provides "UnicodeBig" > support, > > does it mean harmony nio also support "UnicodeBig"? > > > > [1] > > System.out.println(Charset.isSupported("UnicodeBig")); > > Charset.forName("UncodeBig"); > > > > On 10/19/06, Andrew Zhang <[EMAIL PROTECTED]> wrote: > > > > On 10/19/06, Tony Wu <[EMAIL PROTECTED]> wrote: > > > > > > > > > > Thank you all, > > > > > It is not just an issue about name. > > > > > The precondition of mapping is that ICU has really supported this > > > > > charset. AFAIK UnicodeBig is not implemented by ICU, refer to [1]. > > > > > Shall we map the UnicodeBit&UnicodeLittle to UTF-16 as work > around[2]? > > > > > > > > > > > > No, I don't think so. The only difference between "UnicodeBig" and > > > > "UTF-16BE" is with/without byte-order mark. So it should be easy to > wrap > > > > "UTF-16BE" as "UnicodeBig" for java.io/java.lang. Just put 0xFE > 0xFF at > > > the > > > > begining of the bytes and then encode the buffer as "UTF-16BE". Do I > > > miss > > > > something? > > > > > > > > [1]http://dev.icu- > > > > > > > > > project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?view=co > > > > > > > > > > [2] > > > > > UTF-16 > > > > > Sixteen-bit UCS Transformation Format, byte order identified by an > > > > > optional byte-order mark > > > > > UnicodeBig > > > > > Sixteen-bit Unicode Transformation Format, big-endian byte order, > > > > > with byte-order mark > > > > > UnicodeLittle > > > > > Sixteen-bit Unicode Transformation Format, little-endian byte > order, > > > > > with byte-order mark > > > > > > > > > > On 10/17/06, Paulex Yang <[EMAIL PROTECTED]> wrote: > > > > > > Tony Wu wrote: > > > > > > > Thank you Andrew, > > > > > > > I think I got the point. The j.l.String of RI uses the > encoding of > > > IO > > > > > > > whereas Charset.forName use another of NIO. > > > > > > > > > > > > > > And the new problem is shall we follow the spec[1] to support > the > > > two > > > > > > > suites of charset implemetation? I just have a look and find > we > > > does > > > > > > > not support some Canonical Name for java.io and java.lang API > such > > > as > > > > > > > > > > UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc. > > > > > > There is such a charset name mapping in InputStreamReader, I > think > > > we > > > > > > have no choice but to support these legacy charset names, you > may > > > need > > > > > > some refactory work to make these classes use the same mapping > data. > > > > > > > > > > > > > > [1] > > > http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html > > > > > > > > > > > > > > On 10/17/06, Andrew Zhang <[EMAIL PROTECTED]> wrote: > > > > > > >> On 10/17/06, Andrew Zhang <[EMAIL PROTECTED]> wrote: > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > On 10/17/06, Leo Li <[EMAIL PROTECTED]> wrote: > > > > > > >> > > > > > > > > >> > > I think Harmony is more reasonable. > > > > > > >> > > > > > > > > >> > > As spec says, if Charset.forName("UnicodeBig") throws > > > > > > >> > > .UnsupportedCharsetException then no support for the > named > > > > > > >> charset is > > > > > > >> > > available in this instance of the Java virtual machine. > Then > > > how > > > > > > >> can we > > > > > > >> > > get > > > > > > >> > > new String(b, "UnicodeBig") without throwing > > > > > > >> UnsupportedCharsetException > > > > > > >> > > on > > > > > > >> > > the same jvm? The spec for String(byte[] bytes,String > > > > > > >> charsetName) also > > > > > > >> > > says > > > > > > >> > > if the named charset is not supported, > > > > > UnsupportedCharsetException > > > > > > >> > > should be > > > > > > >> > > thrown out. > > > > > > >> > > > > > > > >> > > > > > > > >> > UNICODEBIG is a java alias for UTF-16BE. I think we'd > better > > > > > > >> support such > > > > > > >> > mapping in String and follow RI. > > > > > > >> > > > > > > > >> > > > > > > >> You can find the encoding set from spec. [1] > > > > > > >> > > > > > > >> [1] > > > http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html > > > > > > >> > > > > > > >> On 10/17/06, Tony Wu <[EMAIL PROTECTED]> wrote: > > > > > > >> > > > > > > > > > >> > > > Hi all, > > > > > > >> > > > I found this when I tried to debug the failure tests of > ant > > > on > > > > > > >> > > > harmony. Note the output of testcases below. > > > > > > >> > > > > > > > > > >> > > > import java.io.UnsupportedEncodingException; > > > > > > >> > > > import java.nio.charset.Charset ; > > > > > > >> > > > import junit.framework.TestCase; > > > > > > >> > > > > > > > > > >> > > > public class TestCharset extends TestCase { > > > > > > >> > > > public void test1() throws > UnsupportedEncodingException > > > { > > > > > > >> > > > byte[] b = new byte[] { 'a', 'b', 'c' }; > > > > > > >> > > > String s = new String(b, "UnicodeBig"); > > > > > > >> > > > assertEquals("abc", s); > > > > > > >> > > > } > > > > > > >> > > > > > > > > > >> > > > public void test2() { > > > > > > >> > > > Charset.forName("UnicodeBig"); > > > > > > >> > > > } > > > > > > >> > > > } > > > > > > >> > > > > > > > > > >> > > > RI: > > > > > > >> > > > test1: junit.framework.ComparisonFailure: > expected:<abc> > > > but > > > > > > >> was:<> > > > > > > >> > > > test2: java.nio.charset.UnsupportedCharsetException: > > > UnicodeBig > > > > > > >> > > > > > > > > > >> > > > Harmony: > > > > > > >> > > > test1:java.nio.charset.UnsupportedCharsetException: > > > UnicodeBig > > > > > > >> > > > test2: > > > > > > >> > > > java.nio.charset.UnsupportedCharsetException: The > > > unsupported > > > > > > >> charset > > > > > > >> > > > name is "UnicodeBig" > > > > > > >> > > > > > > > > > >> > > > seems RI can recognize the *UnicodeBig* in Constructor > of > > > > > > >> j.l.String, > > > > > > >> > > > whereas Harmony does not support this alias at all. > > > > > > >> > > > > > > > > > >> > > > Do you have any concern about that? > > > > > > >> > > > -- > > > > > > >> > > > Tony Wu > > > > > > >> > > > China Software Development Lab, IBM > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > --------------------------------------------------------------------- > > > > > > >> > > > Terms of use : > > > http://incubator.apache.org/harmony/mailing.html > > > > > > >> > > > To unsubscribe, e-mail: > > > > > > >> [EMAIL PROTECTED] > > > > > > >> > > > For additional commands, e-mail: > > > > > > >> [EMAIL PROTECTED] > > > > > > >> > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > -- > > > > > > >> > > Leo Li > > > > > > >> > > China Software Development Lab, IBM > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > > >> > -- > > > > > > >> > Best regards, > > > > > > >> > Andrew Zhang > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> -- > > > > > > >> Best regards, > > > > > > >> Andrew Zhang > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Paulex Yang > > > > > > China Software Development Lab > > > > > > IBM > > > > > > > > > > > > > > > > > > > > > --------------------------------------------------------------------- > > > > > > Terms of use : http://incubator.apache.org/harmony/mailing.html > > > > > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > > > > > For additional commands, e-mail: > > > [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Tony Wu > > > > > China Software Development Lab, IBM > > > > > > > > > > > --------------------------------------------------------------------- > > > > > Terms of use : http://incubator.apache.org/harmony/mailing.html > > > > > To unsubscribe, e-mail: > [EMAIL PROTECTED] > > > > > For additional commands, e-mail: > [EMAIL PROTECTED] > > > > > > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > Andrew Zhang > > > > > > > > > > > > > > > > > -- > > > Tony Wu > > > China Software Development Lab, IBM > > > > > > --------------------------------------------------------------------- > > > Terms of use : http://incubator.apache.org/harmony/mailing.html > > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > > > > > > -- > > Best regards, > > Andrew Zhang > > > > > > > -- > Tony Wu > China Software Development Lab, IBM > > --------------------------------------------------------------------- > Terms of use : http://incubator.apache.org/harmony/mailing.html > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- Best regards, Andrew Zhang
-- Tony Wu China Software Development Lab, IBM