On 4/9/07, Tony Wu <[EMAIL PROTECTED]> wrote:
I wonder if it is possible to make it as built-in charset provider and
make icu as an extension?

Attached test bundle, instruction and patch for current code combined
new implementation with ICU. So, I have the same 228 charsets
available - about 90 charsets used from the new bundle, and not
implemented charsets used from ICU.

The full list of charsets supported with the current bundle + ICU:
Adobe-Standard-Encoding  class=com.ibm.icu4jni.charset.CharsetICU
Big5     class=org.apache.harmony.niochar.charset.additional.Big5
Big5-HKSCS       class=org.apache.harmony.niochar.charset.additional.Big5_HKSCS
BOCU-1   class=com.ibm.icu4jni.charset.CharsetICU
CESU-8   class=com.ibm.icu4jni.charset.CharsetICU
cp850    class=org.apache.harmony.niochar.charset.additional.IBM850
cp851    class=com.ibm.icu4jni.charset.CharsetICU
cp856    class=org.apache.harmony.niochar.charset.additional.x_IBM856
cp857    class=org.apache.harmony.niochar.charset.additional.IBM857
cp858    class=org.apache.harmony.niochar.charset.additional.IBM00858
cp860    class=org.apache.harmony.niochar.charset.additional.IBM860
cp861    class=org.apache.harmony.niochar.charset.additional.IBM861
cp862    class=org.apache.harmony.niochar.charset.additional.IBM862
cp863    class=org.apache.harmony.niochar.charset.additional.IBM863
cp864    class=org.apache.harmony.niochar.charset.additional.IBM864
cp865    class=org.apache.harmony.niochar.charset.additional.IBM865
cp866    class=org.apache.harmony.niochar.charset.IBM866
cp868    class=org.apache.harmony.niochar.charset.additional.IBM868
cp869    class=org.apache.harmony.niochar.charset.additional.IBM869
cp922    class=org.apache.harmony.niochar.charset.additional.x_IBM922
EUC-JP   class=com.ibm.icu4jni.charset.CharsetICU
EUC-KR   class=org.apache.harmony.niochar.charset.additional.EUC_KR
GB18030  class=org.apache.harmony.niochar.charset.additional.GB18030
GB2312   class=org.apache.harmony.niochar.charset.additional.GB2312
GB_2312-80       class=com.ibm.icu4jni.charset.CharsetICU
GBK      class=org.apache.harmony.niochar.charset.additional.GBK
hp-roman8        class=com.ibm.icu4jni.charset.CharsetICU
HZ-GB-2312       class=com.ibm.icu4jni.charset.CharsetICU
IBM-Thai         class=org.apache.harmony.niochar.charset.additional.IBM_Thai
IBM01140         class=org.apache.harmony.niochar.charset.additional.IBM01140
IBM01141         class=org.apache.harmony.niochar.charset.additional.IBM01141
IBM01142         class=org.apache.harmony.niochar.charset.additional.IBM01142
IBM01143         class=org.apache.harmony.niochar.charset.additional.IBM01143
IBM01144         class=org.apache.harmony.niochar.charset.additional.IBM01144
IBM01145         class=org.apache.harmony.niochar.charset.additional.IBM01145
IBM01146         class=org.apache.harmony.niochar.charset.additional.IBM01146
IBM01147         class=org.apache.harmony.niochar.charset.additional.IBM01147
IBM01148         class=org.apache.harmony.niochar.charset.additional.IBM01148
IBM01149         class=org.apache.harmony.niochar.charset.additional.IBM01149
IBM037   class=org.apache.harmony.niochar.charset.additional.IBM037
IBM1026  class=org.apache.harmony.niochar.charset.additional.IBM1026
IBM1047  class=org.apache.harmony.niochar.charset.additional.IBM1047
IBM273   class=org.apache.harmony.niochar.charset.additional.IBM273
IBM277   class=org.apache.harmony.niochar.charset.additional.IBM277
IBM278   class=org.apache.harmony.niochar.charset.additional.IBM278
IBM280   class=org.apache.harmony.niochar.charset.additional.IBM280
IBM284   class=org.apache.harmony.niochar.charset.additional.IBM284
IBM285   class=org.apache.harmony.niochar.charset.additional.IBM285
IBM290   class=com.ibm.icu4jni.charset.CharsetICU
IBM297   class=org.apache.harmony.niochar.charset.additional.IBM297
IBM367   class=org.apache.harmony.niochar.charset.US_ASCII
IBM420   class=org.apache.harmony.niochar.charset.additional.IBM420
IBM424   class=org.apache.harmony.niochar.charset.additional.IBM424
IBM437   class=org.apache.harmony.niochar.charset.additional.IBM437
IBM500   class=org.apache.harmony.niochar.charset.additional.IBM500
IBM775   class=org.apache.harmony.niochar.charset.additional.IBM775
IBM852   class=org.apache.harmony.niochar.charset.additional.IBM852
IBM855   class=org.apache.harmony.niochar.charset.additional.IBM855
IBM870   class=org.apache.harmony.niochar.charset.additional.IBM870
IBM871   class=org.apache.harmony.niochar.charset.additional.IBM871
IBM918   class=org.apache.harmony.niochar.charset.additional.IBM918
ISO-2022-CN      class=com.ibm.icu4jni.charset.CharsetICU
ISO-2022-CN-EXT  class=com.ibm.icu4jni.charset.CharsetICU
ISO-2022-JP      class=com.ibm.icu4jni.charset.CharsetICU
ISO-2022-JP-2    class=com.ibm.icu4jni.charset.CharsetICU
ISO-2022-KR      class=com.ibm.icu4jni.charset.CharsetICU
ISO-8859-1       class=org.apache.harmony.niochar.charset.ISO_8859_1
ISO-8859-13      class=org.apache.harmony.niochar.charset.ISO_8859_13
ISO-8859-15      class=org.apache.harmony.niochar.charset.ISO_8859_15
ISO-8859-2       class=org.apache.harmony.niochar.charset.ISO_8859_2
ISO-8859-3       class=org.apache.harmony.niochar.charset.additional.ISO_8859_3
ISO-8859-4       class=org.apache.harmony.niochar.charset.ISO_8859_4
ISO-8859-5       class=org.apache.harmony.niochar.charset.ISO_8859_5
ISO-8859-6       class=org.apache.harmony.niochar.charset.additional.ISO_8859_6
ISO-8859-7       class=org.apache.harmony.niochar.charset.ISO_8859_7
ISO-8859-8       class=org.apache.harmony.niochar.charset.additional.ISO_8859_8
ISO-8859-9       class=org.apache.harmony.niochar.charset.ISO_8859_9
JIS_Encoding     class=com.ibm.icu4jni.charset.CharsetICU
JIS_X0201        class=com.ibm.icu4jni.charset.CharsetICU
KOI8-R   class=org.apache.harmony.niochar.charset.KOI8_R
KOI8-U   class=com.ibm.icu4jni.charset.CharsetICU
KSC_5601         
class=org.apache.harmony.niochar.charset.additional.x_windows_949
macintosh        class=com.ibm.icu4jni.charset.CharsetICU
SCSU     class=com.ibm.icu4jni.charset.CharsetICU
Shift_JIS        class=org.apache.harmony.niochar.charset.additional.windows_31j
TIS-620  class=org.apache.harmony.niochar.charset.additional.x_IBM874
US-ASCII         class=org.apache.harmony.niochar.charset.US_ASCII
UTF-16   class=org.apache.harmony.niochar.charset.UTF_16
UTF-16BE         class=org.apache.harmony.niochar.charset.UTF_16BE
UTF-16LE         class=org.apache.harmony.niochar.charset.UTF_16LE
UTF-32   class=com.ibm.icu4jni.charset.CharsetICU
UTF-32BE         class=com.ibm.icu4jni.charset.CharsetICU
UTF-32LE         class=com.ibm.icu4jni.charset.CharsetICU
UTF-7    class=com.ibm.icu4jni.charset.CharsetICU
UTF-8    class=org.apache.harmony.niochar.charset.UTF_8
windows-1250     class=org.apache.harmony.niochar.charset.CP_1250
windows-1251     class=org.apache.harmony.niochar.charset.CP_1251
windows-1252     class=org.apache.harmony.niochar.charset.CP_1252
windows-1253     class=org.apache.harmony.niochar.charset.CP_1253
windows-1254     class=org.apache.harmony.niochar.charset.CP_1254
windows-1255     
class=org.apache.harmony.niochar.charset.additional.windows_1255
windows-1256     
class=org.apache.harmony.niochar.charset.additional.windows_1256
windows-1257     class=org.apache.harmony.niochar.charset.CP_1257
windows-1258     class=com.ibm.icu4jni.charset.CharsetICU
x-ebcdic-xml-us  class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1006_P100-1995    
class=org.apache.harmony.niochar.charset.additional.x_IBM1006
x-ibm-1025_P100-1995    
class=org.apache.harmony.niochar.charset.additional.x_IBM1025
x-ibm-1047-s390  class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1097_P100-1995    
class=org.apache.harmony.niochar.charset.additional.x_IBM1097
x-ibm-1098_P100-1995    
class=org.apache.harmony.niochar.charset.additional.x_IBM1098
x-ibm-1112_P100-1995    
class=org.apache.harmony.niochar.charset.additional.x_IBM1112
x-ibm-1122_P100-1999    
class=org.apache.harmony.niochar.charset.additional.x_IBM1122
x-ibm-1123_P100-1995    
class=org.apache.harmony.niochar.charset.additional.x_IBM1123
x-ibm-1124_P100-1996    
class=org.apache.harmony.niochar.charset.additional.x_IBM1124
x-ibm-1125_P100-1997     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1129_P100-1997     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1130_P100-1997     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1131_P100-1997     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1132_P100-1998     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1133_P100-1997     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1137_P100-1999     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1140-s390  class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1142-s390  class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1143-s390  class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1144-s390  class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1145-s390  class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1146-s390  class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1147-s390  class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1148-s390  class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1149-s390  class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1153-s390  class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1153_P100-1999     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1154_P100-1999     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1155_P100-1999     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1156_P100-1999     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1157_P100-1999     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1158_P100-1999     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1160_P100-1999     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1162_P100-1999     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1164_P100-1999     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1250_P100-1995     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1251_P100-1995     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1252_P100-2000     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1253_P100-1995     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1254_P100-1995     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1255_P100-1995     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1256_P110-1997     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1257_P100-1995     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1258_P100-1997     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-12712-s390         class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-12712_P100-1998    class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1363_P110-1997     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1364_P110-1997     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1371_P100-1999     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1373_P100-2002     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1375_P100-2003    
class=org.apache.harmony.niochar.charset.additional.x_MS950_HKSCS
x-ibm-1386_P100-2002     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1388_P103-2001     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1390_P110-2003     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-1399_P110-2003     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-16684_P110-2003    class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-16804-s390         class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-16804_X110-1999    class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-25546      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-33722_P120-1999    class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-37-s390    class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-4899_P100-1998     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-4909_P100-1999     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-4971_P100-1999     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-5123_P100-1999     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-5351_P100-1998     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-5352_P100-1998     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-5353_P100-1998     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-737_P100-1997     
class=org.apache.harmony.niochar.charset.additional.x_IBM737
x-ibm-803_P100-1999      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-813_P100-1995      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-8482_P100-1999     class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-867_P100-1998      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-875_P100-1995     
class=org.apache.harmony.niochar.charset.additional.x_IBM875
x-ibm-901_P100-1999      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-902_P100-1999      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-930_P120-1999      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-933_P110-1995      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-935_P110-1999      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-937_P110-1999      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-939_P120-1999      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-942_P12A-1999      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-943_P130-1999      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-949_P110-1999      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-949_P11A-1999      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-950_P110-1999      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-954_P101-2000      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-964_P110-1999      class=com.ibm.icu4jni.charset.CharsetICU
x-ibm-971_P100-1995      class=com.ibm.icu4jni.charset.CharsetICU
x-IMAP-mailbox-name      class=com.ibm.icu4jni.charset.CharsetICU
x-iscii-be       class=com.ibm.icu4jni.charset.CharsetICU
x-iscii-de       class=com.ibm.icu4jni.charset.CharsetICU
x-iscii-gu       class=com.ibm.icu4jni.charset.CharsetICU
x-iscii-ka       class=com.ibm.icu4jni.charset.CharsetICU
x-iscii-ma       class=com.ibm.icu4jni.charset.CharsetICU
x-iscii-or       class=com.ibm.icu4jni.charset.CharsetICU
x-iscii-pa       class=com.ibm.icu4jni.charset.CharsetICU
x-iscii-ta       class=com.ibm.icu4jni.charset.CharsetICU
x-iscii-te       class=com.ibm.icu4jni.charset.CharsetICU
x-JIS7   class=com.ibm.icu4jni.charset.CharsetICU
x-JIS8   class=com.ibm.icu4jni.charset.CharsetICU
x-LMBCS-1        class=com.ibm.icu4jni.charset.CharsetICU
x-LMBCS-11       class=com.ibm.icu4jni.charset.CharsetICU
x-LMBCS-16       class=com.ibm.icu4jni.charset.CharsetICU
x-LMBCS-17       class=com.ibm.icu4jni.charset.CharsetICU
x-LMBCS-18       class=com.ibm.icu4jni.charset.CharsetICU
x-LMBCS-19       class=com.ibm.icu4jni.charset.CharsetICU
x-LMBCS-2        class=com.ibm.icu4jni.charset.CharsetICU
x-LMBCS-3        class=com.ibm.icu4jni.charset.CharsetICU
x-LMBCS-4        class=com.ibm.icu4jni.charset.CharsetICU
x-LMBCS-5        class=com.ibm.icu4jni.charset.CharsetICU
x-LMBCS-6        class=com.ibm.icu4jni.charset.CharsetICU
x-LMBCS-8        class=com.ibm.icu4jni.charset.CharsetICU
x-mac-centraleurroman    class=com.ibm.icu4jni.charset.CharsetICU
x-mac-cyrillic  
class=org.apache.harmony.niochar.charset.additional.x_MacCyrillic
x-mac-greek      class=org.apache.harmony.niochar.charset.additional.x_MacGreek
x-mac-turkish    
class=org.apache.harmony.niochar.charset.additional.x_MacTurkish
x-UTF16_OppositeEndian   class=com.ibm.icu4jni.charset.CharsetICU
x-UTF16_PlatformEndian   class=com.ibm.icu4jni.charset.CharsetICU
x-UTF32_OppositeEndian   class=com.ibm.icu4jni.charset.CharsetICU
x-UTF32_PlatformEndian   class=com.ibm.icu4jni.charset.CharsetICU
x-windows-874-2000       class=com.ibm.icu4jni.charset.CharsetICU
x-windows-949-2000       class=com.ibm.icu4jni.charset.CharsetICU


Thanks.
Vladimir.


On 4/9/07, Tony Wu <[EMAIL PROTECTED]> wrote:
> amazing work.
> generating the charsets...
>
> On 4/9/07, Vladimir Strigun <[EMAIL PROTECTED]> wrote:
> > On 4/9/07, Andrew Zhang <[EMAIL PROTECTED]> wrote:
> > > On 4/9/07, Vladimir Strigun <[EMAIL PROTECTED]> wrote:
> > > >
> > > > On 4/9/07, Andrew Zhang <[EMAIL PROTECTED]> wrote:
> > > > > Super cool!!!
> > > > > Does it mean we're not dependent on ICU any more?
> > > >
> > > > Unfortunately not all charsets supported with attached bundle. The
> > > > list of supported charsets you could find in README file.
> > >
> > >
> > > Hi Vladimir, not unfortunately at all. :)
> > >
> > > We're on the way to be independent of ICU, right? ;)
> >
> > Yes, you right,  we're on the way :)
> >
> >
> > > > On 4/9/07, Vladimir Strigun <[EMAIL PROTECTED]> wrote:
> > > > > >
> > > > > > Hi all!
> > > > > >
> > > > > > I'm happy to announce one more contribution to harmony on behalf of
> > > > > > Intel. Provided implementation of charset encoders/decoders is
> > > > > > intended to replace the ICU-based charsets encoding/decoding
> > > > > > operations. The code was developed in clean-room environment inside
> > > > > > Intel and I'd like you to play with it and include to current 
Harmony
> > > > > > tree.
> > > > > >
> > > > > > The package could be found there:
> > > > > > HARMONY-3593
> > > > > >
> > > > > > The algorithms for charsets encoding/decoding differs from that of
> > > > > > ICU, all charsets are generated from current Harmony or any other
> > > > > > implementation of Java and could be properly integrated into current
> > > > > > nio_char module. The archive contains source files for 6 charsets:
> > > > > > GB18030, US-ASCII, ISO-8859-1, UTF-8, UTF-16, UTF-16BE, UTF-16LE;
> > > > > > implementation of CharsetProvider; generator for other Charsets and
> > > > > > native part. I've tested the package with more that 90 charsets, and
> > > > > > all benchmarks and tests passed with new bundle. Additionally I have
> > > > > > significant boost for Dacapo.antlr and Dacapo.xalan benchmarks with
> > > > > > current Harmony tree on DRLVM and IBM VM. On DRLVM I have 2.5x boost
> > > > > > for antlr and ~5-8x for xalan.
> > > > > >
> > > > > > The main advantages of the package are the following:
> > > > > >   - Code for every charset is generated by CharsetGenerator, thus, 
if
> > > > > > some modification would be necessary we need just correct generator
> > > > > > and re-generate all sources.
> > > > > >   - We use 2 different encoders and decoders for java and direct
> > > > > > buffers. Since most applications use java heap buffers, unlike
> > > > > > existing implementation it doesn't produce lots of native calls to
> > > > > > perform encoding/decoding operations on the java buffers those
> > > > > > significantly improving performance. This is the main reason why we
> > > > > > have such a significant boost for Dacapo.
> > > > > >   - Charset tables for encoding/decoding are stored in appropriate
> > > > > > classes.
> > > > > >
> > > > > > Since the package contains implementation for 6 charsets only,
> > > > > > documentations how to generate and build additional charsets you 
could
> > > > > > find in README file from contributed package.
> > > > > >
> > > > > > Please do not hesitate to contact me for more details.
> > > > > >
> > > > > > Thanks,
> > > > > > Vladimir.
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Andrew Zhang
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew Zhang
> > >
> >
>
>
> --
> Tony Wu
> China Software Development Lab, IBM
>


--
Tony Wu
China Software Development Lab, IBM

Reply via email to