Hi Vladimir, Thanks for explanation, I'm testing it on win xp.
I encounter a minor problem when try to build native code. ....\trunk\modules\nio_char\src\main\native/niochar/windows was asked instead of .../src/native/niochar/shared according to readme. On 4/9/07, Vladimir Strigun <[EMAIL PROTECTED]> wrote:
On 4/9/07, Tony Wu <[EMAIL PROTECTED]> wrote: > I wonder if it is possible to make it as built-in charset provider and > make icu as an extension? Attached test bundle, instruction and patch for current code combined new implementation with ICU. So, I have the same 228 charsets available - about 90 charsets used from the new bundle, and not implemented charsets used from ICU. The full list of charsets supported with the current bundle + ICU: Adobe-Standard-Encoding class=com.ibm.icu4jni.charset.CharsetICU Big5 class=org.apache.harmony.niochar.charset.additional.Big5 Big5-HKSCS class=org.apache.harmony.niochar.charset.additional.Big5_HKSCS BOCU-1 class=com.ibm.icu4jni.charset.CharsetICU CESU-8 class=com.ibm.icu4jni.charset.CharsetICU cp850 class=org.apache.harmony.niochar.charset.additional.IBM850 cp851 class=com.ibm.icu4jni.charset.CharsetICU cp856 class=org.apache.harmony.niochar.charset.additional.x_IBM856 cp857 class=org.apache.harmony.niochar.charset.additional.IBM857 cp858 class=org.apache.harmony.niochar.charset.additional.IBM00858 cp860 class=org.apache.harmony.niochar.charset.additional.IBM860 cp861 class=org.apache.harmony.niochar.charset.additional.IBM861 cp862 class=org.apache.harmony.niochar.charset.additional.IBM862 cp863 class=org.apache.harmony.niochar.charset.additional.IBM863 cp864 class=org.apache.harmony.niochar.charset.additional.IBM864 cp865 class=org.apache.harmony.niochar.charset.additional.IBM865 cp866 class=org.apache.harmony.niochar.charset.IBM866 cp868 class=org.apache.harmony.niochar.charset.additional.IBM868 cp869 class=org.apache.harmony.niochar.charset.additional.IBM869 cp922 class=org.apache.harmony.niochar.charset.additional.x_IBM922 EUC-JP class=com.ibm.icu4jni.charset.CharsetICU EUC-KR class=org.apache.harmony.niochar.charset.additional.EUC_KR GB18030 class=org.apache.harmony.niochar.charset.additional.GB18030 GB2312 class=org.apache.harmony.niochar.charset.additional.GB2312 GB_2312-80 class=com.ibm.icu4jni.charset.CharsetICU GBK class=org.apache.harmony.niochar.charset.additional.GBK hp-roman8 class=com.ibm.icu4jni.charset.CharsetICU HZ-GB-2312 class=com.ibm.icu4jni.charset.CharsetICU IBM-Thai class=org.apache.harmony.niochar.charset.additional.IBM_Thai IBM01140 class=org.apache.harmony.niochar.charset.additional.IBM01140 IBM01141 class=org.apache.harmony.niochar.charset.additional.IBM01141 IBM01142 class=org.apache.harmony.niochar.charset.additional.IBM01142 IBM01143 class=org.apache.harmony.niochar.charset.additional.IBM01143 IBM01144 class=org.apache.harmony.niochar.charset.additional.IBM01144 IBM01145 class=org.apache.harmony.niochar.charset.additional.IBM01145 IBM01146 class=org.apache.harmony.niochar.charset.additional.IBM01146 IBM01147 class=org.apache.harmony.niochar.charset.additional.IBM01147 IBM01148 class=org.apache.harmony.niochar.charset.additional.IBM01148 IBM01149 class=org.apache.harmony.niochar.charset.additional.IBM01149 IBM037 class=org.apache.harmony.niochar.charset.additional.IBM037 IBM1026 class=org.apache.harmony.niochar.charset.additional.IBM1026 IBM1047 class=org.apache.harmony.niochar.charset.additional.IBM1047 IBM273 class=org.apache.harmony.niochar.charset.additional.IBM273 IBM277 class=org.apache.harmony.niochar.charset.additional.IBM277 IBM278 class=org.apache.harmony.niochar.charset.additional.IBM278 IBM280 class=org.apache.harmony.niochar.charset.additional.IBM280 IBM284 class=org.apache.harmony.niochar.charset.additional.IBM284 IBM285 class=org.apache.harmony.niochar.charset.additional.IBM285 IBM290 class=com.ibm.icu4jni.charset.CharsetICU IBM297 class=org.apache.harmony.niochar.charset.additional.IBM297 IBM367 class=org.apache.harmony.niochar.charset.US_ASCII IBM420 class=org.apache.harmony.niochar.charset.additional.IBM420 IBM424 class=org.apache.harmony.niochar.charset.additional.IBM424 IBM437 class=org.apache.harmony.niochar.charset.additional.IBM437 IBM500 class=org.apache.harmony.niochar.charset.additional.IBM500 IBM775 class=org.apache.harmony.niochar.charset.additional.IBM775 IBM852 class=org.apache.harmony.niochar.charset.additional.IBM852 IBM855 class=org.apache.harmony.niochar.charset.additional.IBM855 IBM870 class=org.apache.harmony.niochar.charset.additional.IBM870 IBM871 class=org.apache.harmony.niochar.charset.additional.IBM871 IBM918 class=org.apache.harmony.niochar.charset.additional.IBM918 ISO-2022-CN class=com.ibm.icu4jni.charset.CharsetICU ISO-2022-CN-EXT class=com.ibm.icu4jni.charset.CharsetICU ISO-2022-JP class=com.ibm.icu4jni.charset.CharsetICU ISO-2022-JP-2 class=com.ibm.icu4jni.charset.CharsetICU ISO-2022-KR class=com.ibm.icu4jni.charset.CharsetICU ISO-8859-1 class=org.apache.harmony.niochar.charset.ISO_8859_1 ISO-8859-13 class=org.apache.harmony.niochar.charset.ISO_8859_13 ISO-8859-15 class=org.apache.harmony.niochar.charset.ISO_8859_15 ISO-8859-2 class=org.apache.harmony.niochar.charset.ISO_8859_2 ISO-8859-3 class=org.apache.harmony.niochar.charset.additional.ISO_8859_3 ISO-8859-4 class=org.apache.harmony.niochar.charset.ISO_8859_4 ISO-8859-5 class=org.apache.harmony.niochar.charset.ISO_8859_5 ISO-8859-6 class=org.apache.harmony.niochar.charset.additional.ISO_8859_6 ISO-8859-7 class=org.apache.harmony.niochar.charset.ISO_8859_7 ISO-8859-8 class=org.apache.harmony.niochar.charset.additional.ISO_8859_8 ISO-8859-9 class=org.apache.harmony.niochar.charset.ISO_8859_9 JIS_Encoding class=com.ibm.icu4jni.charset.CharsetICU JIS_X0201 class=com.ibm.icu4jni.charset.CharsetICU KOI8-R class=org.apache.harmony.niochar.charset.KOI8_R KOI8-U class=com.ibm.icu4jni.charset.CharsetICU KSC_5601 class=org.apache.harmony.niochar.charset.additional.x_windows_949 macintosh class=com.ibm.icu4jni.charset.CharsetICU SCSU class=com.ibm.icu4jni.charset.CharsetICU Shift_JIS class=org.apache.harmony.niochar.charset.additional.windows_31j TIS-620 class=org.apache.harmony.niochar.charset.additional.x_IBM874 US-ASCII class=org.apache.harmony.niochar.charset.US_ASCII UTF-16 class=org.apache.harmony.niochar.charset.UTF_16 UTF-16BE class=org.apache.harmony.niochar.charset.UTF_16BE UTF-16LE class=org.apache.harmony.niochar.charset.UTF_16LE UTF-32 class=com.ibm.icu4jni.charset.CharsetICU UTF-32BE class=com.ibm.icu4jni.charset.CharsetICU UTF-32LE class=com.ibm.icu4jni.charset.CharsetICU UTF-7 class=com.ibm.icu4jni.charset.CharsetICU UTF-8 class=org.apache.harmony.niochar.charset.UTF_8 windows-1250 class=org.apache.harmony.niochar.charset.CP_1250 windows-1251 class=org.apache.harmony.niochar.charset.CP_1251 windows-1252 class=org.apache.harmony.niochar.charset.CP_1252 windows-1253 class=org.apache.harmony.niochar.charset.CP_1253 windows-1254 class=org.apache.harmony.niochar.charset.CP_1254 windows-1255 class=org.apache.harmony.niochar.charset.additional.windows_1255 windows-1256 class=org.apache.harmony.niochar.charset.additional.windows_1256 windows-1257 class=org.apache.harmony.niochar.charset.CP_1257 windows-1258 class=com.ibm.icu4jni.charset.CharsetICU x-ebcdic-xml-us class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1006_P100-1995 class=org.apache.harmony.niochar.charset.additional.x_IBM1006 x-ibm-1025_P100-1995 class=org.apache.harmony.niochar.charset.additional.x_IBM1025 x-ibm-1047-s390 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1097_P100-1995 class=org.apache.harmony.niochar.charset.additional.x_IBM1097 x-ibm-1098_P100-1995 class=org.apache.harmony.niochar.charset.additional.x_IBM1098 x-ibm-1112_P100-1995 class=org.apache.harmony.niochar.charset.additional.x_IBM1112 x-ibm-1122_P100-1999 class=org.apache.harmony.niochar.charset.additional.x_IBM1122 x-ibm-1123_P100-1995 class=org.apache.harmony.niochar.charset.additional.x_IBM1123 x-ibm-1124_P100-1996 class=org.apache.harmony.niochar.charset.additional.x_IBM1124 x-ibm-1125_P100-1997 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1129_P100-1997 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1130_P100-1997 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1131_P100-1997 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1132_P100-1998 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1133_P100-1997 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1137_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1140-s390 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1142-s390 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1143-s390 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1144-s390 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1145-s390 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1146-s390 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1147-s390 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1148-s390 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1149-s390 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1153-s390 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1153_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1154_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1155_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1156_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1157_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1158_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1160_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1162_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1164_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1250_P100-1995 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1251_P100-1995 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1252_P100-2000 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1253_P100-1995 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1254_P100-1995 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1255_P100-1995 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1256_P110-1997 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1257_P100-1995 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1258_P100-1997 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-12712-s390 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-12712_P100-1998 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1363_P110-1997 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1364_P110-1997 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1371_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1373_P100-2002 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1375_P100-2003 class=org.apache.harmony.niochar.charset.additional.x_MS950_HKSCS x-ibm-1386_P100-2002 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1388_P103-2001 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1390_P110-2003 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-1399_P110-2003 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-16684_P110-2003 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-16804-s390 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-16804_X110-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-25546 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-33722_P120-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-37-s390 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-4899_P100-1998 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-4909_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-4971_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-5123_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-5351_P100-1998 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-5352_P100-1998 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-5353_P100-1998 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-737_P100-1997 class=org.apache.harmony.niochar.charset.additional.x_IBM737 x-ibm-803_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-813_P100-1995 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-8482_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-867_P100-1998 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-875_P100-1995 class=org.apache.harmony.niochar.charset.additional.x_IBM875 x-ibm-901_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-902_P100-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-930_P120-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-933_P110-1995 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-935_P110-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-937_P110-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-939_P120-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-942_P12A-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-943_P130-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-949_P110-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-949_P11A-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-950_P110-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-954_P101-2000 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-964_P110-1999 class=com.ibm.icu4jni.charset.CharsetICU x-ibm-971_P100-1995 class=com.ibm.icu4jni.charset.CharsetICU x-IMAP-mailbox-name class=com.ibm.icu4jni.charset.CharsetICU x-iscii-be class=com.ibm.icu4jni.charset.CharsetICU x-iscii-de class=com.ibm.icu4jni.charset.CharsetICU x-iscii-gu class=com.ibm.icu4jni.charset.CharsetICU x-iscii-ka class=com.ibm.icu4jni.charset.CharsetICU x-iscii-ma class=com.ibm.icu4jni.charset.CharsetICU x-iscii-or class=com.ibm.icu4jni.charset.CharsetICU x-iscii-pa class=com.ibm.icu4jni.charset.CharsetICU x-iscii-ta class=com.ibm.icu4jni.charset.CharsetICU x-iscii-te class=com.ibm.icu4jni.charset.CharsetICU x-JIS7 class=com.ibm.icu4jni.charset.CharsetICU x-JIS8 class=com.ibm.icu4jni.charset.CharsetICU x-LMBCS-1 class=com.ibm.icu4jni.charset.CharsetICU x-LMBCS-11 class=com.ibm.icu4jni.charset.CharsetICU x-LMBCS-16 class=com.ibm.icu4jni.charset.CharsetICU x-LMBCS-17 class=com.ibm.icu4jni.charset.CharsetICU x-LMBCS-18 class=com.ibm.icu4jni.charset.CharsetICU x-LMBCS-19 class=com.ibm.icu4jni.charset.CharsetICU x-LMBCS-2 class=com.ibm.icu4jni.charset.CharsetICU x-LMBCS-3 class=com.ibm.icu4jni.charset.CharsetICU x-LMBCS-4 class=com.ibm.icu4jni.charset.CharsetICU x-LMBCS-5 class=com.ibm.icu4jni.charset.CharsetICU x-LMBCS-6 class=com.ibm.icu4jni.charset.CharsetICU x-LMBCS-8 class=com.ibm.icu4jni.charset.CharsetICU x-mac-centraleurroman class=com.ibm.icu4jni.charset.CharsetICU x-mac-cyrillic class=org.apache.harmony.niochar.charset.additional.x_MacCyrillic x-mac-greek class=org.apache.harmony.niochar.charset.additional.x_MacGreek x-mac-turkish class=org.apache.harmony.niochar.charset.additional.x_MacTurkish x-UTF16_OppositeEndian class=com.ibm.icu4jni.charset.CharsetICU x-UTF16_PlatformEndian class=com.ibm.icu4jni.charset.CharsetICU x-UTF32_OppositeEndian class=com.ibm.icu4jni.charset.CharsetICU x-UTF32_PlatformEndian class=com.ibm.icu4jni.charset.CharsetICU x-windows-874-2000 class=com.ibm.icu4jni.charset.CharsetICU x-windows-949-2000 class=com.ibm.icu4jni.charset.CharsetICU Thanks. Vladimir. > On 4/9/07, Tony Wu <[EMAIL PROTECTED]> wrote: > > amazing work. > > generating the charsets... > > > > On 4/9/07, Vladimir Strigun <[EMAIL PROTECTED]> wrote: > > > On 4/9/07, Andrew Zhang <[EMAIL PROTECTED]> wrote: > > > > On 4/9/07, Vladimir Strigun <[EMAIL PROTECTED]> wrote: > > > > > > > > > > On 4/9/07, Andrew Zhang <[EMAIL PROTECTED]> wrote: > > > > > > Super cool!!! > > > > > > Does it mean we're not dependent on ICU any more? > > > > > > > > > > Unfortunately not all charsets supported with attached bundle. The > > > > > list of supported charsets you could find in README file. > > > > > > > > > > > > Hi Vladimir, not unfortunately at all. :) > > > > > > > > We're on the way to be independent of ICU, right? ;) > > > > > > Yes, you right, we're on the way :) > > > > > > > > > > > On 4/9/07, Vladimir Strigun <[EMAIL PROTECTED]> wrote: > > > > > > > > > > > > > > Hi all! > > > > > > > > > > > > > > I'm happy to announce one more contribution to harmony on behalf of > > > > > > > Intel. Provided implementation of charset encoders/decoders is > > > > > > > intended to replace the ICU-based charsets encoding/decoding > > > > > > > operations. The code was developed in clean-room environment inside > > > > > > > Intel and I'd like you to play with it and include to current Harmony > > > > > > > tree. > > > > > > > > > > > > > > The package could be found there: > > > > > > > HARMONY-3593 > > > > > > > > > > > > > > The algorithms for charsets encoding/decoding differs from that of > > > > > > > ICU, all charsets are generated from current Harmony or any other > > > > > > > implementation of Java and could be properly integrated into current > > > > > > > nio_char module. The archive contains source files for 6 charsets: > > > > > > > GB18030, US-ASCII, ISO-8859-1, UTF-8, UTF-16, UTF-16BE, UTF-16LE; > > > > > > > implementation of CharsetProvider; generator for other Charsets and > > > > > > > native part. I've tested the package with more that 90 charsets, and > > > > > > > all benchmarks and tests passed with new bundle. Additionally I have > > > > > > > significant boost for Dacapo.antlr and Dacapo.xalan benchmarks with > > > > > > > current Harmony tree on DRLVM and IBM VM. On DRLVM I have 2.5x boost > > > > > > > for antlr and ~5-8x for xalan. > > > > > > > > > > > > > > The main advantages of the package are the following: > > > > > > > - Code for every charset is generated by CharsetGenerator, thus, if > > > > > > > some modification would be necessary we need just correct generator > > > > > > > and re-generate all sources. > > > > > > > - We use 2 different encoders and decoders for java and direct > > > > > > > buffers. Since most applications use java heap buffers, unlike > > > > > > > existing implementation it doesn't produce lots of native calls to > > > > > > > perform encoding/decoding operations on the java buffers those > > > > > > > significantly improving performance. This is the main reason why we > > > > > > > have such a significant boost for Dacapo. > > > > > > > - Charset tables for encoding/decoding are stored in appropriate > > > > > > > classes. > > > > > > > > > > > > > > Since the package contains implementation for 6 charsets only, > > > > > > > documentations how to generate and build additional charsets you could > > > > > > > find in README file from contributed package. > > > > > > > > > > > > > > Please do not hesitate to contact me for more details. > > > > > > > > > > > > > > Thanks, > > > > > > > Vladimir. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Best regards, > > > > > > Andrew Zhang > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Best regards, > > > > Andrew Zhang > > > > > > > > > > > > > -- > > Tony Wu > > China Software Development Lab, IBM > > > > > -- > Tony Wu > China Software Development Lab, IBM >
-- Tony Wu China Software Development Lab, IBM
