Hi all! I'm happy to announce one more contribution to harmony on behalf of Intel. Provided implementation of charset encoders/decoders is intended to replace the ICU-based charsets encoding/decoding operations. The code was developed in clean-room environment inside Intel and I'd like you to play with it and include to current Harmony tree.
The package could be found there: HARMONY-3593 The algorithms for charsets encoding/decoding differs from that of ICU, all charsets are generated from current Harmony or any other implementation of Java and could be properly integrated into current nio_char module. The archive contains source files for 6 charsets: GB18030, US-ASCII, ISO-8859-1, UTF-8, UTF-16, UTF-16BE, UTF-16LE; implementation of CharsetProvider; generator for other Charsets and native part. I've tested the package with more that 90 charsets, and all benchmarks and tests passed with new bundle. Additionally I have significant boost for Dacapo.antlr and Dacapo.xalan benchmarks with current Harmony tree on DRLVM and IBM VM. On DRLVM I have 2.5x boost for antlr and ~5-8x for xalan. The main advantages of the package are the following: - Code for every charset is generated by CharsetGenerator, thus, if some modification would be necessary we need just correct generator and re-generate all sources. - We use 2 different encoders and decoders for java and direct buffers. Since most applications use java heap buffers, unlike existing implementation it doesn't produce lots of native calls to perform encoding/decoding operations on the java buffers those significantly improving performance. This is the main reason why we have such a significant boost for Dacapo. - Charset tables for encoding/decoding are stored in appropriate classes. Since the package contains implementation for 6 charsets only, documentations how to generate and build additional charsets you could find in README file from contributed package. Please do not hesitate to contact me for more details. Thanks, Vladimir.
