On 4/9/07, Yang Paulex <[EMAIL PROTECTED]> wrote:
2007/4/9, Vladimir Strigun <[EMAIL PROTECTED]>:
>
> Hi all!
>
> I'm happy to announce one more contribution to harmony on behalf of
> Intel. Provided implementation of charset encoders/decoders is
> intended to replace the ICU-based charsets encoding/decoding
> operations. The code was developed in clean-room environment inside
> Intel and I'd like you to play with it and include to current Harmony
> tree.
>
> The package could be found there:
> HARMONY-3593
>
> The algorithms for charsets encoding/decoding differs from that of
> ICU, all charsets are generated from current Harmony or any other
> implementation of Java and could be properly integrated into current
> nio_char module. The archive contains source files for 6 charsets:
> GB18030, US-ASCII, ISO-8859-1, UTF-8, UTF-16, UTF-16BE, UTF-16LE;
> implementation of CharsetProvider; generator for other Charsets and
> native part. I've tested the package with more that 90 charsets, and
> all benchmarks and tests passed with new bundle. Additionally I have
> significant boost for Dacapo.antlr and Dacapo.xalan benchmarks with
> current Harmony tree on DRLVM and IBM VM. On DRLVM I have 2.5x boost
> for antlr and ~5-8x for xalan.
>
> The main advantages of the package are the following:
> - Code for every charset is generated by CharsetGenerator, thus, if
> some modification would be necessary we need just correct generator
> and re-generate all sources.
> - We use 2 different encoders and decoders for java and direct
> buffers. Since most applications use java heap buffers, unlike
> existing implementation it doesn't produce lots of native calls to
> perform encoding/decoding operations on the java buffers those
> significantly improving performance. This is the main reason why we
> have such a significant boost for Dacapo.
> - Charset tables for encoding/decoding are stored in appropriate
> classes.
>
> Since the package contains implementation for 6 charsets only,
> documentations how to generate and build additional charsets you could
> find in README file from contributed package.
>
> Please do not hesitate to contact me for more details.
>
> Thanks,
> Vladimir.
>
Good work, Vladimir and team in Intel!
I'm also interested in a pure Java charset conversion provider for Harmony,
because the frequent JNI invocation in ICU4JNI(current Harmony charset
provider) may impair the performance when dealing with small chunk of bytes.
But I noticed that, in this contribution, US_ASCII, ISO_8859_1 and GB18030
are implemented in native C, just out of interest, any special reason not to
implemented in Java?
As I wrote ealier, 2 branches of code generated for every
encoder/decoder: java and native one. Native branch used only for
processing native byte buffers. Native branch could be easily removed
by small modification of generators, but performance measurements
shows that it's better to use native decoders/encoders
in case of native buffers.
Thanks.
Vladimir.
--
Paulex Yang
China Software Development laboratory
IBM