Hi,

> I am interested in using the H2 LZW implementation in another project

For others, we talk about
http://h2database.googlecode.com/svn/trunk/h2/src/main/org/h2/compress/CompressLZF.java
(LZF is part of the LZW family).

> (as is Hadoop, according to 
> https://issues.apache.org/jira/browse/HADOOP-4874),
> and found the decompression to be about 2x slower than Hadoop's LZO.

As far as I understand, Hadoops LZO is written in C/Assembler. Is
there a pure Java port of LZO?

The current LZF implementation is not that great if the data contains
many repeated bytes or long repeated sequences. It is optimized for
XML, HTML and text data (relatively short repeated sequences). I don't
know how typical Hadoop data looks like, maybe it makes sense to
create a special implementation optimized for Hadoop.

> When I looked at the code, I found replacing the sequence of copy
> operations with System.arraycopy() improved performance by
> approximately 15-20%.

Unfortunately, this doesn't always produce the same result.
System.arraycopy unfortunately doesn't always scan from start to end
if the input and output arrays overlap. That means an additional
if-condition is required, which could slow down decompression for
other types of data.

> I'm concerned that this is incorrect in some cases, though it is
> passing my correctness tests and that it may be slower in some
> circumstances I'm not hitting?

For small blocks it is probably a bit slower (depending on the virtual
machine, java version, and system).

Regards,
Thomas

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To post to this group, send email to h2-database@googlegroups.com
To unsubscribe from this group, send email to 
h2-database+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/h2-database?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to