lzop compatible CompressionCodec
--------------------------------
Key: HADOOP-2424
URL: https://issues.apache.org/jira/browse/HADOOP-2424
Project: Hadoop
Issue Type: Improvement
Components: io, native
Reporter: Chris Douglas
LzoCodec currently outputs at most {{io.compression.codec.lzo.buffersize}}
(default 64k)- less the compression overhead- bytes per write (HADOOP-2402) in
the following format:
{noformat}
[compressed block length(32)]
[compressed block]
{noformat}
lzop (lzo-backed command-line utility) writes blocks in the following format:
{noformat}
[uncompressed block length(32)]
[compressed block length (32)]
[Adler-32|CRC-32 checksum of uncompressed block (32)]
[Adler-32|CRC-32 checksum of compressed block (32)]
[compressed block]
{noformat}
There's an additional ~32 byte header to the file. I don't know of a standard,
but the lzop source should suffice.
Since we're using ".lzo" as the default extension, it's worth considering being
compatible with lzop, but not necessarily for all lzo-compressed blocks. For
example, SequenceFiles should use the existing LzoCodec format.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.