[jira] [Comment Edited] (COMPRESS-111) support for lzma files

Damjan Jovanovic (JIRA) Wed, 08 May 2013 00:43:22 -0700

    [ 
https://issues.apache.org/jira/browse/COMPRESS-111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651683#comment-13651683
 ]


Damjan Jovanovic edited comment on COMPRESS-111 at 5/8/13 7:42 AM:
-------------------------------------------------------------------

Compared to the normal way of extracting a file from an archive 
(read->decompress->write), the temp-file solution requires 
read->decompress->write-temp->read-temp->write, increasing I/O time 
proportionally to the size of the decompressed file (ie. at least doubling it), 
which is why I didn't even consider it.

It seems like LZMA2 breaks up the stream to be compressed into blocks, and can 
(de)compress the blocks independently of each other (which has the benefit of 
allowing fast, multi-threaded decompression). In Lasse's code, LZMA2InputStream 
uses O\(n) memory per block in the method RangeDecoder.prepareInputBuffer() 
called from LZMA2InputStream.decodeChunkHeader(). For LZMA however, the "block" 
is the entire file. Luckily it seems pretty easy to patch RangeDecoder to read 
incrementally. LZMA2InputStream probably has to also be modified, as I don't 
think LZMA has a chunk header. I don't know what else may be necessary.

Oh and even if LZMA is a legacy format, we still need it for reading .7z files, 
which always use LZMA for header compression (which is enabled by default).

                
      was (Author: damjan):
    Compared to the normal way of extracting a file from an archive 
(read->decompress->write), the temp-file solution requires 
read->decompress->write-temp->read-temp->write, increasing I/O time 
proportionally to the size of the decompressed file (ie. at least doubling it), 
which is why I didn't even consider it.

It seems like LZMA2 breaks up the stream to be compressed into blocks, and can 
(de)compress the blocks independently of each other (which has the benefit of 
allowing fast, multi-threaded decompression). In Lasse's code, LZMA2InputStream 
uses O(n) memory per block in the method RangeDecoder.prepareInputBuffer() 
called from LZMA2InputStream.decodeChunkHeader(). For LZMA however, the "block" 
is the entire file. Luckily it seems pretty easy to patch RangeDecoder to read 
incrementally. LZMA2InputStream probably has to also be modified, as I don't 
think LZMA has a chunk header. I don't know what else may be necessary.

Oh and even if LZMA is a legacy format, we still need it for reading .7z files, 
which always use LZMA for header compression (which is enabled by default).

                  
> support for lzma files
> ----------------------
>
>                 Key: COMPRESS-111
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-111
>             Project: Commons Compress
>          Issue Type: New Feature
>          Components: Compressors
>    Affects Versions: 1.0
>            Reporter: maurel jean francois
>         Attachments: compress-trunk-lzmaRev0.patch, 
> compress-trunk-lzmaRev1.patch
>
>
> adding support for compressing and decompressing of files with LZMA algoritm 
> (Lempel-Ziv-Markov chain-Algorithm)
> (see 
> http://markmail.org/search/?q=list%3Aorg.apache.commons.users/#query:list%3Aorg.apache.commons.users%2F+page:1+mid:syn4uuvbzusevtko+state:results)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (COMPRESS-111) support for lzma files

Reply via email to