[issue14366] Supporting bzip2 and lzma compression in zip files

2012-03-18 Thread Éric Araujo

Changes by Éric Araujo :


--
nosy: +eric.araujo
title: Suporting bzip2 and lzma compression in zip files -> Supporting bzip2 
and lzma compression in zip files

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14366] Supporting bzip2 and lzma compression in zip files

2012-03-18 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

ISTM that the LZMA support differs from the specification, see

http://www.pkware.com/documents/casestudies/APPNOTE.TXT

In particular, there appears to be no support for the EOS marker, which should 
be emitted when compressing.

Changing the LZMA module is fine as long as it
a) happens before the release of 3.3, and
b) is truly justified by the ZIP spec

I also recommend to split this issue into two: bzip support and lzma support. 
Adding bzip support might be easier.

--
nosy: +loewis

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14366] Supporting bzip2 and lzma compression in zip files

2012-03-18 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

I also think that create_version and extract_version need to be adjusted. 

Since LZMA is version 6.3, we need to check for any features that might be in a 
zip file of extract version 6.3 or lower that we do not support (such as PPMd+ 
compression, strong encryption, etc.). In general, if we claim to support 
version x.y, we need to recognize that a feature is used that is supported for 
x1.y1 (x1.y1 <= x.y) even if we don't support the feature.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14366] Supporting bzip2 and lzma compression in zip files

2012-03-19 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Thank you, Martin, for review and advices.

Lzma in zip format: 2-bytes version (LZMA SDK version, it has not relations 
with version of XZ Utils used by lzma module), 2-bytes properties size (I have 
not seen a value other than 5), N-bytes (N=5) property data, and raw compressed 
data (LZMA_RAW).

Lzma file format (LZMA_ALONE): 5-bytes property data, 8-bytes uncompressed size 
(~0 if unknown), and raw compressed data (LZMA_RAW).

7-Zip ignores version and supports only 5-bytes property data. Because the 
LZMA1 codec is declared obsolete, it is highly unlikely for new versions with 
properties size != 5. Nevertheless, it would be wise to create a lzma module 
functions for parsing the bytes to the codec properties and for dumping the 
codec properties to the bytes (this is functions lzma_lzma_props_encode() and 
lzma_lzma_props_decode() in liblzma). It is not necessary but desirable. I see 
no other reasonable choice but to hardcode some arbitrary version in the 
compressing and to ignore it in the decompressing.

This EOS marker is only helpful for stream zip-files when the size of the 
compressed data is not known beforehand and it is not possible to specify the 
following (see lzma-file-format.txt in liblzma docs). But that's must be 
another issue, the current implementation of the zipfile module does not work 
with non-seekable files (I hope to work on it later).

> I also recommend to split this issue into two: bzip support and lzma support.

Assuredly. I will create a new issue for bzip2, but what do I do with lzma? Do 
I need to rename this issue or create a new one? Does the lzma patch include 
the bzip2 patch, because the latter will contain the code necessary to support 
all codecs? Or should defer any work with lzma until the bzip2 support will 
commited?

I think we should add the ability to register new codecs. Support for PPMd, 
jpeg and WavPack is unlikely to emerge in the Python in the foreseeable future, 
but users of third-party libraries (such as PIL), will use the new codecs as 
needed.

> I also think that create_version and extract_version need to be adjusted.

Agree. Should we raise an exception when using new compressor if allowZip64 == 
False? Or set allowZip64 = True, if we explicitly use the new compressor?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14366] Supporting bzip2 and lzma compression in zip files

2012-03-19 Thread Martin v . Löwis

Martin v. Löwis  added the comment:

For EOS, please re-read the specification. If you then still think it is not 
needed, read it again :-) The documentation in liblzma is irrelevant, only the 
PKWARE specification matters. Take particular notice of the phrase 
"implementers should include the EOS marker whenever possible"

For bzip: propose a patch that does just the bzip stuff, and any infrastructure 
changes needed for it. Having the LZMA patch depend on this is fine.

Re: extensible compressors. I don't think that's needed. There is only a finite 
set, and if somebody wants to support some compression method, they should 
submit a patch.

Re: allowZip64. This depends on whether you create or extract. Not using a 
feature on creation is fine - we don't *have* to use all supported features. On 
extraction, if a feature is used and we support it, it should get used 
regardless of any configuration (note: I didn't check what allowZip64 currently 
does).

Re: 7zip. What it does is irrelevant. The ZIP format is defined by PKWARE, so 
if you want to look at a reference implementation, use theirs. Else use the 
spec.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14366] Supporting bzip2 and lzma compression in zip files

2012-03-20 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

Issue #14371: Add support for bzip2 compression to the zipfile module.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14366] Supporting bzip2 and lzma compression in zip files

2012-03-23 Thread Nadeem Vawda

Nadeem Vawda  added the comment:

I plan on doing a review of the patch, but it might be a week or two
before I have time to do it.

Regarding changes to lzma; exactly what is being proposed? If it's just
additional functions for encoding and decoding of filter specs, then I'm
fine with that (and it's not *necessary* to get it into 3.3, though it
would still be nice).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14366] Supporting bzip2 and lzma compression in zip files

2012-03-23 Thread Serhiy Storchaka

Changes by Serhiy Storchaka :


Added file: http://bugs.python.org/file25007/bzip2_and_lzma_in_zip_3.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14366] Supporting bzip2 and lzma compression in zip files

2012-03-23 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

> For EOS, please re-read the specification.

Well, nothing prevents the setting of this bit. Lzma raw compressor already 
appends EOS.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14366] Supporting bzip2 and lzma compression in zip files

2012-03-23 Thread Serhiy Storchaka

Serhiy Storchaka  added the comment:

> Regarding changes to lzma; exactly what is being proposed?

Yes, it's just additional functions for encoding and decoding of filter specs. 
But this is not necessary, the current implementation uses own functions (need 
the support of 
only LZMA1 format, which is unlikely to change).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com