Matthew Gamble added the comment:

Hi,

I've recently been working on a Python module for the Adobe universal container 
format (UCF) which extends the zip specification - as part of this I wanted to 
be able to remove and rename files in an archive.

I discovered this issue when writing the module so realised there wasn't 
currently a solution - so I went down the rabbit hole.

I've attached a patch which supports the removal and renaming of files in a zip 
archive. You can also look at this python module in a git-repo which is a the 
same code but separated out into a class that extends ZipFile: 
https://github.com/gambl/zipextended.

The patch provides 4 main new "public" functions for the zipfile library:

- remove(self, zinfo_or_arcname):
- rename(self, zinfo_or_arcname, filename):
- commit(self):
- clone(self, file, filenames_or_infolist=None, ignore_hidden_files=False)

The patch is in part modelled on the rubyzip solution. Remove and rename will 
initially only update the ZipFile's infolist. Changes are then persisted via a 
commit function which can be called manually - or will be called automatically 
upon close. Commit will then clone the zipfile with the necessary changes to a 
temporary file and replace the original file when that operation has completed 
successfully.

An alternative to remove files without modifying the original is via the clone 
method directly. This is in the spirit of Serhiy's suggestion of filtering the 
content and not modifying the original. You can pass a list of filenames or 
fileinfos of the files to be included in the clone.
So that clone can be performed without decompressing and then recompressing the 
files in the archive I have added two functions write_compressed and 
read_compressed.

I have also attempted to address Serhiy's concern with respect to the 
tricky.zip - "hidden files" in between members of the archive. The clone method 
will by default retain any hidden files and maintain the same relative order in 
the archive. You can also elect to ignore the hidden files, and clone with just 
the files listed in the central directory.

I did have to modify the tricky.zip attached to this issue manually as the CRC 
of file two (with file three embedded) was incorrect - and would therefore fail 
testzip(). I'm not actually sure how one would create such an archive - but I 
think that it's valid according to the zip spec. I've actually included the 
modified version in the patch for a few of the tests.

I appreciate that this is a large-ish patch and may take some time to review - 
but as suggested in the comments - this wasn't as straight forward as is seems!

Look forward to your comments. 

The signatures of the main functions are described below:

remove(self, zinfo_or_arcname):

    Remove a member from the archive.

    Args:
      zinfo_or_arcname (ZipInfo, str) ZipInfo object or filename of the
        member.

    Raises:
      RuntimeError: If attempting to modify an Zip archive that is closed.
---

rename(self, zinfo_or_arcname, filename):

    Rename a member in the archive.

    Args:
      zinfo_or_arcname (ZipInfo, str): ZipInfo object or filename of the
        member.
      filename (str): the new name for the member.

    Raises:
      RuntimeError: If attempting to modify an Zip archive that is closed.


clone(self, file, filenames_or_infolist=None, ignore_hidden_files=False):

    Clone the a zip file using the given file (filename or filepointer).

    Args:
      file (File, str): file-like object or filename of file to write the
        new zip file to.
      filenames_or_infolist (list(str), list(ZipInfo), optional): list of
        members from this zip file to include in the new zip file.
      ignore_hidden_files (boolean): flag to indicate wether hidden files
        (data inbetween managed memebers of the archive) should be included.

    Returns:
        A new ZipFile object of the cloned zipfile open in append mode.

        If copying hidden files then clone will attempt to maintain the
        relative order between the files and members in the archive

commit(self):
     Commit any inline modifications (removal and rename) to the zip archive.

     This makes use of a temporary file to create a new zip archive with the
     required modifications and then replaces the original.

     This therefore requires write access to either the directory where the
     original zipfile lives, or to python's default tempfile location.

----------
nosy: +gambl
Added file: http://bugs.python.org/file38878/zipfile.remove_rename.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue6818>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to