[issue30661] Support tarfile.PAX_FORMAT in shutil.make_archive
Lars Gustäbel added the comment: tarfile does not use the `format` argument for reading, it will be detected. You can even mix different formats in one archive and tarfile will be fine with it. -- nosy: +lars.gustaebel ___ Python tracker <https://bugs.python.org/issue30661> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue30438] tarfile would fail to extract tarballs with files under R/O directories (twice)
Lars Gustäbel added the comment: Actually, it is not prohibited to add the same file to the same archive more than once. -- nosy: +lars.gustaebel ___ Python tracker <http://bugs.python.org/issue30438> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27590] tarfile module next() method hides exceptions
Lars Gustäbel added the comment: After all these years, it is not that easy to say why the decision to swallow this exception was made. One part surely was a lack of experience with the tar format itself and all of its implementations. The other part I guess was that it was supposed to avoid problems in case users did not use TarFile as an iterator. tarfile was developed on Python 2.2 which was the first release to feature iterators. The problem if you do random access on a tarfile or call TarFile.getmembers() is that first of all all the headers must be collected. If this fails somewhere in the middle, there is no way to resume the current operation and you get nothing out of the archive. -- ___ Python tracker <http://bugs.python.org/issue27590> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27590] tarfile module next() method hides exceptions
Lars Gustäbel added the comment: The question is what you're trying to accomplish. If you just want to prevent tarfile from stopping at the first invalid header in order to extract everything following it, you may use the ignore_zeros=True keyword argument. -- ___ Python tracker <http://bugs.python.org/issue27590> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23228] The tarfile module crashes when tarfile contains a symlink and unpack directory contain it too
Lars Gustäbel added the comment: I suck :-) It is hg revision bb94f6222fef. -- ___ Python tracker <http://bugs.python.org/issue23228> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23228] The tarfile module crashes when tarfile contains a symlink and unpack directory contain it too
Lars Gustäbel added the comment: TarFile.makelink() has a fallback mode in case the platform does not support links. Instead of a symlink or a hardlink it extracts the file it points to as long as it exists in the current archive. More precisely, makelink() calls os.symlink() and if one of the exceptions in the symlink_exception tuple is raised, it goes into fallback mode. r80944 introduced a regression because it replaced the WindowsError in symlink_exception with an OSError which is much less specific than a WindowsError. Since that change, the fallback is used everytime an OSError occurs, in Michael's case it is a FileExistsError, because the symlink is already there. The attached patch restores the old behavior. This might not be what you wanted, Michael, but at least, tarfile no longer crashes. -- Added file: http://bugs.python.org/file42780/windowserror.diff ___ Python tracker <http://bugs.python.org/issue23228> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue26877] tarfile use wrong code when read from fileobj
Lars Gustäbel added the comment: Please give us some example test code that shows us what goes wrong exactly. -- ___ Python tracker <http://bugs.python.org/issue26877> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8978] "tarfile.ReadError: file could not be opened successfully" if compiled without zlib
Lars Gustäbel added the comment: Closed after years of inactivity. -- resolution: -> works for me stage: -> resolved status: open -> closed ___ Python tracker <http://bugs.python.org/issue8978> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24838] tarfile.py: fix GNU and USTAR formats to properly handle paths with special characters that are encoded with more than one byte each
Lars Gustäbel added the comment: Sorry for the glitch, I suppose everything works fine now. -- status: open -> closed ___ Python tracker <http://bugs.python.org/issue24838> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10261] tarfile iterator without members caching
Lars Gustäbel added the comment: Closing after six years of inactivity. -- resolution: -> wont fix stage: -> resolved status: open -> closed ___ Python tracker <http://bugs.python.org/issue10261> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24838] tarfile.py: fix GNU and USTAR formats to properly handle paths with special characters that are encoded with more than one byte each
Changes by Lars Gustäbel : -- resolution: -> fixed stage: test needed -> resolved status: open -> closed versions: -Python 3.2, Python 3.3, Python 3.4 ___ Python tracker <http://bugs.python.org/issue24838> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24838] tarfile.py: fix GNU and USTAR formats to properly handle paths with special characters that are encoded with more than one byte each
Lars Gustäbel added the comment: Thanks for the detailed report and the patch. I haven't checked yet, but I suppose that the entire 3.x branch is affected. The first thing I have to do now is to come up with a comprehensive testcase. -- assignee: -> lars.gustaebel components: +Library (Lib) nosy: +lars.gustaebel stage: -> test needed versions: +Python 3.2, Python 3.3, Python 3.4, Python 3.6 ___ Python tracker <http://bugs.python.org/issue24838> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24259] tar.extractall() does not recognize unexpected EOF
Changes by Lars Gustäbel : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <http://bugs.python.org/issue24259> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24514] tarfile fails to extract archive (handled fine by gnu tar and bsdtar)
Changes by Lars Gustäbel : -- resolution: -> fixed stage: patch review -> resolved status: open -> closed ___ Python tracker <http://bugs.python.org/issue24514> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24259] tar.extractall() does not recognize unexpected EOF
Lars Gustäbel added the comment: Martin, I followed your suggestion to raise ReadError. This needed an additional change in copyfileobj() because it is used both for adding file data to an archive and extracting file data from an archive. But I think the patch is in good shape now. -- Added file: http://bugs.python.org/file39837/issue24259-3.x-3.diff ___ Python tracker <http://bugs.python.org/issue24259> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24514] tarfile fails to extract archive (handled fine by gnu tar and bsdtar)
Lars Gustäbel added the comment: I think a simple addition to the existing unittest for nti() will be enough. itn() seems well-tested, and nts() and stn() are not affected, because they don't operate on numbers. -- Added file: http://bugs.python.org/file39832/issue24514.diff ___ Python tracker <http://bugs.python.org/issue24514> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24514] tarfile fails to extract archive (handled fine by gnu tar and bsdtar)
Lars Gustäbel added the comment: Yes, Python 2.7 still gets bugfixes. However, there's still some work to do on the patch (maybe clean the code, write a test, add a NEWS entry). -- ___ Python tracker <http://bugs.python.org/issue24514> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24514] tarfile fails to extract archive (handled fine by gnu tar and bsdtar)
Lars Gustäbel added the comment: You're welcome :-D -- assignee: -> lars.gustaebel priority: normal -> low stage: -> patch review type: -> behavior versions: +Python 3.5, Python 3.6 ___ Python tracker <http://bugs.python.org/issue24514> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24514] tarfile fails to extract archive (handled fine by gnu tar and bsdtar)
Lars Gustäbel added the comment: The problem is that the tar archive has empty uid and gid fields, i.e. 7 spaces terminated with a null-byte. I attached a patch that solves the problem. -- keywords: +patch Added file: http://bugs.python.org/file39815/issue24514.diff ___ Python tracker <http://bugs.python.org/issue24514> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24465] Make tarfile have deterministic sorting
Lars Gustäbel added the comment: The patch would change behaviour for all tarfile users by the back door, that's why I am a little reluctant. And if the same can be achieved by a reasonably simple change to shutil I think it's just as well. -- ___ Python tracker <http://bugs.python.org/issue24465> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24465] Make tarfile have deterministic sorting
Lars Gustäbel added the comment: You don't need to patch the tarfile module. You could use os.walk() in shutil._make_tarball() and add each file with TarFile.add(recursive=False). -- nosy: +lars.gustaebel ___ Python tracker <http://bugs.python.org/issue24465> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24259] tar.extractall() does not recognize unexpected EOF
Changes by Lars Gustäbel : Added file: http://bugs.python.org/file39580/issue24259-2.x-2.diff ___ Python tracker <http://bugs.python.org/issue24259> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24259] tar.extractall() does not recognize unexpected EOF
Lars Gustäbel added the comment: @Martin: This is actually a nice idea that I hadn't thought of. I updated the Python 3 patch to use a seek() that moves to one byte before the next header block, reads the remaining byte and raises an error if it hits eof. The code looks rather clean compared to the previous patch, and it should perform like it always did. I am not quite sure about which exception type to use, ReadError is used in tarfile's header parsing code, but OSError is already used in tarfile.copyfileobj() and might be more like what the user expects. -- Added file: http://bugs.python.org/file39579/issue24259-3.x-2.diff ___ Python tracker <http://bugs.python.org/issue24259> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24259] tar.extractall() does not recognize unexpected EOF
Lars Gustäbel added the comment: @Thomas: I think your proposal adds a little too much complexity. Also, ExFileObject is not used during iteration, and we would like to detect broken archives without unpacking all the data segments first. I have written patches for Python 2 and 3. -- stage: -> patch review versions: +Python 3.4, Python 3.5, Python 3.6 Added file: http://bugs.python.org/file39543/issue24259-3.x.diff ___ Python tracker <http://bugs.python.org/issue24259> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24259] tar.extractall() does not recognize unexpected EOF
Changes by Lars Gustäbel : Added file: http://bugs.python.org/file39544/issue24259-2.x.diff ___ Python tracker <http://bugs.python.org/issue24259> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24259] tar.extractall() does not recognize unexpected EOF
Lars Gustäbel added the comment: @Martin: Yes, that's right, but only for cases where the TarFile.fileobj attribute is an actual file object. But, most of the time it is something special, e.g. GzipFile or sys.stdin, which makes random seeking either impossible or perform very badly. But thanks for your objection, I have to withdraw the statement I made under option 2.: compressed archives are much more common than uncompressed ones. We probably wouldn't lose too much if we no longer use seek() but read() in TarFile.next(). Reading in an uncompressed file is fast anyway. I have to think about this. -- ___ Python tracker <http://bugs.python.org/issue24259> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24259] tar.extractall() does not recognize unexpected EOF
Lars Gustäbel added the comment: I have written a test for the issue, so that we have a basis for discussion. There are four different scenarios where an unexpected eof can occur: inside a metadata block, directly after a metadata block, inside a data segment or directly after a data segment (i.e. missing end of archive marker). Case #1 is taken care of (TruncatedHeaderError). Case #4 is merely a violation of standard, which is neglectable. Case #2 and #3 are essentially the same. If a data segment is empty or incomplete this means data was lost when the archive was created which should not go unnoticed when reading it. (see _FileInFile.read() for the code in question) The problem is that, even after we have fixed case #2 and #4, we have no reliable way to detect an incomplete data segment unless we read it and count the bytes. If we simply iterate over the TarFile (e.g. do a TarFile.list()) the archive will appear intact. That is because in the TarFile.next() method we seek from one metadata block to the next, but we cannot simply detect if we seek beyond the end of the archive - except if we insist on the premise that each tar that we read is standards-compliant and comes with an end of archive marker (see case #4), which we probably should not. Three possible options come to my mind: 1. Add a warning to the documentation that in order to test the integrity of an archive the user has to read through all the data segments. 2. Instead of using seek() in TarFile.next() use read() to advance the file pointer. This is a negative impact on the performance in most cases. 3. Insist on an end of archive marker. This has the disadvantage that users may get an exception although everything is fine. -- assignee: -> lars.gustaebel keywords: +patch Added file: http://bugs.python.org/file39528/01-issue24259-test.diff ___ Python tracker <http://bugs.python.org/issue24259> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23649] tarfile not re-entrant for multi-threading
Lars Gustäbel added the comment: I agree with David that there is no need for tarfile to be thread-safe. There is nothing to be gained from distributing one TarFile object among multiple threads because it operates on a single resource which has to be accessed sequentially anyway. So, it seems best to me if we leave it like it is and let the user add locks around it as she/he sees fit. -- ___ Python tracker <http://bugs.python.org/issue23649> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue23193] Please support "numeric_owner" in tarfile
Lars Gustäbel added the comment: I would argue that a serious alternative to this patch is to simply override the TarFile.chown() method in a subclass. However, I'm not sure if this expects too much of the user. -- ___ Python tracker <http://bugs.python.org/issue23193> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22208] tarfile can't add in memory files (reopened)
Lars Gustäbel added the comment: Please provide a patch which allows easy addition of file-like objects (not only io.BytesIO) and directories, preferably hard and symbolic links, too. It would be nice to still be able to change attributes of a TarInfo before addition. Please also add tests. -- stage: -> needs patch type: behavior -> enhancement ___ Python tracker <http://bugs.python.org/issue22208> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22208] tarfile can't add in memory files (reopened)
Lars Gustäbel added the comment: I don't have an idea how to make it easier and still meet all/most requirements and without cluttering up the api. The way it currently works allows the programmer to control every tiny aspect of a tar member. Maybe it's best to simply add a new entry to the Examples section of the tarfile documentation. import tarfile, io with tarfile.open("sample.tar", mode="w") as tar: t = tarfile.TarInfo("foo") t.type = tarfile.DIRTYPE tar.addfile(t) b = "Hello world!".encode("ascii") t = tarfile.TarInfo("foo/bar") t.size = len(b) tar.addfile(t, io.BytesIO(b)) -- ___ Python tracker <http://bugs.python.org/issue22208> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22208] tarfile can't add in memory files (reopened)
Lars Gustäbel added the comment: tarfile needs to know the size of a file object beforehand because the tar header is written first followed by the file object's data. If the file object is not based on a real file descriptor, tarfile cannot simply use os.fstat() but the user has to pass the size somehow. And I doubt that it's a good idea to add size arguments to TarFile.add() and .addfile() because it might lead to confusion. I think tarfile is rather good at exposing the important parts of its low-level api to the programmer, in a way that still leaves some work for him to do but without getting in his way. I don't see why manually creating TarInfo objects is such a big deal. It is the far superior way because it offers the maximum freedom for the programmer - admittedly at the cost of a slightly steeper learning curve. And we have to account for many different use cases that people have. For example, you don't mention what you think creating directories from scratch should be like in your opinion. With regard to the usage of the size attribute the documentation for TarFile.addfile() says clearly: """Add the TarInfo object tarinfo to the archive. If fileobj is given, tarinfo.size bytes are read from it and added to the archive. You can create TarInfo objects using gettarinfo().""" -- ___ Python tracker <http://bugs.python.org/issue22208> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue22208] tarfile can't add in memory files (reopened)
Lars Gustäbel added the comment: Why overcomplicate things? import io, tarfile with tarfile.open("foo.tar", mode="w") as tar: b = "hello world!".encode("utf-8") t = tarfile.TarInfo("helloworld.txt") t.size = len(b) # this is crucial tar.addfile(t, io.BytesIO(b)) My answer to issue10369 was never supposed to be used as a reference on how to add file-like objects to a TarFile. I posted it as a simpler but equivalent version of the code of the original poster, which is why it looks "hackish". I think the documentation on TarFile.gettarinfo() is rather clear on how to use it (i.e. that it needs a file object with a valid file descriptor). Also, I think that the code above is intuitive and simple. -- assignee: -> lars.gustaebel priority: normal -> low type: -> behavior ___ Python tracker <http://bugs.python.org/issue22208> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21987] TarFile.getmember on directory requires trailing slash iff over 100 chars
Lars Gustäbel added the comment: Apparently, the problem is located in TarInfo._proc_gnulong(). I attached a patch. When tarfile reads an archive, it strips trailing slashes from all filenames, except GNUTYPE_LONGNAME headers, which is a bug. tarfile creates GNU_FORMAT tar files by default, hence it uses an additional GNUTYPE_LONGNAME header for filenames >100 chars. That's why tarfile_issue.py fails if used with PAX_FORMAT, because PAX_FORMAT doesn't have this bug. -- keywords: +patch Added file: http://bugs.python.org/file36045/issue21987.diff ___ Python tracker <http://bugs.python.org/issue21987> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16859] tarfile.TarInfo.fromtarfile does not check read() return value
Lars Gustäbel added the comment: The size of the buffer returned by TarInfo.fromtarfile() is checked by TarInfo.frombuf() which raises either an EmptyHeaderError or TruncatedHeaderError respectively. -- assignee: -> lars.gustaebel resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <http://bugs.python.org/issue16859> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue17153] tarfile extract fails when Unicode in pathname
Lars Gustäbel added the comment: IIRC, tarfile under 2.7 has never been explicitly unicode-safe, support for unicode objects is heterogeneous at best. The obvious work-around is to work exclusively with str objects. What we can't do is to decode the utf-8 pathname from the archive to a unicode object, because we have no way to detect an archive's encoding. We can either emit a warning if the user passes a unicode object to extract() or we implicitly encode the passed unicode object using TarFile.encoding, so that the os.path.join() succeeds. Unfortunately, I am not entirely sure if there was possibly a rationale behind the current behaviour of extract(). This needs more inspection. -- ___ Python tracker <http://bugs.python.org/issue17153> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21404] Compression level for tarfile/zipfile
Lars Gustäbel added the comment: That's right. But it is there. -- ___ Python tracker <http://bugs.python.org/issue21404> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21404] Compression level for tarfile/zipfile
Lars Gustäbel added the comment: tarfile.open() actually supports a compress_level argument for gzip and bzip2 and a preset argument for lzma compression. -- nosy: +lars.gustaebel ___ Python tracker <http://bugs.python.org/issue21404> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21109] tarfile: Traversal attack vulnerability
Lars Gustäbel added the comment: Let me present for discussion a proposal (and a patch with documentation) with an approach that is a little different, but in my opinion the most effective. I hope that it will appeal to all involved. My proposal consists of a new class SafeTarFile, that is a subclass and drop-in replacement for the TarFile class and can be employed whenever the user feels the necessity. It can be used the same way as TarFile, with the difference that SafeTarFile is equipped with a wide range of tests and as soon as it detects anything bad it interrupts the current operation with a SecurityError exception. This way damage is effectively averted, and it is up to the developer to decide whether he rejects the archive altogether (which is the obvious and recommended measure) or he wants to continue to process it in a subsequent step (on his own responsibility). To simplify a few common operations, SafeTarFile has three more methods: analyze(), filter() and is_safe(). These methods will allow access to the archive without SecurityError exceptions being raised. The analyze() method is a kind of low-level iterator that produces each TarInfo object together with a list of warnings (if the member is bad) as a tuple. This gives a developer access to all the information he needs to implement his own more differentiated way of handling bad archives. The filter() method is a convenience method that provides an iterator over all the "good" members of an archive leaving out all the "bad" ones. It can be used as an argument to SafeTarFile.extractall() for example. is_safe() is a high-level shortcut method that reduces the result of the analysis to a simple True or False. SafeTarFile has a variety of checks that test e.g. for bad pathnames, bad permissions and duplicate files. Also, to prevent denial-of-service scenarios, it enforces user-defined limits upon the archive, such as a maximum number of files or a maxmimum size of unpacked data. The main advantage of this approach is the higher degree of security. The practice of rewriting paths (e.g. like in Daniel.Garcia's patch) is error-prone, has side-effects and is hard to maintain because of its tendency towards regression. It just adds another layer of complexity to an already complex and delicate problem. SafeTarFile (or whatever it will be called) is backward compatible and easy to maintain, because it is an isolated addition to the tarfile module. It is easily subclassable to add more tests. It can be used as a standalone tool to check for bad archives and possible denial-of-service scenarios. Its analyze() method tells the user exactly what's wrong with the archive instead of keeping it away from him. Instead of silently extracting files to locations they weren't expected to be stored (i.e. after "fixing" their pathnames), SafeTarFile simply refuses to extract them at all. This way it is far more transparent and understandable to the user what happens. Feedback is welcome. -- assignee: -> lars.gustaebel priority: release blocker -> normal versions: -Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4 Added file: http://bugs.python.org/file35127/safetarfile-1.diff ___ Python tracker <http://bugs.python.org/issue21109> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21369] Extended modes for tarfile.TarFile()
Lars Gustäbel added the comment: Jup. That's it. -- priority: normal -> low resolution: -> not a bug stage: -> resolved status: open -> closed ___ Python tracker <http://bugs.python.org/issue21369> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21369] Extended modes for tarfile.TarFile()
Lars Gustäbel added the comment: You can pass keyword arguments to tarfile.open(), which will be passed to the TarFile constructor. You can also use pass fileobj arguments to tarfile.open(). -- ___ Python tracker <http://bugs.python.org/issue21369> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21369] Extended modes for tarfile.TarFile()
Lars Gustäbel added the comment: That was a design decision. What would be the advantage of having the TarFile class offer the compression itself? -- assignee: -> lars.gustaebel ___ Python tracker <http://bugs.python.org/issue21369> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18321] Multivolume support in tarfile module
Lars Gustäbel added the comment: Okay, let me tell you why I reject your contribution at this point. The patch you submitted may be well-suited for your purposes but it does not meet the requirements of a standard library implementation because it is not generic and comprehensive enough. It contains duplicate code, spelling mistakes and needless code changes e.g. in test_tarfile.py. It does not expose one set of volumes as one tar archive to the user. It is not possible to iterate over all members of all volumes in one go. It does not allow random-access. Actually, it does not implement complete multivolume support but only the "easy" parts. For example, it fails to read GNU tar archives that are split in the middle of a pax header block sequence. The other way around, when writing it makes a split only when it is inside the data part of a member. Hence, it is possible that a volume turns out smaller than max_volume_size which is not only inaccurate but also bad on a tape device. If you decide that you still want multivolume support in tarfile, feel free to reopen this issue with a new and significantly better patch. I gave you a number of clues on what I think is required. -- assignee: -> lars.gustaebel resolution: -> rejected status: open -> closed ___ Python tracker <http://bugs.python.org/issue18321> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18321] Multivolume support in tarfile module
Lars Gustäbel added the comment: > [...] but remember, we split a volume only in the middle of a big file, not > in any other case (AFAIK). Hopefully you don't get huge pax headers or > anything strange. [...] Hopefully? Sorry, but have you tested this? I did. I let GNU tar create a two volume archive that is split exactly between the two blocks of an XHDTYPE pax header. The result is terrifying. At the beginning of the second volume GNU tar creates an XGLTYPE header as the pax replacement for a GNUTYPE_MULTIVOL header, followed by an XHDTYPE header ("GNUFileParts") that somehow decorates the following REGTYPE(!) tar header that contains the continuation of the split XHDTYPE header data from the previous volume. After that comes the REGTYPE file that the split XHDTYPE header was actually meant for as decoration. I attached the archive to this issue. What happens if a GNUTYPE_LONGNAME header is split in two? I don't wanna know... > write() will need to take into account blocks (BLOCKSIZE), just to be able to > split the volumes correctly. It is mandatory to do the split on a block boundary (a multiple of 512). > * multivolume logic in write() needs read/write access to the current tarinfo > being written [...]. How do you propose this object should be accessed from > write()? I don't know and this problem seems to be quite hard to address with my approach. That's too bad. > > BTW, my version of GNU tar refuses to create compressed multiple-volume > > archives which is why I doubt the usefulness of this feature overall. > But it has multivolume support right? Which is what I am proposing here. > Also, you can gzip (or encrypt or anything) the volumes after creating the > volumes.. Yeah, it has multivolume support, but a very limited one that is not only weird but isn't even usable together with compression. And sure, I can compress and encrypt the volumes afterwards, but I can also create a compressed archive and pipe it through split(1) to split it into parts. Both ways create tar archives that are not readable by GNU tar because they're non-standard. So what? Please tell me, what is your actual personal use-case for this feature? -- Added file: http://bugs.python.org/file34798/split-xhdtype.tar.gz ___ Python tracker <http://bugs.python.org/issue18321> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21109] tarfile: Traversal attack vulnerability
Lars Gustäbel added the comment: In the past, our answer to these kinds of bug reports has always been that you must not extract an archive from an untrusted source without making sure that it has no malicious contents. And that tarfile conforms to the posix specifications with respect to extraction of files and pathname resolution. That's why we put this prominent warning in the documentation, and I think its advice still holds. I don't think that this issue should be marked as a release blocker, because the way tarfile currently works was a conscious decision, not an accident. tarfile does what it is designed to do: it processes a sequence of instructions to store a number of files in the filesystem. So the attack that is described by Daniel Garcia exploits neither a bug in tarfile nor a loophole in the tar archive format. A necessary condition for this attack to work is that the attacker has to trick the user into extracting the malicious archive first. After that, tarfile interprets the contained instructions word-for-word but still only within the boundaries defined by the user's privileges. I think it is obvious that it is potentially dangerous to extract tar archives we didn't create ourselves, because we actually give another person direct access to our filesystem. tarfile could mitigate some of the adverse effects, but this will not change the fact that it remains unsafe to use tarfile to a certain degree unless you use it with your own data or take reasonable precautions. Anyway, if we come to the conclusion that we want to eliminate this kind of attack, we must be aware that there is a lot more to do than that. tarfile as it is today is vulnerable to all known attacks against tar programs, and maybe even a few more that rely on its specific implementation. 1. Path traversal: The archive contains files names e.g. /etc/passwd or ../etc/passwd. 2. Symlink file attack: foo links to /etc/passwd. Another member named foo follows, its data overwrites the target file's data. 3. Symlink directory attack: foo links to /etc. The following member foo/passwd overwrites /etc/passwd. 4. Hardlink attack: Hardlink member foo links to /etc/passwd. tarfile creates the hardlink to /etc/passwd because it cannot find it inside the archive and falls back to the one in the filesystem. Another file named foo follows, its data overwrites /etc/passwd's data. 5. Permission manipulation: The archive contains an executable that is placed somewhere in PATH with its setuid flag set, so that an unprivileged user is able to gain root privileges. 6. Device file attacks: The archive contains a device node foo with the same major and minor numbers as an attached device. Another member named foo follows, its data is written to the device. 7. Huge zero file attacks: Bzip2 and lzma allow it to store huge blobs of repetetive data in tiny archives. When unpacked this data may fill up an entire filesystem. 8. Excessive memory usage: tarfile saves one TarInfo object per member it finds in an archive. If the archive contains several millions of members, this may fill up the memory. 9. Saving a huge sparse file: tarfile is unable to detect holes in sparse files and thus cannot store them efficiently. Archiving a huge sparse file can take very long and may lead to a very big archive that fills up the filesystem. Additionally, there are more issues mentioned in the GNU tar manual: https://www.gnu.org/software/tar/manual/html_node/Security.html In conclusion, I like to emphasize that tarfile is a library, it is no replacement for GNU tar. And as a library it has a different focus, it is merely a building block for an application, and has to be used with a little bit of responsibility. And even if we start to implement all possible checks, I'm afraid we never can do without a warning in the documentation that reminds everyone to keep an eye on what they're doing. -- ___ Python tracker <http://bugs.python.org/issue21109> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18321] Multivolume support in tarfile module
Lars Gustäbel added the comment: > It's also consistent with how the tar command works afaik, just listing the > contents of the current volume. No, GNU tar operates on the entirety of the archive and asks for the filename of the subsequent volume every time it hits eof in the current volume. > You don't want to directly do a plain open in there, because you want to be > able to deal with read/write modes, with gzip/bzip/Stream class. The example I gave is based on the idea that there is a TarVolumeSet class in the tarfile module that implements all the required file-object methods (e.g. read(), write(), seek(), etc.) and acts as if the sequence of volumes is actually one big file. It is passed to tarfile.open() as the fileobj argument. This TarVolumeSet class is supposed to be subclassable to let the user implement her/his own mode of operation. This way the open_volume() method can do whatever the user wants it to do. The TarVolumeSet class might as well have a new_volume() method for writing multivol archives, the example only covered the case of reading a multivol archive. BTW, my version of GNU tar refuses to create compressed multiple-volume archives which is why I doubt the usefulness of this feature overall. > [...] because a multivol tarfile is not exactly the same as a normal tarfile > chopped up. No, I think it is exactly that. The only purpose of the GNUTYPE_MULTIVOL header that is at the start of each subsequent volume is to give GNU tar the ability to detect if it is reading the correct volume. It is not essential and could as well be left out. -- ___ Python tracker <http://bugs.python.org/issue18321> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18321] Multivolume support in tarfile module
Lars Gustäbel added the comment: I had the following idea: What about a separate class, let's call it TarVolumeSet for now, that maps a set of (virtual) volumes onto one big file-like object. This TarVolumeSet will be passed to a TarFile constructor as the fileobj argument. It is subclassable for implementing custom behavior. class MyTarVolumeSet(tarfile.TarVolumeSet): def __init__(self, template): self.template = template def open_volume(self, volume_number): return open(self.template % volume_number, "rb") volumes = MyTarVolumesSet("test.tar.%03d") with tarfile.open(fileobj=volumes, mode="r:") as tar: for t in tar: print(t.name) In my opinion, this approach has a number of advantages: Most importantly, it separates the multi-volume code from the TarFile class, which reduces the invasiveness, complexity and maintenance burden of the original approach. The TarFile class would be totally agnostic about the concept of multiple volumes, TarVolumeSet looks just like another file-object to TarFile. Looks like the cleanest solution to me so far. -- ___ Python tracker <http://bugs.python.org/issue18321> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18321] Multivolume support in tarfile module
Lars Gustäbel added the comment: At first, I'd like to take back my comment on this patch being too complex for too little benefit. That is no real argument. Okay, I gave it a shot and I have a few more remarks: The patch does not support iterating over a multi-volume tar archive, e.g. for TarFile.list(). You must implement this. In my opinion, a tar archive is one logical unit even if it spans across multiple volumes. Thus, it is vital to have .getmembers() and .getnames() reflect the entirety of the archive, e.g. to support "if filename in .getnames()". I think it could be a good idea to store the volume number along each TarInfo object for random-access. By the way, which standard are you referring to? The only one I know of is POSIX pax which doesn't say anything about multiple volumes. -- ___ Python tracker <http://bugs.python.org/issue18321> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18321] Multivolume support in tarfile module
Lars Gustäbel added the comment: I cannot yet go into the details, because I have not tested the patch. The comments, docstrings and quoting are not very consistent with the rest of the module. There are a few spelling mistakes. The open_volume() method is more or less a copy of the open() method which is not optimal. The patch adds a lot of complexity to the tarfile module for a use case that only a few connoisseurs benefit from. It seems to alter some inherent TarFile mechanics that people might rely on, e.g. the members attribute contains only the members stored in the current volume not the overall entirety of members. Does this patch reliably allow random-access? Would it be possible/easier to add the same functionality using a separate class MultiVolumeTarFile instead? -- ___ Python tracker <http://bugs.python.org/issue18321> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13477] tarfile module should have a command line
Lars Gustäbel added the comment: I'd like to re-emphasize that it is best to keep the whole thing as simple and straight-forward as possible. Offer some basic operations and that's it. Although I am pretty accustomed to the original tar command line, I think we should copy zipfile's interface. It makes more sense to offer some kind of unified "Python" command line approach for archive access than keeping to old traditions. I agree with Victor that we don't really need support for stdin/stdout. It only complicates matters. If everybody still votes for stdin/stdout, I'd like to point out that tarfile supports compression detection for streams. It would be best to use mode="r|*" throughout because it works for both normal files and stdin. Use mode="w|(compression)" for writing to files and stdout accordingly. If we do not support stdin/stdout we no longer need all these compression options because for reading we do autodetection and for writing we could deduce the compression from the file extension (which is just some kind of autodetection too). Another side note: We should be aware of the effects discussed in issue17102 and issue1044. In my opinion tarfile as a library is obligated to behave like that, but maybe that's not acceptable for a command line tool. -- ___ Python tracker <http://bugs.python.org/issue13477> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15950] open() should not accept bool argument
New submission from Lars Gustäbel: Today I accidentally did this: open(True).read() Passing True as a file argument to open() does not fail, because a bool value is treated like an integer file descriptor (stdout in this case). Even worse is that the read() call hangs in an endless loop on my linux box. On windows I get an EBADF at least. Wouldn't it be better if open() checked explicitly for a bool argument and raises a TypeError? -- components: IO messages: 170550 nosy: lars.gustaebel priority: normal severity: normal status: open title: open() should not accept bool argument type: behavior versions: Python 3.2, Python 3.3 ___ Python tracker <http://bugs.python.org/issue15950> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15875] tarfile may not make @LongLink for non-ascii character
Lars Gustäbel added the comment: I prepared a patch that fixes this issue and adds a few tests. Please try if it works for you. -- keywords: +patch stage: -> patch review Added file: http://bugs.python.org/file27152/issue15875.diff ___ Python tracker <http://bugs.python.org/issue15875> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15875] tarfile may not make @LongLink for non-ascii character
Changes by Lars Gustäbel : -- assignee: -> lars.gustaebel nosy: +lars.gustaebel versions: +Python 3.3 ___ Python tracker <http://bugs.python.org/issue15875> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue15858] tarfile missing entries due to omitted uid/gid fields
Lars Gustäbel added the comment: Could you provide some sample data and code? I see the problem, but I cannot quite reproduce the behaviour you describe. In all of my testcases tarfile either throws an exception or successfully reads the archive, but never silently stops. -- assignee: -> lars.gustaebel ___ Python tracker <http://bugs.python.org/issue15858> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14810] Bug in tarfile
Changes by Lars Gustäbel : -- assignee: -> lars.gustaebel ___ Python tracker <http://bugs.python.org/issue14810> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14810] Bug in tarfile
Lars Gustäbel added the comment: This issue is related to issue13158 which deals with a GNU tar specific extension to the original tar format. In that issue a negative number in the uid/gid fields caused problems. In your case the problem is a negative mtime field. Reading these particular number fields was fixed in Python 3.2. You might be able to read the archive in question with that version. You should definitely try that. Besides that, I was unable to reproduce the error you report. I just did some tests and could not even open my test archive, because it was not recognized as a tar file. I didn't come as far as the os.utime() call. -- nosy: +lars.gustaebel ___ Python tracker <http://bugs.python.org/issue14810> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14807] Move tarfile.filemode() into stat module
Changes by Lars Gustäbel : -- nosy: +lars.gustaebel ___ Python tracker <http://bugs.python.org/issue14807> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13815] tarfile.ExFileObject can't be wrapped using io.TextIOWrapper
Lars Gustäbel added the comment: Okay, I close this issue now, as I think the problems are now resolved. -- status: open -> closed ___ Python tracker <http://bugs.python.org/issue13815> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13815] tarfile.ExFileObject can't be wrapped using io.TextIOWrapper
Lars Gustäbel added the comment: Okay, I attached a patch that I hope we can all agree upon. It restores the ExFileObject class as a small subclass of BufferedReader as Amaury suggested. Does the documentation have to be changed, too? It states that an io.BufferedReader object is returned by extractfile() not a subclass thereof. -- Added file: http://bugs.python.org/file25516/tarfile-exfileobj.diff ___ Python tracker <http://bugs.python.org/issue13815> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13815] tarfile.ExFileObject can't be wrapped using io.TextIOWrapper
Lars Gustäbel added the comment: In an earlier draft of my patch, I had kept ExFileObject as a subclass of BufferedReader, but I later decided against it. To use BufferedReader directly is in my opinion the cleaner solution. I admit that the change is not fully backward compatible. But a user can still write code that works for both 3.3 and the versions before. If he didn't subclass ExFileObject his code doesn't even need a change. If he subclassed ExFileObject, he might have a problem in either case: either the ExFileObject class is missing, or he may be unable to use it the way he did before, because all that's left of it is a stub subclass of BufferedReader. I am well aware that backward compatibility is most important, but I think it must still be allowed to change internal (and undocumented) APIs every now and then to clean things up a little. And of course, I did a code search before too, and found no code using ExFileObject. This actually doesn't surprise me, as there is really not much you can do with it. -- ___ Python tracker <http://bugs.python.org/issue13815> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13815] tarfile.ExFileObject can't be wrapped using io.TextIOWrapper
Lars Gustäbel added the comment: I did some tarfile spring cleaning: I removed the ExFileObject class completely as it was more or less a leftover from the old days. io.BufferedReader now does the job. So, as a side-effect, I close this issue as fixed. (BTW, this makes tarfile.py smaller by about 100 lines.) -- resolution: -> fixed stage: patch review -> committed/rejected status: open -> closed ___ Python tracker <http://bugs.python.org/issue13815> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14160] TarFile.extractfile fails to extract targets of top-level relative symlinks
Lars Gustäbel added the comment: Fixed. Thanks for the report. -- resolution: -> fixed status: open -> closed ___ Python tracker <http://bugs.python.org/issue14160> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10369] tarfile requires an actual file on disc; a file-like object is insufficient
Changes by Lars Gustäbel : -- resolution: -> invalid stage: -> committed/rejected status: open -> closed ___ Python tracker <http://bugs.python.org/issue10369> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14160] TarFile.extractfile fails to extract targets of top-level relative symlinks
Lars Gustäbel added the comment: Thanks for the report. Attached is a patch (against 3.2) that is supposed to fix the problem. -- keywords: +patch stage: -> patch review Added file: http://bugs.python.org/file24735/issue14160.diff ___ Python tracker <http://bugs.python.org/issue14160> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14160] TarFile.extractfile fails to extract targets of top-level relative symlinks
Changes by Lars Gustäbel : -- assignee: -> lars.gustaebel ___ Python tracker <http://bugs.python.org/issue14160> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14012] Misc tarfile fixes
Lars Gustäbel added the comment: I updated your patch: - I removed the "import as" bit completely and changed all occurrences of _open() to builtins.open() which is more readable and explanatory. - I object to changing the error messages in the 3.2 branch due to backwards compatibility, although I left them in the patch for now. (I changed the style of %-formatting with a single item tuple in order to match the coding style of the rest of the module.) - I inlined the shutil.copyfileobj() method to remove the shutil import. -- Added file: http://bugs.python.org/file24601/tarfile-misc-bugs-3.2-2.diff ___ Python tracker <http://bugs.python.org/issue14012> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14056] Misc doc changes for tarfile
Lars Gustäbel added the comment: a) Good point, a case of sloppy naming. b) IMO a table is a tad too much. The amount of different compression methods is still quite small. My patch proposes a simpler approach. c) A link to shutil is very useful. BTW, thanks for the effort. -- Added file: http://bugs.python.org/file24598/lars-comment.diff ___ Python tracker <http://bugs.python.org/issue14056> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14013] tarfile should expose supported formats
Lars Gustäbel added the comment: I think this is a reasonable proposal. I think it is good style to let tarfile figure out which supported compression methods are available instead of shutil or the user. So far I have no objections. Following 3.3's crypt module, I think the name `methods' is superior to `formats' (maybe `compression_methods' is even better). Also, crypt's concept of a sorted list from stronger to weaker could also make sense here: ["xz", "bz2", "gz"]. Why not? -- ___ Python tracker <http://bugs.python.org/issue14013> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14012] Misc tarfile fixes
Changes by Lars Gustäbel : -- assignee: -> lars.gustaebel ___ Python tracker <http://bugs.python.org/issue14012> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13935] Tarfile - Fixed GNU tar header base-256 handling
Lars Gustäbel added the comment: This has been fixed (issue13158, http://hg.python.org/cpython/rev/341008eab87d). Thanks anyway for the report. -- resolution: -> duplicate stage: -> committed/rejected status: open -> closed ___ Python tracker <http://bugs.python.org/issue13935> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13935] Tarfile - Fixed GNU tar header base-256 handling
Changes by Lars Gustäbel : -- assignee: -> lars.gustaebel nosy: +lars.gustaebel ___ Python tracker <http://bugs.python.org/issue13935> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13815] tarfile.ExFileObject can't be wrapped using io.TextIOWrapper
Changes by Lars Gustäbel : -- assignee: -> lars.gustaebel nosy: +lars.gustaebel ___ Python tracker <http://bugs.python.org/issue13815> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12926] tarfile tarinfo.extract*() broken with symlinks
Lars Gustäbel added the comment: This should be fixed now, thanks. -- resolution: -> fixed stage: -> committed/rejected status: open -> closed versions: +Python 3.3 ___ Python tracker <http://bugs.python.org/issue12926> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13702] relative symlinks in tarfile.extract broken (windows)
Lars Gustäbel added the comment: The dereference option is only used for archive creation, so the contents of the file a symbolic link is pointing to is added instead of the symbolic link itself. -- ___ Python tracker <http://bugs.python.org/issue13702> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13702] relative symlinks in tarfile.extract broken (windows)
Lars Gustäbel added the comment: You actually hit two bugs at the same time here: The target of the created symlink was not translated from unix to windows path delimiters and is therefore broken. The second bug is issue12926 which leads to the error in TarFile.makefile(). Brian, AFAIK all file-specific functions on windows accept forward slashes in pathnames, right? Has this been discussed in the course of the windows implementation of os.symlink()? I could certainly fix the slash translation in tarfile.py, but may be it's os.symlink() that should been fixed. -- dependencies: +tarfile tarinfo.extract*() broken with symlinks ___ Python tracker <http://bugs.python.org/issue13702> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13702] relative symlinks in tarfile.extract broken (windows)
Changes by Lars Gustäbel : -- assignee: -> lars.gustaebel nosy: +lars.gustaebel versions: +Python 3.3 ___ Python tracker <http://bugs.python.org/issue13702> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13639] UnicodeDecodeError when creating tar.gz with unicode name
Lars Gustäbel added the comment: I think we should wrap this up as soon as possible, because it has already absorbed too much of our time. The issue we discuss here is a tiny glitch triggered by a corner-case. My original idea was to fix it in a minimal sort of way that is backwards-compatible. There are at least 4 different solutions now: 1. Keep the patch. 2. Revert the patch, leave everything as it was as wontfix. 3. Don't write an FNAME field at all if the filename that is passed is a unicode string. 4. Rewrite the FNAME code the way Terry suggests. This seems to me like the most complex solution, because we have to fix gzip.py as well, because the code in question was originally taken from the gzip module. (BTW, both the tarfile and gzip module discard the FNAME field when a file is opened for reading.) My favorites are 1 and 3 ;-) -- assignee: -> lars.gustaebel priority: normal -> low ___ Python tracker <http://bugs.python.org/issue13639> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13639] UnicodeDecodeError when creating tar.gz with unicode name
Lars Gustäbel added the comment: I thought about that myself, too. It is clearly no new feature, it is really more some kind of a fix. Unicode pathnames given to tarfile.open() are just passed through to the open() function, which is why this always has been working, except for this particular case. There are 6 different possible write modes: "w:", "w:gz", "w:bz2", "w|", "w|gz" and "w|bz2". And the only one not working with a unicode pathname is "w|gz". Although admittedly tarfile.open() is not supposed to be used with a unicode path, people do it anyway, because they don't care, and because it works. The patch does not add a new broad functionality, it merely harmonises the way the six write modes work. Neither can we retroactively enforce using string pathnames at this point, nor should we let a user run into this strange error. The patch is very small and minimally invasive. The error message you get without the patch is completely incomprehensible. -- ___ Python tracker <http://bugs.python.org/issue13639> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5689] Support xz compression in tarfile module
Lars Gustäbel added the comment: Yes, that's much better. Thanks for the tip. -- Added file: http://bugs.python.org/file24086/lzma-preset.diff ___ Python tracker <http://bugs.python.org/issue5689> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5689] Support xz compression in tarfile module
Changes by Lars Gustäbel : Removed file: http://bugs.python.org/file24084/lzma-preset.diff ___ Python tracker <http://bugs.python.org/issue5689> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5689] Support xz compression in tarfile module
Lars Gustäbel added the comment: Wouldn't it be better then to use a default compresslevel of 6 in tarfile? I used level 9 in my patch without a particular reason, just because I thought 9 must be better than 6 ;-) -- Added file: http://bugs.python.org/file24084/lzma-preset.diff ___ Python tracker <http://bugs.python.org/issue5689> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13639] UnicodeDecodeError when creating tar.gz with unicode name
Lars Gustäbel added the comment: See http://bugs.python.org/issue11638#msg150029 -- ___ Python tracker <http://bugs.python.org/issue13639> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11638] python setup.py sdist --formats tar* crashes if version is unicode
Lars Gustäbel added the comment: Just for the record: The gzip format (defined in RFC 1952) allows storing the original filename (without the .gz suffix) in an additional field in the header (the FNAME field). Latin-1 (iso-8859-1) is required. It is ironic that this causes so much trouble, because it is never used. A gzip file without that field is prefectly valid. The gzip program for example stores the original filename by default but does not use it when decompressing unless it is explicitly told to do so with the -N/--name option. If no FNAME field is present in a gzipped file the gzip program just falls back on stripping the .gz suffix. -- ___ Python tracker <http://bugs.python.org/issue11638> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13639] UnicodeDecodeError when creating tar.gz with unicode name
Lars Gustäbel added the comment: tarfile under Python 2.x is not particularly designed to support unicode filenames (the gzip module does not support them either), but that should not be too hard to fix. -- keywords: +patch Added file: http://bugs.python.org/file24066/tarfile-stream-gzip-unicode-fix.diff ___ Python tracker <http://bugs.python.org/issue13639> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11638] python setup.py sdist --formats tar* crashes if version is unicode
Lars Gustäbel added the comment: Is there a good reason why the tarfile mode that is used is "w|gz"? It seems to me that this is not necessary, "w:gz" should be enough. "w|gz" is for special operations only (see the tarfile docs). -- nosy: +lars.gustaebel Added file: http://bugs.python.org/file24065/distutils_tarfile_fix.diff ___ Python tracker <http://bugs.python.org/issue11638> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5689] Support xz compression in tarfile module
Lars Gustäbel added the comment: Please, go ahead! -- ___ Python tracker <http://bugs.python.org/issue5689> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5689] Support xz compression in tarfile module
Lars Gustäbel added the comment: Thanks for the review, guys! I can't close this issue yet because it depends on #6715. -- resolution: -> fixed stage: needs patch -> committed/rejected ___ Python tracker <http://bugs.python.org/issue5689> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5689] Support xz compression in tarfile module
Lars Gustäbel added the comment: For those who want to test it first, I post the current state of the patch here. It is ready for commit, there are no failing tests. If nobody objects, I will apply it this weekend. -- Added file: http://bugs.python.org/file23880/2011-12-08-tarfile-lzma.diff ___ Python tracker <http://bugs.python.org/issue5689> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5689] Support xz compression in tarfile module
Lars Gustäbel added the comment: I will be happy to, but my spare time is limited right now, so this could take about a week. If this is a problem, please go ahead. -- ___ Python tracker <http://bugs.python.org/issue5689> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13477] tarfile module should have a command line
Lars Gustäbel added the comment: This is no bad idea. I recommend keeping it as simple as possible. I would definitely not be supportive of a full tar clone. List, extract, create - that should be enough. There are two possible command line choices: do what the zipfile module does or emulate tar. I am in favor of the latter. -- assignee: -> lars.gustaebel priority: normal -> low stage: test needed -> needs patch ___ Python tracker <http://bugs.python.org/issue13477> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13407] tarfile.getnames misses members again
Lars Gustäbel added the comment: Some testing reveals that the bz2 module < 3.3 cannot fully decompress the file in question. Only the first 900k are decompressed. Thus, this issue is not related to issue13158 or the tarfile module. -- nosy: +lars.gustaebel ___ Python tracker <http://bugs.python.org/issue13407> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13158] tarfile.TarFile.getmembers misses some entries
Lars Gustäbel added the comment: Thanks for the report. There was a problem decoding a special and rare kind of header field in the archive. The format of the archive is of very bad quality BTW ;-) -- resolution: -> fixed stage: -> committed/rejected status: open -> closed ___ Python tracker <http://bugs.python.org/issue13158> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13158] tarfile.TarFile.getmembers misses some entries
Changes by Lars Gustäbel : -- assignee: -> lars.gustaebel nosy: +lars.gustaebel versions: +Python 3.3 ___ Python tracker <http://bugs.python.org/issue13158> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13031] [PATCH] small speed-up for tarfile.py when unzipping tarballs
Changes by Lars Gustäbel : -- assignee: -> lars.gustaebel nosy: +lars.gustaebel priority: normal -> low versions: +Python 3.3 -Python 2.7, Python 3.2 ___ Python tracker <http://bugs.python.org/issue13031> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6715] xz compressor support
Lars Gustäbel added the comment: Today I played around with lzma support for tarfile based on your last patch (see issue5689). There are a few minor issues that I just wanted to mention, as they break the tarfile testsuite: - LZMAFile does not expose a name attribute. BZ2File doesn't either (not in 3.x anyway), but GzipFile does. - LZMAFile does not allow a 'b' in the mode argument, unlike GzipFile and BZ2File. - The bz2 module exposes many error conditions as standard Python exceptions, e.g. IOError, EOFError. The lzma module uses LZMAError for all errors without distinction. -- ___ Python tracker <http://bugs.python.org/issue6715> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5689] Support xz compression in tarfile module
Lars Gustäbel added the comment: Attached is a patch with the current state of my work on lzma integration into tarfile (17 test errors). -- assignee: -> lars.gustaebel keywords: +patch Added file: http://bugs.python.org/file23162/2011-09-15-tarfile-lzma.diff ___ Python tracker <http://bugs.python.org/issue5689> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12800] 'tarfile.StreamError: seeking backwards is not allowed' when extract symlink
Changes by Lars Gustäbel : -- assignee: -> lars.gustaebel ___ Python tracker <http://bugs.python.org/issue12800> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12926] tarfile tarinfo.extract*() broken with symlinks
Changes by Lars Gustäbel : -- assignee: -> lars.gustaebel ___ Python tracker <http://bugs.python.org/issue12926> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12841] Incorrect tarfile.py extraction
Lars Gustäbel added the comment: It's the low-level operating system aspects of tarfile that are very difficult to test, e.g. filesystem and operating system dependent features such as symbolic links, hard links, file permissions, ownership. It is not even possible to reliably determine the filesystem the testsuite currently runs on. Also, superuser privileges are needed for some operations to work, e.g. chown(). A testsuite is normally not run as root, so a test that depends on this will never get enough coverage. -- ___ Python tracker <http://bugs.python.org/issue12841> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12841] Incorrect tarfile.py extraction
Lars Gustäbel added the comment: Close as fixed. Thanks all! -- resolution: -> fixed stage: -> committed/rejected status: open -> closed ___ Python tracker <http://bugs.python.org/issue12841> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com