from:"Lars Gustäbel"

[issue30661] Support tarfile.PAX_FORMAT in shutil.make_archive

2019-04-01 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

tarfile does not use the `format` argument for reading, it will be detected. 
You can even mix different formats in one archive and tarfile will be fine with 
it.

--
nosy: +lars.gustaebel

___
Python tracker 
<https://bugs.python.org/issue30661>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue30438] tarfile would fail to extract tarballs with files under R/O directories (twice)

2017-05-25 Thread Lars Gustäbel


Lars Gustäbel added the comment:

Actually, it is not prohibited to add the same file to the same archive more 
than once.

--
nosy: +lars.gustaebel

___
Python tracker 
<http://bugs.python.org/issue30438>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27590] tarfile module next() method hides exceptions

2016-07-27 Thread Lars Gustäbel


Lars Gustäbel added the comment:

After all these years, it is not that easy to say why the decision to swallow 
this exception was made. One part surely was a lack of experience with the tar 
format itself and all of its implementations. The other part I guess was that 
it was supposed to avoid problems in case users did not use TarFile as an 
iterator. tarfile was developed on Python 2.2 which was the first release to 
feature iterators. The problem if you do random access on a tarfile or call 
TarFile.getmembers() is that first of all all the headers must be collected. If 
this fails somewhere in the middle, there is no way to resume the current 
operation and you get nothing out of the archive.

--

___
Python tracker 
<http://bugs.python.org/issue27590>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27590] tarfile module next() method hides exceptions

2016-07-24 Thread Lars Gustäbel


Lars Gustäbel added the comment:

The question is what you're trying to accomplish. If you just want to prevent 
tarfile from stopping at the first invalid header in order to extract 
everything following it, you may use the ignore_zeros=True keyword argument.

--

___
Python tracker 
<http://bugs.python.org/issue27590>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23228] The tarfile module crashes when tarfile contains a symlink and unpack directory contain it too

2016-05-08 Thread Lars Gustäbel


Lars Gustäbel added the comment:

I suck :-) It is hg revision bb94f6222fef.

--

___
Python tracker 
<http://bugs.python.org/issue23228>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23228] The tarfile module crashes when tarfile contains a symlink and unpack directory contain it too

2016-05-08 Thread Lars Gustäbel


Lars Gustäbel added the comment:

TarFile.makelink() has a fallback mode in case the platform does not support 
links. Instead of a symlink or a hardlink it extracts the file it points to as 
long as it exists in the current archive.

More precisely, makelink() calls os.symlink() and if one of the exceptions in 
the symlink_exception tuple is raised, it goes into fallback mode. r80944 
introduced a regression because it replaced the WindowsError in 
symlink_exception with an OSError which is much less specific than a 
WindowsError. Since that change, the fallback is used everytime an OSError 
occurs, in Michael's case it is a FileExistsError, because the symlink is 
already there.

The attached patch restores the old behavior. This might not be what you 
wanted, Michael, but at least, tarfile no longer crashes.

--
Added file: http://bugs.python.org/file42780/windowserror.diff

___
Python tracker 
<http://bugs.python.org/issue23228>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue26877] tarfile use wrong code when read from fileobj

2016-04-30 Thread Lars Gustäbel


Lars Gustäbel added the comment:

Please give us some example test code that shows us what goes wrong exactly.

--

___
Python tracker 
<http://bugs.python.org/issue26877>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue8978] "tarfile.ReadError: file could not be opened successfully" if compiled without zlib

2016-04-20 Thread Lars Gustäbel


Lars Gustäbel added the comment:

Closed after years of inactivity.

--
resolution:  -> works for me
stage:  -> resolved
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue8978>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24838] tarfile.py: fix GNU and USTAR formats to properly handle paths with special characters that are encoded with more than one byte each

2016-04-19 Thread Lars Gustäbel


Lars Gustäbel added the comment:

Sorry for the glitch, I suppose everything works fine now.

--
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue24838>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10261] tarfile iterator without members caching

2016-04-19 Thread Lars Gustäbel


Lars Gustäbel added the comment:

Closing after six years of inactivity.

--
resolution:  -> wont fix
stage:  -> resolved
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue10261>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24838] tarfile.py: fix GNU and USTAR formats to properly handle paths with special characters that are encoded with more than one byte each

2016-04-18 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
resolution:  -> fixed
stage: test needed -> resolved
status: open -> closed
versions:  -Python 3.2, Python 3.3, Python 3.4

___
Python tracker 
<http://bugs.python.org/issue24838>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24838] tarfile.py: fix GNU and USTAR formats to properly handle paths with special characters that are encoded with more than one byte each

2015-08-14 Thread Lars Gustäbel


Lars Gustäbel added the comment:

Thanks for the detailed report and the patch. I haven't checked yet, but I 
suppose that the entire 3.x branch is affected. The first thing I have to do 
now is to come up with a comprehensive testcase.

--
assignee:  -> lars.gustaebel
components: +Library (Lib)
nosy: +lars.gustaebel
stage:  -> test needed
versions: +Python 3.2, Python 3.3, Python 3.4, Python 3.6

___
Python tracker 
<http://bugs.python.org/issue24838>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24259] tar.extractall() does not recognize unexpected EOF

2015-07-06 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue24259>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24514] tarfile fails to extract archive (handled fine by gnu tar and bsdtar)

2015-07-02 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
resolution:  -> fixed
stage: patch review -> resolved
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue24514>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24259] tar.extractall() does not recognize unexpected EOF

2015-06-29 Thread Lars Gustäbel


Lars Gustäbel added the comment:

Martin, I followed your suggestion to raise ReadError. This needed an 
additional change in copyfileobj() because it is used both for adding file data 
to an archive and extracting file data from an archive.

But I think the patch is in good shape now.

--
Added file: http://bugs.python.org/file39837/issue24259-3.x-3.diff

___
Python tracker 
<http://bugs.python.org/issue24259>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24514] tarfile fails to extract archive (handled fine by gnu tar and bsdtar)

2015-06-29 Thread Lars Gustäbel


Lars Gustäbel added the comment:

I think a simple addition to the existing unittest for nti() will be enough. 
itn() seems well-tested, and nts() and stn() are not affected, because they 
don't operate on numbers.

--
Added file: http://bugs.python.org/file39832/issue24514.diff

___
Python tracker 
<http://bugs.python.org/issue24514>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24514] tarfile fails to extract archive (handled fine by gnu tar and bsdtar)

2015-06-26 Thread Lars Gustäbel


Lars Gustäbel added the comment:

Yes, Python 2.7 still gets bugfixes.

However, there's still some work to do on the patch (maybe clean the code, 
write a test, add a NEWS entry).

--

___
Python tracker 
<http://bugs.python.org/issue24514>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24514] tarfile fails to extract archive (handled fine by gnu tar and bsdtar)

2015-06-26 Thread Lars Gustäbel


Lars Gustäbel added the comment:

You're welcome :-D

--
assignee:  -> lars.gustaebel
priority: normal -> low
stage:  -> patch review
type:  -> behavior
versions: +Python 3.5, Python 3.6

___
Python tracker 
<http://bugs.python.org/issue24514>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24514] tarfile fails to extract archive (handled fine by gnu tar and bsdtar)

2015-06-26 Thread Lars Gustäbel


Lars Gustäbel added the comment:

The problem is that the tar archive has empty uid and gid fields, i.e. 7 spaces 
terminated with a null-byte.

I attached a patch that solves the problem.

--
keywords: +patch
Added file: http://bugs.python.org/file39815/issue24514.diff

___
Python tracker 
<http://bugs.python.org/issue24514>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24465] Make tarfile have deterministic sorting

2015-06-19 Thread Lars Gustäbel


Lars Gustäbel added the comment:

The patch would change behaviour for all tarfile users by the back door, that's 
why I am a little reluctant. And if the same can be achieved by a reasonably 
simple change to shutil I think it's just as well.

--

___
Python tracker 
<http://bugs.python.org/issue24465>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24465] Make tarfile have deterministic sorting

2015-06-18 Thread Lars Gustäbel


Lars Gustäbel added the comment:

You don't need to patch the tarfile module. You could use os.walk() in 
shutil._make_tarball() and add each file with TarFile.add(recursive=False).

--
nosy: +lars.gustaebel

___
Python tracker 
<http://bugs.python.org/issue24465>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24259] tar.extractall() does not recognize unexpected EOF

2015-05-31 Thread Lars Gustäbel


Changes by Lars Gustäbel :


Added file: http://bugs.python.org/file39580/issue24259-2.x-2.diff

___
Python tracker 
<http://bugs.python.org/issue24259>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24259] tar.extractall() does not recognize unexpected EOF

2015-05-31 Thread Lars Gustäbel


Lars Gustäbel added the comment:

@Martin:

This is actually a nice idea that I hadn't thought of. I updated the Python 3 
patch to use a seek() that moves to one byte before the next header block, 
reads the remaining byte and raises an error if it hits eof. The code looks 
rather clean compared to the previous patch, and it should perform like it 
always did.

I am not quite sure about which exception type to use, ReadError is used in 
tarfile's header parsing code, but OSError is already used in 
tarfile.copyfileobj() and might be more like what the user expects.

--
Added file: http://bugs.python.org/file39579/issue24259-3.x-2.diff

___
Python tracker 
<http://bugs.python.org/issue24259>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24259] tar.extractall() does not recognize unexpected EOF

2015-05-29 Thread Lars Gustäbel


Lars Gustäbel added the comment:

@Thomas:

I think your proposal adds a little too much complexity. Also, ExFileObject is 
not used during iteration, and we would like to detect broken archives without 
unpacking all the data segments first.

I have written patches for Python 2 and 3.

--
stage:  -> patch review
versions: +Python 3.4, Python 3.5, Python 3.6
Added file: http://bugs.python.org/file39543/issue24259-3.x.diff

___
Python tracker 
<http://bugs.python.org/issue24259>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24259] tar.extractall() does not recognize unexpected EOF

2015-05-29 Thread Lars Gustäbel


Changes by Lars Gustäbel :


Added file: http://bugs.python.org/file39544/issue24259-2.x.diff

___
Python tracker 
<http://bugs.python.org/issue24259>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24259] tar.extractall() does not recognize unexpected EOF

2015-05-28 Thread Lars Gustäbel


Lars Gustäbel added the comment:

@Martin:

Yes, that's right, but only for cases where the TarFile.fileobj attribute is an 
actual file object. But, most of the time it is something special, e.g. 
GzipFile or sys.stdin, which makes random seeking either impossible or perform 
very badly.

But thanks for your objection, I have to withdraw the statement I made under 
option 2.: compressed archives are much more common than uncompressed ones. We 
probably wouldn't lose too much if we no longer use seek() but read() in 
TarFile.next(). Reading in an uncompressed file is fast anyway. I have to think 
about this.

--

___
Python tracker 
<http://bugs.python.org/issue24259>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue24259] tar.extractall() does not recognize unexpected EOF

2015-05-28 Thread Lars Gustäbel


Lars Gustäbel added the comment:

I have written a test for the issue, so that we have a basis for discussion.

There are four different scenarios where an unexpected eof can occur: inside a 
metadata block, directly after a metadata block, inside a data segment or 
directly after a data segment (i.e. missing end of archive marker).

Case #1 is taken care of (TruncatedHeaderError).

Case #4 is merely a violation of standard, which is neglectable.

Case #2 and #3 are essentially the same. If a data segment is empty or 
incomplete this means data was lost when the archive was created which should 
not go unnoticed when reading it. (see _FileInFile.read() for the code in 
question)

The problem is that, even after we have fixed case #2 and #4, we have no 
reliable way to detect an incomplete data segment unless we read it and count 
the bytes. If we simply iterate over the TarFile (e.g. do a TarFile.list()) the 
archive will appear intact. That is because in the TarFile.next() method we 
seek from one metadata block to the next, but we cannot simply detect if we 
seek beyond the end of the archive - except if we insist on the premise that 
each tar that we read is standards-compliant and comes with an end of archive 
marker (see case #4), which we probably should not.

Three possible options come to my mind:

1. Add a warning to the documentation that in order to test the integrity of an 
archive the user has to read through all the data segments.
2. Instead of using seek() in TarFile.next() use read() to advance the file 
pointer. This is a negative impact on the performance in most cases.
3. Insist on an end of archive marker. This has the disadvantage that users may 
get an exception although everything is fine.

--
assignee:  -> lars.gustaebel
keywords: +patch
Added file: http://bugs.python.org/file39528/01-issue24259-test.diff

___
Python tracker 
<http://bugs.python.org/issue24259>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23649] tarfile not re-entrant for multi-threading

2015-03-16 Thread Lars Gustäbel


Lars Gustäbel added the comment:

I agree with David that there is no need for tarfile to be thread-safe. There 
is nothing to be gained from distributing one TarFile object among multiple 
threads because it operates on a single resource which has to be accessed 
sequentially anyway. So, it seems best to me if we leave it like it is and let 
the user add locks around it as she/he sees fit.

--

___
Python tracker 
<http://bugs.python.org/issue23649>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23193] Please support "numeric_owner" in tarfile

2015-01-13 Thread Lars Gustäbel


Lars Gustäbel added the comment:

I would argue that a serious alternative to this patch is to simply override 
the TarFile.chown() method in a subclass. However, I'm not sure if this expects 
too much of the user.

--

___
Python tracker 
<http://bugs.python.org/issue23193>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22208] tarfile can't add in memory files (reopened)

2014-08-21 Thread Lars Gustäbel


Lars Gustäbel added the comment:

Please provide a patch which allows easy addition of file-like objects (not 
only io.BytesIO) and directories, preferably hard and symbolic links, too. It 
would be nice to still be able to change attributes of a TarInfo before 
addition. Please also add tests.

--
stage:  -> needs patch
type: behavior -> enhancement

___
Python tracker 
<http://bugs.python.org/issue22208>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22208] tarfile can't add in memory files (reopened)

2014-08-20 Thread Lars Gustäbel


Lars Gustäbel added the comment:

I don't have an idea how to make it easier and still meet all/most requirements 
and without cluttering up the api. The way it currently works allows the 
programmer to control every tiny aspect of a tar member. Maybe it's best to 
simply add a new entry to the Examples section of the tarfile documentation.


import tarfile, io

with tarfile.open("sample.tar", mode="w") as tar:
t = tarfile.TarInfo("foo")
t.type = tarfile.DIRTYPE
tar.addfile(t)

b = "Hello world!".encode("ascii")

t = tarfile.TarInfo("foo/bar")
t.size = len(b)
tar.addfile(t, io.BytesIO(b))

--

___
Python tracker 
<http://bugs.python.org/issue22208>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22208] tarfile can't add in memory files (reopened)

2014-08-19 Thread Lars Gustäbel


Lars Gustäbel added the comment:

tarfile needs to know the size of a file object beforehand because the tar 
header is written first followed by the file object's data. If the file object 
is not based on a real file descriptor, tarfile cannot simply use os.fstat() 
but the user has to pass the size somehow. And I doubt that it's a good idea to 
add size arguments to TarFile.add() and .addfile() because it might lead to 
confusion.

I think tarfile is rather good at exposing the important parts of its low-level 
api to the programmer, in a way that still leaves some work for him to do but 
without getting in his way.  I don't see why manually creating TarInfo objects 
is such a big deal. It is the far superior way because it offers the maximum 
freedom for the programmer - admittedly at the cost of a slightly steeper 
learning curve. And we have to account for many different use cases that people 
have. For example, you don't mention what you think creating directories from 
scratch should be like in your opinion.

With regard to the usage of the size attribute the documentation for 
TarFile.addfile() says clearly:

"""Add the TarInfo object tarinfo to the archive. If fileobj is given, 
tarinfo.size bytes are read from it and added to the archive. You can create 
TarInfo objects using gettarinfo()."""

--

___
Python tracker 
<http://bugs.python.org/issue22208>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue22208] tarfile can't add in memory files (reopened)

2014-08-16 Thread Lars Gustäbel


Lars Gustäbel added the comment:

Why overcomplicate things?

import io, tarfile

with tarfile.open("foo.tar", mode="w") as tar:
b = "hello world!".encode("utf-8")

t = tarfile.TarInfo("helloworld.txt")
t.size = len(b) # this is crucial
tar.addfile(t, io.BytesIO(b))


My answer to issue10369 was never supposed to be used as a reference on how to 
add file-like objects to a TarFile. I posted it as a simpler but equivalent 
version of the code of the original poster, which is why it looks "hackish".

I think the documentation on TarFile.gettarinfo() is rather clear on how to use 
it (i.e. that it needs a file object with a valid file descriptor). Also, I 
think that the code above is intuitive and simple.

--
assignee:  -> lars.gustaebel
priority: normal -> low
type:  -> behavior

___
Python tracker 
<http://bugs.python.org/issue22208>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21987] TarFile.getmember on directory requires trailing slash iff over 100 chars

2014-07-23 Thread Lars Gustäbel


Lars Gustäbel added the comment:

Apparently, the problem is located in TarInfo._proc_gnulong(). I attached a 
patch.

When tarfile reads an archive, it strips trailing slashes from all filenames, 
except GNUTYPE_LONGNAME headers, which is a bug. tarfile creates GNU_FORMAT tar 
files by default, hence it uses an additional GNUTYPE_LONGNAME header for 
filenames >100 chars. That's why tarfile_issue.py fails if used with 
PAX_FORMAT, because PAX_FORMAT doesn't have this bug.

--
keywords: +patch
Added file: http://bugs.python.org/file36045/issue21987.diff

___
Python tracker 
<http://bugs.python.org/issue21987>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue16859] tarfile.TarInfo.fromtarfile does not check read() return value

2014-07-18 Thread Lars Gustäbel


Lars Gustäbel added the comment:

The size of the buffer returned by TarInfo.fromtarfile() is checked by 
TarInfo.frombuf() which raises either an EmptyHeaderError or 
TruncatedHeaderError respectively.

--
assignee:  -> lars.gustaebel
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue16859>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue17153] tarfile extract fails when Unicode in pathname

2014-07-08 Thread Lars Gustäbel


Lars Gustäbel added the comment:

IIRC, tarfile under 2.7 has never been explicitly unicode-safe, support for 
unicode objects is heterogeneous at best. The obvious work-around is to work 
exclusively with str objects.

What we can't do is to decode the utf-8 pathname from the archive to a unicode 
object, because we have no way to detect an archive's encoding. We can either 
emit a warning if the user passes a unicode object to extract() or we 
implicitly encode the passed unicode object using TarFile.encoding, so that the 
os.path.join() succeeds.

Unfortunately, I am not entirely sure if there was possibly a rationale behind 
the current behaviour of extract(). This needs more inspection.

--

___
Python tracker 
<http://bugs.python.org/issue17153>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21404] Compression level for tarfile/zipfile

2014-05-01 Thread Lars Gustäbel


Lars Gustäbel added the comment:

That's right. But it is there.

--

___
Python tracker 
<http://bugs.python.org/issue21404>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21404] Compression level for tarfile/zipfile

2014-05-01 Thread Lars Gustäbel


Lars Gustäbel added the comment:

tarfile.open() actually supports a compress_level argument for gzip and bzip2 
and a preset argument for lzma compression.

--
nosy: +lars.gustaebel

___
Python tracker 
<http://bugs.python.org/issue21404>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21109] tarfile: Traversal attack vulnerability

2014-05-01 Thread Lars Gustäbel


Lars Gustäbel added the comment:

Let me present for discussion a proposal (and a patch with documentation) with 
an approach that is a little different, but in my opinion the most effective. I 
hope that it will appeal to all involved.

My proposal consists of a new class SafeTarFile, that is a subclass and drop-in 
replacement for the TarFile class and can be employed whenever the user feels 
the necessity.  It can be used the same way as TarFile, with the difference 
that SafeTarFile is equipped with a wide range of tests and as soon as it 
detects anything bad it interrupts the current operation with a SecurityError 
exception. This way damage is effectively averted, and it is up to the 
developer to decide whether he rejects the archive altogether (which is the 
obvious and recommended measure) or he wants to continue to process it in a 
subsequent step (on his own responsibility).

To simplify a few common operations, SafeTarFile has three more methods: 
analyze(), filter() and is_safe(). These methods will allow access to the 
archive without SecurityError exceptions being raised. The analyze() method is 
a kind of low-level iterator that produces each TarInfo object together with a 
list of warnings (if the member is bad) as a tuple. This gives a developer 
access to all the information he needs to implement his own more differentiated 
way of handling bad archives. The filter() method is a convenience method that 
provides an iterator over all the "good" members of an archive leaving out all 
the "bad" ones. It can be used as an argument to SafeTarFile.extractall() for 
example. is_safe() is a high-level shortcut method that reduces the result of 
the analysis to a simple True or False.

SafeTarFile has a variety of checks that test e.g. for bad pathnames, bad 
permissions and duplicate files. Also, to prevent denial-of-service scenarios, 
it enforces user-defined limits upon the archive, such as a maximum number of 
files or a maxmimum size of unpacked data.

The main advantage of this approach is the higher degree of security. The 
practice of rewriting paths (e.g. like in Daniel.Garcia's patch) is 
error-prone, has side-effects and is hard to maintain because of its tendency 
towards regression. It just adds another layer of complexity to an already 
complex and delicate problem.

SafeTarFile (or whatever it will be called) is backward compatible and easy to 
maintain, because it is an isolated addition to the tarfile module. It is 
easily subclassable to add more tests. It can be used as a standalone tool to 
check for bad archives and possible denial-of-service scenarios. Its analyze() 
method tells the user exactly what's wrong with the archive instead of keeping 
it away from him. Instead of silently extracting files to locations they 
weren't expected to be stored (i.e. after "fixing" their pathnames), 
SafeTarFile simply refuses to extract them at all. This way it is far more 
transparent and understandable to the user what happens.

Feedback is welcome.

--
assignee:  -> lars.gustaebel
priority: release blocker -> normal
versions:  -Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4
Added file: http://bugs.python.org/file35127/safetarfile-1.diff

___
Python tracker 
<http://bugs.python.org/issue21109>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21369] Extended modes for tarfile.TarFile()

2014-04-28 Thread Lars Gustäbel


Lars Gustäbel added the comment:

Jup. That's it.

--
priority: normal -> low
resolution:  -> not a bug
stage:  -> resolved
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue21369>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21369] Extended modes for tarfile.TarFile()

2014-04-28 Thread Lars Gustäbel


Lars Gustäbel added the comment:

You can pass keyword arguments to tarfile.open(), which will be passed to the 
TarFile constructor. You can also use pass fileobj arguments to tarfile.open().

--

___
Python tracker 
<http://bugs.python.org/issue21369>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21369] Extended modes for tarfile.TarFile()

2014-04-27 Thread Lars Gustäbel


Lars Gustäbel added the comment:

That was a design decision. What would be the advantage of having the TarFile 
class offer the compression itself?

--
assignee:  -> lars.gustaebel

___
Python tracker 
<http://bugs.python.org/issue21369>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18321] Multivolume support in tarfile module

2014-04-14 Thread Lars Gustäbel


Lars Gustäbel added the comment:

Okay, let me tell you why I reject your contribution at this point.

The patch you submitted may be well-suited for your purposes but it does not 
meet the requirements of a standard library implementation because it is not 
generic and comprehensive enough.

It contains duplicate code, spelling mistakes and needless code changes e.g.  
in test_tarfile.py.

It does not expose one set of volumes as one tar archive to the user. It is not 
possible to iterate over all members of all volumes in one go. It does not 
allow random-access.

Actually, it does not implement complete multivolume support but only the 
"easy" parts.  For example, it fails to read GNU tar archives that are split in 
the middle of a pax header block sequence. The other way around, when writing 
it makes a split only when it is inside the data part of a member. Hence, it is 
possible that a volume turns out smaller than max_volume_size which is not only 
inaccurate but also bad on a tape device.

If you decide that you still want multivolume support in tarfile, feel free to 
reopen this issue with a new and significantly better patch. I gave you a 
number of clues on what I think is required.

--
assignee:  -> lars.gustaebel
resolution:  -> rejected
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue18321>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18321] Multivolume support in tarfile module

2014-04-13 Thread Lars Gustäbel


Lars Gustäbel added the comment:

> [...] but remember, we split a volume only in the middle of a big file, not 
> in any other case (AFAIK). Hopefully you don't get huge pax headers or 
> anything strange. [...] 

Hopefully? Sorry, but have you tested this? I did. I let GNU tar create a two 
volume archive that is split exactly between the two blocks of an XHDTYPE pax 
header.

The result is terrifying. At the beginning of the second volume GNU tar creates 
an XGLTYPE header as the pax replacement for a GNUTYPE_MULTIVOL header, 
followed by an XHDTYPE header ("GNUFileParts") that somehow decorates the 
following REGTYPE(!) tar header that contains the continuation of the split 
XHDTYPE header data from the previous volume. After that comes the REGTYPE file 
that the split XHDTYPE header was actually meant for as decoration.

I attached the archive to this issue.

What happens if a GNUTYPE_LONGNAME header is split in two? I don't wanna know...


> write() will need to take into account blocks (BLOCKSIZE), just to be able to 
> split the volumes correctly.

It is mandatory to do the split on a block boundary (a multiple of 512).


> * multivolume logic in write() needs read/write access to the current tarinfo 
> being written [...]. How do you propose this object should be accessed from 
> write()?

I don't know and this problem seems to be quite hard to address with my 
approach. That's too bad.


> > BTW, my version of GNU tar refuses to create compressed multiple-volume 
> > archives which is why I doubt the usefulness of this feature overall.
> But it has multivolume support right? Which is what I am proposing here. 
> Also, you can gzip (or encrypt or anything) the volumes after creating the 
> volumes..

Yeah, it has multivolume support, but a very limited one that is not only weird 
but isn't even usable together with compression. And sure, I can compress and 
encrypt the volumes afterwards, but I can also create a compressed archive and 
pipe it through split(1) to split it into parts. Both ways create tar archives 
that are not readable by GNU tar because they're non-standard. So what?

Please tell me, what is your actual personal use-case for this feature?

--
Added file: http://bugs.python.org/file34798/split-xhdtype.tar.gz

___
Python tracker 
<http://bugs.python.org/issue18321>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue21109] tarfile: Traversal attack vulnerability

2014-04-06 Thread Lars Gustäbel


Lars Gustäbel added the comment:

In the past, our answer to these kinds of bug reports has always been that you 
must not extract an archive from an untrusted source without making sure that 
it has no malicious contents. And that tarfile conforms to the posix 
specifications with respect to extraction of files and pathname resolution.  
That's why we put this prominent warning in the documentation, and I think its 
advice still holds.

I don't think that this issue should be marked as a release blocker, because 
the way tarfile currently works was a conscious decision, not an accident.  
tarfile does what it is designed to do: it processes a sequence of instructions 
to store a number of files in the filesystem. So the attack that is described 
by Daniel Garcia exploits neither a bug in tarfile nor a loophole in the tar 
archive format. A necessary condition for this attack to work is that the 
attacker has to trick the user into extracting the malicious archive first. 
After that, tarfile interprets the contained instructions word-for-word but 
still only within the boundaries defined by the user's privileges.

I think it is obvious that it is potentially dangerous to extract tar archives 
we didn't create ourselves, because we actually give another person direct 
access to our filesystem. tarfile could mitigate some of the adverse effects, 
but this will not change the fact that it remains unsafe to use tarfile to a 
certain degree unless you use it with your own data or take reasonable 
precautions.

Anyway, if we come to the conclusion that we want to eliminate this kind of 
attack, we must be aware that there is a lot more to do than that. tarfile as 
it is today is vulnerable to all known attacks against tar programs, and maybe 
even a few more that rely on its specific implementation.


1. Path traversal:

The archive contains files names e.g. /etc/passwd or ../etc/passwd.

2. Symlink file attack:

foo links to /etc/passwd.
Another member named foo follows, its data overwrites the target file's 
data.

3. Symlink directory attack:

foo links to /etc.
The following member foo/passwd overwrites /etc/passwd.

4. Hardlink attack:

Hardlink member foo links to /etc/passwd.
tarfile creates the hardlink to /etc/passwd because it cannot find it 
inside the archive and falls back to the one in the filesystem.
Another file named foo follows, its data overwrites /etc/passwd's data.

5. Permission manipulation:

The archive contains an executable that is placed somewhere in PATH with 
its setuid flag set, so that an unprivileged user is able to gain root 
privileges.

6. Device file attacks:

The archive contains a device node foo with the same major and minor 
numbers as an attached device.
Another member named foo follows, its data is written to the device.

7. Huge zero file attacks:

Bzip2 and lzma allow it to store huge blobs of repetetive data in tiny 
archives. When unpacked this data may fill up an entire filesystem.

8. Excessive memory usage:

tarfile saves one TarInfo object per member it finds in an archive. If the 
archive contains several millions of members, this may fill up the memory.

9. Saving a huge sparse file:

tarfile is unable to detect holes in sparse files and thus cannot store 
them efficiently. Archiving a huge sparse file can take very long and may lead 
to a very big archive that fills up the filesystem.


Additionally, there are more issues mentioned in the GNU tar manual:

  https://www.gnu.org/software/tar/manual/html_node/Security.html


In conclusion, I like to emphasize that tarfile is a library, it is no 
replacement for GNU tar. And as a library it has a different focus, it is 
merely a building block for an application, and has to be used with a little 
bit of responsibility. And even if we start to implement all possible checks, 
I'm afraid we never can do without a warning in the documentation that reminds 
everyone to keep an eye on what they're doing.

--

___
Python tracker 
<http://bugs.python.org/issue21109>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18321] Multivolume support in tarfile module

2014-03-12 Thread Lars Gustäbel


Lars Gustäbel added the comment:

> It's also consistent with how the tar command works afaik, just listing the 
> contents of the current volume.

No, GNU tar operates on the entirety of the archive and asks for the filename 
of the subsequent volume every time it hits eof in the current volume.

> You don't want to directly do a plain open in there, because you want to be 
> able to deal with read/write modes, with gzip/bzip/Stream class.

The example I gave is based on the idea that there is a TarVolumeSet class in 
the tarfile module that implements all the required file-object methods (e.g.  
read(), write(), seek(), etc.) and acts as if the sequence of volumes is 
actually one big file. It is passed to tarfile.open() as the fileobj argument. 
This TarVolumeSet class is supposed to be subclassable to let the user 
implement her/his own mode of operation. This way the open_volume() method can 
do whatever the user wants it to do. The TarVolumeSet class might as well have 
a new_volume() method for writing multivol archives, the example only covered 
the case of reading a multivol archive.

BTW, my version of GNU tar refuses to create compressed multiple-volume 
archives which is why I doubt the usefulness of this feature overall.

> [...] because a multivol tarfile is not exactly the same as a normal tarfile 
> chopped up.

No, I think it is exactly that. The only purpose of the GNUTYPE_MULTIVOL header 
that is at the start of each subsequent volume is to give GNU tar the ability 
to detect if it is reading the correct volume. It is not essential and could as 
well be left out.

--

___
Python tracker 
<http://bugs.python.org/issue18321>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18321] Multivolume support in tarfile module

2014-02-02 Thread Lars Gustäbel


Lars Gustäbel added the comment:

I had the following idea: What about a separate class, let's call it 
TarVolumeSet for now, that maps a set of (virtual) volumes onto one big 
file-like object. This TarVolumeSet will be passed to a TarFile constructor as 
the fileobj argument. It is subclassable for implementing custom behavior.


class MyTarVolumeSet(tarfile.TarVolumeSet):

def __init__(self, template):
self.template = template

def open_volume(self, volume_number):
return open(self.template % volume_number, "rb")

volumes = MyTarVolumesSet("test.tar.%03d")
with tarfile.open(fileobj=volumes, mode="r:") as tar:
for t in tar:
print(t.name)


In my opinion, this approach has a number of advantages: Most importantly, it 
separates the multi-volume code from the TarFile class, which reduces the 
invasiveness, complexity and maintenance burden of the original approach. The 
TarFile class would be totally agnostic about the concept of multiple volumes, 
TarVolumeSet looks just like another file-object to TarFile. Looks like the 
cleanest solution to me so far.

--

___
Python tracker 
<http://bugs.python.org/issue18321>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18321] Multivolume support in tarfile module

2014-01-30 Thread Lars Gustäbel


Lars Gustäbel added the comment:

At first, I'd like to take back my comment on this patch being too complex for 
too little benefit. That is no real argument.

Okay, I gave it a shot and I have a few more remarks:

The patch does not support iterating over a multi-volume tar archive, e.g. for 
TarFile.list(). You must implement this.

In my opinion, a tar archive is one logical unit even if it spans across 
multiple volumes. Thus, it is vital to have .getmembers() and .getnames() 
reflect the entirety of the archive, e.g. to support "if filename in 
.getnames()". I think it could be a good idea to store the volume number along 
each TarInfo object for random-access.

By the way, which standard are you referring to? The only one I know of is 
POSIX pax which doesn't say anything about multiple volumes.

--

___
Python tracker 
<http://bugs.python.org/issue18321>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue18321] Multivolume support in tarfile module

2014-01-29 Thread Lars Gustäbel


Lars Gustäbel added the comment:

I cannot yet go into the details, because I have not tested the patch.

The comments, docstrings and quoting are not very consistent with the rest of 
the module. There are a few spelling mistakes. The open_volume() method is more 
or less a copy of the open() method which is not optimal.

The patch adds a lot of complexity to the tarfile module for a use case that 
only a few connoisseurs benefit from. It seems to alter some inherent TarFile 
mechanics that people might rely on, e.g. the members attribute contains only 
the members stored in the current volume not the overall entirety of members. 
Does this patch reliably allow random-access?

Would it be possible/easier to add the same functionality using a separate 
class MultiVolumeTarFile instead?

--

___
Python tracker 
<http://bugs.python.org/issue18321>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13477] tarfile module should have a command line

2013-03-20 Thread Lars Gustäbel


Lars Gustäbel added the comment:

I'd like to re-emphasize that it is best to keep the whole thing as simple and 
straight-forward as possible. Offer some basic operations and that's it.

Although I am pretty accustomed to the original tar command line, I think we 
should copy zipfile's interface. It makes more sense to offer some kind of 
unified "Python" command line approach for archive access than keeping to old 
traditions.

I agree with Victor that we don't really need support for stdin/stdout. It only 
complicates matters. 

If everybody still votes for stdin/stdout, I'd like to point out that tarfile 
supports compression detection for streams. It would be best to use mode="r|*" 
throughout because it works for both normal files and stdin. Use 
mode="w|(compression)" for writing to files and stdout accordingly.

If we do not support stdin/stdout we no longer need all these compression 
options because for reading we do autodetection and for writing we could deduce 
the compression from the file extension (which is just some kind of 
autodetection too).

Another side note: We should be aware of the effects discussed in issue17102 
and issue1044. In my opinion tarfile as a library is obligated to behave like 
that, but maybe that's not acceptable for a command line tool.

--

___
Python tracker 
<http://bugs.python.org/issue13477>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue15950] open() should not accept bool argument

2012-09-16 Thread Lars Gustäbel


New submission from Lars Gustäbel:

Today I accidentally did this:

open(True).read()

Passing True as a file argument to open() does not fail, because a bool value 
is treated like an integer file descriptor (stdout in this case). Even worse is 
that the read() call hangs in an endless loop on my linux box. On windows I get 
an EBADF at least.

Wouldn't it be better if open() checked explicitly for a bool argument and 
raises a TypeError?

--
components: IO
messages: 170550
nosy: lars.gustaebel
priority: normal
severity: normal
status: open
title: open() should not accept bool argument
type: behavior
versions: Python 3.2, Python 3.3

___
Python tracker 
<http://bugs.python.org/issue15950>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue15875] tarfile may not make @LongLink for non-ascii character

2012-09-09 Thread Lars Gustäbel


Lars Gustäbel added the comment:

I prepared a patch that fixes this issue and adds a few tests. Please try if it 
works for you.

--
keywords: +patch
stage:  -> patch review
Added file: http://bugs.python.org/file27152/issue15875.diff

___
Python tracker 
<http://bugs.python.org/issue15875>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue15875] tarfile may not make @LongLink for non-ascii character

2012-09-08 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
assignee:  -> lars.gustaebel
nosy: +lars.gustaebel
versions: +Python 3.3

___
Python tracker 
<http://bugs.python.org/issue15875>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue15858] tarfile missing entries due to omitted uid/gid fields

2012-09-05 Thread Lars Gustäbel


Lars Gustäbel added the comment:

Could you provide some sample data and code? I see the problem, but I cannot 
quite reproduce the behaviour you describe. In all of my testcases tarfile 
either throws an exception or successfully reads the archive, but never 
silently stops.

--
assignee:  -> lars.gustaebel

___
Python tracker 
<http://bugs.python.org/issue15858>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14810] Bug in tarfile

2012-05-16 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
assignee:  -> lars.gustaebel

___
Python tracker 
<http://bugs.python.org/issue14810>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14810] Bug in tarfile

2012-05-15 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

This issue is related to issue13158 which deals with a GNU tar specific 
extension to the original tar format. In that issue a negative number in the 
uid/gid fields caused problems. In your case the problem is a negative mtime 
field.

Reading these particular number fields was fixed in Python 3.2. You might be 
able to read the archive in question with that version. You should definitely 
try that.

Besides that, I was unable to reproduce the error you report. I just did some 
tests and could not even open my test archive, because it was not recognized as 
a tar file. I didn't come as far as the os.utime() call.

--
nosy: +lars.gustaebel

___
Python tracker 
<http://bugs.python.org/issue14810>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14807] Move tarfile.filemode() into stat module

2012-05-14 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
nosy: +lars.gustaebel

___
Python tracker 
<http://bugs.python.org/issue14807>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13815] tarfile.ExFileObject can't be wrapped using io.TextIOWrapper

2012-05-14 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

Okay, I close this issue now, as I think the problems are now resolved.

--
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue13815>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13815] tarfile.ExFileObject can't be wrapped using io.TextIOWrapper

2012-05-10 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

Okay, I attached a patch that I hope we can all agree upon. It restores the 
ExFileObject class as a small subclass of BufferedReader as Amaury suggested. 
Does the documentation have to be changed, too? It states that an 
io.BufferedReader object is returned by extractfile() not a subclass thereof.

--
Added file: http://bugs.python.org/file25516/tarfile-exfileobj.diff

___
Python tracker 
<http://bugs.python.org/issue13815>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13815] tarfile.ExFileObject can't be wrapped using io.TextIOWrapper

2012-05-09 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

In an earlier draft of my patch, I had kept ExFileObject as a subclass of 
BufferedReader, but I later decided against it. To use BufferedReader directly 
is in my opinion the cleaner solution.

I admit that the change is not fully backward compatible. But a user can still 
write code that works for both 3.3 and the versions before. If he didn't 
subclass ExFileObject his code doesn't even need a change. If he subclassed 
ExFileObject, he might have a problem in either case: either the ExFileObject 
class is missing, or he may be unable to use it the way he did before, because 
all that's left of it is a stub subclass of BufferedReader.

I am well aware that backward compatibility is most important, but I think it 
must still be allowed to change internal (and undocumented) APIs every now and 
then to clean things up a little.
And of course, I did a code search before too, and found no code using 
ExFileObject. This actually doesn't surprise me, as there is really not much 
you can do with it.

--

___
Python tracker 
<http://bugs.python.org/issue13815>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13815] tarfile.ExFileObject can't be wrapped using io.TextIOWrapper

2012-05-05 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

I did some tarfile spring cleaning: I removed the ExFileObject class completely 
as it was more or less a leftover from the old days. io.BufferedReader now does 
the job. So, as a side-effect, I close this issue as fixed.

(BTW, this makes tarfile.py smaller by about 100 lines.)

--
resolution:  -> fixed
stage: patch review -> committed/rejected
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue13815>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14160] TarFile.extractfile fails to extract targets of top-level relative symlinks

2012-04-24 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

Fixed. Thanks for the report.

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue14160>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue10369] tarfile requires an actual file on disc; a file-like object is insufficient

2012-03-06 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
resolution:  -> invalid
stage:  -> committed/rejected
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue10369>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14160] TarFile.extractfile fails to extract targets of top-level relative symlinks

2012-03-05 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

Thanks for the report. Attached is a patch (against 3.2) that is supposed to 
fix the problem.

--
keywords: +patch
stage:  -> patch review
Added file: http://bugs.python.org/file24735/issue14160.diff

___
Python tracker 
<http://bugs.python.org/issue14160>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14160] TarFile.extractfile fails to extract targets of top-level relative symlinks

2012-03-05 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
assignee:  -> lars.gustaebel

___
Python tracker 
<http://bugs.python.org/issue14160>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14012] Misc tarfile fixes

2012-02-22 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

I updated your patch:

- I removed the "import as" bit completely and changed all occurrences of 
_open() to builtins.open() which is more readable and explanatory.

- I object to changing the error messages in the 3.2 branch due to backwards 
compatibility, although I left them in the patch for now. (I changed the style 
of %-formatting with a single item tuple in order to match the coding style of 
the rest of the module.)

- I inlined the shutil.copyfileobj() method to remove the shutil import.

--
Added file: http://bugs.python.org/file24601/tarfile-misc-bugs-3.2-2.diff

___
Python tracker 
<http://bugs.python.org/issue14012>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14056] Misc doc changes for tarfile

2012-02-22 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

a) Good point, a case of sloppy naming.

b) IMO a table is a tad too much. The amount of different compression methods 
is still quite small. My patch proposes a simpler approach.

c) A link to shutil is very useful.

BTW, thanks for the effort.

--
Added file: http://bugs.python.org/file24598/lars-comment.diff

___
Python tracker 
<http://bugs.python.org/issue14056>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14013] tarfile should expose supported formats

2012-02-18 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

I think this is a reasonable proposal. I think it is good style to let tarfile 
figure out which supported compression methods are available instead of shutil 
or the user. So far I have no objections.

Following 3.3's crypt module, I think the name `methods' is superior to 
`formats' (maybe `compression_methods' is even better). Also, crypt's concept 
of a sorted list from stronger to weaker could also make sense here: ["xz", 
"bz2", "gz"]. Why not?

--

___
Python tracker 
<http://bugs.python.org/issue14013>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue14012] Misc tarfile fixes

2012-02-18 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
assignee:  -> lars.gustaebel

___
Python tracker 
<http://bugs.python.org/issue14012>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13935] Tarfile - Fixed GNU tar header base-256 handling

2012-02-04 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

This has been fixed (issue13158, 
http://hg.python.org/cpython/rev/341008eab87d). Thanks anyway for the report.

--
resolution:  -> duplicate
stage:  -> committed/rejected
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue13935>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13935] Tarfile - Fixed GNU tar header base-256 handling

2012-02-03 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
assignee:  -> lars.gustaebel
nosy: +lars.gustaebel

___
Python tracker 
<http://bugs.python.org/issue13935>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13815] tarfile.ExFileObject can't be wrapped using io.TextIOWrapper

2012-01-18 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
assignee:  -> lars.gustaebel
nosy: +lars.gustaebel

___
Python tracker 
<http://bugs.python.org/issue13815>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12926] tarfile tarinfo.extract*() broken with symlinks

2012-01-05 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

This should be fixed now, thanks.

--
resolution:  -> fixed
stage:  -> committed/rejected
status: open -> closed
versions: +Python 3.3

___
Python tracker 
<http://bugs.python.org/issue12926>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13702] relative symlinks in tarfile.extract broken (windows)

2012-01-05 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

The dereference option is only used for archive creation, so the contents of 
the file a symbolic link is pointing to is added instead of the symbolic link 
itself.

--

___
Python tracker 
<http://bugs.python.org/issue13702>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13702] relative symlinks in tarfile.extract broken (windows)

2012-01-05 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

You actually hit two bugs at the same time here: The target of the created 
symlink was not translated from unix to windows path delimiters and is 
therefore broken. The second bug is issue12926 which leads to the error in 
TarFile.makefile(). 

Brian, AFAIK all file-specific functions on windows accept forward slashes in 
pathnames, right? Has this been discussed in the course of the windows 
implementation of os.symlink()? I could certainly fix the slash translation in 
tarfile.py, but may be it's os.symlink() that should been fixed.

--
dependencies: +tarfile tarinfo.extract*() broken with symlinks

___
Python tracker 
<http://bugs.python.org/issue13702>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13702] relative symlinks in tarfile.extract broken (windows)

2012-01-03 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
assignee:  -> lars.gustaebel
nosy: +lars.gustaebel
versions: +Python 3.3

___
Python tracker 
<http://bugs.python.org/issue13702>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-25 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

I think we should wrap this up as soon as possible, because it has already 
absorbed too much of our time. The issue we discuss here is a tiny glitch 
triggered by a corner-case. My original idea was to fix it in a minimal sort of 
way that is backwards-compatible.

There are at least 4 different solutions now:

1. Keep the patch.
2. Revert the patch, leave everything as it was as wontfix.
3. Don't write an FNAME field at all if the filename that is passed is a 
unicode string.
4. Rewrite the FNAME code the way Terry suggests. This seems to me like the 
most complex solution, because we have to fix gzip.py as well, because the code 
in question was originally taken from the gzip module. (BTW, both the tarfile 
and gzip module discard the FNAME field when a file is opened for reading.)

My favorites are 1 and 3 ;-)

--
assignee:  -> lars.gustaebel
priority: normal -> low

___
Python tracker 
<http://bugs.python.org/issue13639>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-24 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

I thought about that myself, too. It is clearly no new feature, it is really 
more some kind of a fix.

Unicode pathnames given to tarfile.open() are just passed through to the open() 
function, which is why this always has been working, except for this particular 
case. There are 6 different possible write modes: "w:", "w:gz", "w:bz2", "w|", 
"w|gz" and "w|bz2". And the only one not working with a unicode pathname is 
"w|gz". Although admittedly tarfile.open() is not supposed to be used with a 
unicode path, people do it anyway, because they don't care, and because it 
works. The patch does not add a new broad functionality, it merely harmonises 
the way the six write modes work.

Neither can we retroactively enforce using string pathnames at this point, nor 
should we let a user run into this strange error. The patch is very small and 
minimally invasive. The error message you get without the patch is completely 
incomprehensible.

--

___
Python tracker 
<http://bugs.python.org/issue13639>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue5689] Support xz compression in tarfile module

2011-12-23 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

Yes, that's much better. Thanks for the tip.

--
Added file: http://bugs.python.org/file24086/lzma-preset.diff

___
Python tracker 
<http://bugs.python.org/issue5689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue5689] Support xz compression in tarfile module

2011-12-23 Thread Lars Gustäbel


Changes by Lars Gustäbel :


Removed file: http://bugs.python.org/file24084/lzma-preset.diff

___
Python tracker 
<http://bugs.python.org/issue5689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue5689] Support xz compression in tarfile module

2011-12-23 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

Wouldn't it be better then to use a default compresslevel of 6 in tarfile? I 
used level 9 in my patch without a particular reason, just because I thought 9 
must be better than 6 ;-)

--
Added file: http://bugs.python.org/file24084/lzma-preset.diff

___
Python tracker 
<http://bugs.python.org/issue5689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-21 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

See http://bugs.python.org/issue11638#msg150029

--

___
Python tracker 
<http://bugs.python.org/issue13639>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11638] python setup.py sdist --formats tar* crashes if version is unicode

2011-12-21 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

Just for the record:

The gzip format (defined in RFC 1952) allows storing the original filename 
(without the .gz suffix) in an additional field in the header (the FNAME 
field). Latin-1 (iso-8859-1) is required. It is ironic that this causes so much 
trouble, because it is never used. A gzip file without that field is prefectly 
valid. The gzip program for example stores the original filename by default but 
does not use it when decompressing unless it is explicitly told to do so with 
the -N/--name option. If no FNAME field is present in a gzipped file the gzip 
program just falls back on stripping the .gz suffix.

--

___
Python tracker 
<http://bugs.python.org/issue11638>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13639] UnicodeDecodeError when creating tar.gz with unicode name

2011-12-21 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

tarfile under Python 2.x is not particularly designed to support unicode 
filenames (the gzip module does not support them either), but that should not 
be too hard to fix.

--
keywords: +patch
Added file: 
http://bugs.python.org/file24066/tarfile-stream-gzip-unicode-fix.diff

___
Python tracker 
<http://bugs.python.org/issue13639>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue11638] python setup.py sdist --formats tar* crashes if version is unicode

2011-12-21 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

Is there a good reason why the tarfile mode that is used is "w|gz"? It seems to 
me that this is not necessary, "w:gz" should be enough. "w|gz" is for special 
operations only (see the tarfile docs).

--
nosy: +lars.gustaebel
Added file: http://bugs.python.org/file24065/distutils_tarfile_fix.diff

___
Python tracker 
<http://bugs.python.org/issue11638>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue5689] Support xz compression in tarfile module

2011-12-12 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

Please, go ahead!

--

___
Python tracker 
<http://bugs.python.org/issue5689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue5689] Support xz compression in tarfile module

2011-12-10 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

Thanks for the review, guys! I can't close this issue yet because it depends on 
#6715.

--
resolution:  -> fixed
stage: needs patch -> committed/rejected

___
Python tracker 
<http://bugs.python.org/issue5689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue5689] Support xz compression in tarfile module

2011-12-08 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

For those who want to test it first, I post the current state of the patch 
here. It is ready for commit, there are no failing tests. If nobody objects, I 
will apply it this weekend.

--
Added file: http://bugs.python.org/file23880/2011-12-08-tarfile-lzma.diff

___
Python tracker 
<http://bugs.python.org/issue5689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue5689] Support xz compression in tarfile module

2011-12-01 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

I will be happy to, but my spare time is limited right now, so this could take 
about a week. If this is a problem, please go ahead.

--

___
Python tracker 
<http://bugs.python.org/issue5689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13477] tarfile module should have a command line

2011-11-26 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

This is no bad idea. I recommend keeping it as simple as possible. I would 
definitely not be supportive of a full tar clone. List, extract, create - that 
should be enough. There are two possible command line choices: do what the 
zipfile module does or emulate tar. I am in favor of the latter.

--
assignee:  -> lars.gustaebel
priority: normal -> low
stage: test needed -> needs patch

___
Python tracker 
<http://bugs.python.org/issue13477>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13407] tarfile.getnames misses members again

2011-11-15 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

Some testing reveals that the bz2 module < 3.3 cannot fully decompress the file 
in question. Only the first 900k are decompressed. Thus, this issue is not 
related to issue13158 or the tarfile module.

--
nosy: +lars.gustaebel

___
Python tracker 
<http://bugs.python.org/issue13407>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13158] tarfile.TarFile.getmembers misses some entries

2011-10-14 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

Thanks for the report. There was a problem decoding a special and rare kind of 
header field in the archive. The format of the archive is of very bad quality 
BTW ;-)

--
resolution:  -> fixed
stage:  -> committed/rejected
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue13158>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13158] tarfile.TarFile.getmembers misses some entries

2011-10-13 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
assignee:  -> lars.gustaebel
nosy: +lars.gustaebel
versions: +Python 3.3

___
Python tracker 
<http://bugs.python.org/issue13158>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue13031] [PATCH] small speed-up for tarfile.py when unzipping tarballs

2011-09-22 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
assignee:  -> lars.gustaebel
nosy: +lars.gustaebel
priority: normal -> low
versions: +Python 3.3 -Python 2.7, Python 3.2

___
Python tracker 
<http://bugs.python.org/issue13031>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue6715] xz compressor support

2011-09-15 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

Today I played around with lzma support for tarfile based on your last patch 
(see issue5689). There are a few minor issues that I just wanted to mention, as 
they break the tarfile testsuite:

- LZMAFile does not expose a name attribute. BZ2File doesn't either (not in 3.x 
anyway), but GzipFile does.
- LZMAFile does not allow a 'b' in the mode argument, unlike GzipFile and 
BZ2File.
- The bz2 module exposes many error conditions as standard Python exceptions, 
e.g. IOError, EOFError. The lzma module uses LZMAError for all errors without 
distinction.

--

___
Python tracker 
<http://bugs.python.org/issue6715>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue5689] Support xz compression in tarfile module

2011-09-15 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

Attached is a patch with the current state of my work on lzma integration into 
tarfile (17 test errors).

--
assignee:  -> lars.gustaebel
keywords: +patch
Added file: http://bugs.python.org/file23162/2011-09-15-tarfile-lzma.diff

___
Python tracker 
<http://bugs.python.org/issue5689>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12800] 'tarfile.StreamError: seeking backwards is not allowed' when extract symlink

2011-09-12 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
assignee:  -> lars.gustaebel

___
Python tracker 
<http://bugs.python.org/issue12800>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12926] tarfile tarinfo.extract*() broken with symlinks

2011-09-10 Thread Lars Gustäbel


Changes by Lars Gustäbel :


--
assignee:  -> lars.gustaebel

___
Python tracker 
<http://bugs.python.org/issue12926>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12841] Incorrect tarfile.py extraction

2011-09-09 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

It's the low-level operating system aspects of tarfile that are very difficult 
to test, e.g. filesystem and operating system dependent features such as 
symbolic links, hard links, file permissions, ownership. It is not even 
possible to reliably determine the filesystem the testsuite currently runs on. 
Also, superuser privileges are needed for some operations to work, e.g. 
chown(). A testsuite is normally not run as root, so a test that depends on 
this will never get enough coverage.

--

___
Python tracker 
<http://bugs.python.org/issue12841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue12841] Incorrect tarfile.py extraction

2011-09-05 Thread Lars Gustäbel


Lars Gustäbel  added the comment:

Close as fixed. Thanks all!

--
resolution:  -> fixed
stage:  -> committed/rejected
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue12841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

1 2 3 >

1 - 100 of 227 matches

Mail list logo