[issue7693] tarfile.extractall can't have unicode extraction path
Peter Bienstman added the comment: > Lars Gustäbel added the comment: > > So, use the pax format. It stores the filenames as utf-8 and this way you > will be on the safe side. > > I hope we both agree that the solution to your particular problem is > nothing tarfile.py can provide. If I want to extract a pax archive to a unicode path with non-latin characters, how should I encode the path before passing it to 'extractall'? would utf-8 be OK? Peter -- ___ Python tracker <http://bugs.python.org/issue7693> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7693] tarfile.extractall can't have unicode extraction path
Peter Bienstman added the comment: On Friday 15 January 2010 02:14:30 pm Lars Gustäbel wrote: > Lars Gustäbel added the comment: > > I suppose you do not have a real problem here. I thought your problem was > that you want to use unicode pathnames as input and output to tarfile. You > don't need that. > > You want to transfer an archive from one system to another. You can do that > with tarfile already. Python 3.x's tarfile does the same as Python 2.x's > tarfile, except that in 3.x *all* strings are unicode strings. > > If you have different encodings on these systems, that should not be a > problem unless these encodings are not compatible with each other. If you > want to use a tar archive created on a utf-8 system on a iso-8859-1 system > that is no problem, as long as you use the pax format and all the utf-8 > characters used are also valid iso-8859-1 characters. I think I *do* have a problem. I want to create a tar archive on one system, where the filenames could contain non latin characters. I'm sending this tar file over a socket to a different system (with potentially a different encoding), where I want to extract it to a directory which name could contain non-latin characters. -- ___ Python tracker <http://bugs.python.org/issue7693> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7693] tarfile.extractall can't have unicode extraction path
Peter Bienstman added the comment: On Friday 15 January 2010 11:51:24 am Lars Gustäbel wrote: > Lars Gustäbel added the comment: > > First, use a string pathname for extractall(). Most likely, your script is > going to work. Convert all pathnames to strings using > sys.getfilesystemencoding() before you add() them. Ensure that all systems > you are going to use the archives on have the same filesystem encoding, > e.g. utf-8. Unfortunately, that is beyond my control. Am I then totally out of luck? Would the implementation of tarfile in 3.0 be useable on 2.6 (perhaps with small modifications?) > Pax archives are probably the best choice if you plan to keep > the archives for several years. If you simply want to transfer data from > one system to the other throwing the archives away afterwards, the format > is rather irrelevant. The archives are throw-away, transfer only, but they could be used on any system. -- ___ Python tracker <http://bugs.python.org/issue7693> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7693] tarfile.extractall can't have unicode extraction path
Peter Bienstman added the comment: So what do suggest then as the best approach if I want to use unicode paths in tar files in Python 2.x in a way that is portable across different systems? -- ___ Python tracker <http://bugs.python.org/issue7693> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7693] tarfile.extractall can't have unicode extraction path
New submission from Peter Bienstman : import tarfile fname = unichr(40960) + u"a.ogg" f = file(fname, "w") f.write("A") f.close() tar_pipe = tarfile.open("test.tar", mode="w|", format=tarfile.PAX_FORMAT) tar_pipe.add(fname) tar_pipe.close() tar_pipe = tarfile.open("test.tar") tar_pipe.extractall(u".") # Just "." as string works fine. This gives: Traceback (most recent call last): File "a.py", line 15, in tar_pipe.extractall(u".") # Just "." as string works fine. File "/usr/lib/python2.6/tarfile.py", line 2031, in extractall self.extract(tarinfo, path) File "/usr/lib/python2.6/tarfile.py", line 2068, in extract self._extract_member(tarinfo, os.path.join(path, tarinfo.name)) File "/usr/lib/python2.6/posixpath.py", line 70, in join path += '/' + b UnicodeDecodeError: 'ascii' codec can't decode byte 0xea in position 1: ordinal not in range(128) -- components: Extension Modules messages: 97717 nosy: pbienst severity: normal status: open title: tarfile.extractall can't have unicode extraction path type: crash versions: Python 2.6 ___ Python tracker <http://bugs.python.org/issue7693> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com