[issue7693] tarfile.extractall can't have unicode extraction path

2010-01-17 Thread Peter Bienstman

Peter Bienstman  added the comment:

> Lars Gustäbel  added the comment:
> 
> So, use the pax format. It stores the filenames as utf-8 and this way you
>  will be on the safe side.
> 
> I hope we both agree that the solution to your particular problem is
>  nothing tarfile.py can provide.

If I want to extract a pax archive to a unicode path with non-latin 
characters, how should I encode the path before passing it to 'extractall'? 
would utf-8 be OK?

Peter

--

___
Python tracker 
<http://bugs.python.org/issue7693>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7693] tarfile.extractall can't have unicode extraction path

2010-01-15 Thread Peter Bienstman

Peter Bienstman  added the comment:

On Friday 15 January 2010 02:14:30 pm Lars Gustäbel wrote:
> Lars Gustäbel  added the comment:
> 
> I suppose you do not have a real problem here. I thought your problem was
>  that you want to use unicode pathnames as input and output to tarfile. You
>  don't need that.
> 
> You want to transfer an archive from one system to another. You can do that
>  with tarfile already. Python 3.x's tarfile does the same as Python 2.x's
>  tarfile, except that in 3.x *all* strings are unicode strings.
> 
> If you have different encodings on these systems, that should not be a
>  problem unless these encodings are not compatible with each other. If you
>  want to use a tar archive created on a utf-8 system on a iso-8859-1 system
>  that is no problem, as long as you use the pax format and all the utf-8
>  characters used are also valid iso-8859-1 characters.

I think I *do* have a problem. I want to create a tar archive on one system, 
where the filenames could contain non latin characters. I'm sending this tar 
file over a socket to a different system (with potentially a different 
encoding), 
where I want to extract it to a directory which name could contain non-latin 
characters.

--

___
Python tracker 
<http://bugs.python.org/issue7693>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7693] tarfile.extractall can't have unicode extraction path

2010-01-15 Thread Peter Bienstman

Peter Bienstman  added the comment:

On Friday 15 January 2010 11:51:24 am Lars Gustäbel wrote:
> Lars Gustäbel  added the comment:
> 
> First, use a string pathname for extractall(). Most likely, your script is
>  going to work. Convert all pathnames to strings using
>  sys.getfilesystemencoding() before you add() them. Ensure that all systems
>  you are going to use the archives on have the same filesystem encoding,
>  e.g. utf-8. 

Unfortunately, that is beyond my control. Am I then totally out of luck? Would 
the implementation of tarfile in 3.0 be useable on 2.6 (perhaps with small 
modifications?)

>  Pax archives are probably the best choice if you plan to keep
>  the archives for several years. If you simply want to transfer data from
>  one system to the other throwing the archives away afterwards, the format
>  is rather irrelevant.

The archives are throw-away, transfer only, but they could be used on any 
system.

--

___
Python tracker 
<http://bugs.python.org/issue7693>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7693] tarfile.extractall can't have unicode extraction path

2010-01-15 Thread Peter Bienstman

Peter Bienstman  added the comment:

So what do suggest then as the best approach if I want to use unicode paths in 
tar files in Python 2.x in a way that is portable across different systems?

--

___
Python tracker 
<http://bugs.python.org/issue7693>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7693] tarfile.extractall can't have unicode extraction path

2010-01-13 Thread Peter Bienstman

New submission from Peter Bienstman :

import tarfile

fname = unichr(40960) + u"a.ogg"

f = file(fname, "w")
f.write("A")
f.close()

tar_pipe = tarfile.open("test.tar", mode="w|",
format=tarfile.PAX_FORMAT)
tar_pipe.add(fname)
tar_pipe.close()

tar_pipe = tarfile.open("test.tar")
tar_pipe.extractall(u".") # Just "." as string works fine.

This gives:

Traceback (most recent call last):
  File "a.py", line 15, in 
tar_pipe.extractall(u".") # Just "." as string works fine.
  File "/usr/lib/python2.6/tarfile.py", line 2031, in extractall
self.extract(tarinfo, path)
  File "/usr/lib/python2.6/tarfile.py", line 2068, in extract
self._extract_member(tarinfo, os.path.join(path, tarinfo.name))
  File "/usr/lib/python2.6/posixpath.py", line 70, in join
path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xea in position 1: ordinal 
not in range(128)

--
components: Extension Modules
messages: 97717
nosy: pbienst
severity: normal
status: open
title: tarfile.extractall can't have unicode extraction path
type: crash
versions: Python 2.6

___
Python tracker 
<http://bugs.python.org/issue7693>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com