Τη Πέμπτη, 6 Ιουνίου 2013 1:24:16 μ.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε:
> On 05Jun2013 11:43, =?utf-8?B?zp3Or866zr/PgiDOk866z4EzM866?= 
> <nikos.gr...@gmail.com> wrote:
> 
> | Τη Τετάρτη, 5 Ιουνίου 2013 9:32:15 μ.μ. UTC+3, ο χρήστης MRAB έγραψε:
> 
> | > Using Python, I think you could get the filenames using os.listdir, 
> 
> | > passing the directory name as a bytestring so that it'll return the
> 
> | > names as bytestrings.
> 
> | 
> 
> | > Then, for each name, you could decode from its current encoding and 
> 
> | > encode to UTF-8 and rename the file, passing the old and new paths to
> 
> | > os.rename as bytestrings.
> 
> | 
> 
> | Iam not sure i follow:
> 
> | 
> 
> | Change this:
> 
> | 
> 
> | # Compute a set of current fullpaths
> 
> | fullpaths = set()
> 
> | path = "/home/nikos/public_html/data/apps/"
> 
> | 
> 
> | for root, dirs, files in os.walk(path):
> 
> [...]
> 
> 
> 
> Have a read of this:
> 
> 
> 
>   http://docs.python.org/3/library/os.html#os.listdir
> 
> 
> 
> The UNIX API accepts bytes for filenames and paths.
> 
> 
> 
> Python 3 strs are sequences of Unicode code points. If you try to
> 
> open a file or directory on a UNIX system using a Python str, that
> 
> string must be converted to a sequence of bytes before being handed
> 
> to the OS.
> 
> 
> 
> This is done implicitly using your locale settings if you just use a str.
> 
> 
> 
> However, if you pass a bytes to open or listdir, this conversion
> 
> does not take place. You put bytes in and in the case of listdir
> 
> you get bytes out.
> 
> 
> 
> You can work on pathnames in bytes and never concern yourself with
> 
> encode/decode at all.
> 
> 
> 
> In this way you can write code that does not care about the translation
> 
> between Unicode and some arbitrary byte encoding.
> 
> 
> 
> Of course, the issue will still arise when accepting user input;
> 
> your shell has done exactly this kind of thing when you renamed
> 
> your MP3 file. But it is possible to write pure utility code that
> 
> doesn't care about filenames as Unicode or str if you work purely
> 
> in bytes.



> 
> Regarding user filenames, the common policy these days is to use
> 
> utf-8 throughout. Of course you need to get everything into that
> 
> regime to start with





Τη Πέμπτη, 6 Ιουνίου 2013 1:24:16 μ.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε:
> On 05Jun2013 11:43, =?utf-8?B?zp3Or866zr/PgiDOk866z4EzM866?= 
> <nikos.gr...@gmail.com> wrote:
> 
> | Τη Τετάρτη, 5 Ιουνίου 2013 9:32:15 μ.μ. UTC+3, ο χρήστης MRAB έγραψε:
> 
> | > Using Python, I think you could get the filenames using os.listdir, 
> 
> | > passing the directory name as a bytestring so that it'll return the
> 
> | > names as bytestrings.
> 
> | 
> 
> | > Then, for each name, you could decode from its current encoding and 
> 
> | > encode to UTF-8 and rename the file, passing the old and new paths to
> 
> | > os.rename as bytestrings.
> 
> | 
> 
> | Iam not sure i follow:
> 
> | 
> 
> | Change this:
> 
> | 
> 
> | # Compute a set of current fullpaths
> 
> | fullpaths = set()
> 
> | path = "/home/nikos/public_html/data/apps/"
> 
> | 
> 
> | for root, dirs, files in os.walk(path):
> 
> [...]
> 
> 
> 
> Have a read of this:
> 
> 
> 
>   http://docs.python.org/3/library/os.html#os.listdir
> 
> 
> 
> The UNIX API accepts bytes for filenames and paths.
> 
> 
> 
> Python 3 strs are sequences of Unicode code points. If you try to
> 
> open a file or directory on a UNIX system using a Python str, that
> 
> string must be converted to a sequence of bytes before being handed
> 
> to the OS.
> 
> 
> 
> This is done implicitly using your locale settings if you just use a str.
> 
> 
> 
> However, if you pass a bytes to open or listdir, this conversion
> 
> does not take place. You put bytes in and in the case of listdir
> 
> you get bytes out.
> 
> 
> 
> You can work on pathnames in bytes and never concern yourself with
> 
> encode/decode at all.
> 
> 
> 
> In this way you can write code that does not care about the translation
> 
> between Unicode and some arbitrary byte encoding.
> 
> 
> 
> Of course, the issue will still arise when accepting user input;
> 
> your shell has done exactly this kind of thing when you renamed
> 
> your MP3 file. But it is possible to write pure utility code that
> 
> doesn't care about filenames as Unicode or str if you work purely
> 
> in bytes.
> 
> 
> 
> Regarding user filenames, the common policy these days is to use
> 
> utf-8 throughout. Of course you need to get everything into that
> 
> regime to start with.

So i i nee to use os.listdir() to grab those filenames into bytes. okey.

So by changing this to:

fullpaths = set()
path = "/home/nikos/public_html/data/apps/"

for root, dirs, files in os.walk(path):
        for fullpath in files:
                fullpaths.add( os.path.join(root, fullpath) )



# Compute a set of current fullpaths
fullpaths = os.listdir( '/home/nikos/public_html/data/apps/' )

# Load'em
for fullpath in fullpaths:
        try: 
                # Check the presence of a file against the database and insert 
if it doesn't exist
                cur.execute('''SELECT url FROM files WHERE url = %s''', 
(fullpath,) )
                data = cur.fetchone()        #URL is unique, so should only be 
one


-----------------------------
[Thu Jun 06 14:15:38 2013] [error] [client 79.103.41.173] Original exception 
was:
[Thu Jun 06 14:15:38 2013] [error] [client 79.103.41.173] Traceback (most 
recent call last):
[Thu Jun 06 14:15:38 2013] [error] [client 79.103.41.173]   File "files.py", 
line 67, in <module>
[Thu Jun 06 14:15:38 2013] [error] [client 79.103.41.173]     
cur.execute('''SELECT url FROM files WHERE url = %s''', (fullpath,) )
[Thu Jun 06 14:15:38 2013] [error] [client 79.103.41.173]   File 
"/usr/local/lib/python3.3/site-packages/PyMySQL3-0.5-py3.3.egg/pymysql/cursors.py",
 line 108, in execute
[Thu Jun 06 14:15:38 2013] [error] [client 79.103.41.173]     query = 
query.encode(charset)
[Thu Jun 06 14:15:38 2013] [error] [client 79.103.41.173] UnicodeEncodeError: 
'utf-8' codec can't encode character '\\udcc5' in position 35: surrogates not 
allowed
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to