Hi,

I took a look at the unicode handling in freevo, and attached is a small patch 
to fix a couple of things:

1. urllib.quote() only handles strings, not unicode strings

2. Add sitecustomize.py to set the freevo default encoding to 'utf-8', so that 
each "".encode() call does not need to specify it.  Note that encoding should 
not be hard coded all over the place, there are things like:

search_string = '%s %s' % (artist.encode('latin-1'), album.encode('latin-1'))

in many places.  The String() and Unicode() helper functions should usually be 
used instead when necessary.

3. I added another fallback to Unicode() helper function to use 'iso-8859-15' 
if the encoding to unicode fails with the default (utf-8).  This I did to 
handle the filenames.  Problem with filesystems is that those are not usually 
unicode aware.  That means the user's locale defines how the filenames are 
encoded.  So if my locale is iso-8859-15, a name like "tämä" will have 
different bytes on disk than if my locale is "utf-8".  This happens probably 
most often when moving files between machines having different locales, but 
you can simulate the effect with something like:

os.mkdir("Andr\xe9".decode("latin-1").encode("latin-1"))
os.mkdir("Andr\xe9".decode("latin-1").encode("utf-8"))

That will give you two directories "André", but with different encodings.


Dealing with unicode and different encodings can be sometimes confusing.
Personally I find it helpful to think of it as follows:

There is usually a pair, unicode string and raw string. The unicode string 
includes metadata, it knows about its encoding.  The raw python string is 
just a bytestream.  The convention is that the bytestream contains ascii, but 
it can contain anything.

So unicode("abc") will take the bytestream "abc" and turn it to a unicode 
string, and all is well as the default encoding is ascii.

Now unicode ("äläpäs") will fail, unless you have changed the default 
encoding.  It is equivalent to "äläpäs".decode().  But the string is not pure 
ascii, and thus it bails out with something like "UnicodeDecodeError: 'ascii' 
codec can't decode byte ..."

So if you have a raw python string containing anything more exotic than ascii, 
and you want to convert it to unicode, you must explicitly tell the encoding 
of the string.  You can also change the default encoding from ascii to 
something else, but only in site.py or sitecustomize.py

Another curve ball is the user locale, what you type in the terminal can look 
the same to you, but without checking it is impossible to say if the 
representation on disk will be the same.  For instance, Mandriva 2007 seems 
to default to utf-8 encoding, and that will result in filenames with accents 
being different.  Furthermore, if you write something using utf-8 in kwrite, 
and give the resulting file to your friend who is using latin-1, the contents 
will not render properly -- he will not see your accents before changing his 
encoding.


Hope this helps,

Harri
Index: src/www/htdocs/library.rpy
===================================================================
--- src/www/htdocs/library.rpy	(revision 8806)
+++ src/www/htdocs/library.rpy	(working copy)
@@ -344,7 +344,6 @@
             # get me the directories to output
             directorylist = util.getdirnames(String(action_dir))
             for mydir in directorylist:
-                mydir = Unicode(mydir)
                 fv.tableRowOpen('class="chanrow"')
                 mydispdir = os.path.basename(mydir)
                 mydirlink = '<a href="'+ action_script +'?media='+action_mediatype+'&dir='+urllib.quote(mydir)+'">'+mydispdir+'</a>'
Index: src/sitecustomize.py
===================================================================
--- src/sitecustomize.py	(revision 0)
+++ src/sitecustomize.py	(revision 0)
@@ -0,0 +1,12 @@
+# -*- coding: iso-8859-1 -*-
+# -----------------------------------------------------------------------
+# sitecustomize.py - Automatically imported if present
+# Set the default encoding for freevo to be more 
+# generic that the default ascii. 
+# See http://docs.python.org/lib/module-site.html
+# -----------------------------------------------------------------------
+# $Id$
+
+import sys
+
+sys.setdefaultencoding('utf-8')
Index: src/util/__init__.py
===================================================================
--- src/util/__init__.py	(revision 8806)
+++ src/util/__init__.py	(working copy)
@@ -54,9 +54,12 @@
                 try:
                     return unicode(string, config.LOCALE)
                 except Exception, e:
-                    print 'Error: Could not convert %s to unicode' % repr(string)
-                    print 'tried encoding %s and %s' % (encoding, config.LOCALE)
-                    print e
+                    try:
+                        return unicode(string, "iso-8859-15")
+                    except Exception, e:
+                        print 'Error: Could not convert %s to unicode' % repr(string)
+                        print 'tried encoding %s and %s' % (encoding, config.LOCALE)
+                        print e
         elif string.__class__ != unicode:
             return unicode(str(string), config.LOCALE)
         
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Freevo-users mailing list
Freevo-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/freevo-users

Reply via email to