subject:"Re\: \[Tutor\] UTF\-8 filenames encountered in os.walk"

Re: [Tutor] UTF-8 filenames encountered in os.walk

2007-07-04 Thread William O'Higgins Witteman

On Tue, Jul 03, 2007 at 06:04:16PM -0700, Terry Carroll wrote:

 Has anyone found a silver bullet for ensuring that all the filenames
 encountered by os.walk are treated as UTF-8?  Thanks.

What happens if you specify the starting directory as a Unicode string, 
rather than an ascii string, e.g., if you're walking the current 
directory:
 
 for thing in os.walk(u'.'):

instead of:

 for thing in os.walk('.'): 

This is a good thought, and the crux of the problem.  I pull the
starting directories from an XML file which is UTF-8, but by the time it
hits my program, because there are no extended characters in the
starting path, os.walk assumes ascii.  So, I recast the string as UTF-8,
and I get UTF-8 output.  The problem happens further down the line.

I get a list of paths from the results of os.walk, all in UTF-8, but not
identified as such.  If I just pass my list to other parts of the
program it seems to assume either ascii or UTF-8, based on the
individual list elements.  If I try to cast the whole list as UTF-8, I
get an exception because it is assuming ascii and receiving UTF-8 for
some list elements.

I suspect that my program will have to make sure to recast all
equivalent-to-ascii strings as UTF-8 while leaving the ones that are
already extended alone.
-- 

yours,

William
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UTF-8 filenames encountered in os.walk

2007-07-04 Thread Kent Johnson

William O'Higgins Witteman wrote:
 for thing in os.walk(u'.'):

 instead of:

 for thing in os.walk('.'): 
 
 This is a good thought, and the crux of the problem.  I pull the
 starting directories from an XML file which is UTF-8, but by the time it
 hits my program, because there are no extended characters in the
 starting path, os.walk assumes ascii.  So, I recast the string as UTF-8,
 and I get UTF-8 output.  The problem happens further down the line.
 
 I get a list of paths from the results of os.walk, all in UTF-8, but not
 identified as such.  If I just pass my list to other parts of the
 program it seems to assume either ascii or UTF-8, based on the
 individual list elements.  If I try to cast the whole list as UTF-8, I
 get an exception because it is assuming ascii and receiving UTF-8 for
 some list elements.

FWIW, I'm pretty sure you are confusing Unicode strings and UTF-8
strings, they are not the same thing. A Unicode string uses 16 bits to
represent each character. It is a distinct data type from a 'regular'
string. Regular Python strings are byte strings with an implicit
encoding. One possible encoding is UTF-8 which uses one or more bytes to
represent each character.

Some good reading on Unicode and utf-8:
http://www.joelonsoftware.com/articles/Unicode.html
http://effbot.org/zone/unicode-objects.htm

If you pass a unicode string (not utf-8) to os.walk(), the resulting 
lists will also be unicode.

Again, it would be helpful to see the code that is getting the error.

 I suspect that my program will have to make sure to recast all
 equivalent-to-ascii strings as UTF-8 while leaving the ones that are
 already extended alone.

It is nonsense to talk about 'recasting' an ascii string as UTF-8; an 
ascii string is *already* UTF-8 because the representation of the 
characters is identical. OTOH it makes sense to talk about converting an 
ascii string to a unicode string.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UTF-8 filenames encountered in os.walk

2007-07-04 Thread William O'Higgins Witteman

On Wed, Jul 04, 2007 at 11:28:53AM -0400, Kent Johnson wrote:

FWIW, I'm pretty sure you are confusing Unicode strings and UTF-8
strings, they are not the same thing. A Unicode string uses 16 bits to
represent each character. It is a distinct data type from a 'regular'
string. Regular Python strings are byte strings with an implicit
encoding. One possible encoding is UTF-8 which uses one or more bytes to
represent each character.

Some good reading on Unicode and utf-8:
http://www.joelonsoftware.com/articles/Unicode.html
http://effbot.org/zone/unicode-objects.htm

The problem is that the Windows filesystem uses UTF-8 as the encoding
for filenames, but os doesn't seem to have a UTF-8 mode, just an ascii
mode and a Unicode mode.

If you pass a unicode string (not utf-8) to os.walk(), the resulting 
lists will also be unicode.

Again, it would be helpful to see the code that is getting the error.

The code is quite complex for not-relevant-to-this-problem reasons.  The
gist is that I walk the FS, get filenames, some of which get written to
an XML file.  If I leave the output alone I get errors on reading the
XML file.  If I try to change the output so that it is all Unicode, I
get errors because my UTF-8 data sometimes looks like ascii, and I don't
see a UTF-8-to-Unicode converter in the docs.

I suspect that my program will have to make sure to recast all
equivalent-to-ascii strings as UTF-8 while leaving the ones that are
already extended alone.

It is nonsense to talk about 'recasting' an ascii string as UTF-8; an 
ascii string is *already* UTF-8 because the representation of the 
characters is identical. OTOH it makes sense to talk about converting an 
ascii string to a unicode string.

Then what does mystring.encode(UTF-8) do?
-- 

yours,

William
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UTF-8 filenames encountered in os.walk

2007-07-04 Thread Lloyd Kvam

On Wed, 2007-07-04 at 12:00 -0400, William O'Higgins Witteman wrote:
 On Wed, Jul 04, 2007 at 11:28:53AM -0400, Kent Johnson wrote:
 
 FWIW, I'm pretty sure you are confusing Unicode strings and UTF-8
 strings, they are not the same thing. A Unicode string uses 16 bits to
 represent each character. It is a distinct data type from a 'regular'
 string. Regular Python strings are byte strings with an implicit
 encoding. One possible encoding is UTF-8 which uses one or more bytes to
 represent each character.
 
 Some good reading on Unicode and utf-8:
 http://www.joelonsoftware.com/articles/Unicode.html
 http://effbot.org/zone/unicode-objects.htm
 
 The problem is that the Windows filesystem uses UTF-8 as the encoding
 for filenames, but os doesn't seem to have a UTF-8 mode, just an ascii
 mode and a Unicode mode.

Are you converting your utf-8 strings to unicode?

unicode_file_name = utf8_file_name.decode('UTF-8')

 If you pass a unicode string (not utf-8) to os.walk(), the resulting 
 lists will also be unicode.
 
 Again, it would be helpful to see the code that is getting the error.
 
 The code is quite complex for not-relevant-to-this-problem reasons.  The
 gist is that I walk the FS, get filenames, some of which get written to
 an XML file.  If I leave the output alone I get errors on reading the
 XML file.  If I try to change the output so that it is all Unicode, I
 get errors because my UTF-8 data sometimes looks like ascii, and I don't
 see a UTF-8-to-Unicode converter in the docs.
 

It is probably worth the effort to put together a simpler piece of code
that can illustrate the problem.

 I suspect that my program will have to make sure to recast all
 equivalent-to-ascii strings as UTF-8 while leaving the ones that are
 already extended alone.
 
 It is nonsense to talk about 'recasting' an ascii string as UTF-8; an 
 ascii string is *already* UTF-8 because the representation of the 
 characters is identical. OTOH it makes sense to talk about converting an 
 ascii string to a unicode string.
 
 Then what does mystring.encode(UTF-8) do?

It uses utf8 encoding rules to convert mystring FROM unicode to a
string.  If mystring is *NOT* unicode but simply a string, it appears to
do a round trip decode and encode of the string.  This allows you to
find encoding errors, but if there are no errors the result is the same
as what you started with.

The data in a file (streams of bytes) are encoded to represent unicode
characters.  The stream must be decoded to recover the underlying
unicode.  The unicode must be encoded when written to files.  utf-8 is
just one of many possible encoding schemes.

-- 
Lloyd Kvam
Venix Corp

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UTF-8 filenames encountered in os.walk

2007-07-04 Thread Terry Carroll

On Wed, 4 Jul 2007, William O'Higgins Witteman wrote:

 It is nonsense to talk about 'recasting' an ascii string as UTF-8; an 
 ascii string is *already* UTF-8 because the representation of the 
 characters is identical. OTOH it makes sense to talk about converting an 
 ascii string to a unicode string.
 
 Then what does mystring.encode(UTF-8) do?

I'm pretty iffy on this stuff myself, but as I see it, you basically have 
three kinds of things here.

First, an ascii string:

  s = 'abc'

In hex, this is 616263; 61 for 'a'; 62 for 'b', 63 for 'c'.

Second, a unicode string:

  u = u'abc' 

I can't say what this is in hex because that's not meaningful.  A 
Unicode character is a code point, which can be represented in a variety 
of ways, depending on the encoding used.  So, moving on

Finally, you can have a sequence of bytes, which are stored in a string as 
a buffer, that shows the particular encoding of a particular string:

  e8 = s.encode(UTF-8)
  e16 = s.encode(UTF-16) 

Now, e8 and e16 are each strings (of bytes), the content of which tells
you how the string of characters that was encoded is represented in that 
particular encoding.

In hex, these look like this.

  e8: 616263 (61 for 'a'; 62 for 'b', 63 for 'c')
  e16: FFFE6100 62006300
 (FFEE for the BOM, 6100 for 'a', 6200 for 'b', 6300 for 'c')

Now, superficially, s and e8 are equal, because for plain old ascii 
characters (which is all I've used in this example), UTF-8 is equivalent 
to ascii.  And they compare the same:

 s == e8
True

But that's not true of the UTF-16:

 s == e16
False
 e8 == e16
False

So (and I'm open to correction on this), I think of the encode() method as 
returning a string of bytes that represents the particular encoding of a 
string value -- and it can't be used as the string value itself.

But you can get that string value back (assuming all the characters map 
to ascii):

 s8 = e8.decode(UTF-8)
 s16 = e16.decode(UTF-16)
 s == s8 == s16
True



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UTF-8 filenames encountered in os.walk

2007-07-04 Thread Kent Johnson

William O'Higgins Witteman wrote:

 The problem is that the Windows filesystem uses UTF-8 as the encoding
 for filenames,

That's not what I get. For example, I made a file called Tést.txt and 
looked at what os.listdir() gives me. (os.listdir() is what os.walk() 
uses to get the file and directory names.) If I pass a byte string as 
the directory name, I get byte strings back, not in utf-8, but 
apparently in cp1252 (or latin-1, but this is Windows so it's probably 
cp1252):
  os.listdir('C:\Documents and Settings')
['Administrator', 'All Users', 'Default User', 'LocalService', 
'NetworkService', 'T\xe9st.txt']

Note the \xe9 which is the cp1252 representation of é.

If I give the directory as a unicode string, the results are all unicode 
strings as well:
  os.listdir(u'C:\Documents and Settings')
[u'Administrator', u'All Users', u'Default User', u'LocalService', 
u'NetworkService', u'T\xe9st.txt']

In neither case does it give me utf-8.

  but os doesn't seem to have a UTF-8 mode, just an ascii
  mode and a Unicode mode.

It has a unicode string mode and a byte string mode.

 The code is quite complex for not-relevant-to-this-problem reasons.  The
 gist is that I walk the FS, get filenames, some of which get written to
 an XML file.  If I leave the output alone I get errors on reading the
 XML file.  

What kind of errors? Be specific! Show the code that generates the error.

I'll hazard a guess that you are writing the cp1252 characters to the 
XML file but not specifying the charset of the file, or specifying it as 
utf-8, and the reader croaks on the cp1252.

  If I try to change the output so that it is all Unicode, I
  get errors because my UTF-8 data sometimes looks like ascii,

How do you change the output? What do you mean, the utf-8 data looks 
like ascii? Ascii data *is* utf-8, they should look the same.

  I don't
  see a UTF-8-to-Unicode converter in the docs.

If s is a byte string containing utf-8, then s.decode('utf-8') is the 
equivalent unicode string.

 I suspect that my program will have to make sure to recast all
 equivalent-to-ascii strings as UTF-8 while leaving the ones that are
 already extended alone.
 It is nonsense to talk about 'recasting' an ascii string as UTF-8; an 
 ascii string is *already* UTF-8 because the representation of the 
 characters is identical. OTOH it makes sense to talk about converting an 
 ascii string to a unicode string.
 
 Then what does mystring.encode(UTF-8) do?

It depends on what mystring is. If it is a unicode string, it converts 
it to a plain (byte) string containing the utf-8 representation of 
mystring. For example,
In [8]: s=u'\xe9'  # Note the leading u - this is a unicode string
In [9]: s.encode('utf-8')
Out[9]: '\xc3\xa9'


If mystring is a string, it is converted to a unicode string using the 
default encoding (ascii unless you have changed it), then that string is 
converted to utf-8. This can work out two ways:
- if mystring originally contained only ascii characters, the result is 
identical to the original:
In [1]: s='abc'
In [2]: s.encode('utf-8')
Out[2]: 'abc'
In [4]: s.encode('utf-8') == s
Out[4]: True

- if mystring contains non-ascii characters, then the implicit *decode* 
using the ascii codec will fail with an exception:
In [5]: s = '\303\251'
In [6]: s.encode('utf-8')

Traceback (most recent call last):
   File ipython console, line 1, in module
type 'exceptions.UnicodeDecodeError': 'ascii' codec can't decode byte 
0xc3 in position 0: ordinal not in range(128)

Note this is exactly the same error you would get if you explicitly 
tried to convert to unicode using the ascii codec, because that is what 
is happening under the hood:

In [11]: s.decode('ascii')

Traceback (most recent call last):
   File ipython console, line 1, in module
type 'exceptions.UnicodeDecodeError': 'ascii' codec can't decode byte 
0xc3 in position 0: ordinal not in range(128)

Again, it would really help if you would
- show some code
- show some data
- learn more about unicode, utf-8, character encodings and python strings.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UTF-8 filenames encountered in os.walk

2007-07-04 Thread Kent Johnson

Terry Carroll wrote:
 I'm pretty iffy on this stuff myself, but as I see it, you basically have 
 three kinds of things here.
 
 First, an ascii string:
 
   s = 'abc'
 
 In hex, this is 616263; 61 for 'a'; 62 for 'b', 63 for 'c'.
 
 Second, a unicode string:
 
   u = u'abc' 
 
 I can't say what this is in hex because that's not meaningful.  A 
 Unicode character is a code point, which can be represented in a variety 
 of ways, depending on the encoding used.  So, moving on
 
 Finally, you can have a sequence of bytes, which are stored in a string as 
 a buffer, that shows the particular encoding of a particular string:
 
   e8 = s.encode(UTF-8)
   e16 = s.encode(UTF-16) 
 
 Now, e8 and e16 are each strings (of bytes), the content of which tells
 you how the string of characters that was encoded is represented in that 
 particular encoding.

I would say that there are two kinds of strings, byte strings and 
unicode strings. Byte strings have an implicit encoding. If the contents 
of the byte string are all ascii characters, you can generally get away 
with ignoring that they are in an encoding, because most of the common 
8-bit character encodings include plain ascii as a subset (all the 
latin-x encodings, all the Windows cp12xx encodings, and utf-8 all have 
ascii as a subset), so an ascii string can be interpreted as any of 
those encodings without error. As soon as you get away from ascii, you 
have to be aware of the encoding of the string.

encode() really wants a unicode string not a byte string. If you call 
encode() on a byte string, the string is first converted to unicode 
using the default encoding (usually ascii), then converted with the 
given encoding.
 
 In hex, these look like this.
 
   e8: 616263 (61 for 'a'; 62 for 'b', 63 for 'c')
   e16: FFFE6100 62006300
  (FFEE for the BOM, 6100 for 'a', 6200 for 'b', 6300 for 'c')
 
 Now, superficially, s and e8 are equal, because for plain old ascii 
 characters (which is all I've used in this example), UTF-8 is equivalent 
 to ascii.  And they compare the same:
 
 s == e8
 True

They are equal in every sense, I don't know why you consider this 
superficial. And if your original string was not ascii the encode() 
would fail with a UnicodeDecodeError.
 
 But that's not true of the UTF-16:
 
 s == e16
 False
 e8 == e16
 False
 
 So (and I'm open to correction on this), I think of the encode() method as 
 returning a string of bytes that represents the particular encoding of a 
 string value -- and it can't be used as the string value itself.

The idea that there is somehow some kind of string value that doesn't 
have an encoding will bring you a world of hurt as soon as you venture 
out of the realm of pure ascii. Every string is a particular encoding of 
character values. It's not any different from the string value itself.
 
 But you can get that string value back (assuming all the characters map 
 to ascii):
 
 s8 = e8.decode(UTF-8)
 s16 = e16.decode(UTF-16)
 s == s8 == s16
 True

You can get back to the ascii-encoded representation of the string. 
Though here you are hiding something - s8 and s16 are unicode strings 
while s is a byte string.

In [13]: s = 'abc'
In [14]: e8 = s.encode(UTF-8)
In [15]: e16 = s.encode(UTF-16)
In [16]: s8 = e8.decode(UTF-8)
In [17]: s16 = e16.decode(UTF-16)
In [18]: s8
Out[18]: u'abc'
In [19]: s16
Out[19]: u'abc'
In [20]: s
Out[20]: 'abc'
In [21]: type(s8) == type(s)
Out[21]: False

The way I think of it is, unicode is the pure representation of the 
string. (This is nonsense, I know, but I find it a convenient mnemonic.) 
encode() converts from the pure representation to an encoded 
representation. The encoding can be ascii, latin-1, utf-8... decode() 
converts from the coded representation back to the pure one.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UTF-8 filenames encountered in os.walk

2007-07-04 Thread William O'Higgins Witteman

On Wed, Jul 04, 2007 at 02:47:45PM -0400, Kent Johnson wrote:

encode() really wants a unicode string not a byte string. If you call 
encode() on a byte string, the string is first converted to unicode 
using the default encoding (usually ascii), then converted with the 
given encoding.

Aha!  That helps.  Something else that helps is that my Python code is
generating output that is received by several other tools.  Interesting
facts:

Not all .NET XML parsers (nor IE6) accept valid UTF-8 XML.
I am indeed seeing filenames in cp1252, even though the Microsoft docs
say that filenames are in UTF-8.

Filenames in Arabic are in UTF-8.

What I have to do is to check the encoding of the filename as received
by os.walk (and thus os.listdir) and convert them to Unicode, continue
to process them, and then encode them as UTF-8 for output to XML.

In trying to work around bad 3rd party tools and inconsistent data I
introduced errors in my Python code.  The problem was in treating all
filenames the same way, when they were not being created the same way by
the filesystem.

Thanks for all the help and suggestions.
-- 

yours,

William
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UTF-8 filenames encountered in os.walk

2007-07-04 Thread Kent Johnson

William O'Higgins Witteman wrote:
 On Wed, Jul 04, 2007 at 02:47:45PM -0400, Kent Johnson wrote:
 
 encode() really wants a unicode string not a byte string. If you call 
 encode() on a byte string, the string is first converted to unicode 
 using the default encoding (usually ascii), then converted with the 
 given encoding.
 
 Aha!  That helps.  Something else that helps is that my Python code is
 generating output that is received by several other tools.  Interesting
 facts:
 
 Not all .NET XML parsers (nor IE6) accept valid UTF-8 XML.

Yikes! Are you sure it isn't a problem with your XML?

 I am indeed seeing filenames in cp1252, even though the Microsoft docs
 say that filenames are in UTF-8.
 
 Filenames in Arabic are in UTF-8.

Not on my computer (Win XP) in os.listdir(). With filenames of Tést.txt 
and ق.txt (that's \u0642, an Arabic character), os.listdir() gives me
  os.listdir('.')
['Administrator', 'All Users', 'Default User', 'LocalService', 
'NetworkService', 'T\xe9st.txt', '?.txt']
  os.listdir(u'.')
[u'Administrator', u'All Users', u'Default User', u'LocalService', 
u'NetworkService', u'T\xe9st.txt', u'\u0642.txt']

So with a byte string directory it fails, with a unicode directory it 
gives unicode, not utf-8.

 What I have to do is to check the encoding of the filename as received
 by os.walk (and thus os.listdir) and convert them to Unicode, continue
 to process them, and then encode them as UTF-8 for output to XML.

How do you do that? AFAIK there is no completely reliable way to 
determine the encoding of a byte string by looking at it; the most 
common approach is to try to find one that successfully decodes the 
string; more sophisticated variations look at the distribution of 
character codes.

Anyway if you use the Unicode file names you shouldn't have to worry 
about this.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UTF-8 filenames encountered in os.walk

2007-07-04 Thread Terry Carroll

On Wed, 4 Jul 2007, Kent Johnson wrote:

 Terry Carroll wrote:
  Now, superficially, s and e8 are equal, because for plain old ascii 
  characters (which is all I've used in this example), UTF-8 is equivalent 
  to ascii.  And they compare the same:
  
  s == e8
  True
 
 They are equal in every sense, I don't know why you consider this 
 superficial. And if your original string was not ascii the encode() 
 would fail with a UnicodeDecodeError.

Superficial in the sense that I was using only characters in the ascii
character set, so that the same byte encoding in UTF-8.

so: 

 'abc'.decode(UTF-8)
u'abc'

works

But UTF-8 can hold other characters, too; for example

 '\xe4\xba\xba'.decode(UTF-8)
u'\u4eba'

(Chinese character for person)

I'm just saying that UTF-8 encodes ascii characters to themselves; but 
UTF-8 is not the same as ascii.

I think we're ultimately saying the same thing; to merge both our ways of
putting it, I think, is that ascii will map to UTF-8 identically; but
UTF-8 may map back or it will raise UnicodeDecodeError.

I just didn't want to leave the impression Yeah, UTF-8  ascii, they're
the same thing.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UTF-8 filenames encountered in os.walk

2007-07-04 Thread Kent Johnson

Terry Carroll wrote:
 I'm just saying that UTF-8 encodes ascii characters to themselves; but 
 UTF-8 is not the same as ascii.
 
 I think we're ultimately saying the same thing; to merge both our ways of
 putting it, I think, is that ascii will map to UTF-8 identically; but
 UTF-8 may map back or it will raise UnicodeDecodeError.
 
 I just didn't want to leave the impression Yeah, UTF-8  ascii, they're
 the same thing.

I hope neither of us gave that impression! I think you are right, we 
just have different ways of thinking about it. Any ascii string is also 
a valid utf-8 string (and latin-1, and many other encodings), but the 
opposite is not true.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UTF-8 filenames encountered in os.walk

2007-07-03 Thread Alan Gauld


William O'Higgins Witteman [EMAIL PROTECTED] wrote

I have several programs which traverse a Windows filesystem with 
French
 characters in the filenames.

I suspect you need to set the Locale at the top of your file.

Do a search for locale in this lists archive where we had a
thread on this a few months ago.

HTH,

Alan G 


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UTF-8 filenames encountered in os.walk

2007-07-03 Thread Kent Johnson

Alan Gauld wrote:
 William O'Higgins Witteman [EMAIL PROTECTED] wrote
 
 I have several programs which traverse a Windows filesystem with 
 French
 characters in the filenames.
 
 I suspect you need to set the Locale at the top of your file.

Do you mean the
# -*- coding: encoding-name -*-
comment? That only affects the encoding of the source file itself.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UTF-8 filenames encountered in os.walk

2007-07-03 Thread Alan Gauld


Kent Johnson [EMAIL PROTECTED] wrote

 I suspect you need to set the Locale at the top of your file.
 
 Do you mean the
 # -*- coding: encoding-name -*-
 comment? That only affects the encoding of the source file itself.

No, I meant the Locale but I got it mixed up with the encoding
in how it is set. Oops!

Alan G.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] UTF-8 filenames encountered in os.walk

Re: [Tutor] UTF-8 filenames encountered in os.walk

Re: [Tutor] UTF-8 filenames encountered in os.walk

Re: [Tutor] UTF-8 filenames encountered in os.walk

Re: [Tutor] UTF-8 filenames encountered in os.walk

Re: [Tutor] UTF-8 filenames encountered in os.walk

Re: [Tutor] UTF-8 filenames encountered in os.walk

Re: [Tutor] UTF-8 filenames encountered in os.walk

Re: [Tutor] UTF-8 filenames encountered in os.walk

Re: [Tutor] UTF-8 filenames encountered in os.walk

Re: [Tutor] UTF-8 filenames encountered in os.walk

Re: [Tutor] UTF-8 filenames encountered in os.walk

Re: [Tutor] UTF-8 filenames encountered in os.walk

Re: [Tutor] UTF-8 filenames encountered in os.walk

14 matches

Site Navigation

Mail list logo

Footer information