I’m currently working on Japanese version of windows XP. Yesterday, I tried to use python’s (ver 2.5.1.1) regular expression service to batch rename files in a directory, but the problem occurred: python failed to match file names.
Then I made some small scripts, and found that python can’t handle string very well (or maybe this is intended?). The test environment: 1. An empty directory is made. 2. A file named “パイソン.txt” is created under the directory. 3. Python scripts are placed and run under the directory. 4. All python scripts are written by Notepad. Because python cannot recognize Unicode text file, so I test 2 file encodings: ANSI and UTF-8. ################ Script ############################################ # -*- encoding: shift_jis -*- # script1.py (saved as ansi text file) import os, re def rename(): pattern = 'パイソン\.txt' # ANSI print 'pattern: ', pattern myre = re.compile(pattern) for f in os.listdir('.'): m = myre.match(f) if m != None: print f, ': match!' else: print f, ': doesn\'t match!' rename() ################# Output ########################################### pattern: パイソン\.txt パイソン.txt : doesn't match! ################ Script ############################################ # -*- encoding: shift_jis -*- # script2.py (saved as ansi text file) import os, re def rename(): pattern = u'パイソン\.txt' # Unicode print 'pattern: ', pattern myre = re.compile(pattern) for f in os.listdir('.'): m = myre.match(f) if m != None: print f, ': match!' else: print f, ': doesn\'t match!' rename() ################# Output ########################################### pattern: パイソン\.txt パイソン.txt : doesn't match! ################ Script ############################################ # script3.py (saved as UTF-8 text file) import os, re def rename(): pattern = 'パイソン\.txt' # ANSI print 'pattern: ', pattern myre = re.compile(pattern) for f in os.listdir('.'): m = myre.match(f) if m != None: print f, ': match!' else: print f, ': doesn\'t match!' rename() ################# Output ########################################### pattern: 繝代う繧ス繝ウ\.txt パイソン.txt : doesn't match! (pattern is shown as unrecognizable characters) ################ Script ############################################ # script4.py (saved as UTF-8 text file) import os, re def rename(): pattern = u'パイソン\.txt' # Unicode print 'pattern: ', pattern myre = re.compile(pattern) for f in os.listdir('.'): m = myre.match(f) if m != None: print f, ': match!' else: print f, ': doesn\'t match!' rename() ################# Output ########################################### pattern: パイソン\.txt パイソン.txt : doesn't match!
_______________________________________________ ActivePython mailing list ActivePython@listserv.ActiveState.com To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs Other options: http://listserv.ActiveState.com/mailman/listinfo/ActivePython