I’m currently working on Japanese version of windows XP.

Yesterday, I tried to use python’s (ver 2.5.1.1) regular expression service to 
batch rename files in a directory, but the problem occurred: python failed to 
match file names.

 

Then I made some small scripts, and found that python can’t handle string very 
well (or maybe this is intended?).

 

The test environment:

1. An empty directory is made.

2. A file named “パイソン.txt” is created under the directory.

3. Python scripts are placed and run under the directory.

4. All python scripts are written by Notepad. Because python cannot recognize 
Unicode text file, so I test 2 file encodings: ANSI and UTF-8.

 

################ Script ############################################

# -*- encoding: shift_jis -*-

# script1.py (saved as ansi text file)

 

import os, re

 

def rename():

      pattern = 'パイソン\.txt'     # ANSI

      print 'pattern: ', pattern

 

      myre = re.compile(pattern)

      for f in os.listdir('.'):

           m = myre.match(f)

           if m != None: print f, ': match!'

           else: print f, ': doesn\'t match!'

 

rename()

################# Output ###########################################

pattern:  パイソン\.txt

パイソン.txt : doesn't match!

 

################ Script ############################################

# -*- encoding: shift_jis -*-

# script2.py (saved as ansi text file)

 

import os, re

 

def rename():

      pattern = u'パイソン\.txt'    # Unicode

      print 'pattern: ', pattern

 

      myre = re.compile(pattern)

      for f in os.listdir('.'):

           m = myre.match(f)

           if m != None: print f, ': match!'

           else: print f, ': doesn\'t match!'

 

rename()

################# Output ###########################################

pattern:  パイソン\.txt

パイソン.txt : doesn't match!

 

################ Script ############################################

# script3.py (saved as UTF-8 text file)

 

import os, re

 

def rename():

      pattern = 'パイソン\.txt'     # ANSI

      print 'pattern: ', pattern

 

      myre = re.compile(pattern)

      for f in os.listdir('.'):

           m = myre.match(f)

           if m != None: print f, ': match!'

           else: print f, ': doesn\'t match!'

 

rename()

################# Output ###########################################

pattern:  繝代う繧ス繝ウ\.txt

パイソン.txt : doesn't match!

(pattern is shown as unrecognizable characters)

 

################ Script ############################################

# script4.py (saved as UTF-8 text file)

 

import os, re

 

def rename():

      pattern = u'パイソン\.txt'    # Unicode

      print 'pattern: ', pattern

 

      myre = re.compile(pattern)

      for f in os.listdir('.'):

           m = myre.match(f)

           if m != None: print f, ': match!'

           else: print f, ': doesn\'t match!'

 

rename()

################# Output ###########################################

pattern:  パイソン\.txt

パイソン.txt : doesn't match!

 

_______________________________________________
ActivePython mailing list
ActivePython@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Other options: http://listserv.ActiveState.com/mailman/listinfo/ActivePython

Reply via email to