Thank you so much. The code worked perfectly. This is what I tried using Emile code. The only time when it picked wrong name from the list was when the file was named like this.
Data Mark Stone.doc How can I fix this? Hope I am not asking too much? import os from difflib import SequenceMatcher as SM path = r'D:\Files ' txt_names = [] with open(r'D:/python/log1.txt') as f: for txt_name in f.readlines(): txt_names.append(txt_name.strip()) def ignore(x): return x in ' ,.' for filename in os.listdir(path): ratios = [SM(ignore,filename,txt_name).ratio() for txt_name in txt_names] best = max(ratios) owner = txt_names[ratios.index(best)] print filename,":",owner On Sat, 27 Aug 2011 14:08:17 -0700, Emile van Sebille <em...@fenx.com> wrote: >On 8/27/2011 1:15 PM r...@rdo.python.org said... >> >> Hello Emile , >> >> Thank you for the code below as I have not encountered SequenceMatcher >> before and would have to take a look at it closer. >> >> My question would it work for a text file list of names about 25k >> lines and a directory with say 100 files inside? > >Sure. > >Emile > > >> >> Thank you once again. >> >> >> On Sat, 27 Aug 2011 11:06:22 -0700, Emile van Sebille<em...@fenx.com> >> wrote: >> >>> On 8/27/2011 10:03 AM r...@rdo.python.org said... >>>> Hello, >>>> >>>> What would be the best way to accomplish this task? >>> >>> I'd do something like: >>> >>> >>> usernames = """Adler, Jack >>> Smith, John >>> Smith, Sally >>> Stone, Mark""".split('\n') >>> >>> filenames = """Smith, John - 02-15-75 - business files.doc >>> Random Data - Adler Jack - expenses.xls >>> More Data Mark Stone files list.doc""".split('\n') >>> >>>from difflib import SequenceMatcher as SM >>> >>> >>> def ignore(x): >>> return x in ' ,.' >>> >>> >>> for filename in filenames: >>> ratios = [SM(ignore,filename,username).ratio() for username in >>> usernames] >>> best = max(ratios) >>> owner = usernames[ratios.index(best)] >>> print filename,":",owner >>> >>> >>> Emile >>> >>> >>> >>>> I have many files in separate directories, each file name >>>> contain a persons name but never in the same spot. >>>> I need to find that name which is listed in a large >>>> text file in the following format. Last name, comma >>>> and First name. The last name could be duplicate. >>>> >>>> Adler, Jack >>>> Smith, John >>>> Smith, Sally >>>> Stone, Mark >>>> etc. >>>> >>>> >>>> The file names don't necessary follow any standard >>>> format. >>>> >>>> Smith, John - 02-15-75 - business files.doc >>>> Random Data - Adler Jack - expenses.xls >>>> More Data Mark Stone files list.doc >>>> etc >>>> >>>> I need some way to pull the name from the file name, find it in the >>>> text list and then create a directory based on the name on the list >>>> "Smith, John" and move all files named with the clients name into that >>>> directory. >>> > -- http://mail.python.org/mailman/listinfo/python-list