Re: Pattern Matching Given # of Characters and no String Input; use RegularExpressions?
Hello! I was trying to create a program to search for the largest common subsetstring among filenames in a directory, them move the filenames to the substring's name. I have succeeded, with help, in doing so and here is the code. Thanks for your help! --- Code --- #This program was created with feed back from: smeghead and sirup plus aum of I2P; and also tiissa and John Machin of comp.lang.python #Thank you very much. #I still get the odd error in this, but it was 1 out of 2500 files successfully sorted. Make sure you have a directory under c:/test/ called 'aa' and have your #I release this code into the public domain :o), send feed back to [EMAIL PROTECTED] files in c:/test/ import pickle import os import shutil os.chdir ( '/test') =2 aa='aa' x=0 y=20 while y <> 2: print y List = [] for fileName in os.listdir ( '/test/' ): Directory = fileName List.append(Directory) List.append("A") List.sort() List.append("Z") ListLength = len(List) - 1 x = 0 while x < ListLength: ListLength = len(List) - 1 b = List[x] c = List[x + 1] backward1 = List[x - 1] d = b[:y] e = c[:y] backward2 = backward1[:y] f = str(d) g = str(e) backward3 = str(backward2) if f==g: if os.path.isdir (aa+"/"+f) == True: shutil.move(b,aa+"/"+f) else: os.mkdir(aa+"/"+f) #os.mkdir(f) shutil.move(b,aa+"/"+f) else: if f==backward3: if os.path.isdir (aa+"/"+f) == True: shutil.move(b,aa+"/"+f) else: os.mkdir(aa+"/"+f) #os.mkdir(f) shutil.move(b,aa+"/"+f) else: =3 x = x + 1 y = y - 1 --- End Code --- [EMAIL PROTECTED] (Synonymous) wrote in message news:<[EMAIL PROTECTED]>... > Hello, > > Can regular expressions compare file names to one another. It seems RE > can only compare with input i give it, while I want it to compare > amongst itself and give me matches if the first x characters are > similiar. > > For example: > > cccat > cccap > cccan > dddfa > dddfg > dddfz > > Would result in the 'ddd' and the 'ccc' being grouped together if I > specified it to look for a match of the first 3 characters. > > What I am trying to do is build a script that will automatically > create directories based on duplicates like this starting with say 10 > characters, and going down to 1. This way "Vacation1.jpg, > Vacation2.jpg" would be sent to its own directory (if i specifiy the > first 8 characters being similiar) and "Cat1.jpg, Cat2.jpg" would > (with 3) as well. > > Thanks for your help and interest! > > S M -- http://mail.python.org/mailman/listinfo/python-list
Re: Pattern Matching Given # of Characters and no String Input; use RegularExpressions?
tiissa <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>... > Synonymous wrote: > > tiissa <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>... > > > >>tiissa wrote: > >> > >>>If you know the number of characters to match can't you just compare > >>>slices? > >> > >>If you don't, you can still do it by hand: > >> > >>In [7]: def cmp(s1,s2): > >> : diff_map=[chr(s1[i]!=s2[i]) for i in range(min(len(s1), > >>len(s2)))] > >> : diff_index=''.join(diff_map).find(chr(True)) > >> : if -1==diff_index: > >> : return min(len(s1), len(s2)) > >> : else: > >> : return diff_index > >> : > > > > I will look at that, although if i have 300 images i dont want to type > > all the comparisons (In [9]: cmp('ccc','cccap')) by hand, it would > > just be easier to sort them then :). > > I didn't meant you had to type it by hand. I thought about writing a > small script (as opposed to using some in the standard tools). It might > look like: > > In [22]: def make_group(L): > : root,res='',[] > : for i in range(1,len(L)): > : if ''==root: > : root=L[i][:cmp(L[i-1],L[i])] > : if ''==root: > : res.append((L[i-1],[L[i-1]])) > : else: > : res.append((root,[L[i-1],L[i]])) > : elif len(root)==cmp(root,L[i]): > : res[-1][1].append(L[i]) > : else: > : root='' > : if ''==root: > : res.append((L[-1],[L[-1]])) > : return res > : > > In [23]: L=['cccat','cccap','cccan','dddfa','dddfg','dddfz'] > > In [24]: L.sort() > > In [25]: make_group(L) > Out[25]: [('ccca', ['cccan', 'cccap', 'cccat']), ('dddf', ['dddfa', > 'dddfg', 'dddfz'])] > > > However I guarantee no optimality in the number of classes (but, hey, > that's when you don't specify the size of the prefix). > (Actually, I guarantee nothing at all ;p) > But in particular, you can have some file singled out: > > In [26]: make_group(['cccan','cccap','cccat','cccb']) > Out[26]: [('ccca', ['cccan', 'cccap', 'cccat']), ('cccb', ['cccb'])] > > > It is a matter of choice: either you want to specify by hand the size of > the prefix and you'd rather look at itertools as pointed out by Kent, or > you don't and a variation with the above code might do the job. Thank you, that is very kool I found out how to copy files finally with shutil too, so i'm getting close to doing something. Going to be working on an old computer, playing with files = dangerous lol. Thanks for your help and taking the time to post! Bye :o) S M -- http://mail.python.org/mailman/listinfo/python-list
Re: Pattern Matching Given # of Characters and no String Input; use RegularExpressions?
John Machin <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>... > On 17 Apr 2005 18:12:19 -0700, [EMAIL PROTECTED] (Synonymous) > wrote: > > > > >I will look for a Left$(str) function that looks at the first X > >characters for python :)). > > > > Wild goose chase alert! AFAIK there isn't one. Python uses slice > notation instead of left/mid/right/substr/whatever functions. I do > suggest that instead of looking for such a beastie, you read this > section of the Python Tutorial: 3.1.2 Strings. > > Then, if you think that that was a good use of your time, you might > like to read the *whole* tutorial :)) Haha it always comes down to RTFM i guess, which is always the best advice :o). Thank you for your help, Now that I think about it I guess string is exactly what I am looking for because even though I am using file names I am treating them like strings when comparing them. Byebye :o) S M -- http://mail.python.org/mailman/listinfo/python-list
Re: Pattern Matching Given # of Characters and no String Input; use RegularExpressions?
Synonymous wrote: tiissa <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>... tiissa wrote: If you know the number of characters to match can't you just compare slices? If you don't, you can still do it by hand: In [7]: def cmp(s1,s2): : diff_map=[chr(s1[i]!=s2[i]) for i in range(min(len(s1), len(s2)))] : diff_index=''.join(diff_map).find(chr(True)) : if -1==diff_index: : return min(len(s1), len(s2)) : else: : return diff_index : I will look at that, although if i have 300 images i dont want to type all the comparisons (In [9]: cmp('ccc','cccap')) by hand, it would just be easier to sort them then :). I didn't meant you had to type it by hand. I thought about writing a small script (as opposed to using some in the standard tools). It might look like: In [22]: def make_group(L): : root,res='',[] : for i in range(1,len(L)): : if ''==root: : root=L[i][:cmp(L[i-1],L[i])] : if ''==root: : res.append((L[i-1],[L[i-1]])) : else: : res.append((root,[L[i-1],L[i]])) : elif len(root)==cmp(root,L[i]): : res[-1][1].append(L[i]) : else: : root='' : if ''==root: : res.append((L[-1],[L[-1]])) : return res : In [23]: L=['cccat','cccap','cccan','dddfa','dddfg','dddfz'] In [24]: L.sort() In [25]: make_group(L) Out[25]: [('ccca', ['cccan', 'cccap', 'cccat']), ('dddf', ['dddfa', 'dddfg', 'dddfz'])] However I guarantee no optimality in the number of classes (but, hey, that's when you don't specify the size of the prefix). (Actually, I guarantee nothing at all ;p) But in particular, you can have some file singled out: In [26]: make_group(['cccan','cccap','cccat','cccb']) Out[26]: [('ccca', ['cccan', 'cccap', 'cccat']), ('cccb', ['cccb'])] It is a matter of choice: either you want to specify by hand the size of the prefix and you'd rather look at itertools as pointed out by Kent, or you don't and a variation with the above code might do the job. -- http://mail.python.org/mailman/listinfo/python-list
Re: Pattern Matching Given # of Characters and no String Input; use RegularExpressions?
On 17 Apr 2005 18:12:19 -0700, [EMAIL PROTECTED] (Synonymous) wrote: > >I will look for a Left$(str) function that looks at the first X >characters for python :)). > Wild goose chase alert! AFAIK there isn't one. Python uses slice notation instead of left/mid/right/substr/whatever functions. I do suggest that instead of looking for such a beastie, you read this section of the Python Tutorial: 3.1.2 Strings. Then, if you think that that was a good use of your time, you might like to read the *whole* tutorial :)) HTH, John -- http://mail.python.org/mailman/listinfo/python-list
Re: Pattern Matching Given # of Characters and no String Input; use RegularExpressions?
tiissa <[EMAIL PROTECTED]> wrote in message news:<[EMAIL PROTECTED]>... > tiissa wrote: > > If you know the number of characters to match can't you just compare > > slices? > If you don't, you can still do it by hand: > > In [7]: def cmp(s1,s2): >: diff_map=[chr(s1[i]!=s2[i]) for i in range(min(len(s1), > len(s2)))] >: diff_index=''.join(diff_map).find(chr(True)) >: if -1==diff_index: >: return min(len(s1), len(s2)) >: else: >: return diff_index >: > > In [8]: cmp('cccat','cccap') > Out[8]: 4 > > In [9]: cmp('ccc','cccap') > Out[9]: 3 > > In [10]: cmp('cccat','dddfa') > Out[10]: 0 I will look at that, although if i have 300 images i dont want to type all the comparisons (In [9]: cmp('ccc','cccap')) by hand, it would just be easier to sort them then :). I got it somewhat close to working in visual basic: If Left$(Cells(iRow, 1).Value, Count) = Left$(Cells(iRow - 1, 1).Value, Count) Then What it says is when comparing a list, it looks at the 'Count' left number of characters in the cell and compares it to the row cell above's 'Count' left number of characters and then does the task (i.e. makes a directory, moves the files) if they are equal. I will look for a Left$(str) function that looks at the first X characters for python :)). Thank you for your help! Synonymous -- http://mail.python.org/mailman/listinfo/python-list
Re: Pattern Matching Given # of Characters and no String Input; use RegularExpressions?
tiissa wrote: If you know the number of characters to match can't you just compare slices? If you don't, you can still do it by hand: In [7]: def cmp(s1,s2): : diff_map=[chr(s1[i]!=s2[i]) for i in range(min(len(s1), len(s2)))] : diff_index=''.join(diff_map).find(chr(True)) : if -1==diff_index: : return min(len(s1), len(s2)) : else: : return diff_index : In [8]: cmp('cccat','cccap') Out[8]: 4 In [9]: cmp('ccc','cccap') Out[9]: 3 In [10]: cmp('cccat','dddfa') Out[10]: 0 -- http://mail.python.org/mailman/listinfo/python-list
Re: Pattern Matching Given # of Characters and no String Input; use RegularExpressions?
Synonymous wrote: Can regular expressions compare file names to one another. It seems RE can only compare with input i give it, while I want it to compare amongst itself and give me matches if the first x characters are similiar. Do you have to use regular expressions? If you know the number of characters to match can't you just compare slices? In [1]: f1,f2='cccat','cccap' In [2]: f1[:3] Out[2]: 'ccc' In [3]: f1[:3]==f2[:3] Out[3]: True It seems to me you just have to compare each file to the next one (after having sorted your list). -- http://mail.python.org/mailman/listinfo/python-list
Pattern Matching Given # of Characters and no String Input; use RegularExpressions?
Hello, Can regular expressions compare file names to one another. It seems RE can only compare with input i give it, while I want it to compare amongst itself and give me matches if the first x characters are similiar. For example: cccat cccap cccan dddfa dddfg dddfz Would result in the 'ddd' and the 'ccc' being grouped together if I specified it to look for a match of the first 3 characters. What I am trying to do is build a script that will automatically create directories based on duplicates like this starting with say 10 characters, and going down to 1. This way "Vacation1.jpg, Vacation2.jpg" would be sent to its own directory (if i specifiy the first 8 characters being similiar) and "Cat1.jpg, Cat2.jpg" would (with 3) as well. Thanks for your help and interest! S M -- http://mail.python.org/mailman/listinfo/python-list