Re: [Tutor] longest common substring
On Mon, Nov 14, 2011 at 11:56 AM, lina wrote: > On Mon, Nov 14, 2011 at 6:28 AM, Andreas Perstinger > wrote: >> On 2011-11-11 14:44, lina wrote: >>> >>> You are right, I did not think of this parts before. and actually the >>> initiative wish was to find possible paths, I mean, possible >>> substrings, all possible substrings. not the longest one, but at >>> least bigger than 3. >> >> I had some time today and since you have changed your initial task (from >> finding the longest common path to finding all common paths with a minimum >> length) I've modified the code and came up with the following solution: >> >> def AllCommonPaths(list1, list2, minimum=3): >> """ finds all common paths with a minimum length (default = 3)""" >> >> # First we have to initialize the necessary variables: >> # M is an empty table where we will store all found matches >> # (regardless of their length) >> >> M = [[0] * (len(list2)) for i in range(len(list1))] >> >> # length is a dictionary where we store the length of each common >> # path. The keys are the starting positions ot the paths in list1. >> >> length = {} >> >> # result will be a list of of all found paths >> >> result =[] >> >> # Now the hard work begins: >> # Each element of list1 is compared to each element in list2 >> # (x is the index for list1, y is the index for list2). >> # If we find a match, we store the distance to the starting point >> # of the matching block. If we are in the left-most column (x == 0) >> # or in the upper-most row (y == 0) we have to set the starting >> # point ourself because we would get negative indexes if we look >> # for the predecessor cell (M[x - 1][y - 1]). Else, we are one >> # element farther away as the element before, so we add 1 to its >> # value. >> >> for x in range(len(list1)): >> for y in range(len(list2)): >> if list1[x] == list2[y]: >> if (x == 0) or (y == 0): >> M[x][y] = 1 >> else: >> M[x][y] = M[x - 1][y - 1] + 1 >> >> # To get everything done in one pass, we update the length of >> # the found path in our dictionary if it is longer than the minimum >> # length. Thus we don't have to get through the whole table a >> # second time to get all found paths with the minimum length (we >> # don't know yet if we are already at the end of the matching >> # block). >> >> if M[x][y] >= minimum: >> length[x + 1 - M[x][y]] = M[x][y] >> >> >> # We now have for all matching blocks their starting >> # position in list1 and their length. Now we cut out this parts >> # and create our resulting list How silly I was, it's nothing to do with x,y, since I used i and j, it's crystal clear. Thanks again for your time, Best regards, lina > > This is a very smart way to store their starting position as a key. My > mind was choked about how to save the list as a key before. > >> >> for pos in length: >> result.append(list1[pos:pos + length[pos]]) >> >> return result >> >> I've tried to explain what I have done, but I'm sure you will still have >> questions :-). > > I am confused myself with this matrix/array, about how to define > x-axis, y-axis. > > I must understand some parts wrong, for the following: >> >> Is this close to what you want? >> >> Bye, Andreas >> >> PS: Here's the function again without comments: >> >> def AllCommonPaths(list1, list2, minimum=3): >> """ finds all common paths with a minimum length (default = 3)""" >> >> M = [[0] * (len(list2)) for i in range(len(list1))] > > is it correct that the list2 as the x-axis, the list1 as y-axis:? > >> length = {} >> result =[] >> >> for x in range(len(list1)): > > Here for each row , > >> for y in range(len(list2)): > > This loop go through each column of certain row then, > >> if list1[x] == list2[y]: >> if (x == 0) or (y == 0): >> M[x][y] = 1 > > Here M[x][y] actually means the x-row? y-column, seems conflicts with > the x-axis and y-axis. they took y-axis as x row, x-axis as y column. > >> else: >> M[x][y] = M[x - 1][y - 1] + 1 >> if M[x][y] >= minimum: >> length[x + 1 - M[x][y]] = M[x][y] >> >> for pos in length: >> result.append(list1[pos:pos + length[pos]]) >> >> return result > > I have no problem understanding the other parts, except the array and > axis entangled in my mind. > >> ___ >> Tutor maillist - Tutor@python.org >> To unsubscribe or change subscription options: >> http://mail.python.org/mailman/listinfo/tutor >> > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] longest common substring
On Mon, Nov 14, 2011 at 6:28 AM, Andreas Perstinger wrote: > On 2011-11-11 14:44, lina wrote: >> >> You are right, I did not think of this parts before. and actually the >> initiative wish was to find possible paths, I mean, possible >> substrings, all possible substrings. not the longest one, but at >> least bigger than 3. > > I had some time today and since you have changed your initial task (from > finding the longest common path to finding all common paths with a minimum > length) I've modified the code and came up with the following solution: > > def AllCommonPaths(list1, list2, minimum=3): > """ finds all common paths with a minimum length (default = 3)""" > > # First we have to initialize the necessary variables: > # M is an empty table where we will store all found matches > # (regardless of their length) > > M = [[0] * (len(list2)) for i in range(len(list1))] > > # length is a dictionary where we store the length of each common > # path. The keys are the starting positions ot the paths in list1. > > length = {} > > # result will be a list of of all found paths > > result =[] > > # Now the hard work begins: > # Each element of list1 is compared to each element in list2 > # (x is the index for list1, y is the index for list2). > # If we find a match, we store the distance to the starting point > # of the matching block. If we are in the left-most column (x == 0) > # or in the upper-most row (y == 0) we have to set the starting > # point ourself because we would get negative indexes if we look > # for the predecessor cell (M[x - 1][y - 1]). Else, we are one > # element farther away as the element before, so we add 1 to its > # value. > > for x in range(len(list1)): > for y in range(len(list2)): > if list1[x] == list2[y]: > if (x == 0) or (y == 0): > M[x][y] = 1 > else: > M[x][y] = M[x - 1][y - 1] + 1 > > # To get everything done in one pass, we update the length of > # the found path in our dictionary if it is longer than the minimum > # length. Thus we don't have to get through the whole table a > # second time to get all found paths with the minimum length (we > # don't know yet if we are already at the end of the matching > # block). > > if M[x][y] >= minimum: > length[x + 1 - M[x][y]] = M[x][y] > > > # We now have for all matching blocks their starting > # position in list1 and their length. Now we cut out this parts > # and create our resulting list This is a very smart way to store their starting position as a key. My mind was choked about how to save the list as a key before. > > for pos in length: > result.append(list1[pos:pos + length[pos]]) > > return result > > I've tried to explain what I have done, but I'm sure you will still have > questions :-). I am confused myself with this matrix/array, about how to define x-axis, y-axis. I must understand some parts wrong, for the following: > > Is this close to what you want? > > Bye, Andreas > > PS: Here's the function again without comments: > > def AllCommonPaths(list1, list2, minimum=3): > """ finds all common paths with a minimum length (default = 3)""" > > M = [[0] * (len(list2)) for i in range(len(list1))] is it correct that the list2 as the x-axis, the list1 as y-axis:? > length = {} > result =[] > > for x in range(len(list1)): Here for each row , > for y in range(len(list2)): This loop go through each column of certain row then, > if list1[x] == list2[y]: > if (x == 0) or (y == 0): > M[x][y] = 1 Here M[x][y] actually means the x-row? y-column, seems conflicts with the x-axis and y-axis. they took y-axis as x row, x-axis as y column. > else: > M[x][y] = M[x - 1][y - 1] + 1 > if M[x][y] >= minimum: > length[x + 1 - M[x][y]] = M[x][y] > > for pos in length: > result.append(list1[pos:pos + length[pos]]) > > return result I have no problem understanding the other parts, except the array and axis entangled in my mind. > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] longest common substring
On 2011-11-11 16:53, Jerry Hill wrote: There's nothing wrong with writing your own code to find the longest common substring, but are you aware that python has a module in the standard library that already does this? In the difflib module, the SequenceMatcher class can compare two sequences and extract the longest common sequence of elements from it, like this: Thanks for the tip. I've played around with it, but I think it doesn't help in the OP's situation. "SequenceMatcher.find_longest_match()" just finds the first common block: Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24) [GCC 4.5.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import difflib >>> first = [0, 1, 2, 3, 0, 4, 5, 6, 0] >>> second = [1, 2, 3, 4, 5, 6] >>> match = difflib.SequenceMatcher(None, first, second) >>> match.find_longest_match(0, len(first), 0, len(second)) Match(a=1, b=0, size=3) Here it returns just [1, 2, 3] but misses [4, 5, 6]. So you would have to adjust the lower limits to get it. "SequenceMatcher.get_matching_blocks()" seems to be a better choice: >>> match.get_matching_blocks() [Match(a=1, b=0, size=3), Match(a=5, b=3, size=3), Match(a=9, b=6, size=0)] Now you get [1, 2, 3] and [4, 5, 6]. But if the two blocks are in the reversed order, there is no longest common subsequence [1, 2, 3, 4, 5, 6] any more and "SequenceMatcher" only finds one part (apparently it chooses the first it comes across in the first list if both have the same length): >>> first = [0, 1, 2, 3, 0, 4, 5, 6, 0] >>> second = [4, 5, 6, 1, 2, 3] >>> match = difflib.SequenceMatcher(None, first, second) >>> match.find_longest_match(0, len(first), 0, len(second)) Match(a=1, b=3, size=3) >>> match.get_matching_blocks() [Match(a=1, b=3, size=3), Match(a=9, b=6, size=0)] From both methods you get [1, 2, 3]. As I've learnt during this tests, there is a difference between subsequences and substrings: http://en.wikipedia.org/wiki/Subsequence#Substring_vs._subsequence If I've understood the OP right, he/she wants to find all common substrings with a minimum length regardless of their order in the strings. Bye, Andreas ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] longest common substring
On 2011-11-11 14:44, lina wrote: You are right, I did not think of this parts before. and actually the initiative wish was to find possible paths, I mean, possible substrings, all possible substrings. not the longest one, but at least bigger than 3. I had some time today and since you have changed your initial task (from finding the longest common path to finding all common paths with a minimum length) I've modified the code and came up with the following solution: def AllCommonPaths(list1, list2, minimum=3): """ finds all common paths with a minimum length (default = 3)""" # First we have to initialize the necessary variables: # M is an empty table where we will store all found matches # (regardless of their length) M = [[0] * (len(list2)) for i in range(len(list1))] # length is a dictionary where we store the length of each common # path. The keys are the starting positions ot the paths in list1. length = {} # result will be a list of of all found paths result =[] # Now the hard work begins: # Each element of list1 is compared to each element in list2 # (x is the index for list1, y is the index for list2). # If we find a match, we store the distance to the starting point # of the matching block. If we are in the left-most column (x == 0) # or in the upper-most row (y == 0) we have to set the starting # point ourself because we would get negative indexes if we look # for the predecessor cell (M[x - 1][y - 1]). Else, we are one # element farther away as the element before, so we add 1 to its # value. for x in range(len(list1)): for y in range(len(list2)): if list1[x] == list2[y]: if (x == 0) or (y == 0): M[x][y] = 1 else: M[x][y] = M[x - 1][y - 1] + 1 # To get everything done in one pass, we update the length of # the found path in our dictionary if it is longer than the minimum # length. Thus we don't have to get through the whole table a # second time to get all found paths with the minimum length (we # don't know yet if we are already at the end of the matching # block). if M[x][y] >= minimum: length[x + 1 - M[x][y]] = M[x][y] # We now have for all matching blocks their starting # position in list1 and their length. Now we cut out this parts # and create our resulting list for pos in length: result.append(list1[pos:pos + length[pos]]) return result I've tried to explain what I have done, but I'm sure you will still have questions :-). Is this close to what you want? Bye, Andreas PS: Here's the function again without comments: def AllCommonPaths(list1, list2, minimum=3): """ finds all common paths with a minimum length (default = 3)""" M = [[0] * (len(list2)) for i in range(len(list1))] length = {} result =[] for x in range(len(list1)): for y in range(len(list2)): if list1[x] == list2[y]: if (x == 0) or (y == 0): M[x][y] = 1 else: M[x][y] = M[x - 1][y - 1] + 1 if M[x][y] >= minimum: length[x + 1 - M[x][y]] = M[x][y] for pos in length: result.append(list1[pos:pos + length[pos]]) return result ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] longest common substring
On 11/13/2011 08:06 AM, lina wrote: Finally, if I am not wrong again, I feel I am kinda of starting figuring out what's going on. Why it's None. The main mistake here I use result = result.append(something) the "=" I checked the print(id(result)) and print(id(result.append()), For the NoneType they shared the same id 8823392 in my laptop. is it temporary address? None is a unique object, deliberately. No matter how many times people create None, it'll always be the same object. So a= None b = x.append(y) a is b #(true) id(a) == id(b)#(true) Similarly True and False are unique objects. Other objects which are equal to each other may or may not have the same ID; you should not count on it. For example, x = 45+3 y = 6*8 Checking (x==y) is true, of course. But checking (x is y) is indeterminate. It may be true for the first ten tests you do, and false next time -- DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] longest common substring
On Sun, Nov 13, 2011 at 12:40 AM, Andreas Perstinger wrote: > On 2011-11-12 16:24, lina wrote: >> >> Thanks, ^_^, now better. > > No, I'm afraid you are still not understanding. > >> I checked, the sublist (list) here can't be as a key of the results >> (dict). > > "result" isn't a dictionary. It started as an empty list and later becomes a > null object ("NoneType"). > > You must not forget that you are inside a for-loop. Simplified your > situation is like this: > result = [] for i in range(1,10): > ... print("Iteration {0}, result = {1}".format(i, result)) > ... result = result.append(i) > ... > Iteration 1, result = [] > Iteration 2, result = None > Traceback (most recent call last): > File "", line 3, in > AttributeError: 'NoneType' object has no attribute 'append' > > As you see the error happens in the *second* iteration, because result is no > list any more. > Dave gave you already the explanation: functions and method always return a > value in Python. If the don't have a return statement they return "None". > > Another simple example: > a = print("Test") > Test > > "print" is a function which prints out the text you passed to it and you > usually aren't interested in its return value. But every function/method in > Python returns something. You save this value in "a" > print(a) > None > > As you see the return value of "print" is "None". > a.append(x) > Traceback (most recent call last): > File "", line 1, in > AttributeError: 'NoneType' object has no attribute 'append' > > Same error as above, because "NoneType" objects (null objects) don't have a > method "append". > > I also think you mix two different ways to add an element to a list: > > result.append(x) > > is equivalent to > > result = result + [x] (that's what you will use in other languages) Finally, if I am not wrong again, I feel I am kinda of starting figuring out what's going on. Why it's None. The main mistake here I use result = result.append(something) the "=" I checked the print(id(result)) and print(id(result.append()), For the NoneType they shared the same id 8823392 in my laptop. is it temporary address? > > HTH, Andreas Really helps, Thanks again. haha ...for the emails from the list I used to read several times, again and again to understand. Best regards, > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] [off-topic] Any thread for linux troubleshooting
On 13/11/11 03:29, Ken G. wrote: I checked the above website and I was linked to a foreign political blog. The correct link is: tldp.org/ for The Linux Documentation Project. Oops, sorry about that. Amazing the difference a single letter makes! :-) -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor