Re: [Tutor] longest common substring

2011-11-13 Thread lina
On Mon, Nov 14, 2011 at 11:56 AM, lina  wrote:
> On Mon, Nov 14, 2011 at 6:28 AM, Andreas Perstinger
>  wrote:
>> On 2011-11-11 14:44, lina wrote:
>>>
>>> You are right, I did not think of this parts before. and actually the
>>> initiative wish was to find possible paths, I mean, possible
>>> substrings, all possible substrings. not the longest one, but at
>>> least bigger than 3.
>>
>> I had some time today and since you have changed your initial task (from
>> finding the longest common path to finding all common paths with a minimum
>> length) I've modified the code and came up with the following solution:
>>
>> def AllCommonPaths(list1, list2, minimum=3):
>> """ finds all common paths with a minimum length (default = 3)"""
>>
>>    # First we have to initialize the necessary variables:
>>    # M is an empty table where we will store all found matches
>>    # (regardless of their length)
>>
>>    M = [[0] * (len(list2)) for i in range(len(list1))]
>>
>>    # length is a dictionary where we store the length of each common
>>    # path. The keys are the starting positions ot the paths in list1.
>>
>>    length = {}
>>
>>    # result will be a list of of all found paths
>>
>>    result =[]
>>
>>    # Now the hard work begins:
>>    # Each element of list1 is compared to each element in list2
>>    # (x is the index for list1, y is the index for list2).
>>    # If we find a match, we store the distance to the starting point
>>    # of the matching block. If we are in the left-most column (x == 0)
>>    # or in the upper-most row (y == 0) we have to set the starting
>>    # point ourself because we would get negative indexes if we look
>>    # for the predecessor cell (M[x - 1][y - 1]). Else, we are one
>>    # element farther away as the element before, so we add 1 to its
>>    # value.
>>
>>    for x in range(len(list1)):
>>        for y in range(len(list2)):
>>            if list1[x] == list2[y]:
>>                if (x == 0) or (y == 0):
>>                    M[x][y] = 1
>>                else:
>>                    M[x][y] = M[x - 1][y - 1] + 1
>>
>>    # To get everything done in one pass, we update the length of
>>    # the found path in our dictionary if it is longer than the minimum
>>    # length. Thus we don't have to get through the whole table a
>>    # second time to get all found paths with the minimum length (we
>>    # don't know yet if we are already at the end of the matching
>>    # block).
>>
>>                if M[x][y] >= minimum:
>>                    length[x + 1 - M[x][y]] = M[x][y]
>>
>>
>>    # We now have for all matching blocks their starting
>>    # position in list1 and their length. Now we cut out this parts
>>    # and create our resulting list

How silly I was, it's nothing to do with x,y, since I used i and j,
it's crystal clear.

Thanks again for your time,

Best regards,

lina
>
> This is a very smart way to store their starting position as a key. My
> mind was choked about how to save the list as a key before.
>
>>
>>    for pos in length:
>>        result.append(list1[pos:pos + length[pos]])
>>
>>    return result
>>
>> I've tried to explain what I have done, but I'm sure you will still have
>> questions :-).
>
> I am confused myself with this matrix/array, about how to define
> x-axis, y-axis.
>
> I must understand some parts wrong, for the following:
>>
>> Is this close to what you want?
>>
>> Bye, Andreas
>>
>> PS: Here's the function again without comments:
>>
>> def AllCommonPaths(list1, list2, minimum=3):
>>    """ finds all common paths with a minimum length (default = 3)"""
>>
>>    M = [[0] * (len(list2)) for i in range(len(list1))]
>
> is it correct that the list2 as the x-axis, the list1 as y-axis:?
>
>>    length = {}
>>    result =[]
>>
>>    for x in range(len(list1)):
>
> Here for each row ,
>
>>        for y in range(len(list2)):
>
> This loop go through each column of certain row then,
>
>>            if list1[x] == list2[y]:
>>                if (x == 0) or (y == 0):
>>                    M[x][y] = 1
>
> Here M[x][y] actually means the x-row? y-column, seems conflicts with
> the x-axis and y-axis. they took y-axis as x row, x-axis as y column.
>
>>                else:
>>                    M[x][y] = M[x - 1][y - 1] + 1
>>                if M[x][y] >= minimum:
>>                    length[x + 1 - M[x][y]] = M[x][y]
>>
>>    for pos in length:
>>        result.append(list1[pos:pos + length[pos]])
>>
>>    return result
>
> I have no problem understanding the other parts, except the array and
> axis entangled in my mind.
>
>> ___
>> Tutor maillist  -  Tutor@python.org
>> To unsubscribe or change subscription options:
>> http://mail.python.org/mailman/listinfo/tutor
>>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] longest common substring

2011-11-13 Thread lina
On Mon, Nov 14, 2011 at 6:28 AM, Andreas Perstinger
 wrote:
> On 2011-11-11 14:44, lina wrote:
>>
>> You are right, I did not think of this parts before. and actually the
>> initiative wish was to find possible paths, I mean, possible
>> substrings, all possible substrings. not the longest one, but at
>> least bigger than 3.
>
> I had some time today and since you have changed your initial task (from
> finding the longest common path to finding all common paths with a minimum
> length) I've modified the code and came up with the following solution:
>
> def AllCommonPaths(list1, list2, minimum=3):
> """ finds all common paths with a minimum length (default = 3)"""
>
>    # First we have to initialize the necessary variables:
>    # M is an empty table where we will store all found matches
>    # (regardless of their length)
>
>    M = [[0] * (len(list2)) for i in range(len(list1))]
>
>    # length is a dictionary where we store the length of each common
>    # path. The keys are the starting positions ot the paths in list1.
>
>    length = {}
>
>    # result will be a list of of all found paths
>
>    result =[]
>
>    # Now the hard work begins:
>    # Each element of list1 is compared to each element in list2
>    # (x is the index for list1, y is the index for list2).
>    # If we find a match, we store the distance to the starting point
>    # of the matching block. If we are in the left-most column (x == 0)
>    # or in the upper-most row (y == 0) we have to set the starting
>    # point ourself because we would get negative indexes if we look
>    # for the predecessor cell (M[x - 1][y - 1]). Else, we are one
>    # element farther away as the element before, so we add 1 to its
>    # value.
>
>    for x in range(len(list1)):
>        for y in range(len(list2)):
>            if list1[x] == list2[y]:
>                if (x == 0) or (y == 0):
>                    M[x][y] = 1
>                else:
>                    M[x][y] = M[x - 1][y - 1] + 1
>
>    # To get everything done in one pass, we update the length of
>    # the found path in our dictionary if it is longer than the minimum
>    # length. Thus we don't have to get through the whole table a
>    # second time to get all found paths with the minimum length (we
>    # don't know yet if we are already at the end of the matching
>    # block).
>
>                if M[x][y] >= minimum:
>                    length[x + 1 - M[x][y]] = M[x][y]
>
>
>    # We now have for all matching blocks their starting
>    # position in list1 and their length. Now we cut out this parts
>    # and create our resulting list

This is a very smart way to store their starting position as a key. My
mind was choked about how to save the list as a key before.

>
>    for pos in length:
>        result.append(list1[pos:pos + length[pos]])
>
>    return result
>
> I've tried to explain what I have done, but I'm sure you will still have
> questions :-).

I am confused myself with this matrix/array, about how to define
x-axis, y-axis.

I must understand some parts wrong, for the following:
>
> Is this close to what you want?
>
> Bye, Andreas
>
> PS: Here's the function again without comments:
>
> def AllCommonPaths(list1, list2, minimum=3):
>    """ finds all common paths with a minimum length (default = 3)"""
>
>    M = [[0] * (len(list2)) for i in range(len(list1))]

is it correct that the list2 as the x-axis, the list1 as y-axis:?

>    length = {}
>    result =[]
>
>    for x in range(len(list1)):

Here for each row ,

>        for y in range(len(list2)):

This loop go through each column of certain row then,

>            if list1[x] == list2[y]:
>                if (x == 0) or (y == 0):
>                    M[x][y] = 1

Here M[x][y] actually means the x-row? y-column, seems conflicts with
the x-axis and y-axis. they took y-axis as x row, x-axis as y column.

>                else:
>                    M[x][y] = M[x - 1][y - 1] + 1
>                if M[x][y] >= minimum:
>                    length[x + 1 - M[x][y]] = M[x][y]
>
>    for pos in length:
>        result.append(list1[pos:pos + length[pos]])
>
>    return result

I have no problem understanding the other parts, except the array and
axis entangled in my mind.

> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] longest common substring

2011-11-13 Thread Andreas Perstinger

On 2011-11-11 16:53, Jerry Hill wrote:

There's nothing wrong with writing your own code to find the longest common
substring, but are you aware that python has a module in the standard
library that already does this?  In the difflib module, the SequenceMatcher
class can compare two sequences and extract the longest common sequence of
elements from it, like this:


Thanks for the tip. I've played around with it, but I think it doesn't 
help in the OP's situation. "SequenceMatcher.find_longest_match()" just 
finds the first common block:


Python 2.7.1+ (r271:86832, Apr 11 2011, 18:05:24)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import difflib
>>> first = [0, 1, 2, 3, 0, 4, 5, 6, 0]
>>> second = [1, 2, 3, 4, 5, 6]
>>> match = difflib.SequenceMatcher(None, first, second)
>>> match.find_longest_match(0, len(first), 0, len(second))
Match(a=1, b=0, size=3)

Here it returns just [1, 2, 3] but misses [4, 5, 6]. So you would have 
to adjust the lower limits to get it. 
"SequenceMatcher.get_matching_blocks()" seems to be a better choice:


>>> match.get_matching_blocks()
[Match(a=1, b=0, size=3), Match(a=5, b=3, size=3), Match(a=9, b=6, size=0)]

Now you get [1, 2, 3] and [4, 5, 6]. But if the two blocks are in the 
reversed order, there is no longest common subsequence [1, 2, 3, 4, 5, 
6] any more and "SequenceMatcher" only finds one part (apparently it 
chooses the first it comes across in the first list if both have the 
same length):


>>> first = [0, 1, 2, 3, 0, 4, 5, 6, 0]
>>> second = [4, 5, 6, 1, 2, 3]
>>> match = difflib.SequenceMatcher(None, first, second)
>>> match.find_longest_match(0, len(first), 0, len(second))
Match(a=1, b=3, size=3)
>>> match.get_matching_blocks()
[Match(a=1, b=3, size=3), Match(a=9, b=6, size=0)]

From both methods you get [1, 2, 3].

As I've learnt during this tests, there is a difference between 
subsequences and substrings:

http://en.wikipedia.org/wiki/Subsequence#Substring_vs._subsequence

If I've understood the OP right, he/she wants to find all common 
substrings with a minimum length regardless of their order in the strings.


Bye, Andreas
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] longest common substring

2011-11-13 Thread Andreas Perstinger

On 2011-11-11 14:44, lina wrote:

You are right, I did not think of this parts before. and actually the
initiative wish was to find possible paths, I mean, possible
substrings, all possible substrings. not the longest one, but at
least bigger than 3.


I had some time today and since you have changed your initial task (from 
finding the longest common path to finding all common paths with a 
minimum length) I've modified the code and came up with the following 
solution:


def AllCommonPaths(list1, list2, minimum=3):
""" finds all common paths with a minimum length (default = 3)"""

# First we have to initialize the necessary variables:
# M is an empty table where we will store all found matches
# (regardless of their length)

M = [[0] * (len(list2)) for i in range(len(list1))]

# length is a dictionary where we store the length of each common
# path. The keys are the starting positions ot the paths in list1.

length = {}

# result will be a list of of all found paths

result =[]

# Now the hard work begins:
# Each element of list1 is compared to each element in list2
# (x is the index for list1, y is the index for list2).
# If we find a match, we store the distance to the starting point
# of the matching block. If we are in the left-most column (x == 0)
# or in the upper-most row (y == 0) we have to set the starting
# point ourself because we would get negative indexes if we look
# for the predecessor cell (M[x - 1][y - 1]). Else, we are one
# element farther away as the element before, so we add 1 to its
# value.

for x in range(len(list1)):
for y in range(len(list2)):
if list1[x] == list2[y]:
if (x == 0) or (y == 0):
M[x][y] = 1
else:
M[x][y] = M[x - 1][y - 1] + 1

# To get everything done in one pass, we update the length of
# the found path in our dictionary if it is longer than the minimum
# length. Thus we don't have to get through the whole table a
# second time to get all found paths with the minimum length (we
# don't know yet if we are already at the end of the matching
# block).

if M[x][y] >= minimum:
length[x + 1 - M[x][y]] = M[x][y]


# We now have for all matching blocks their starting
# position in list1 and their length. Now we cut out this parts
# and create our resulting list

for pos in length:
result.append(list1[pos:pos + length[pos]])

return result

I've tried to explain what I have done, but I'm sure you will still have 
questions :-).


Is this close to what you want?

Bye, Andreas

PS: Here's the function again without comments:

def AllCommonPaths(list1, list2, minimum=3):
""" finds all common paths with a minimum length (default = 3)"""

M = [[0] * (len(list2)) for i in range(len(list1))]
length = {}
result =[]

for x in range(len(list1)):
for y in range(len(list2)):
if list1[x] == list2[y]:
if (x == 0) or (y == 0):
M[x][y] = 1
else:
M[x][y] = M[x - 1][y - 1] + 1
if M[x][y] >= minimum:
length[x + 1 - M[x][y]] = M[x][y]

for pos in length:
result.append(list1[pos:pos + length[pos]])

return result
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] longest common substring

2011-11-13 Thread Dave Angel

On 11/13/2011 08:06 AM, lina wrote:


Finally, if I am not wrong again, I feel I am kinda of starting
figuring out what's going on. Why it's None.

The main mistake here I use result = result.append(something)
the "="

I checked the print(id(result)) and print(id(result.append()),

For the NoneType they shared the same id 8823392 in my laptop. is it
temporary address?

None is a unique object, deliberately.  No matter how many times people 
create None, it'll always be the same object.  So

a= None
b = x.append(y)
a is b   #(true)
id(a) == id(b)#(true)

Similarly  True and False are unique objects.

Other objects which are equal to each other may or may not have the same 
ID;  you should not count on it.  For example,

x = 45+3
y = 6*8

Checking  (x==y) is true, of course.  But checking (x is y) is 
indeterminate.  It may be true for the first ten tests you do, and false 
next time


--

DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] longest common substring

2011-11-13 Thread lina
On Sun, Nov 13, 2011 at 12:40 AM, Andreas Perstinger
 wrote:
> On 2011-11-12 16:24, lina wrote:
>>
>> Thanks, ^_^, now better.
>
> No, I'm afraid you are still not understanding.
>
>> I checked, the sublist (list) here can't be as a key of the results
>> (dict).
>
> "result" isn't a dictionary. It started as an empty list and later becomes a
> null object ("NoneType").
>
> You must not forget that you are inside a for-loop. Simplified your
> situation is like this:
>
 result = []
 for i in range(1,10):
> ...     print("Iteration {0}, result = {1}".format(i, result))
> ...     result = result.append(i)
> ...
> Iteration 1, result = []
> Iteration 2, result = None
> Traceback (most recent call last):
>  File "", line 3, in 
> AttributeError: 'NoneType' object has no attribute 'append'
>
> As you see the error happens in the *second* iteration, because result is no
> list any more.
> Dave gave you already the explanation: functions and method always return a
> value in Python. If the don't have a return statement they return "None".
>
> Another simple example:
>
 a = print("Test")
> Test
>
> "print" is a function which prints out the text you passed to it and you
> usually aren't interested in its return value. But every function/method in
> Python returns something. You save this value in "a"
>
 print(a)
> None
>
> As you see the return value of "print" is "None".
>
 a.append(x)
> Traceback (most recent call last):
>  File "", line 1, in 
> AttributeError: 'NoneType' object has no attribute 'append'
>
> Same error as above, because "NoneType" objects (null objects) don't have a
> method "append".
>
> I also think you mix two different ways to add an element to a list:
>
> result.append(x)
>
> is equivalent to
>
> result = result + [x] (that's what you will use in other languages)

Finally, if I am not wrong again, I feel I am kinda of starting
figuring out what's going on. Why it's None.

The main mistake here I use result = result.append(something)
the "="

I checked the print(id(result)) and print(id(result.append()),

For the NoneType they shared the same id 8823392 in my laptop. is it
temporary address?

>
> HTH, Andreas

Really helps, Thanks again.

 haha ...for the emails from the list I used to read several times,
again and again to understand.

Best regards,

> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] [off-topic] Any thread for linux troubleshooting

2011-11-13 Thread Alan Gauld

On 13/11/11 03:29, Ken G. wrote:


I checked the above website and I was linked to a foreign political
blog. The correct link is: tldp.org/ for The Linux Documentation Project.


Oops, sorry about that.
Amazing the difference a single letter makes! :-)

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor