how to find difference in number of characters
hi I am trying to write a compare method which takes two strings and find how many characters have changed. def compare_strings(s1,s2): pass text1=goat milk text2=cow milk print compare_strings(text1,text2) This must give 3 ,since 3 characters are changed between strings.I was advised to use levenshtein algorithm ..but then the matrix ops take a long time for nontrivial strings of say 2 characters ..Can this comparison be implemented using difflib module?..I am at a loss as to how to implement this using difflib .Is there some way I can get the difference as a number ? Can somebody help? thanks harry -- http://mail.python.org/mailman/listinfo/python-list
Re: how to find difference in number of characters
harryos wrote: I am trying to write a compare method which takes two strings and find how many characters have changed. def compare_strings(s1,s2): pass text1=goat milk text2=cow milk print compare_strings(text1,text2) This must give 3 ,since 3 characters are changed between strings.I was advised to use levenshtein algorithm ..but then the matrix ops take a long time for nontrivial strings of say 2 characters ..Can this I tried it with a string of 2 chars and the python-levenshtein package that comes with ubuntu. It took about one second to calculate the distance: import functools import random import string import time from Levenshtein import distance def make_strings(n, delete, insert, swap, replace, charset=string.ascii_letters): def gen_chars(): while True: yield random.choice(charset) chars = gen_chars() a = [next(chars) for i in xrange(n)] s = .join(a) for i in range(delete): del a[random.randrange(len(a))] for i in range(insert): a.insert(random.randrange(len(a)+1), next(chars)) for i in range(swap): p = random.randrange(len(a)-1) a[p], a[p+1] = a[p+1], a[p] for i in range(replace): a[random.randrange(len(a))] = next(chars) t = .join(a) return s, t N = 2 M = 100 ms = functools.partial(make_strings, N, M, M, M, M) def measure(f, *args): start = time.time() try: return f(*args) finally: print time.time() - start if __name__ == __main__: import sys args = sys.argv[1:] if args: N, M = map(int, args) s, t = make_strings(N, M, M, M, M) print measure(distance, s, t) $ python levenshtein_demo.py 1 1000 0.225363969803 3644 $ python levenshtein_demo.py 2 1000 1.05217313766 4197 $ python levenshtein_demo.py 3 1000 2.38736391068 4390 $ python levenshtein_demo.py 4 1000 4.1686527729 4558 What would be an acceptable time? Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: how to find difference in number of characters
On Oct 9, 2:45 pm, Peter Otten __pete...@web.de wrote: What would be an acceptable time? Thanks for the reply Peter, I was using python functions I came across the net..not cpython implementations..Probably my low config machine is also to blame..(I am no expert at judging algorithm performance either),but is there a way I can use difflib module to do this job?Even though I went through the docs I couldn't make out how.. regards harry -- http://mail.python.org/mailman/listinfo/python-list
Re: how to find difference in number of characters
harryos wrote: but is there a way I can use difflib module to do this job? I'm afraid I can't help you with that. You might get more/better answers if you tell us more about the context of the problem and add some details that may be relevant. Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: how to find difference in number of characters
On Oct 9, 4:52 pm, Peter Otten __pete...@web.de wrote: You might get more/better answers if you tell us more about the context of the problem and add some details that may be relevant. Peter I am trying to determine if a wep page is updated by x number of characters..Mozilla firefox plugin 'update scanner' has a similar functionality ..A user can specify the x ..I think this would be done by reading from the same url at two different times and finding the change in body text..I was wondering if difflib could offer something in the way of determining the size of delta.. Thanks again for the reply.. harry -- http://mail.python.org/mailman/listinfo/python-list
Testing for changes on a web page (was: how to find difference in number of characters)
harryos, 09.10.2010 14:24: I am trying to determine if a wep page is updated by x number of characters..Mozilla firefox plugin 'update scanner' has a similar functionality ..A user can specify the x ..I think this would be done by reading from the same url at two different times and finding the change in body text. Number of characters sounds like a rather useless measure here. I'd rather apply an XPath, CSS selector or PyQuery expression to the parsed page and check if the interesting subtree of it has changed at all or not, potentially disregarding any structural changes by stripping all tags and normalising the resulting text to ignore whitespace and case differences. Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: Testing for changes on a web page (was: how to find difference in number of characters)
On Oct 9, 5:41 pm, Stefan Behnel stefan...@behnel.de wrote: Number of characters sounds like a rather useless measure here. What I meant by number of characters was the number of edits happened between the two versions..Levenshtein distance may be one way for this..but I was wondering if difflib could do this regards harry -- http://mail.python.org/mailman/listinfo/python-list
Re: Testing for changes on a web page (was: how to find difference in number of characters)
On Oct 9, 5:41 pm, Stefan Behnel stefan...@behnel.de wrote: Number of characters sounds like a rather useless measure here. What I meant by number of characters was the number of edits happened between the two versions..Levenshtein distance may be one way for this..but I was wondering if difflib could do this regards As pointed out above, you also need to consider how the structure of the web page has changed. If you are only looking at plain text, the Levenshtein distance measures the number of edit operations (insertion, deletion or substition) necessary to transform string A into string B. Cheers, Emm -- http://mail.python.org/mailman/listinfo/python-list
Re: Testing for changes on a web page (was: how to find difference in number of characters)
On 2010-10-09, harryos oswald.ha...@gmail.com wrote: What I meant by number of characters was the number of edits happened between the two versions.. Consider two strings: Hello, world! Yo, there. What is the number of edits happened between the two versions? It could be: * Zero. I just typed them both from scratch, no editing occurred between them. * Two. Two words are different. * Ten or so -- counting changed characters. * Three. Two words and a punctuation mark are different. In other words, your problem here is that you haven't actually described what you want. Slow down. Think! Describe what you want clearly enough that any other person who reads your description can always come up with the same answer you would for a given set of inputs. -s -- Copyright 2010, all wrongs reversed. Peter Seebach / usenet-nos...@seebs.net http://www.seebs.net/log/ -- lawsuits, religion, and funny pictures http://en.wikipedia.org/wiki/Fair_Game_(Scientology) -- get educated! I am not speaking for my employer, although they do rent some of my opinions. -- http://mail.python.org/mailman/listinfo/python-list
Testing for changes on a web page (was: how to find difference in number of characters)
On Sat, Oct 9, 2010 at 10:47 AM, Seebs usenet-nos...@seebs.net wrote: On 2010-10-09, harryos oswald.ha...@gmail.com wrote: What I meant by number of characters was the number of edits happened between the two versions.. Consider two strings: Hello, world! Yo, there. What is the number of edits happened between the two versions? It could be: * Zero. I just typed them both from scratch, no editing occurred between them. * Two. Two words are different. * Ten or so -- counting changed characters. * Three. Two words and a punctuation mark are different. In other words, your problem here is that you haven't actually described what you want. Slow down. Think! Describe what you want clearly enough that any other person who reads your description can always come up with the same answer you would for a given set of inputs. -s He mentioned L distance earlier, I'm sure he means 'number of edits' in that context... Geremy Condra -- http://mail.python.org/mailman/listinfo/python-list
Re: how to find difference in number of characters
harryos oswald.ha...@gmail.com writes: On Oct 9, 4:52 pm, Peter Otten __pete...@web.de wrote: You might get more/better answers if you tell us more about the context of the problem and add some details that may be relevant. Peter I am trying to determine if a wep page is updated by x number of characters..Mozilla firefox plugin 'update scanner' has a similar functionality ..A user can specify the x ..I think this would be done by reading from the same url at two different times and finding the change in body text..I was wondering if difflib could offer something in the way of determining the size of delta.. If you normalize the data, this might be worth trying. Make all tags appear on one single line, possibly re-order attributes so that they are in alphabetical order. Each text child git's also normalized, by replacing all whitespace with a single space. Then run difflib over these, and count the number of diffrences. Diez -- http://mail.python.org/mailman/listinfo/python-list
Re: Testing for changes on a web page (was: how to find difference in number of characters)
On 09 Oct 2010 17:47:56 GMT Seebs usenet-nos...@seebs.net wrote: In other words, your problem here is that you haven't actually described what you want. Slow down. Think! Describe what you want clearly enough that any other person who reads your description can always come up with the same answer you would for a given set of inputs. Better yet, write the unit tests for us. -- D'Arcy J.M. Cain da...@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. -- http://mail.python.org/mailman/listinfo/python-list