Re: [Python-ideas] string method count()
Regular expressions are not just "an order of magnitude better"—they're asymptotically faster. See https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm for a non-regular-expression algorithm. On Thursday, April 26, 2018 at 5:45:20 AM UTC-4, Jacco van Dorp wrote: > > or build it yourself... > > def str_count(string, sub): > c = 0 > for c in range(len(string)-len(sub)): > if string[c:].startswith(sub): > c += 1 > return c > > (probably some optimizations possible...) > > Or in one line with a generator expression: > def str_count(string, sub): > return sum(string[c:].startswith(sub) for c in > range(len(string)-len(sub))) > > regular expressions would probably be at least an order of magnitude > better in speed, if it's a bottleneck to you. But pure python > implementation for this is a lot easier than it would be for the > current string.count(). > > 2018-04-26 8:57 GMT+02:00 Wes Turner: > > > > > > On Wednesday, April 25, 2018, Steven D'Aprano > wrote: > >> > >> On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote: > >> > Hi, > >> > > >> > There’s an error with the string method count(). > >> > > >> > x = ‘AAA’ > >> > y = ‘AA’ > >> > print(x.count(y)) > >> > > >> > The output is 1, instead of 2. > >> > >> Are you proposing that there ought to be a version of count that looks > >> for *overlapping* substrings? > >> > >> When will this be useful? > > > > > > "Finding a motif in DNA" > > http://rosalind.info/problems/subs/ > > > > This is possible with re.find, re.finditer, re.findall, regex.findall(, > > overlapped=True), sliding window > > > https://stackoverflow.com/questions/2970520/string-count-with-overlapping-occurrences > > > > n-grams can be by indices or by value. > > count = len(indices) > > https://en.wikipedia.org/wiki/N-gram#Examples > > > > > https://en.wikipedia.org/wiki/String_(computer_science)#String_processing_algorithms > > > > https://en.wikipedia.org/wiki/Sequential_pattern_mining > > > >> > >> > >> -- > >> Steve > >> ___ > >> Python-ideas mailing list > >> python...@python.org > >> https://mail.python.org/mailman/listinfo/python-ideas > >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > > > > ___ > > Python-ideas mailing list > > python...@python.org > > https://mail.python.org/mailman/listinfo/python-ideas > > Code of Conduct: http://python.org/psf/codeofconduct/ > > > ___ > Python-ideas mailing list > python...@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > >___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] string method count()
If this was for a school assignment, I'd probably go to edit distance and fuzzy string match next: https://en.wikipedia.org/wiki/Edit_distance https://en.wikipedia.org/wiki/String-to-string_correction_problem - https://pypi.org/search/?q=Levenshtein - https://pypi.org/project/textdistance/ As a bioinformatics program, this is a bit like CRISPR: https://en.wikipedia.org/wiki/CRISPR BioPython Seq has a count_overlap method with a BSD 3-Clause LICENSE: https://github.com/biopython/biopython/blob/master/LICENSE.rst Can it be made faster with e.g. itertools.count and a generator comprehension? - Bio.Seq.Seq.count_overlap() http://biopython.org/DIST/docs/api/Bio.Seq.Seq-class.html#count_overlap Are there any changes or features necessary in core Python in order to finish this application? If not, the python-tutor mailing list or r/learnpython are set up to handle this sort of thing. It may or may not be appropriate for core Python to support all of these string algorithms: http://rosalind.info/problems/topics/string-algorithms/ On Thursday, April 26, 2018, Julia Kimwrote: > There are two ‘AA’ in ‘AAA’, one starting from 0 and the other starting > from 1. > > If ‘AA’ starting from 0 is deleted and inserted with ‘BANAN’, ‘AAA’ > becomes ‘BANANA ‘. > > If ‘AA’ starting from 1 is deleted and inserted with ‘PPLE’, ‘AAA’ becomes > ‘APPLE’. > > Depending on which one is chosen, ‘AAA’ can be edited to ‘BANANA’ or > ‘APPLE ‘, two different results. > > > I wrote a program which edits a part of a text. If the part to be edited > occurs more than once, it presents the positions and asks the user to > choose which one to be edited. > > I tried with different algorithms. Best one so far would be using just > find() and collecting the results in a list. > > > > On Apr 25, 2018, at 11:57 PM, Wes Turner wrote: > > > > On Wednesday, April 25, 2018, Steven D'Aprano wrote: > >> On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote: >> > Hi, >> > >> > There’s an error with the string method count(). >> > >> > x = ‘AAA’ >> > y = ‘AA’ >> > print(x.count(y)) >> > >> > The output is 1, instead of 2. >> >> Are you proposing that there ought to be a version of count that looks >> for *overlapping* substrings? >> >> When will this be useful? > > > "Finding a motif in DNA" > http://rosalind.info/problems/subs/ > > This is possible with re.find, re.finditer, re.findall, regex.findall(, > overlapped=True), sliding window > https://stackoverflow.com/questions/2970520/string-count-with-overlapping- > occurrences > > n-grams can be by indices or by value. > count = len(indices) > https://en.wikipedia.org/wiki/N-gram#Examples > > https://en.wikipedia.org/wiki/String_(computer_science)# > String_processing_algorithms > > https://en.wikipedia.org/wiki/Sequential_pattern_mining > > >> >> -- >> Steve >> ___ >> Python-ideas mailing list >> Python-ideas@python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] string method count()
There are two ‘AA’ in ‘AAA’, one starting from 0 and the other starting from 1. If ‘AA’ starting from 0 is deleted and inserted with ‘BANAN’, ‘AAA’ becomes ‘BANANA ‘. If ‘AA’ starting from 1 is deleted and inserted with ‘PPLE’, ‘AAA’ becomes ‘APPLE’. Depending on which one is chosen, ‘AAA’ can be edited to ‘BANANA’ or ‘APPLE ‘, two different results. I wrote a program which edits a part of a text. If the part to be edited occurs more than once, it presents the positions and asks the user to choose which one to be edited. I tried with different algorithms. Best one so far would be using just find() and collecting the results in a list. > On Apr 25, 2018, at 11:57 PM, Wes Turnerwrote: > > > >> On Wednesday, April 25, 2018, Steven D'Aprano wrote: >> On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote: >> > Hi, >> > >> > There’s an error with the string method count(). >> > >> > x = ‘AAA’ >> > y = ‘AA’ >> > print(x.count(y)) >> > >> > The output is 1, instead of 2. >> >> Are you proposing that there ought to be a version of count that looks >> for *overlapping* substrings? >> >> When will this be useful? > > "Finding a motif in DNA" > http://rosalind.info/problems/subs/ > > This is possible with re.find, re.finditer, re.findall, regex.findall(, > overlapped=True), sliding window > https://stackoverflow.com/questions/2970520/string-count-with-overlapping-occurrences > > n-grams can be by indices or by value. > count = len(indices) > https://en.wikipedia.org/wiki/N-gram#Examples > > https://en.wikipedia.org/wiki/String_(computer_science)#String_processing_algorithms > > https://en.wikipedia.org/wiki/Sequential_pattern_mining > >> >> >> -- >> Steve >> ___ >> Python-ideas mailing list >> Python-ideas@python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] string method count()
or build it yourself... def str_count(string, sub): c = 0 for c in range(len(string)-len(sub)): if string[c:].startswith(sub): c += 1 return c (probably some optimizations possible...) Or in one line with a generator expression: def str_count(string, sub): return sum(string[c:].startswith(sub) for c in range(len(string)-len(sub))) regular expressions would probably be at least an order of magnitude better in speed, if it's a bottleneck to you. But pure python implementation for this is a lot easier than it would be for the current string.count(). 2018-04-26 8:57 GMT+02:00 Wes Turner: > > > On Wednesday, April 25, 2018, Steven D'Aprano wrote: >> >> On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote: >> > Hi, >> > >> > There’s an error with the string method count(). >> > >> > x = ‘AAA’ >> > y = ‘AA’ >> > print(x.count(y)) >> > >> > The output is 1, instead of 2. >> >> Are you proposing that there ought to be a version of count that looks >> for *overlapping* substrings? >> >> When will this be useful? > > > "Finding a motif in DNA" > http://rosalind.info/problems/subs/ > > This is possible with re.find, re.finditer, re.findall, regex.findall(, > overlapped=True), sliding window > https://stackoverflow.com/questions/2970520/string-count-with-overlapping-occurrences > > n-grams can be by indices or by value. > count = len(indices) > https://en.wikipedia.org/wiki/N-gram#Examples > > https://en.wikipedia.org/wiki/String_(computer_science)#String_processing_algorithms > > https://en.wikipedia.org/wiki/Sequential_pattern_mining > >> >> >> -- >> Steve >> ___ >> Python-ideas mailing list >> Python-ideas@python.org >> https://mail.python.org/mailman/listinfo/python-ideas >> Code of Conduct: http://python.org/psf/codeofconduct/ > > > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] string method count()
On Wednesday, April 25, 2018, Steven D'Apranowrote: > On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote: > > Hi, > > > > There’s an error with the string method count(). > > > > x = ‘AAA’ > > y = ‘AA’ > > print(x.count(y)) > > > > The output is 1, instead of 2. > > Are you proposing that there ought to be a version of count that looks > for *overlapping* substrings? > > When will this be useful? "Finding a motif in DNA" http://rosalind.info/problems/subs/ This is possible with re.find, re.finditer, re.findall, regex.findall(, overlapped=True), sliding window https://stackoverflow.com/questions/2970520/string-count-with-overlapping-occurrences n-grams can be by indices or by value. count = len(indices) https://en.wikipedia.org/wiki/N-gram#Examples https://en.wikipedia.org/wiki/String_(computer_science)#String_processing_algorithms https://en.wikipedia.org/wiki/Sequential_pattern_mining > > -- > Steve > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] string method count()
On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote: > Hi, > > There’s an error with the string method count(). > > x = ‘AAA’ > y = ‘AA’ > print(x.count(y)) > > The output is 1, instead of 2. Are you proposing that there ought to be a version of count that looks for *overlapping* substrings? When will this be useful? -- Steve ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] string method count()
str.count counts non-overlapping instances of the substring. After counting the first 'AA', there is only one A left, so that isn't a second instance of 'AA' On 2018-04-25 02:22 PM, Julia Kim wrote: > Hi, > > There’s an error with the string method count(). > > x = ‘AAA’ > y = ‘AA’ > print(x.count(y)) > > The output is 1, instead of 2. > > > I write programs on SoloLearn mobile app. > > > > Warm regards, > Julia Kim > > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Re: [Python-ideas] string method count()
Hi, >From https://docs.python.org/3/library/stdtypes.html#str.count: str.count(*sub*[, *start*[, *end*]]) Return the number of *non-overlapping* occurrences of substring *sub* in the range [*start*, *end*]. Optional arguments *start* and *end* are interpreted as in slice notation. Best regards, João Santos On Wed, 25 Apr 2018 at 20:22 Julia Kimwrote: > Hi, > > There’s an error with the string method count(). > > x = ‘AAA’ > y = ‘AA’ > print(x.count(y)) > > The output is 1, instead of 2. > > > I write programs on SoloLearn mobile app. > > > > Warm regards, > Julia Kim > > ___ > Python-ideas mailing list > Python-ideas@python.org > https://mail.python.org/mailman/listinfo/python-ideas > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] string method count()
Hi, There’s an error with the string method count(). x = ‘AAA’ y = ‘AA’ print(x.count(y)) The output is 1, instead of 2. I write programs on SoloLearn mobile app. Warm regards, Julia Kim ___ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/