Re: [Python-ideas] string method count()

2018-05-07 Thread Neil Girdhar
Regular expressions are not just "an order of magnitude better"—they're 
asymptotically faster.  
See https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm 
for a non-regular-expression algorithm.

On Thursday, April 26, 2018 at 5:45:20 AM UTC-4, Jacco van Dorp wrote:
>
> or build it yourself...
>
> def str_count(string, sub):
>   c = 0
>   for c in range(len(string)-len(sub)):
> if string[c:].startswith(sub):
>   c += 1
>   return c
>
> (probably some optimizations possible...)
>
> Or in one line with a generator expression:
> def str_count(string, sub):
>   return sum(string[c:].startswith(sub) for c in 
> range(len(string)-len(sub)))
>
> regular expressions would probably be at least an order of magnitude
> better in speed, if it's a bottleneck to you. But pure python
> implementation for this is a lot easier than it would be for the
> current string.count().
>
> 2018-04-26 8:57 GMT+02:00 Wes Turner :
> >
> >
> > On Wednesday, April 25, 2018, Steven D'Aprano  > wrote:
> >>
> >> On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote:
> >> > Hi,
> >> >
> >> > There’s an error with the string method count().
> >> >
> >> > x = ‘AAA’
> >> > y = ‘AA’
> >> > print(x.count(y))
> >> >
> >> > The output is 1, instead of 2.
> >>
> >> Are you proposing that there ought to be a version of count that looks
> >> for *overlapping* substrings?
> >>
> >> When will this be useful?
> >
> >
> > "Finding a motif in DNA"
> > http://rosalind.info/problems/subs/
> >
> > This is possible with re.find, re.finditer, re.findall, regex.findall(,
> > overlapped=True), sliding window
> > 
> https://stackoverflow.com/questions/2970520/string-count-with-overlapping-occurrences
> >
> > n-grams can be by indices or by value.
> > count = len(indices)
> > https://en.wikipedia.org/wiki/N-gram#Examples
> >
> > 
> https://en.wikipedia.org/wiki/String_(computer_science)#String_processing_algorithms
> >
> > https://en.wikipedia.org/wiki/Sequential_pattern_mining
> >
> >>
> >>
> >> --
> >> Steve
> >> ___
> >> Python-ideas mailing list
> >> python...@python.org 
> >> https://mail.python.org/mailman/listinfo/python-ideas
> >> Code of Conduct: http://python.org/psf/codeofconduct/
> >
> >
> > ___
> > Python-ideas mailing list
> > python...@python.org 
> > https://mail.python.org/mailman/listinfo/python-ideas
> > Code of Conduct: http://python.org/psf/codeofconduct/
> >
> ___
> Python-ideas mailing list
> python...@python.org 
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] string method count()

2018-04-26 Thread Wes Turner
If this was for a school assignment, I'd probably go to edit distance and
fuzzy string match next:
https://en.wikipedia.org/wiki/Edit_distance
https://en.wikipedia.org/wiki/String-to-string_correction_problem

- https://pypi.org/search/?q=Levenshtein
  - https://pypi.org/project/textdistance/

As a bioinformatics program, this is a bit like CRISPR:
https://en.wikipedia.org/wiki/CRISPR

BioPython Seq has a count_overlap method with a BSD 3-Clause LICENSE:
https://github.com/biopython/biopython/blob/master/LICENSE.rst

Can it be made faster with e.g. itertools.count and a generator
comprehension?

- Bio.Seq.Seq.count_overlap()
  http://biopython.org/DIST/docs/api/Bio.Seq.Seq-class.html#count_overlap

Are there any changes or features necessary in core Python in order to
finish this application?
If not, the python-tutor mailing list or r/learnpython are set up to handle
this sort of thing.

It may or may not be appropriate for core Python to support all of these
string algorithms:
http://rosalind.info/problems/topics/string-algorithms/

On Thursday, April 26, 2018, Julia Kim  wrote:

> There are two ‘AA’ in ‘AAA’, one starting from 0 and the other starting
> from 1.
>
> If ‘AA’ starting from 0 is deleted and inserted with ‘BANAN’, ‘AAA’
> becomes ‘BANANA ‘.
>
> If ‘AA’ starting from 1 is deleted and inserted with ‘PPLE’, ‘AAA’ becomes
> ‘APPLE’.
>
> Depending on which one is chosen, ‘AAA’ can be edited to ‘BANANA’ or
> ‘APPLE ‘, two different results.
>
>
> I wrote a program which edits a part of a text. If the part to be edited
> occurs more than once, it presents the positions and asks the user to
> choose which one to be edited.
>
> I tried with different algorithms. Best one so far would be using just
> find() and collecting the results in a list.
>
>
>
> On Apr 25, 2018, at 11:57 PM, Wes Turner  wrote:
>
>
>
> On Wednesday, April 25, 2018, Steven D'Aprano  wrote:
>
>> On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote:
>> > Hi,
>> >
>> > There’s an error with the string method count().
>> >
>> > x = ‘AAA’
>> > y = ‘AA’
>> > print(x.count(y))
>> >
>> > The output is 1, instead of 2.
>>
>> Are you proposing that there ought to be a version of count that looks
>> for *overlapping* substrings?
>>
>> When will this be useful?
>
>
> "Finding a motif in DNA"
> http://rosalind.info/problems/subs/
>
> This is possible with re.find, re.finditer, re.findall, regex.findall(,
> overlapped=True), sliding window
> https://stackoverflow.com/questions/2970520/string-count-with-overlapping-
> occurrences
>
> n-grams can be by indices or by value.
> count = len(indices)
> https://en.wikipedia.org/wiki/N-gram#Examples
>
> https://en.wikipedia.org/wiki/String_(computer_science)#
> String_processing_algorithms
>
> https://en.wikipedia.org/wiki/Sequential_pattern_mining
>
>
>>
>> --
>> Steve
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] string method count()

2018-04-26 Thread Julia Kim
There are two ‘AA’ in ‘AAA’, one starting from 0 and the other starting from 1.

If ‘AA’ starting from 0 is deleted and inserted with ‘BANAN’, ‘AAA’ becomes 
‘BANANA ‘.

If ‘AA’ starting from 1 is deleted and inserted with ‘PPLE’, ‘AAA’ becomes 
‘APPLE’.

Depending on which one is chosen, ‘AAA’ can be edited to ‘BANANA’ or ‘APPLE ‘, 
two different results.


I wrote a program which edits a part of a text. If the part to be edited occurs 
more than once, it presents the positions and asks the user to choose which one 
to be edited. 

I tried with different algorithms. Best one so far would be using just find() 
and collecting the results in a list.



> On Apr 25, 2018, at 11:57 PM, Wes Turner  wrote:
> 
> 
> 
>> On Wednesday, April 25, 2018, Steven D'Aprano  wrote:
>> On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote:
>> > Hi,
>> > 
>> > There’s an error with the string method count().
>> > 
>> > x = ‘AAA’
>> > y = ‘AA’
>> > print(x.count(y))
>> > 
>> > The output is 1, instead of 2.
>> 
>> Are you proposing that there ought to be a version of count that looks 
>> for *overlapping* substrings?
>> 
>> When will this be useful?
> 
> "Finding a motif in DNA"
> http://rosalind.info/problems/subs/
>  
> This is possible with re.find, re.finditer, re.findall, regex.findall(, 
> overlapped=True), sliding window
> https://stackoverflow.com/questions/2970520/string-count-with-overlapping-occurrences
> 
> n-grams can be by indices or by value.
> count = len(indices)
> https://en.wikipedia.org/wiki/N-gram#Examples
> 
> https://en.wikipedia.org/wiki/String_(computer_science)#String_processing_algorithms
> 
> https://en.wikipedia.org/wiki/Sequential_pattern_mining
> 
>> 
>> 
>> -- 
>> Steve
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] string method count()

2018-04-26 Thread Jacco van Dorp
or build it yourself...

def str_count(string, sub):
  c = 0
  for c in range(len(string)-len(sub)):
if string[c:].startswith(sub):
  c += 1
  return c

(probably some optimizations possible...)

Or in one line with a generator expression:
def str_count(string, sub):
  return sum(string[c:].startswith(sub) for c in range(len(string)-len(sub)))

regular expressions would probably be at least an order of magnitude
better in speed, if it's a bottleneck to you. But pure python
implementation for this is a lot easier than it would be for the
current string.count().

2018-04-26 8:57 GMT+02:00 Wes Turner :
>
>
> On Wednesday, April 25, 2018, Steven D'Aprano  wrote:
>>
>> On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote:
>> > Hi,
>> >
>> > There’s an error with the string method count().
>> >
>> > x = ‘AAA’
>> > y = ‘AA’
>> > print(x.count(y))
>> >
>> > The output is 1, instead of 2.
>>
>> Are you proposing that there ought to be a version of count that looks
>> for *overlapping* substrings?
>>
>> When will this be useful?
>
>
> "Finding a motif in DNA"
> http://rosalind.info/problems/subs/
>
> This is possible with re.find, re.finditer, re.findall, regex.findall(,
> overlapped=True), sliding window
> https://stackoverflow.com/questions/2970520/string-count-with-overlapping-occurrences
>
> n-grams can be by indices or by value.
> count = len(indices)
> https://en.wikipedia.org/wiki/N-gram#Examples
>
> https://en.wikipedia.org/wiki/String_(computer_science)#String_processing_algorithms
>
> https://en.wikipedia.org/wiki/Sequential_pattern_mining
>
>>
>>
>> --
>> Steve
>> ___
>> Python-ideas mailing list
>> Python-ideas@python.org
>> https://mail.python.org/mailman/listinfo/python-ideas
>> Code of Conduct: http://python.org/psf/codeofconduct/
>
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] string method count()

2018-04-26 Thread Wes Turner
On Wednesday, April 25, 2018, Steven D'Aprano  wrote:

> On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote:
> > Hi,
> >
> > There’s an error with the string method count().
> >
> > x = ‘AAA’
> > y = ‘AA’
> > print(x.count(y))
> >
> > The output is 1, instead of 2.
>
> Are you proposing that there ought to be a version of count that looks
> for *overlapping* substrings?
>
> When will this be useful?


"Finding a motif in DNA"
http://rosalind.info/problems/subs/

This is possible with re.find, re.finditer, re.findall, regex.findall(,
overlapped=True), sliding window
https://stackoverflow.com/questions/2970520/string-count-with-overlapping-occurrences

n-grams can be by indices or by value.
count = len(indices)
https://en.wikipedia.org/wiki/N-gram#Examples

https://en.wikipedia.org/wiki/String_(computer_science)#String_processing_algorithms

https://en.wikipedia.org/wiki/Sequential_pattern_mining


>
> --
> Steve
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] string method count()

2018-04-25 Thread Steven D'Aprano
On Wed, Apr 25, 2018 at 11:22:24AM -0700, Julia Kim wrote:
> Hi,
> 
> There’s an error with the string method count().
> 
> x = ‘AAA’
> y = ‘AA’
> print(x.count(y))
> 
> The output is 1, instead of 2.

Are you proposing that there ought to be a version of count that looks 
for *overlapping* substrings?

When will this be useful?


-- 
Steve
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] string method count()

2018-04-25 Thread Alexandre Brault
str.count counts non-overlapping instances of the substring. After
counting the first 'AA', there is only one A left, so that isn't a
second instance of 'AA'


On 2018-04-25 02:22 PM, Julia Kim wrote:
> Hi,
>
> There’s an error with the string method count().
>
> x = ‘AAA’
> y = ‘AA’
> print(x.count(y))
>
> The output is 1, instead of 2.
>
>
> I write programs on SoloLearn mobile app.
>
>
>
> Warm regards,
> Julia Kim
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


Re: [Python-ideas] string method count()

2018-04-25 Thread João Santos
Hi,

>From https://docs.python.org/3/library/stdtypes.html#str.count:
str.count(*sub*[, *start*[, *end*]])

Return the number of *non-overlapping* occurrences of substring *sub* in
the range [*start*, *end*]. Optional arguments *start* and *end* are
interpreted as in slice notation.
Best regards,
João Santos

On Wed, 25 Apr 2018 at 20:22 Julia Kim  wrote:

> Hi,
>
> There’s an error with the string method count().
>
> x = ‘AAA’
> y = ‘AA’
> print(x.count(y))
>
> The output is 1, instead of 2.
>
>
> I write programs on SoloLearn mobile app.
>
>
>
> Warm regards,
> Julia Kim
>
> ___
> Python-ideas mailing list
> Python-ideas@python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] string method count()

2018-04-25 Thread Julia Kim
Hi,

There’s an error with the string method count().

x = ‘AAA’
y = ‘AA’
print(x.count(y))

The output is 1, instead of 2.


I write programs on SoloLearn mobile app.



Warm regards,
Julia Kim

___
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/