Re: #elements of seq A in seq B

2009-08-21 Thread Raymond Hettinger
On Aug 19, 4:19 pm, Neal Becker ndbeck...@gmail.com wrote:
 What would be a time efficient way to count the number of occurrences of
 elements of sequence A in sequence B?  (in this particular case, these
 sequences are strings, if that matters).

Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit
(Intel)] on win32
Type copyright, credits or license() for more information.
 import collections
 A = 'abc'
 B = 'abracadabra'
 collections.Counter(filter(A.__contains__, B))
Counter({'a': 5, 'b': 2, 'c': 1})


Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: #elements of seq A in seq B

2009-08-21 Thread Raymond Hettinger
On Aug 19, 4:19 pm, Neal Becker ndbeck...@gmail.com wrote:
 What would be a time efficient way to count the number of occurrences of
 elements of sequence A in sequence B?  (in this particular case, these
 sequences are strings, if that matters).

Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit
(Intel)] on win32
Type copyright, credits or license() for more information.
 import collections
 A = 'abc'
 B = 'abracadabra'
 collections.Counter(filter(set(A).__contains__, B))
Counter({'a': 5, 'b': 2, 'c': 1})

Raymond
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: #elements of seq A in seq B

2009-08-21 Thread Peter Otten
Jan Kaliszewski wrote:

 20-08-2009 o 13:01:29 Neal Becker ndbeck...@gmail.com wrote:
 
 I meant #occurrences of characters from the set A in string B
 
 But:
 
 1) separately for each element of A? (see Simon's sollution with
 defaultdict)
 
 2) or total number of all occurrences of elements of A? (see below)
 
 
 20-08-2009 o 14:05:12 Peter Otten __pete...@web.de wrote:
 
 identity = .join(map(chr, range(256)))
 n = len(b) - len(b.translate(identity, a))
 
 Nice, though I'd prefer Simon's sollution:
 
  a = set(a)
  n = sum(item in a for item in b)

Just to give you an idea why I posted the former:

$ python -m timeit -sa = set('abc'); b = 'abcdefg'*10**5 'sum(item in a 
for item in b)'
10 loops, best of 3: 195 msec per loop

$ python -m timeit -sa = 'abc'; b = 'abcdefg'*10**5; id=''.join(map(chr, 
range(256))) 'len(b) - len(b.translate(id, a))'
100 loops, best of 3: 4.97 msec per loop

Peter

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: #elements of seq A in seq B

2009-08-20 Thread Neal Becker
Jan Kaliszewski wrote:

 20-08-2009 o 01:19:24 Neal Becker ndbeck...@gmail.com wrote:
 
 What would be a time efficient way to count the number of occurrences of
 elements of sequence A in sequence B?  (in this particular case, these
 sequences are strings, if that matters).
 
 If you mean: to count occurences of each element of A (separately) in B...
 
 Hm, maybe something like this:
 
# result as a dict {element of A: how many occurences in B, ...}
dict((element, B.count(element)) for element in A)
 
 
 If you mean: to count non overlaping occurences of string A in B
 -- simply:
 
B.count(A)
 
 
 Regards,
 *j
 

I meant #occurrences of characters from the set A in string B

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: #elements of seq A in seq B

2009-08-20 Thread Peter Otten
Neal Becker wrote:

 I meant #occurrences of characters from the set A in string B

If a contains few characters:

n = sum(b.count(c) for c in a)

If a contains many characters:

identity = .join(map(chr, range(256)))
n = len(b) - len(b.translate(identity, a))

Peter


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: #elements of seq A in seq B

2009-08-20 Thread Jan Kaliszewski

 a = set(a)
 n = sum(item in a for item in b)



Why set?  Does it matter if I say that items in A are already unique?


Sets are hash-based, so it's (most probably) far more efficient for
sets than for sequences (especially if we say about big/long ones).

Regards,
*j

--
Jan Kaliszewski (zuo) z...@chopin.edu.pl
--
http://mail.python.org/mailman/listinfo/python-list


#elements of seq A in seq B

2009-08-19 Thread Neal Becker
What would be a time efficient way to count the number of occurrences of 
elements of sequence A in sequence B?  (in this particular case, these 
sequences are strings, if that matters).

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: #elements of seq A in seq B

2009-08-19 Thread Jan Kaliszewski

20-08-2009 o 01:19:24 Neal Becker ndbeck...@gmail.com wrote:


What would be a time efficient way to count the number of occurrences of
elements of sequence A in sequence B?  (in this particular case, these
sequences are strings, if that matters).


If you mean: to count occurences of each element of A (separately) in B...

Hm, maybe something like this:

  # result as a dict {element of A: how many occurences in B, ...}
  dict((element, B.count(element)) for element in A)


If you mean: to count non overlaping occurences of string A in B
-- simply:

  B.count(A)


Regards,
*j

--
Jan Kaliszewski (zuo) z...@chopin.edu.pl
--
http://mail.python.org/mailman/listinfo/python-list


Re: #elements of seq A in seq B

2009-08-19 Thread Simon Forman
On Aug 19, 8:17 pm, Jan Kaliszewski z...@chopin.edu.pl wrote:
 20-08-2009 o 01:19:24 Neal Becker ndbeck...@gmail.com wrote:

  What would be a time efficient way to count the number of occurrences of
  elements of sequence A in sequence B?  (in this particular case, these
  sequences are strings, if that matters).

 If you mean: to count occurences of each element of A (separately) in B...

 Hm, maybe something like this:

    # result as a dict {element of A: how many occurences in B, ...}
    dict((element, B.count(element)) for element in A)

 If you mean: to count non overlaping occurences of string A in B
 -- simply:

    B.count(A)

 Regards,
 *j

 --
 Jan Kaliszewski (zuo) z...@chopin.edu.pl

You don't want to use count() in a case like this because it iterates
through B len(A) times, i.e. this algorithm is O(A x B).

If all you want is the total you can do it like this:

def f(a, b):
a = set(a)
return sum(item in a for item in b)


A = 'some string'
B = 'another string'

print f(A, B) # prints 12


If you want to know the count for each element you can use this:

from collections import defaultdict

def g(a, b):
a = set(a)
d = defaultdict(int)
for item in b:
if item in a:
d[item] += 1
return d

print g(A, B)

# prints defaultdict(type 'int', {' ': 1, 'e': 1, 'g': 1, 'i': 1,
'o': 1, 'n': 2, 's': 1, 'r': 2, 't': 2})

HTH,
~Simon
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: #elements of seq A in seq B

2009-08-19 Thread Tomasz Rola
On Wed, 19 Aug 2009, Neal Becker wrote:

 What would be a time efficient way to count the number of occurrences of 
 elements of sequence A in sequence B?  (in this particular case, these 
 sequences are strings, if that matters).

If A and B are rather lengthy, then maybe build a tree from elements of A 
and use it to count elements of B. I.e., tree nodes would be counters. 
Building trees in Python, however, I guess it is a little bit tricky. Of 
course it is possible. But maybe you would rather have this tree 
maintained in C.

Regards
Tomasz Rola

--
** A C programmer asked whether computer had Buddha's nature.  **
** As the answer, master did rm -rif on the programmer's home**
** directory. And then the C programmer became enlightened...  **
** **
** Tomasz Rola  mailto:tomasz_r...@bigfoot.com **
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: #elements of seq A in seq B

2009-08-19 Thread John Machin
On Aug 20, 12:12 pm, Simon Forman sajmik...@gmail.com wrote:
 On Aug 19, 8:17 pm, Jan Kaliszewski z...@chopin.edu.pl wrote:
  If you mean: to count non overlaping occurences of string A in B
  -- simply:

     B.count(A)

 You don't want to use count() in a case like this because it iterates
 through B len(A) times, i.e. this algorithm is O(A x B).

There's seems to be multiple confusion here. Jan was talking about
counting non-overlapping occurrences of string A in [string] B. In
that case B.count(A) is nowhere near O(A*B)... in fact in reasonable
cases it is sub-linear i.e. O(B/A)

See 
http://svn.python.org/view/python/trunk/Objects/stringlib/fastsearch.h?revision=68811view=markup

And even a naive implementation of str.count() would not iterate
through B len(A) times.

and list.count() can't be used on its own to solve Neal's problem; its
arg is a single element so it would have to be wrapped in a loop: sum
(B.count(a) for a in A) which would of course be O(A*B)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: #elements of seq A in seq B

2009-08-19 Thread Simon Forman
On Aug 19, 11:34 pm, John Machin sjmac...@lexicon.net wrote:
 On Aug 20, 12:12 pm, Simon Forman sajmik...@gmail.com wrote:

  On Aug 19, 8:17 pm, Jan Kaliszewski z...@chopin.edu.pl wrote:
   If you mean: to count non overlaping occurences of string A in B
   -- simply:

      B.count(A)

  You don't want to use count() in a case like this because it iterates
  through B len(A) times, i.e. this algorithm is O(A x B).

 There's seems to be multiple confusion here. Jan was talking about
 counting non-overlapping occurrences of string A in [string] B. In
 that case B.count(A) is nowhere near O(A*B)... in fact in reasonable
 cases it is sub-linear i.e. O(B/A)

 Seehttp://svn.python.org/view/python/trunk/Objects/stringlib/fastsearch

 And even a naive implementation of str.count() would not iterate
 through B len(A) times.

 and list.count() can't be used on its own to solve Neal's problem; its
 arg is a single element so it would have to be wrapped in a loop: sum
 (B.count(a) for a in A) which would of course be O(A*B)

First, thanks for pointing out str.count().  I didn't know of it and I
didn't notice that that was what was being used in Jan's code. (It's
blindingly obvious-- now that you've pointed it out. :] )


Second, I was referring to this:

   dict((element, B.count(element)) for element in A)

not B.count(A). Sorry for not interspersing my comment better.

Regards,
~Simon
-- 
http://mail.python.org/mailman/listinfo/python-list