Re: #elements of seq A in seq B
On Aug 19, 4:19 pm, Neal Becker ndbeck...@gmail.com wrote: What would be a time efficient way to count the number of occurrences of elements of sequence A in sequence B? (in this particular case, these sequences are strings, if that matters). Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)] on win32 Type copyright, credits or license() for more information. import collections A = 'abc' B = 'abracadabra' collections.Counter(filter(A.__contains__, B)) Counter({'a': 5, 'b': 2, 'c': 1}) Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: #elements of seq A in seq B
On Aug 19, 4:19 pm, Neal Becker ndbeck...@gmail.com wrote: What would be a time efficient way to count the number of occurrences of elements of sequence A in sequence B? (in this particular case, these sequences are strings, if that matters). Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)] on win32 Type copyright, credits or license() for more information. import collections A = 'abc' B = 'abracadabra' collections.Counter(filter(set(A).__contains__, B)) Counter({'a': 5, 'b': 2, 'c': 1}) Raymond -- http://mail.python.org/mailman/listinfo/python-list
Re: #elements of seq A in seq B
Jan Kaliszewski wrote: 20-08-2009 o 13:01:29 Neal Becker ndbeck...@gmail.com wrote: I meant #occurrences of characters from the set A in string B But: 1) separately for each element of A? (see Simon's sollution with defaultdict) 2) or total number of all occurrences of elements of A? (see below) 20-08-2009 o 14:05:12 Peter Otten __pete...@web.de wrote: identity = .join(map(chr, range(256))) n = len(b) - len(b.translate(identity, a)) Nice, though I'd prefer Simon's sollution: a = set(a) n = sum(item in a for item in b) Just to give you an idea why I posted the former: $ python -m timeit -sa = set('abc'); b = 'abcdefg'*10**5 'sum(item in a for item in b)' 10 loops, best of 3: 195 msec per loop $ python -m timeit -sa = 'abc'; b = 'abcdefg'*10**5; id=''.join(map(chr, range(256))) 'len(b) - len(b.translate(id, a))' 100 loops, best of 3: 4.97 msec per loop Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: #elements of seq A in seq B
Jan Kaliszewski wrote: 20-08-2009 o 01:19:24 Neal Becker ndbeck...@gmail.com wrote: What would be a time efficient way to count the number of occurrences of elements of sequence A in sequence B? (in this particular case, these sequences are strings, if that matters). If you mean: to count occurences of each element of A (separately) in B... Hm, maybe something like this: # result as a dict {element of A: how many occurences in B, ...} dict((element, B.count(element)) for element in A) If you mean: to count non overlaping occurences of string A in B -- simply: B.count(A) Regards, *j I meant #occurrences of characters from the set A in string B -- http://mail.python.org/mailman/listinfo/python-list
Re: #elements of seq A in seq B
Neal Becker wrote: I meant #occurrences of characters from the set A in string B If a contains few characters: n = sum(b.count(c) for c in a) If a contains many characters: identity = .join(map(chr, range(256))) n = len(b) - len(b.translate(identity, a)) Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: #elements of seq A in seq B
a = set(a) n = sum(item in a for item in b) Why set? Does it matter if I say that items in A are already unique? Sets are hash-based, so it's (most probably) far more efficient for sets than for sequences (especially if we say about big/long ones). Regards, *j -- Jan Kaliszewski (zuo) z...@chopin.edu.pl -- http://mail.python.org/mailman/listinfo/python-list
#elements of seq A in seq B
What would be a time efficient way to count the number of occurrences of elements of sequence A in sequence B? (in this particular case, these sequences are strings, if that matters). -- http://mail.python.org/mailman/listinfo/python-list
Re: #elements of seq A in seq B
20-08-2009 o 01:19:24 Neal Becker ndbeck...@gmail.com wrote: What would be a time efficient way to count the number of occurrences of elements of sequence A in sequence B? (in this particular case, these sequences are strings, if that matters). If you mean: to count occurences of each element of A (separately) in B... Hm, maybe something like this: # result as a dict {element of A: how many occurences in B, ...} dict((element, B.count(element)) for element in A) If you mean: to count non overlaping occurences of string A in B -- simply: B.count(A) Regards, *j -- Jan Kaliszewski (zuo) z...@chopin.edu.pl -- http://mail.python.org/mailman/listinfo/python-list
Re: #elements of seq A in seq B
On Aug 19, 8:17 pm, Jan Kaliszewski z...@chopin.edu.pl wrote: 20-08-2009 o 01:19:24 Neal Becker ndbeck...@gmail.com wrote: What would be a time efficient way to count the number of occurrences of elements of sequence A in sequence B? (in this particular case, these sequences are strings, if that matters). If you mean: to count occurences of each element of A (separately) in B... Hm, maybe something like this: # result as a dict {element of A: how many occurences in B, ...} dict((element, B.count(element)) for element in A) If you mean: to count non overlaping occurences of string A in B -- simply: B.count(A) Regards, *j -- Jan Kaliszewski (zuo) z...@chopin.edu.pl You don't want to use count() in a case like this because it iterates through B len(A) times, i.e. this algorithm is O(A x B). If all you want is the total you can do it like this: def f(a, b): a = set(a) return sum(item in a for item in b) A = 'some string' B = 'another string' print f(A, B) # prints 12 If you want to know the count for each element you can use this: from collections import defaultdict def g(a, b): a = set(a) d = defaultdict(int) for item in b: if item in a: d[item] += 1 return d print g(A, B) # prints defaultdict(type 'int', {' ': 1, 'e': 1, 'g': 1, 'i': 1, 'o': 1, 'n': 2, 's': 1, 'r': 2, 't': 2}) HTH, ~Simon -- http://mail.python.org/mailman/listinfo/python-list
Re: #elements of seq A in seq B
On Wed, 19 Aug 2009, Neal Becker wrote: What would be a time efficient way to count the number of occurrences of elements of sequence A in sequence B? (in this particular case, these sequences are strings, if that matters). If A and B are rather lengthy, then maybe build a tree from elements of A and use it to count elements of B. I.e., tree nodes would be counters. Building trees in Python, however, I guess it is a little bit tricky. Of course it is possible. But maybe you would rather have this tree maintained in C. Regards Tomasz Rola -- ** A C programmer asked whether computer had Buddha's nature. ** ** As the answer, master did rm -rif on the programmer's home** ** directory. And then the C programmer became enlightened... ** ** ** ** Tomasz Rola mailto:tomasz_r...@bigfoot.com ** -- http://mail.python.org/mailman/listinfo/python-list
Re: #elements of seq A in seq B
On Aug 20, 12:12 pm, Simon Forman sajmik...@gmail.com wrote: On Aug 19, 8:17 pm, Jan Kaliszewski z...@chopin.edu.pl wrote: If you mean: to count non overlaping occurences of string A in B -- simply: B.count(A) You don't want to use count() in a case like this because it iterates through B len(A) times, i.e. this algorithm is O(A x B). There's seems to be multiple confusion here. Jan was talking about counting non-overlapping occurrences of string A in [string] B. In that case B.count(A) is nowhere near O(A*B)... in fact in reasonable cases it is sub-linear i.e. O(B/A) See http://svn.python.org/view/python/trunk/Objects/stringlib/fastsearch.h?revision=68811view=markup And even a naive implementation of str.count() would not iterate through B len(A) times. and list.count() can't be used on its own to solve Neal's problem; its arg is a single element so it would have to be wrapped in a loop: sum (B.count(a) for a in A) which would of course be O(A*B) -- http://mail.python.org/mailman/listinfo/python-list
Re: #elements of seq A in seq B
On Aug 19, 11:34 pm, John Machin sjmac...@lexicon.net wrote: On Aug 20, 12:12 pm, Simon Forman sajmik...@gmail.com wrote: On Aug 19, 8:17 pm, Jan Kaliszewski z...@chopin.edu.pl wrote: If you mean: to count non overlaping occurences of string A in B -- simply: B.count(A) You don't want to use count() in a case like this because it iterates through B len(A) times, i.e. this algorithm is O(A x B). There's seems to be multiple confusion here. Jan was talking about counting non-overlapping occurrences of string A in [string] B. In that case B.count(A) is nowhere near O(A*B)... in fact in reasonable cases it is sub-linear i.e. O(B/A) Seehttp://svn.python.org/view/python/trunk/Objects/stringlib/fastsearch And even a naive implementation of str.count() would not iterate through B len(A) times. and list.count() can't be used on its own to solve Neal's problem; its arg is a single element so it would have to be wrapped in a loop: sum (B.count(a) for a in A) which would of course be O(A*B) First, thanks for pointing out str.count(). I didn't know of it and I didn't notice that that was what was being used in Jan's code. (It's blindingly obvious-- now that you've pointed it out. :] ) Second, I was referring to this: dict((element, B.count(element)) for element in A) not B.count(A). Sorry for not interspersing my comment better. Regards, ~Simon -- http://mail.python.org/mailman/listinfo/python-list