On 10/04/2022 21:20, Antoon Pardon wrote:
Op 9/04/2022 om 02:01 schreef duncan smith:
On 08/04/2022 22:08, Antoon Pardon wrote:
Well my first thought is that a bitset makes it less obvious to calulate
the size of the set or to iterate over its elements. But it is an idea
worth exploring.
def popcount(n):
"""
Returns the number of set bits in n
"""
cnt = 0
while n:
n &= n - 1
cnt += 1
return cnt
and not tested,
def iterinds(n):
"""
Returns a generator of the indices of the set bits of n
"""
i = 0
while n:
if n & 1:
yield i
n = n >> 1
i += 1
Sure but these seem rather naive implementation with a time complexity of
O(n) where n is the maximum number of possible elements. Using these would
turn my O(n) algorithm in a O(n^2) algorithm.
I thought your main concern was memory. Of course, dependent on various
factors, you might be able to do much better than the above. But I don't
know what your O(n) algorithm is, how using a bitset would make it
O(n^2), or if the O(n^2) algorithm would actually be slower for typical
n. The overall thing sounds broadly like some of the blocking and
clustering methods I've come across in record linkage.
Duncan
--
https://mail.python.org/mailman/listinfo/python-list