Op 11/04/2022 om 02:01 schreef duncan smith:
On 10/04/2022 21:20, Antoon Pardon wrote:
Op 9/04/2022 om 02:01 schreef duncan smith:
On 08/04/2022 22:08, Antoon Pardon wrote:
Well my first thought is that a bitset makes it less obvious to
calulate
the size of the set or to iterate over its elements. But it is an idea
worth exploring.
def popcount(n):
"""
Returns the number of set bits in n
"""
cnt = 0
while n:
n &= n - 1
cnt += 1
return cnt
and not tested,
def iterinds(n):
"""
Returns a generator of the indices of the set bits of n
"""
i = 0
while n:
if n & 1:
yield i
n = n >> 1
i += 1
Sure but these seem rather naive implementation with a time
complexity of
O(n) where n is the maximum number of possible elements. Using these
would
turn my O(n) algorithm in a O(n^2) algorithm.
I thought your main concern was memory. Of course, dependent on
various factors, you might be able to do much better than the above.
But I don't know what your O(n) algorithm is, how using a bitset would
make it O(n^2), or if the O(n^2) algorithm would actually be slower
for typical n. The overall thing sounds broadly like some of the
blocking and clustering methods I've come across in record linkage.
Using bitsets would make change my algorithm from O(n) into O(n^2) because
the bitset operations are O(n) instead of O(1). If I have a 20000 people
a bitset will take a vector of about 300, 64bit words. With 40000 people
the bitset will take a vector of about 600. So doubling the population
will also double the time for the bitset operations, meaning doubling
the population will increase execution time four times.
--
https://mail.python.org/mailman/listinfo/python-list