Op 11/04/2022 om 02:01 schreef duncan smith:
On 10/04/2022 21:20, Antoon Pardon wrote:


Op 9/04/2022 om 02:01 schreef duncan smith:
On 08/04/2022 22:08, Antoon Pardon wrote:

Well my first thought is that a bitset makes it less obvious to calulate
the size of the set or to iterate over its elements. But it is an idea
worth exploring.




def popcount(n):
    """
    Returns the number of set bits in n
    """
    cnt = 0
    while n:
        n &= n - 1
        cnt += 1
    return cnt

and not tested,

def iterinds(n):
    """
    Returns a generator of the indices of the set bits of n
    """
    i = 0
    while n:
        if n & 1:
            yield i
        n = n >> 1
        i += 1

Sure but these seem rather naive implementation with a time complexity of O(n) where n is the maximum number of possible elements. Using these would
turn my O(n) algorithm in a O(n^2) algorithm.


I thought your main concern was memory. Of course, dependent on various factors, you might be able to do much better than the above. But I don't know what your O(n) algorithm is, how using a bitset would make it O(n^2), or if the O(n^2) algorithm would actually be slower for typical n. The overall thing sounds broadly like some of the blocking and clustering methods I've come across in record linkage.

Using bitsets would make change my algorithm from O(n) into O(n^2) because
the bitset operations are O(n) instead of O(1). If I have a 20000 people
a bitset will take a vector of about 300, 64bit words. With 40000 people
the bitset will take a vector of about 600. So doubling the population
will also double the time for the bitset operations, meaning doubling
the population will increase execution time four times.

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to