On 10/04/2022 21:20, Antoon Pardon wrote:


Op 9/04/2022 om 02:01 schreef duncan smith:
On 08/04/2022 22:08, Antoon Pardon wrote:

Well my first thought is that a bitset makes it less obvious to calulate
the size of the set or to iterate over its elements. But it is an idea
worth exploring.




def popcount(n):
    """
    Returns the number of set bits in n
    """
    cnt = 0
    while n:
        n &= n - 1
        cnt += 1
    return cnt

and not tested,

def iterinds(n):
    """
    Returns a generator of the indices of the set bits of n
    """
    i = 0
    while n:
        if n & 1:
            yield i
        n = n >> 1
        i += 1

Sure but these seem rather naive implementation with a time complexity of
O(n) where n is the maximum number of possible elements. Using these would
turn my O(n) algorithm in a O(n^2) algorithm.


I thought your main concern was memory. Of course, dependent on various factors, you might be able to do much better than the above. But I don't know what your O(n) algorithm is, how using a bitset would make it O(n^2), or if the O(n^2) algorithm would actually be slower for typical n. The overall thing sounds broadly like some of the blocking and clustering methods I've come across in record linkage.

Duncan
--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to