Re: Comparing sequences with range objects

Antoon Pardon Mon, 11 Apr 2022 23:52:59 -0700


Op 11/04/2022 om 02:01 schreef duncan smith:

On 10/04/2022 21:20, Antoon Pardon wrote:
Op 9/04/2022 om 02:01 schreef duncan smith:
On 08/04/2022 22:08, Antoon Pardon wrote:
Well my first thought is that a bitset makes it less obvious tocalulate
the size of the set or to iterate over its elements. But it is an idea
worth exploring.
def popcount(n):
    """
    Returns the number of set bits in n
    """
    cnt = 0
    while n:
        n &= n - 1
        cnt += 1
    return cnt

and not tested,

def iterinds(n):
    """
    Returns a generator of the indices of the set bits of n
    """
    i = 0
    while n:
        if n & 1:
            yield i
        n = n >> 1
        i += 1
Sure but these seem rather naive implementation with a timecomplexity ofO(n) where n is the maximum number of possible elements. Using thesewould
turn my O(n) algorithm in a O(n^2) algorithm.
I thought your main concern was memory. Of course, dependent onvarious factors, you might be able to do much better than the above.But I don't know what your O(n) algorithm is, how using a bitset wouldmake it O(n^2), or if the O(n^2) algorithm would actually be slowerfor typical n. The overall thing sounds broadly like some of theblocking and clustering methods I've come across in record linkage.


Using bitsets would make change my algorithm from O(n) into O(n^2) because
the bitset operations are O(n) instead of O(1). If I have a 20000 people
a bitset will take a vector of about 300, 64bit words. With 40000 people
the bitset will take a vector of about 600. So doubling the population
will also double the time for the bitset operations, meaning doubling
the population will increase execution time four times.

--
https://mail.python.org/mailman/listinfo/python-list

Re: Comparing sequences with range objects

Reply via email to