I have a million-line text file with 100 characters per line,
and simply need to determine how many of the lines are distinct.

On my PC, this little program just goes to never-never land:

def number_distinct(fn):
    f = file(fn)
    x = f.readline().strip()
    L = []
    while x<>'':
        if x not in L:
            L = L + [x]
        x = f.readline().strip()
    return len(L) 

Would anyone care to point out improvements? 
Is there a better algorithm for doing this?
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to