r.e.s. wrote: > I have a million-line text file with 100 characters per line, > and simply need to determine how many of the lines are distinct. > > On my PC, this little program just goes to never-never land: > > def number_distinct(fn): > f = file(fn) > x = f.readline().strip() > L = [] > while x<>'': > if x not in L: > L = L + [x] > x = f.readline().strip() > return len(L) > > Would anyone care to point out improvements? > Is there a better algorithm for doing this?
Take a look at http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560 It is a python approach to the uniq command on *nix. -- http://mail.python.org/mailman/listinfo/python-list