[Martin MOKREJÅ] > ... > > I gave up the theoretical approach. Practically, I might need up > to store maybe those 1E15 keys.
We should work on our multiplication skills here <wink>. You don't have enough disk space to store 1E15 keys. If your keys were just one byte each, you would need to have 4 thousand disks of 250GB each to store 1E15 keys. How much disk space do you actually have? I'm betting you have no more than one 250GB disk. ... [Istvan Albert] >> On my system storing 1 million words of length 15 >> as keys of a python dictionary is around 75MB. > Fine, that's what I wanted to hear. How do you improve the algorithm? > Do you delay indexing to the very latest moment or do you let your > computer index 999 999 times just for fun? It remains wholly unclear to me what "the algorithm" you want might be. As I mentioned before, if you store keys in sorted text files, you can do intersection and difference very efficiently just by using the Unix `comm` utiltity. -- http://mail.python.org/mailman/listinfo/python-list