Prateek wrote: >> For the above example, it's worth sorting lists_of_sets by the >>length of the sets, and doing the short ones first. > > > Thanks. I thought so - I'm doing just that using a simple Decorate- > Sort-Undecorate idiom. > > >> How big are the sets? If they're small, but you have a lot of >>them, you may be better off with a bit-set representation, then >>using AND operations for intersection. If they're huge (tens of millions >>of entries), you might be better off doing sorts and merges on the >>sets. > > > I have either 2 or 3 sets (never more) which can be arbitrarily large. > Most of them are small (between 0 and few elements - say less that 5). > A few will be megamonstrous ( > 100,000 items) > > >> When you ask questions like this, it's helpful to give some >>background. We don't know whether this is a homework assignment, or >>some massive application that's slow and you need to fix it, even >>if it requires heavy implementation effort. >> > > Its definitely not a homework assignment - its part of a commercial > database query engine. Heavy implementation effort is no problem. > > Prateek
If you're intersecting a set of 5 vs a set of 100,000, the intersection time won't be the problem. That's just five lookups. It's building a set of 100,000 items that may be time consuming. Does the big set stay around for a while, or do you have to pay that cost on each query? Those really aren't big data sets on modern machines. John Nagle -- http://mail.python.org/mailman/listinfo/python-list