On Mar 31, 2:56 am, Arnaud Delobelle <arno...@googlemail.com> wrote: > Arnaud Delobelle wrote: > > prueba...@latinmail.com writes: > > [...] > > > I myself asked about how to write a library to efficiently do union > > > and intersection of sets containing time intervals some time ago on > > > this list and got little to no answers. It is a tricky problem. Since > > > I was getting paid I got an O(n*n) solution working. People on this > > > list on the other hand do not get paid and answer whatever strikes > > > their fancy. Sometimes the question is hard or confusing and nobody is > > > motivated enough to answer. > > > I wasn't around when you posted this I guess. Do you mean intervals sets > > on the (real) number line such as: > > > 1-3, 6-10 meaning all numbers between 1 and 3 and all numbers > > between 6 and 10. > > > In this case I think you can achieve union and intersection in O(nlogn) > > where n is the total number of intervals in the interval sets to unify > > or intersect. There is an implementation below. I have chosen a very > > simple data structure for interval sets: an interval set is the list of > > its endpoints. E.g. > > > 1-3, 6-10 is the list [1, 3, 6, 10] > > > This means that I can't specify whether an interval is closed or open. > > So in the implementation below all intervals are assumed to be open. > > The method could be made to work for any kind of intervals with the same > > complexity, there would just be a few more LOC. I'm focusing on the > > principle - here it is: > > > -------------------------------------------------- > > # Implementation of union and intersection of interval sets. > > > from itertools import * > > > def combine(threshold, intsets): > > endpoints = sorted(chain(*imap(izip, intsets, repeat(cycle([1,-1]))))) > > height = 0 > > compound = [] > > for x, step in endpoints: > > old_height = height > > height += step > > if max(height, old_height) == threshold: > > compound.append(x) > > return compound > > > def union(*intsets): > > return combine(1, intsets) > > > def intersection(*intsets): > > return combine(len(intsets), intsets) > > > # tests > > > def pretty(a): > > a = iter(a) > > return ', '.join("%s-%s" % (a, b) for a, b in izip(a, a)) > > > tests = [ > > ([1, 5, 10, 15], [3, 11, 13, 20]), > > ([2, 4, 6, 8], [4, 7, 10, 11]), > > ([0, 11], [5, 10, 15, 25], [7, 12, 13, 15]), > > ] > > > for intsets in tests: > > print "sets: ", "; ".join(imap(pretty, intsets)) > > print "union: ", pretty(union(*intsets)) > > print "intersection: ", pretty(intersection(*intsets)) > > print "-"*20 > > -------------------------------------------------- > > > Is this what you were looking for? > > > -- > > Arnaud > > I realised after posting last night that I must be > > (1) solving the wrong problem > (2) solving it badly > > - My implementation of the combine() function above is O(nlogn) > (because of the sorted() call) whereas it could be O(n) by iterating > over the interval in the parallel manner, hence (2). This would make > union() and intersection() O(n). > > - As the problem was solved by the OP in O(n^2) I must be solving the > wrong problem (1). > > I apologise for this. > > However it was a nice and compact implementation IMHO :) > > -- > Arnaud
I am pretty sure the problem can be solved in O(n log n). I just wasn't feeling overly smart when I was writing the algorithm. N is on average 4 and it had eventually to be implemented inside a framework using C++ anyway, so it is pretty fast. I can’t believe that no programmer has come over the same kind of problem before, yet my Google fu didn’t do anything for me. Well since I attracted a couple people's attention I will describe the problem in more detail. Describing the problem properly is probably as hard as solving it, so excuse me if I struggle a bit. The problem is for a health insurance company and involves the period of time a person is covered. Most insurance companies allow not only for the main member to be insured but his family: the spouse and the dependents (children). This additional coverage costs extra but less than a full new insurance. So for example if Alice buys an insurance worth at 100 dollars a month, she can insure her husband Bob for an additional 50 dollars. Under certain circumstances Alice may go off the insurance and only Bob stays. In that case the price goes back to 100 dollars or maybe there is a deal for 80 or something like that. In other words the cost of the insurance is dependent on the combination of family members that participate in it. Additionally not only do we have different family compositions but also different insurance products. So you can get medical, dental and vision insurance. All that data is stored in a database that is not very tidy and looks something like this: First Day of Coverage, Last Day of Coverage, Relationship, Product 5/3/2005, 5/3/2005, D, M 9/10/2005, 10/10/2005, S, V 3/15/2005, 7/15/2005, M, M 3/1/2005, 6/1/2005, S, M 5/15/2005, 7/20/2005, D, D 9/10/2005, 1/1/2140, M, V 2/1/2005, 5/3/2005, M, M Where Relationship: M=Main Member, S=Spouse, D=Dependent Product: M=Medical, D=Dental, V=Vision As far as I know at the present time there are no deals based on combination of products purchased so we will handle each product independently. What I want is a simple algorithm that allows me to calculate something like this out of it (hopefully I didn’t make a mistake): Medical: 2/1/2005, 2/28/2005, M 3/1/2005, 5/2/2005, M&S 5/3/2005, 5/3/2005, M&S&D 5/4/2005, 6/1/2005, M&S 6/2/2005, 7/15/2005, M Dental: 5/15/2005, 7/20/2005, D Vision: 9/10/2005, 10/10/2005, M&S 10/11/2005, 1/1/2140, M -- http://mail.python.org/mailman/listinfo/python-list