hi brent, thanks very much for your informative reply -- didn't realize this about the size of the interval.
thanks for the bx-python link. could you (or someone else) explain why the size of the interval makes such a big difference? i don't understand why it affects efficiency so much... thanks. On Jan 13, 12:24 am, brent <bpede...@gmail.com> wrote: > On Jan 12, 8:55 pm, Per Freem <perfr...@yahoo.com> wrote: > > > > > On Jan 12, 10:58 pm, Steven D'Aprano > > > <ste...@remove.this.cybersource.com.au> wrote: > > > On Mon, 12 Jan 2009 14:49:43 -0800, Per Freem wrote: > > > > thanks for your replies -- a few clarifications and questions. the > > > > is_within operation is containment, i.e. (a,b) is within (c,d) iff a > > > >>= c and b <= d. Note that I am not looking for intervals that > > > > overlap... this is why interval trees seem to me to not be relevant, as > > > > the overlapping interval problem is way harder than what I am trying to > > > > do. Please correct me if I'm wrong on this... > > > > To test for contained intervals: > > > a >= c and b <= d > > > > To test for overlapping intervals: > > > > not (b < c or a > d) > > > > Not exactly what I would call "way harder". > > > > -- > > > Steven > > > hi Steven, > > > i found an implementation (which is exactly how i'd write it based on > > the description) > > here:http://hackmap.blogspot.com/2008/11/python-interval-tree.html > > > when i use this however, it comes out either significantly slower or > > equal to a naive search. my naive search just iterates through a > > smallish list of intervals and for each one says whether they overlap > > with each of a large set of intervals. > > > here is the exact code i used to make the comparison, plus the code at > > the link i have above: > > > class Interval(): > > def __init__(self, start, stop): > > self.start = start > > self.stop = stop > > > import random > > import time > > num_ints = 30000 > > init_intervals = [] > > for n in range(0, > > num_ints): > > start = int(round(random.random() > > *1000)) > > end = start + int(round(random.random()*500+1)) > > init_intervals.append(Interval(start, end)) > > num_ranges = 900 > > ranges = [] > > for n in range(0, num_ranges): > > start = int(round(random.random() > > *1000)) > > end = start + int(round(random.random()*500+1)) > > ranges.append((start, end)) > > #print init_intervals > > tree = IntervalTree(init_intervals) > > t1 = time.time() > > for r in ranges: > > tree.find(r[0], r[1]) > > t2 = time.time() > > print "interval tree: %.3f" %((t2-t1)*1000.0) > > t1 = time.time() > > for r in ranges: > > naive_find(init_intervals, r[0], r[1]) > > t2 = time.time() > > print "brute force: %.3f" %((t2-t1)*1000.0) > > > on one run, i get: > > interval tree: 8584.682 > > brute force: 8201.644 > > > is there anything wrong with this implementation? it seems very right > > to me but i am no expert. any help on this would be relly helpful. > > hi, the tree is inefficient when the interval is large. as the size of > the interval shrinks to much less than the expanse of the tree, the > tree will be faster. changing 500 to 50 in both cases in your script, > i get: > interval tree: 3233.404 > brute force: 9807.787 > > so the tree will work for limited cases. but it's quite simple. check > the tree in > bx-python:http://bx-python.trac.bx.psu.edu/browser/trunk/lib/bx/intervals/opera... > for a more robust implementation. > -brentp -- http://mail.python.org/mailman/listinfo/python-list