Re: Overlap in python
On Aug 4, 3:21 pm, Jay Bird wrote: > Hi everyone, > > I wanted to thank you all for your help and *excellent* discussion. I > was able to utilize and embed the script by Grigor Lingl in the 6th > post of this discussion to get my program to work very quickly (I had > to do about 20 comparisons per data bin, with over 40K bins in > total). I am involved in genomic analysis research and this problem > comes up a lot and I was surprised to not have been able to find a > clear way to solve it. I will also look through all the tips in this > thread, I have a feeling they may come in handy for future use! > > Thank you again, > Jay Hi Jay, I know this is a bit off-topic, but how does this pertain to genomic analysis? Are you counting the lengths of microsatellite repeats or something? -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
On 8/5/2009 10:56 AM, nn wrote: On Aug 5, 7:13 am, Marcus Wanner wrote: On 8/4/2009 6:09 PM, MRAB wrote: >>> parts = [(5, 9, "a"), (7, 10, "b"), (3, 6, "c"), (15, 20, "d"), (18, 23, "e")] >>> parts.sort() >>> parts [(3, 6, 'c'), (5, 9, 'a'), (7, 10, 'b'), (15, 20, 'd'), (18, 23, 'e')] >>> # Merge overlapping intervals. >>> pos = 1 >>> while pos < len(parts): # Merge the pair in parts[pos - 1 : pos + 1] if they overlap. p, q = parts[pos - 1 : pos + 1] if p[1] >= q[0]: parts[pos - 1 : pos + 1] = [(p[0], max(p[1], q[1]), p[2] + "." + q[2])] else: # They don't overlap, so try the next pair. pos += 1 >>> parts [(3, 10, 'c.a.b'), (15, 23, 'd.e')] That's the best solution I've seen so far. It even has input/output formatted as close as is reasonably possible to the format specified. As we would say in googlecode, +1. Marcus How does it compare to this one? http://groups.google.com/group/comp.lang.python/browse_frm/thread/1a1d2ed9d05d11d0/56684b795fc527cc#56684b795fc527cc That is a different problem, and the solution is more complex. I am not going to try to judge which is better. Marcus -- print ''.join([chr(((ord(z)+(ord("I'M/THE"[3])+sum( [ord(x)for x in 'CRYPTOR'])))%(4*ord('8')+ord( ' ' for z in ''.join(([(('\xca\x10\x03\t'+ '\x01\xff\xe6\xbe\x0c\r\x06\x12\x17\xee\xbe'+ '\x10\x03\x06\x12\r\x0c\xdf\xbe\x12\x11\x13'+ '\xe8')[13*2-y]) for y in range(int(6.5*4)+1)] ))]) -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
Jay Bird wrote: Hi everyone, I wanted to thank you all for your help and *excellent* discussion. I was able to utilize and embed the script by Grigor Lingl in the 6th post of this discussion to get my program to work very quickly (I had to do about 20 comparisons per data bin, with over 40K bins in total). I am involved in genomic analysis research and this problem comes up a lot and I was surprised to not have been able to find a clear way to solve it. I will also look through all the tips in this thread, I have a feeling they may come in handy for future use! Thank you again, Jay I don't know if this is relevant, but http://planet.python.org/ has an entry dated this morning which points here http://www.logarithmic.net/pfh/blog/01249470842. HTH. -- Kindest regards. Mark Lawrence. -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
Jay Bird wrote: Hi everyone, I've been trying to figure out a simple algorithm on how to combine a list of parts that have 1D locations that overlap into a non- overlapping list. For example, here would be my input: part name location a 5-9 b 7-10 c 3-6 d 15-20 e 18-23 And here is what I need for an output: part name location c.a.b3-10 d.e 15-23 I've tried various methods, which all fail. Does anyone have an idea how to do this? Thank you very much! Jay I once had to do this for finding nested block structure. The key for me was a sort order: start, -final. Having not seen it here (though I looked a bit), here's one: class Entry(object): '''An entry is a name and range''' def __init__(self, line): self.name, startstop = line.split() start, stop = startstop.split('-') self.start, self.stop = int(start), int(stop) def combined_ranges(lines): '''Create Entries in "magic order", and produce ranges. The "magic order" makes least element with longest range first, so overlaps show up in head order, with final tail first among equals. ''' # Fill in our table (ignoring blank lines), then sort by magic order elements = [Entry(line) for line in lines if line.strip()] elements.sort(key=lambda e: (e.start, -e.stop)) # Now produce resolved ranges. Grab the start gen = iter(elements) first = gen.next() # For the remainder, combine or produce for v in gen: if v.start <= first.stop: # on overlap, merge in new element (may update stop) first.name += '.' + v.name if first.stop < v.stop: first.stop = v.stop else: yield first first = v # And now produce the last element we covered yield first # Demo: sample = '''part name location a 5-9 b 7-10 c 3-6 d 15-20 e 18-23 ''' source = iter(sample.split('\n')) # source of lines, opened file? ignored = source.next() # discard heading for interval in combined_range(source): print '%s %s-%s' % (interval.name, interval.start, interval.stop) Prints: c.a.b 3-10 d.e 15-23 --Scott David Daniels scott.dani...@acm.org -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
On Aug 5, 7:13 am, Marcus Wanner wrote: > On 8/4/2009 6:09 PM, MRAB wrote: > > > >>> parts = [(5, 9, "a"), (7, 10, "b"), (3, 6, "c"), (15, 20, "d"), > > (18, 23, "e")] > > >>> parts.sort() > > >>> parts > > [(3, 6, 'c'), (5, 9, 'a'), (7, 10, 'b'), (15, 20, 'd'), (18, 23, 'e')] > > >>> # Merge overlapping intervals. > > >>> pos = 1 > > >>> while pos < len(parts): > > # Merge the pair in parts[pos - 1 : pos + 1] if they overlap. > > p, q = parts[pos - 1 : pos + 1] > > if p[1] >= q[0]: > > parts[pos - 1 : pos + 1] = [(p[0], max(p[1], q[1]), p[2] > > + "." + q[2])] > > else: > > # They don't overlap, so try the next pair. > > pos += 1 > > > >>> parts > > [(3, 10, 'c.a.b'), (15, 23, 'd.e')] > > That's the best solution I've seen so far. It even has input/output > formatted as close as is reasonably possible to the format specified. > > As we would say in googlecode, +1. > > Marcus How does it compare to this one? http://groups.google.com/group/comp.lang.python/browse_frm/thread/1a1d2ed9d05d11d0/56684b795fc527cc#56684b795fc527cc -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
On 8/4/2009 6:09 PM, MRAB wrote: >>> parts = [(5, 9, "a"), (7, 10, "b"), (3, 6, "c"), (15, 20, "d"), (18, 23, "e")] >>> parts.sort() >>> parts [(3, 6, 'c'), (5, 9, 'a'), (7, 10, 'b'), (15, 20, 'd'), (18, 23, 'e')] >>> # Merge overlapping intervals. >>> pos = 1 >>> while pos < len(parts): # Merge the pair in parts[pos - 1 : pos + 1] if they overlap. p, q = parts[pos - 1 : pos + 1] if p[1] >= q[0]: parts[pos - 1 : pos + 1] = [(p[0], max(p[1], q[1]), p[2] + "." + q[2])] else: # They don't overlap, so try the next pair. pos += 1 >>> parts [(3, 10, 'c.a.b'), (15, 23, 'd.e')] That's the best solution I've seen so far. It even has input/output formatted as close as is reasonably possible to the format specified. As we would say in googlecode, +1. Marcus -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
Albert van der Horst: >That is an algorithmic question and has little to do with Python.< Yes, but comp.lang.python isn't comp.lang.c, that kind of questions are perfectly fine here. They help keep this place from becoming boring. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
In article , Jay Bird wrote: >Hi everyone, > >I've been trying to figure out a simple algorithm on how to combine a >list of parts that have 1D locations that overlap into a non- >overlapping list. For example, here would be my input: > >part name location >a 5-9 >b 7-10 >c 3-6 >d 15-20 >e 18-23 > >And here is what I need for an output: >part name location >c.a.b3-10 >d.e 15-23 > >I've tried various methods, which all fail. Does anyone have an idea >how to do this? That is an algorithmic question and has little to do with Python. You could proceed as follows: - Order your data by the lower limit - Combine everything with the same lower limit. - Then combine every pair of consecutive entries that overlap. (In you case the second step is not needed.) > >Thank you very much! >Jay -- -- Albert van der Horst, UTRECHT,THE NETHERLANDS Economic growth -- being exponential -- ultimately falters. alb...@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
On Aug 4, 2:57 pm, Gregor Lingl wrote: > Jay Bird schrieb: > > > > > > > Hi everyone, > > > I've been trying to figure out a simple algorithm on how to combine a > > list of parts that have 1D locations that overlap into a non- > > overlapping list. For example, here would be my input: > > > part name location > > a 5-9 > > b 7-10 > > c 3-6 > > d 15-20 > > e 18-23 > > > And here is what I need for an output: > > part name location > > c.a.b 3-10 > > d.e 15-23 > > Suppose you have your data given as a dictionary: > > data = {'a':(5,9), > 'b':(7,10), > 'c':(3,6), > 'd':(15,20), > 'e':(18,23)} > > Then the following not particularly elegant > but very simple and straight forward > algorithm will solve your problem: > > def values(key): > return set(range(data[key][0], data[key][1]+1)) > > keys = list(data.keys()) > result = [] > while keys: > k = [keys[0]] > keys = keys[1:] > v = values(k[0]) > for l in keys[:]: > if v.intersection(values(l)): > v.update(values(l)) > k.append(l) > keys.remove(l) > result.append((k,v)) > > # print(result) ## if you like > > for k, v in result: > print(".".join(sorted(k)), "%d-%d" % (min(v), max(v))) > > Regards, > Gregor > > > > > I've tried various methods, which all fail. Does anyone have an idea > > how to do this? > > > Thank you very much! > > Jay I was thinking sets, too. import random def overlap(): merge_count = 0 datam.append(data.pop(0)) while len(data)>0: d0 = data.pop(0) dmi = 0 dmtemp = datam[:] while d0: dmtest = d0[1].intersection(dmtemp[dmi][1]) #print(d0,dmtemp[dmi],dmtest) if len(dmtest)>0: # overlap dmtest = d0[1].union(dmtemp[dmi][1]) datam[dmi] = (d0[0]+dmtemp[dmi][0],dmtest) d0 = None dmi = 0 merge_count += 1 else: dmi += 1 if dmi == len(dmtemp): datam.append(d0) d0 = None return merge_count # repeat until 0 returned ## OP data ##data=[] ##data.append(('a',set([i for i in range(5,9+1)]))) ##data.append(('b',set([i for i in range(7,10+1)]))) ##data.append(('c',set([i for i in range(3,6+1)]))) ##data.append(('d',set([i for i in range(15,20+1)]))) ##data.append(('e',set([i for i in range(18,23+1)]))) ##datam = [] ## ##print() ##for j in data: print(j) ##print() ## ## ##overlap() ## ##for i in datam: ## print(i[0],min(i[1]),max(i[1])) ## ## ('a', {8, 9, 5, 6, 7}) ## ('b', {8, 9, 10, 7}) ## ('c', {3, 4, 5, 6}) ## ('d', {15, 16, 17, 18, 19, 20}) ## ('e', {18, 19, 20, 21, 22, 23}) ## ## cba 3 10 ## ed 15 23 ## special case - repeat overlap test until no merges data = [('A', {79, 80, 81, 82, 83, 84, 85, 86}), ('B', {96, 89, 90, 91, 92, 93, 94, 95}), ('C', {21, 22, 23, 24, 25, 26, 27, 28}), ('D', {64, 65, 66, 67, 68, 69, 62, 63}), ('E', {72, 73, 74, 75, 76, 77, 78, 79}), ('F', {65, 66, 67, 68, 69, 70, 71, 72}), ('G', {11, 12, 13, 14, 15, 16, 17, 18}), ('H', {24, 25, 26, 27, 28, 29, 30, 31}), ('I', {32, 33, 34, 35, 36, 37, 38, 31}), ('J', {81, 82, 83, 84, 85, 86, 87, 88})] datam = [] ## ('A', {55, 56, 57, 58, 59, 60, 61, 62}) ## ('B', {64, 57, 58, 59, 60, 61, 62, 63}) ## ('C', {10, 11, 12, 13, 14, 15, 16, 17}) ## ('D', {32, 25, 26, 27, 28, 29, 30, 31}) ## ('E', {54, 55, 56, 57, 58, 59, 60, 61}) ## ('F', {64, 65, 66, 59, 60, 61, 62, 63}) ## ('G', {41, 42, 43, 44, 45, 46, 47, 48}) ## ('H', {67, 68, 69, 70, 71, 72, 73, 74}) ## ('I', {55, 56, 57, 58, 59, 60, 61, 62}) ## ('J', {64, 65, 66, 67, 68, 69, 62, 63}) ## ## JIFEBA 54 69 ## C 10 17 ## D 25 32 ## G 41 48 ## H 67 74 <--- NFG overlaps JIFEBA print() for j in data: print(j) print() merges = 1 # corrects above while merges > 0: merges = overlap() print(merges) if merges>0: data = datam[:] datam = [] for i in datam: print(i[0],min(i[1]),max(i[1])) ## corrected ## ## ('A', {79, 80, 81, 82, 83, 84, 85, 86}) ## ('B', {96, 89, 90, 91, 92, 93, 94, 95}) ## ('C', {21, 22, 23, 24, 25, 26, 27, 28}) ## ('D', {64, 65, 66, 67, 68, 69, 62, 63}) ## ('E', {72, 73, 74, 75, 76, 77, 78, 79}) ## ('F', {65, 66, 67, 68, 69, 70, 71, 72}) ## ('G', {11, 12, 13, 14, 15, 16, 17, 18}) ## ('H', {24, 25, 26, 27, 28, 29, 30, 31}) ## ('I', {32, 33, 34, 35, 36, 37, 38, 31}) ## ('J', {81, 82, 83, 84, 85, 86, 87, 88}) ## ## 5 ## 1 ## 0 ## DJFEA 62 88 ## B 89 96 ## IHC 21 38 ## G 11 18 # random sequences data = [] for j in range(12): q = random.randint(0,90) r = q+12 data.append((chr(j+65),set([x for x in range(q,r)]))) datam = [] print() for j in data: print(j) print() merges = 1 # corrects above while merges > 0: merges = overlap() print(merges) if merges>0: data = datam[:] datam = [] for i in datam: print(i[0],min(i[1]),max(i[1])) ## ('A', {3, 4, 5, 6, 7, 8, 9, 10}) ## ('
Re: Overlap in python
kj: > # connect the nodes > for i in range(len(parts) - 1): > part_i = parts[i] > for j in range(i + 1, len(parts)): Note that's O(N^2), see the O(sort) standard solution by Mark Dickinson. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
In <78d86d92-d373-4163-a418-600a3eb36...@o15g2000yqm.googlegroups.com> Mark Dickinson writes: >On Aug 4, 7:15=A0pm, Jay Bird wrote: >> Hi everyone, >> >> I've been trying to figure out a simple algorithm on how to combine a >> list of parts that have 1D locations that overlap into a non- >> overlapping list. =A0For example, here would be my input: >> >> part name =A0 location >> a =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A05-9 >> b =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A07-10 >> c =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A03-6 >> d =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A015-20 >> e =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A018-23 >> >> And here is what I need for an output: >> part name =A0 location >> c.a.b =A0 =A0 =A0 =A0 =A0 =A03-10 >> d.e =A0 =A0 =A0 =A0 =A0 =A0 =A0 15-23 >> >> I've tried various methods, which all fail. =A0Does anyone have an idea >> how to do this? >Here's an approach that might work. Turning it into a sensible >function and parsing input/massaging output properly are left >as an exercise. Blocks that just touch (e.g. (3, 6) then (6, 9)) >are amalgamated; if this isn't what you want, and they should >be treated as separate instead, then replace 'Stop' with 'Finish' >(or any other string that sorts before 'Start'). >input =3D [ >('a', (5, 9)), >('b', (7, 10)), >('c', (3, 6)), >('d', (15, 20)), >('e', (18, 23)), >] ># create sorted list of points where an interval is entered or left >transitions =3D [] >for name, (start, stop) in input: >transitions.extend([(start, 'Start', name), (stop, 'Stop', name)]) >transitions.sort() ># scan list, accumulating blocks along the way. >count =3D 0 >for i, (pt, startend, name) in enumerate(transitions): >if startend =3D=3D 'Start': >if not count: >start =3D pt >names =3D [] >count +=3D 1 >names.append(name) >else: >count -=3D 1 >if not count: >print '.'.join(names), "%s--%s" % (start, pt) Very cool. kynn -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
Jay Bird wrote: Hi everyone, I've been trying to figure out a simple algorithm on how to combine a list of parts that have 1D locations that overlap into a non- overlapping list. For example, here would be my input: part name location a 5-9 b 7-10 c 3-6 d 15-20 e 18-23 And here is what I need for an output: part name location c.a.b3-10 d.e 15-23 I've tried various methods, which all fail. Does anyone have an idea how to do this? >>> parts = [(5, 9, "a"), (7, 10, "b"), (3, 6, "c"), (15, 20, "d"), (18, 23, "e")] >>> parts.sort() >>> parts [(3, 6, 'c'), (5, 9, 'a'), (7, 10, 'b'), (15, 20, 'd'), (18, 23, 'e')] >>> # Merge overlapping intervals. >>> pos = 1 >>> while pos < len(parts): # Merge the pair in parts[pos - 1 : pos + 1] if they overlap. p, q = parts[pos - 1 : pos + 1] if p[1] >= q[0]: parts[pos - 1 : pos + 1] = [(p[0], max(p[1], q[1]), p[2] + "." + q[2])] else: # They don't overlap, so try the next pair. pos += 1 >>> parts [(3, 10, 'c.a.b'), (15, 23, 'd.e')] -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
In Jay Bird writes: >Hi everyone, >I've been trying to figure out a simple algorithm on how to combine a >list of parts that have 1D locations that overlap into a non- >overlapping list. For example, here would be my input: >part name location >a 5-9 >b 7-10 >c 3-6 >d 15-20 >e 18-23 >And here is what I need for an output: >part name location >c.a.b3-10 >d.e 15-23 >I've tried various methods, which all fail. Does anyone have an idea >how to do this? This problem is basically isomorphic to the problem of finding connected components in an undirected graph. If your parts are nodes in a graph, there's an edge between them if they overlap. The following is probably overkill, but it works. It uses the module pygraph, which you can download from http://code.google.com/p/python-graph/ The real work is done by its connected_components function. import pygraph from pygraph.algorithms.accessibility import connected_components class Part(object): def __init__(self, name, start, finish): self.name = name if start >= finish: raise ValueError("start must be strictly less than finish") self.start = start self.finish = finish def __str__(self): return self.name def combine_parts(parts): gr = pygraph.graph() gr.add_nodes(parts) # connect the nodes for i in range(len(parts) - 1): part_i = parts[i] for j in range(i + 1, len(parts)): part_j = parts[j] if (part_j.start <= part_i.finish <= part_j.finish or part_i.start <= part_j.finish <= part_i.finish): gr.add_edge(part_i, part_j) cc = connected_components(gr) c2n = {} # build connected components as lists for k, v in cc.items(): c2n.setdefault(v, []).append(k) ret = [] for nodes in c2n.values(): nodes.sort(key=lambda x: x.start) name = '.'.join(x.name for x in nodes) start = min(x.start for x in nodes) finish = max(x.finish for x in nodes) ret.append((name, start, finish)) return ret if __name__ == '__main__': data = [Part('a', 5, 9), Part('b', 7, 10), Part('c', 3, 6), Part('d', 15, 20), Part('e', 18, 23)] print combine_parts(data) When you run the script above, the output is [('c.a.b', 3, 10), ('d.e', 15, 23)] HTH, kynn -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
On Aug 4, 9:03 pm, Mark Dickinson wrote: > for i, (pt, startend, name) in enumerate(transitions): Whoops. That line should just be: for pt, startend, name in transitions: The enumerate was accidentally left-over from a more complicated version. -- Mark -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
Mark Dickinson: > # create sorted list of points where an interval is entered or left > transitions = [] > for name, (start, stop) in input: > transitions.extend([(start, 'Start', name), (stop, 'Stop', name)]) > transitions.sort() > > # scan list, accumulating blocks along the way. Oh, right, that's the usual simple solution. Silly me for showing the dumb quadratic solution :-) The "pixelmap" approach may be faster if you have tons of small intervals in a not too much large range. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
Gregor Lingl schrieb: As my proposed solution shows this approach can be done with on board means of Python (namely the set type). This would be quite different though, if you had floating point boundaries of the intervals. ... or if the intgers involved were very big :-( Regards, Gregor Duane -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
DuaneKaufman schrieb: On Aug 4, 1:15 pm, Jay Bird wrote: ... For instance the interval module found at: http://members.cox.net/apoco/interval/ can be put to use: Given your data above: part name location a 5-9 b 7-10 c 3-6 from interval import Interval a_int=Interval(5,9) b_int=Interval(7,10) c_int=Interval(3,6) a_int.overlaps(b_int) True a_int.overlaps(c_int) True b_int.overlaps(c_int) False As my proposed solution shows this approach can be done with on board means of Python (namely the set type). This would be quite different though, if you had floating point boundaries of the intervals. Regards, Gregor Duane -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
Jay Bird: > I've been trying to figure out a simple algorithm on how to combine a > list of parts that have 1D locations that overlap into a non- > overlapping list. For example, here would be my input: > > part name location > a 5-9 > b 7-10 > c 3-6 > d 15-20 > e 18-23 > > And here is what I need for an output: > part name location > c.a.b3-10 > d.e 15-23 That's sometimes known as "The Set Union Problem on Intervals". Knowing the name of a problem is sometimes half the work :-) People have written papers on this. This is a O(N^2) force-brute algorithm, you can try it on your data: data = """part name location a 5-9 b 7-10 c 3-6 d 15-20 e 18-23""" pairs = map(str.split, data.splitlines()[1:]) triples = [map(int, i.split("-")) + [name] for (name, i) in pairs] overlap = lambda x1,y1, x2,y2: y1 >= x2 and x1 <= y2 results = [] for (x1,y1,name) in triples: for i, res in enumerate(results): if overlap(x1,y1, res[0],res[1]): results[i].append(name) results[i][0] = min(x1, res[0]) results[i][1] = max(y1, res[1]) break else: results.append([x1, y1, name]) print results # Output: [[3, 10, 'a', 'b', 'c'], [15, 23, 'd', 'e']] If you have have a lot of data then it will be too much slow, and you have to use a more complex algorithm. If your intervals are many, they are small, and they are spread in a not too much large range of possible values, you can create an integer array of indexes, and you can "paint" intervals on it. Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
On Aug 4, 1:15 pm, Jay Bird wrote: > Hi everyone, > > I've been trying to figure out a simple algorithm on how to combine a > list of parts that have 1D locations that overlap into a non- > overlapping list. For example, here would be my input: > > part name location > a 5-9 > b 7-10 > c 3-6 > d 15-20 > e 18-23 > > And here is what I need for an output: > part name location > c.a.b 3-10 > d.e 15-23 > > I've tried various methods, which all fail. Does anyone have an idea > how to do this? > > Thank you very much! > Jay Hi, You have gotten some excellent feedback (all without installing anything extra to your Python install), but might I suggest the use of an interval arithmetic package to aid in this? For instance the interval module found at: http://members.cox.net/apoco/interval/ can be put to use: Given your data above: > part name location > a 5-9 > b 7-10 > c 3-6 from interval import Interval >>> a_int=Interval(5,9) >>> b_int=Interval(7,10) >>> c_int=Interval(3,6) >>> a_int.overlaps(b_int) True >>> a_int.overlaps(c_int) True >>> b_int.overlaps(c_int) False Duane -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
On Aug 4, 7:15 pm, Jay Bird wrote: > Hi everyone, > > I've been trying to figure out a simple algorithm on how to combine a > list of parts that have 1D locations that overlap into a non- > overlapping list. For example, here would be my input: > > part name location > a 5-9 > b 7-10 > c 3-6 > d 15-20 > e 18-23 > > And here is what I need for an output: > part name location > c.a.b 3-10 > d.e 15-23 > > I've tried various methods, which all fail. Does anyone have an idea > how to do this? Here's an approach that might work. Turning it into a sensible function and parsing input/massaging output properly are left as an exercise. Blocks that just touch (e.g. (3, 6) then (6, 9)) are amalgamated; if this isn't what you want, and they should be treated as separate instead, then replace 'Stop' with 'Finish' (or any other string that sorts before 'Start'). input = [ ('a', (5, 9)), ('b', (7, 10)), ('c', (3, 6)), ('d', (15, 20)), ('e', (18, 23)), ] # create sorted list of points where an interval is entered or left transitions = [] for name, (start, stop) in input: transitions.extend([(start, 'Start', name), (stop, 'Stop', name)]) transitions.sort() # scan list, accumulating blocks along the way. count = 0 for i, (pt, startend, name) in enumerate(transitions): if startend == 'Start': if not count: start = pt names = [] count += 1 names.append(name) else: count -= 1 if not count: print '.'.join(names), "%s--%s" % (start, pt) The output that I get from this under Python 2.6 is: c.a.b 3--10 d.e 15--23 -- Mark -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
Jay Bird schrieb: Hi everyone, I've been trying to figure out a simple algorithm on how to combine a list of parts that have 1D locations that overlap into a non- overlapping list. For example, here would be my input: part name location a 5-9 b 7-10 c 3-6 d 15-20 e 18-23 And here is what I need for an output: part name location c.a.b3-10 d.e 15-23 Suppose you have your data given as a dictionary: data = {'a':(5,9), 'b':(7,10), 'c':(3,6), 'd':(15,20), 'e':(18,23)} Then the following not particularly elegant but very simple and straight forward algorithm will solve your problem: def values(key): return set(range(data[key][0], data[key][1]+1)) keys = list(data.keys()) result = [] while keys: k = [keys[0]] keys = keys[1:] v = values(k[0]) for l in keys[:]: if v.intersection(values(l)): v.update(values(l)) k.append(l) keys.remove(l) result.append((k,v)) # print(result) ## if you like for k, v in result: print(".".join(sorted(k)), "%d-%d" % (min(v), max(v))) Regards, Gregor I've tried various methods, which all fail. Does anyone have an idea how to do this? Thank you very much! Jay -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
Jay Bird wrote: Hi everyone, I've been trying to figure out a simple algorithm on how to combine a list of parts that have 1D locations that overlap into a non- overlapping list. For example, here would be my input: part name location a 5-9 b 7-10 c 3-6 d 15-20 e 18-23 And here is what I need for an output: part name location c.a.b3-10 d.e 15-23 I've tried various methods, which all fail. Does anyone have an idea how to do this? Thank you very much! Jay I haven't tried but could you adapt this:- data = [ 1, 4,5,6, 10, 15,16,17,18, 22, 25,26,27,28] for k, g in groupby(enumerate(data), lambda (i,x):i-x): print map(itemgetter(1), g) found here http://www.python.org/doc/2.6.2/library/itertools.html#module-itertools -- Kindest regards. Mark Lawrence. -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
On 8/4/2009 2:46 PM, Ann wrote: On Aug 4, 11:31 am, Marcus Wanner wrote: On Aug 4, 2:15 pm, Jay Bird wrote: Hi everyone, I've been trying to figure out a simple algorithm on how to combine a list of parts that have 1D locations that overlap into a non- overlapping list. For example, here would be my input: part name location a 5-9 b 7-10 c 3-6 d 15-20 e 18-23 And here is what I need for an output: part name location c.a.b3-10 d.e 15-23 I've tried various methods, which all fail. Does anyone have an idea how to do this? Thank you very much! Jay Just take all the values, put them in a list, and use min() and max(). For example: import string def makelist(values): values = string.replace(values, ' ', '') listofvaluepairs = string.split(values, ',') listofvalues = [] for pair in listofvaluepairs: twovalues = string.split(pair, '-') listofvalues.append(int(twovalues[0])) listofvalues.append(int(twovalues[1])) return listofvalues values = '5-9, 7-10, 3-6' values = makelist(values) print('Values: %d-%d' %(min(values), max(values)) ) Marcus Thank you for your help, this is a very nice program but it does not address the issue that I have values that do not overlap, for example 'c' and 'd' do not overlap in the above example and thus I would not want to output 3-20 but the individual positions instead. In addition, my inputs are not ordered from smallest to largest, thus I could have a situation where a=5-9, b=15-20, c=7-10, etc., and I would still want an output of: a.c = 5-10, b=15-20. I apologize for the complication. Thank you again! Jay That would be a bit more complicated...you might try using tuples of values in a list, or writing your own class for the data handling. Unfortunately that's not something I can just sit down and code in 5 minutes, it would take some careful thought and planning. Marcus -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
On Aug 4, 11:31 am, Marcus Wanner wrote: > On Aug 4, 2:15 pm, Jay Bird wrote: > > > > > Hi everyone, > > > I've been trying to figure out a simple algorithm on how to combine a > > list of parts that have 1D locations that overlap into a non- > > overlapping list. For example, here would be my input: > > > part name location > > a 5-9 > > b 7-10 > > c 3-6 > > d 15-20 > > e 18-23 > > > And here is what I need for an output: > > part name location > > c.a.b 3-10 > > d.e 15-23 > > > I've tried various methods, which all fail. Does anyone have an idea > > how to do this? > > > Thank you very much! > > Jay > > Just take all the values, put them in a list, and use min() and max(). > For example: > > import string > > def makelist(values): > values = string.replace(values, ' ', '') > listofvaluepairs = string.split(values, ',') > listofvalues = [] > for pair in listofvaluepairs: > twovalues = string.split(pair, '-') > listofvalues.append(int(twovalues[0])) > listofvalues.append(int(twovalues[1])) > return listofvalues > > values = '5-9, 7-10, 3-6' > values = makelist(values) > print('Values: %d-%d' %(min(values), max(values)) ) > > Marcus Thank you for your help, this is a very nice program but it does not address the issue that I have values that do not overlap, for example 'c' and 'd' do not overlap in the above example and thus I would not want to output 3-20 but the individual positions instead. In addition, my inputs are not ordered from smallest to largest, thus I could have a situation where a=5-9, b=15-20, c=7-10, etc., and I would still want an output of: a.c = 5-10, b=15-20. I apologize for the complication. Thank you again! Jay -- http://mail.python.org/mailman/listinfo/python-list
Re: Overlap in python
On Aug 4, 2:15 pm, Jay Bird wrote: > Hi everyone, > > I've been trying to figure out a simple algorithm on how to combine a > list of parts that have 1D locations that overlap into a non- > overlapping list. For example, here would be my input: > > part name location > a 5-9 > b 7-10 > c 3-6 > d 15-20 > e 18-23 > > And here is what I need for an output: > part name location > c.a.b 3-10 > d.e 15-23 > > I've tried various methods, which all fail. Does anyone have an idea > how to do this? > > Thank you very much! > Jay Just take all the values, put them in a list, and use min() and max(). For example: import string def makelist(values): values = string.replace(values, ' ', '') listofvaluepairs = string.split(values, ',') listofvalues = [] for pair in listofvaluepairs: twovalues = string.split(pair, '-') listofvalues.append(int(twovalues[0])) listofvalues.append(int(twovalues[1])) return listofvalues values = '5-9, 7-10, 3-6' values = makelist(values) print('Values: %d-%d' %(min(values), max(values)) ) Marcus -- http://mail.python.org/mailman/listinfo/python-list
Overlap in python
Hi everyone, I've been trying to figure out a simple algorithm on how to combine a list of parts that have 1D locations that overlap into a non- overlapping list. For example, here would be my input: part name location a 5-9 b 7-10 c 3-6 d 15-20 e 18-23 And here is what I need for an output: part name location c.a.b3-10 d.e 15-23 I've tried various methods, which all fail. Does anyone have an idea how to do this? Thank you very much! Jay -- http://mail.python.org/mailman/listinfo/python-list