Jay Bird wrote:
Hi everyone,
I've been trying to figure out a simple algorithm on how to combine a
list of parts that have 1D locations that overlap into a non-
overlapping list. For example, here would be my input:
part name location
a 5-9
b 7-10
c 3-6
d 15-20
e 18-23
And here is what I need for an output:
part name location
c.a.b 3-10
d.e 15-23
I've tried various methods, which all fail. Does anyone have an idea
how to do this?
Thank you very much!
Jay
I once had to do this for finding nested block structure.
The key for me was a sort order: start, -final.
Having not seen it here (though I looked a bit), here's one:
class Entry(object):
'''An entry is a name and range'''
def __init__(self, line):
self.name, startstop = line.split()
start, stop = startstop.split('-')
self.start, self.stop = int(start), int(stop)
def combined_ranges(lines):
'''Create Entries in "magic order", and produce ranges.
The "magic order" makes least element with longest range first, so
overlaps show up in head order, with final tail first among equals.
'''
# Fill in our table (ignoring blank lines), then sort by magic order
elements = [Entry(line) for line in lines if line.strip()]
elements.sort(key=lambda e: (e.start, -e.stop))
# Now produce resolved ranges. Grab the start
gen = iter(elements)
first = gen.next()
# For the remainder, combine or produce
for v in gen:
if v.start <= first.stop:
# on overlap, merge in new element (may update stop)
first.name += '.' + v.name
if first.stop < v.stop:
first.stop = v.stop
else:
yield first
first = v
# And now produce the last element we covered
yield first
# Demo:
sample = '''part name location
a 5-9
b 7-10
c 3-6
d 15-20
e 18-23
'''
source = iter(sample.split('\n')) # source of lines, opened file?
ignored = source.next() # discard heading
for interval in combined_range(source):
print '%s %s-%s' % (interval.name, interval.start, interval.stop)
Prints:
c.a.b 3-10
d.e 15-23
--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list