Jay Bird wrote:
Hi everyone,

I've been trying to figure out a simple algorithm on how to combine a
list of parts that have 1D locations that overlap into a non-
overlapping list.  For example, here would be my input:

part name   location
a                  5-9
b                  7-10
c                  3-6
d                  15-20
e                  18-23

And here is what I need for an output:
part name   location
c.a.b            3-10
d.e               15-23

I've tried various methods, which all fail.  Does anyone have an idea
how to do this?

Thank you very much!
Jay

I once had to do this for finding nested block structure.
The key for me was a sort order:  start, -final.

Having not seen it here (though I looked a bit), here's one:

class Entry(object):
    '''An entry is a name and range'''
    def __init__(self, line):
        self.name, startstop = line.split()
        start, stop = startstop.split('-')
        self.start, self.stop = int(start), int(stop)


def combined_ranges(lines):
    '''Create Entries in "magic order", and produce ranges.

    The "magic order" makes least element with longest range first, so
    overlaps show up in head order, with final tail first among equals.
    '''
    # Fill in our table (ignoring blank lines), then sort by magic order
    elements = [Entry(line) for line in lines if line.strip()]
    elements.sort(key=lambda e: (e.start, -e.stop))

    # Now produce resolved ranges.  Grab the start
    gen = iter(elements)
    first = gen.next()

    # For the remainder, combine or produce
    for v in gen:
        if v.start <= first.stop:
            # on overlap, merge in new element (may update stop)
            first.name += '.' + v.name
            if first.stop < v.stop:
                first.stop = v.stop
        else:
            yield first
            first = v
    # And now produce the last element we covered
    yield first

# Demo:
sample = '''part name   location
    a                  5-9
    b                  7-10
    c                  3-6
    d                  15-20
    e                  18-23
    '''
source = iter(sample.split('\n')) # source of lines, opened file?
ignored = source.next() # discard heading
for interval in combined_range(source):
    print '%s  %s-%s' % (interval.name, interval.start, interval.stop)

Prints:
c.a.b  3-10
d.e  15-23


--Scott David Daniels
scott.dani...@acm.org
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to