Re: Re: difflib and intelligent file differences

Dave Angel Thu, 26 Mar 2009 08:36:14 -0700

If the lines are really sorted, all you really need is a merge, whereyou read one line from each source, and if equal, read another fromeach. If one source is less, output the lesser line with appropriatetag , and refresh that one from its source. Stop when either source hasrun out, and then flush the rest of the other source to the output, withappropriate tag.


Time is linear, and memory use negligible.


Marco Mariani wrote:

You can adapt and use this, provided the files are already sorted.Memory usage scales linearly with the size of the file difference, andtime scales linearly with file sizes.

#!/usr/bin/env python

import sys


def run(fname_a, fname_b):
    filea = file(fname_a)
    fileb = file(fname_b)
    a_lines = set()
    b_lines = set()

    while True:
        a = filea.readline()
        b = fileb.readline()
        if not (a or b):
            break

        if a == b:
            continue

        if a in b_lines:
            b_lines.remove(a)
        elif a:
            a_lines.add(a)

        if b in a_lines:
            a_lines.remove(b)
        elif b:
            b_lines.add(b)


    for line in a_lines:
        print line

    if a_lines or b_lines:
        print ''
        print '***************'
        print ''

    for line in b_lines:
        print line


if __name__ == '__main__':
    run(sys.argv[1], sys.argv[2])


</div>

--
http://mail.python.org/mailman/listinfo/python-list

Re: Re: difflib and intelligent file differences

Reply via email to