On Sat, May 5, 2012 at 10:12 PM, J. Mwebaze <jmweb...@gmail.com> wrote: > This is out of curiosity, i know this can be done with python diffllib > module, but been figuring out how to compute the delta, Consider two lists > below. > > s1 = ['e', 'f', 'g', 'A', 'B', 'C', 'D', 'C'] > s2 =['e', 'A', 'B', 'f', 'g', 'C', 'D', 'z'] > > This is the result should be > > [' e', '+ A', '+ B', ' f', ' g', '- A', '- B', ' C', ' D', '- C', '+ > z'] > > ideas on how to approach this.. ?
Here's a simple algorithm that produces sorta-mostly-reasonable results most of the time: Set your current-position-index to the beginning of each lists. (Call them 'pos1' and 'pos2'.) If s1[pos1] and s2[pos2] are identical, match - increment and iterate. Otherwise, increment pos2 until either it matches pos1 or you reach the end of s2. If you find a match, report the insertion into s2, increment both pointers past the match, and carry on. If you hit the end of s2, increment pos1 once and report an insertion into s1, then go back to looking for a match. It helps to append a sentinel to each list; that way, you don't need to separately check for additional content at the end of either list (as the sentinel won't match any of the strings). This is the algorithm I used for writing a simple file diff tool ages ago. It's not as good as diff(1), but it was fun to do the exercise. It's quite inefficient at handling large insertions into s1, and needs to be modified for most file handling (for instance, requiring a 2-line or 3-line rematch after a difference, to avoid matching on blank lines), but it's a basis. Producing beautiful or minimal diffs is a more complex task. :) ChrisA -- http://mail.python.org/mailman/listinfo/python-list