Gael Varoquaux <gael.varoquaux <at> normalesup.org> writes: > > wdiff.
Thanks for the suggestions! Unfortunately, one thing I forgot to mention was that the concatenations should not span different paragraphs. Thus: Hello! World! is not the same as: Hello! World! Since the first represents 2 paragraphs, but the second only 1. Instead, I propose the following python script that diffs the docutil trees instead of the original text files. I don't know how it could tell whether the 2 imputs are reStructuredText documents vs. regular text documents and only perform the doc-tree step if rst, and am welcome to suggestions for improvements but so far this does a good job of what I am trying to achieve. Such a tool could be handy to rst documenters in cases where a document may have a bunch of lines through years of editing that go beyond 80 columns and thus the file is edited to bring it back in line, which produces massive standard diffs when the result really should more or less be the same document. This script could be used to confirm that the two versions of documents are more or less the same. ---------- #!/usr/bin/python import sys import subprocess import tempfile import docutils.core import os import re # Regexp for removing inconsequential characters trimwhite = re.compile(r'(?<!>)\n\s*(?![< ])', re.M + re.U + re.L) webspace = re.compile(r'(?<=[.?!):])\s{2,}(?=[\w\d(])', re.M + re.U + re.L) repl = r' ' if __name__ == '__main__': # To Do: verify that document 1 and document 2 are both # reStructuredText documents # Last 2 parameters are the left hand side and right hand side file lhs, rhs = sys.argv[-2:] # Parse the left and right file into docutils tree strings lhss1 = docutils.core.publish_string(file(lhs).read()) rhss2 = docutils.core.publish_string(file(rhs).read()) # Concatenate multi-line text that lies within a node lhss1, lhsr1 = trimwhite.subn(repl, lhss1) rhss2, rhsr2 = trimwhite.subn(repl, rhss2) #sys.stdout.write('Removed returns (left, right): %d, %d\n' % # (lhsr1, rhsr2)) # Trim multiple white spaces between full-stop (.?!) and the next phrase lhss1, lhsr1 = webspace.subn(repl, lhss1) rhss2, rhsr2 = webspace.subn(repl, rhss2) #sys.stdout.write('Removed double space (left, right): %d, %d\n' % # (lhsr1, rhsr2)) # Make sure the last line is properly terminated lhss1 += '\n' rhss2 += '\n' # Allocate temporary files to hold the left and right doc-trees lhsh1, lhst1 = tempfile.mkstemp(text=True) rhsh2, rhst2 = tempfile.mkstemp(text=True) # Open the left and write temp files for writing lhso1 = os.fdopen(lhsh1, 'w') rhso2 = os.fdopen(rhsh2, 'w') # Write the doc-trees to the temp files lhso1.write(lhss1) rhso2.write(rhss2) # Close the temp files lhso1.close() rhso2.close() # Spawn [UNIX] diff and wait for it to complete # Stdout and Stderr are passed directly to this application sp = subprocess.Popen(['diff'] + sys.argv[1:-2] + [lhst1, rhst2]) sp.wait() # Delete the temp files os.remove(lhst1) os.remove(rhst2) _______________________________________________ Doc-SIG maillist - Doc-SIG@python.org http://mail.python.org/mailman/listinfo/doc-sig