Re: Any way to adjust difflib algorithm?
In article fmqdncomnpg-jrjxnz2dnuvz_tti4...@posted.visi, Grant Edwards inva...@invalid wrote: On 2009-08-14, Grant Edwards inva...@invalid wrote: In my particular usage, no lines have ever been inserted/deleted, so perhaps I should be running diffs on individual lines instead? If I do that, I can't figure out how to generate HTML output. I ended up using the SequenceMatcher on individual pairs of lines and generating my own HTML based on the results of get_matching_blocks(). That produced the desired results. Good work! Note that IME most diff software shows changed lines as a delete-and-add. For example, diff -u -- Aahz (a...@pythoncraft.com) * http://www.pythoncraft.com/ Given that C++ has pointers and typecasts, it's really hard to have a serious conversation about type safety with a C++ programmer and keep a straight face. It's kind of like having a guy who juggles chainsaws wearing body armor arguing with a guy who juggles rubber chickens wearing a T-shirt about who's in more danger. --Roy Smith -- http://mail.python.org/mailman/listinfo/python-list
Re: Any way to adjust difflib algorithm?
On 2009-08-19, Aahz a...@pythoncraft.com wrote: In article fmqdncomnpg-jrjxnz2dnuvz_tti4...@posted.visi, Grant Edwards inva...@invalid wrote: On 2009-08-14, Grant Edwards inva...@invalid wrote: In my particular usage, no lines have ever been inserted/deleted, so perhaps I should be running diffs on individual lines instead? If I do that, I can't figure out how to generate HTML output. I ended up using the SequenceMatcher on individual pairs of lines and generating my own HTML based on the results of get_matching_blocks(). That produced the desired results. Good work! Note that IME most diff software shows changed lines as a delete-and-add. For example, diff -u Right -- though difflib did show _some_ lines as changed rather than deleted/added, it wasn't obvious how it decided between the two. I suspect it used some sort of percentage-changed threshold. For this application both files had all the same lines (by definition), so what I was interested in was what parts of each line changed. -- Grant Edwards grante Yow! I just heard the at SEVENTIES were over!! And visi.comI was just getting in touch with my LEISURE SUIT!! -- http://mail.python.org/mailman/listinfo/python-list
Any way to adjust difflib algorithm?
I'm trying to use difflib to compare two files, and it's not producing very useful results. When comparing two lines where only a few characters have changed, it usually seems to decide that a line was deleted/inserted/replaced rather than changed. Here's how I'm using it: #!/usr/bin/python import sys,difflib fromlines = [l.rstrip('\n') for l in open(sys.argv[1]).readlines()] tolines = [l.rstrip('\n') for l in open(sys.argv[2]).readlines()] print difflib.HtmlDiff().make_file(fromlines,tolines) In my particular usage, no lines have ever been inserted/deleted, so perhaps I should be running diffs on individual lines instead? If I do that, I can't figure out how to generate HTML output. Is there a way to tell the differ to try harder to match lines? -- Grant Edwards grante Yow! I hope something GOOD at came in the mail today so visi.comI have a REASON to live!! -- http://mail.python.org/mailman/listinfo/python-list
Re: Any way to adjust difflib algorithm?
On Fri, Aug 14, 2009 at 2:38 PM, Grant Edwardsinva...@invalid wrote: I'm trying to use difflib to compare two files, and it's not producing very useful results. When comparing two lines where only a few characters have changed, it usually seems to decide that a line was deleted/inserted/replaced rather than changed. snip Is there a way to tell the differ to try harder to match lines? You could use a wordwise diff instead: http://www.gnu.org/software/wdiff/ Obviously that's not a pure Python solution though. Cheers, Chris -- http://blog.rebertia.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Any way to adjust difflib algorithm?
On 2009-08-14, Grant Edwards inva...@invalid wrote: I'm trying to use difflib to compare two files, and it's not producing very useful results. When comparing two lines where only a few characters have changed, it usually seems to decide that a line was deleted/inserted/replaced rather than changed. [...] In my particular usage, no lines have ever been inserted/deleted, so perhaps I should be running diffs on individual lines instead? If I do that, I can't figure out how to generate HTML output. I ended up using the SequenceMatcher on individual pairs of lines and generating my own HTML based on the results of get_matching_blocks(). That produced the desired results. -- Grant Edwards grante Yow! I have a very good at DENTAL PLAN. Thank you. visi.com -- http://mail.python.org/mailman/listinfo/python-list