Re: Any way to adjust difflib algorithm?

2009-08-19 Thread Aahz
In article fmqdncomnpg-jrjxnz2dnuvz_tti4...@posted.visi,
Grant Edwards  inva...@invalid wrote:
On 2009-08-14, Grant Edwards inva...@invalid wrote:

 In my particular usage, no lines have ever been
 inserted/deleted, so perhaps I should be running diffs on
 individual lines instead?  If I do that, I can't figure out
 how to generate HTML output.

I ended up using the SequenceMatcher on individual pairs of
lines and generating my own HTML based on the results of
get_matching_blocks().

That produced the desired results.

Good work!  Note that IME most diff software shows changed lines as a
delete-and-add.  For example, diff -u
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

Given that C++ has pointers and typecasts, it's really hard to have a
serious conversation about type safety with a C++ programmer and keep a
straight face.  It's kind of like having a guy who juggles chainsaws
wearing body armor arguing with a guy who juggles rubber chickens wearing
a T-shirt about who's in more danger.  --Roy Smith
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any way to adjust difflib algorithm?

2009-08-19 Thread Grant Edwards
On 2009-08-19, Aahz a...@pythoncraft.com wrote:
 In article fmqdncomnpg-jrjxnz2dnuvz_tti4...@posted.visi,
 Grant Edwards  inva...@invalid wrote:
On 2009-08-14, Grant Edwards inva...@invalid wrote:

 In my particular usage, no lines have ever been
 inserted/deleted, so perhaps I should be running diffs on
 individual lines instead?  If I do that, I can't figure out
 how to generate HTML output.

I ended up using the SequenceMatcher on individual pairs of
lines and generating my own HTML based on the results of
get_matching_blocks().

That produced the desired results.

 Good work!  Note that IME most diff software shows changed
 lines as a delete-and-add.  For example, diff -u

Right -- though difflib did show _some_ lines as changed rather
than deleted/added, it wasn't obvious how it decided between
the two.  I suspect it used some sort of percentage-changed
threshold.

For this application both files had all the same lines (by
definition), so what I was interested in was what parts of each
line changed.

-- 
Grant Edwards   grante Yow! I just heard the
  at   SEVENTIES were over!!  And
   visi.comI was just getting in touch
   with my LEISURE SUIT!!
-- 
http://mail.python.org/mailman/listinfo/python-list


Any way to adjust difflib algorithm?

2009-08-14 Thread Grant Edwards
I'm trying to use difflib to compare two files, and it's not
producing very useful results.  When comparing two lines where
only a few characters have changed, it usually seems to decide
that a line was deleted/inserted/replaced rather than changed.

Here's how I'm using it:

   #!/usr/bin/python
   import sys,difflib
   
   fromlines = [l.rstrip('\n') for l in open(sys.argv[1]).readlines()]
   tolines   = [l.rstrip('\n') for l in open(sys.argv[2]).readlines()]
   
   print difflib.HtmlDiff().make_file(fromlines,tolines)
   
In my particular usage, no lines have ever been
inserted/deleted, so perhaps I should be running diffs on
individual lines instead?  If I do that, I can't figure out how
to generate HTML output.

Is there a way to tell the differ to try harder to match lines?

-- 
Grant Edwards   grante Yow! I hope something GOOD
  at   came in the mail today so
   visi.comI have a REASON to live!!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any way to adjust difflib algorithm?

2009-08-14 Thread Chris Rebert
On Fri, Aug 14, 2009 at 2:38 PM, Grant Edwardsinva...@invalid wrote:
 I'm trying to use difflib to compare two files, and it's not
 producing very useful results.  When comparing two lines where
 only a few characters have changed, it usually seems to decide
 that a line was deleted/inserted/replaced rather than changed.
snip
 Is there a way to tell the differ to try harder to match lines?

You could use a wordwise diff instead: http://www.gnu.org/software/wdiff/
Obviously that's not a pure Python solution though.

Cheers,
Chris
-- 
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any way to adjust difflib algorithm?

2009-08-14 Thread Grant Edwards
On 2009-08-14, Grant Edwards inva...@invalid wrote:

 I'm trying to use difflib to compare two files, and it's not
 producing very useful results.  When comparing two lines where
 only a few characters have changed, it usually seems to decide
 that a line was deleted/inserted/replaced rather than changed.

[...]

 In my particular usage, no lines have ever been
 inserted/deleted, so perhaps I should be running diffs on
 individual lines instead?  If I do that, I can't figure out
 how to generate HTML output.

I ended up using the SequenceMatcher on individual pairs of
lines and generating my own HTML based on the results of
get_matching_blocks().

That produced the desired results.

-- 
Grant Edwards   grante Yow! I have a very good
  at   DENTAL PLAN.  Thank you.
   visi.com
-- 
http://mail.python.org/mailman/listinfo/python-list