Re: Any way to adjust difflib algorithm?

2009-08-19 Thread Aahz
In article fmqdncomnpg-jrjxnz2dnuvz_tti4...@posted.visi,
Grant Edwards  inva...@invalid wrote:
On 2009-08-14, Grant Edwards inva...@invalid wrote:

 In my particular usage, no lines have ever been
 inserted/deleted, so perhaps I should be running diffs on
 individual lines instead?  If I do that, I can't figure out
 how to generate HTML output.

I ended up using the SequenceMatcher on individual pairs of
lines and generating my own HTML based on the results of
get_matching_blocks().

That produced the desired results.

Good work!  Note that IME most diff software shows changed lines as a
delete-and-add.  For example, diff -u
-- 
Aahz (a...@pythoncraft.com)   * http://www.pythoncraft.com/

Given that C++ has pointers and typecasts, it's really hard to have a
serious conversation about type safety with a C++ programmer and keep a
straight face.  It's kind of like having a guy who juggles chainsaws
wearing body armor arguing with a guy who juggles rubber chickens wearing
a T-shirt about who's in more danger.  --Roy Smith
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any way to adjust difflib algorithm?

2009-08-19 Thread Grant Edwards
On 2009-08-19, Aahz a...@pythoncraft.com wrote:
 In article fmqdncomnpg-jrjxnz2dnuvz_tti4...@posted.visi,
 Grant Edwards  inva...@invalid wrote:
On 2009-08-14, Grant Edwards inva...@invalid wrote:

 In my particular usage, no lines have ever been
 inserted/deleted, so perhaps I should be running diffs on
 individual lines instead?  If I do that, I can't figure out
 how to generate HTML output.

I ended up using the SequenceMatcher on individual pairs of
lines and generating my own HTML based on the results of
get_matching_blocks().

That produced the desired results.

 Good work!  Note that IME most diff software shows changed
 lines as a delete-and-add.  For example, diff -u

Right -- though difflib did show _some_ lines as changed rather
than deleted/added, it wasn't obvious how it decided between
the two.  I suspect it used some sort of percentage-changed
threshold.

For this application both files had all the same lines (by
definition), so what I was interested in was what parts of each
line changed.

-- 
Grant Edwards   grante Yow! I just heard the
  at   SEVENTIES were over!!  And
   visi.comI was just getting in touch
   with my LEISURE SUIT!!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any way to adjust difflib algorithm?

2009-08-14 Thread Chris Rebert
On Fri, Aug 14, 2009 at 2:38 PM, Grant Edwardsinva...@invalid wrote:
 I'm trying to use difflib to compare two files, and it's not
 producing very useful results.  When comparing two lines where
 only a few characters have changed, it usually seems to decide
 that a line was deleted/inserted/replaced rather than changed.
snip
 Is there a way to tell the differ to try harder to match lines?

You could use a wordwise diff instead: http://www.gnu.org/software/wdiff/
Obviously that's not a pure Python solution though.

Cheers,
Chris
-- 
http://blog.rebertia.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Any way to adjust difflib algorithm?

2009-08-14 Thread Grant Edwards
On 2009-08-14, Grant Edwards inva...@invalid wrote:

 I'm trying to use difflib to compare two files, and it's not
 producing very useful results.  When comparing two lines where
 only a few characters have changed, it usually seems to decide
 that a line was deleted/inserted/replaced rather than changed.

[...]

 In my particular usage, no lines have ever been
 inserted/deleted, so perhaps I should be running diffs on
 individual lines instead?  If I do that, I can't figure out
 how to generate HTML output.

I ended up using the SequenceMatcher on individual pairs of
lines and generating my own HTML based on the results of
get_matching_blocks().

That produced the desired results.

-- 
Grant Edwards   grante Yow! I have a very good
  at   DENTAL PLAN.  Thank you.
   visi.com
-- 
http://mail.python.org/mailman/listinfo/python-list