Hi Lie, On Jun 5, 2:53 am, Lie Ryan <lie.1...@gmail.com> wrote: > On 06/05/10 15:43, GZ wrote: > > > > > > > On Jun 4, 8:37 pm, Lie Ryan <lie.1...@gmail.com> wrote: > >> On06/05/10 07:51, GZ wrote: > >>> No, rsync does not solve my problem. > > >>> I want a library that does unix 'diff' like function, i.e. compare two > >>> strings line by line and output the difference. Python's difflib does > >>> not work perfectly for me, because the resulting differences are > >>> pretty big. I would like an algorithm that generates the smallest > >>> differences. > > >> is n=0 not short enough? > > >> pprint.pprint(list(difflib.context_diff(s, t, n=0))) > > > This still does not do what I want it to do. It only displays the diff > > results in a different format. I want a different algorithm to > > generate a smaller diff -- in other words less differences > > No, I meant I was confirming that you already have turned off context > lines (i.e. the n=0 part), right? > > Also, what's the nature of the changes? You might be able to minimize > difflib's output by using word-based or character-based diff-ing instead > of traditional line-based diff-ing. > > diff output is fairly compressable, so you might want to look at zipping > the output.- Hide quoted text - > > - Show quoted text -
Thanks for your response. The verboseness of the format is not really my problem and I only care about line by line comparison for now. Let me think of a better way to express what I mean by a "smaller diff." After I diff the two strings, I will have something like this: AAA - BBB + CCC + DDD - EEE It means the first line does not change, the second line is replaced by the third line, the forth line is new, and the fifth line is deleted. I define the "smallness" of the diff algorithm as "the sum of the total number of minuses and pluses". In my above example, it is 4 (two minuses and 2 pluses). Note that no matter what format we use to represent the diff, this number is the same. Python's difflib does not really minimize this number. It tries to make this number small, but also tries to yield matches that “look right” to people at the cost of increasing this number. (http:// docs.python.org/library/difflib.html). What I am looking for is an algo that can really minimize this number. -- http://mail.python.org/mailman/listinfo/python-list