Re: Which Diff tool could I use for visually comparing two text files where Word Wrap is possible?

davidson Wed, 05 Apr 2023 08:53:16 -0700

On Wed, 5 Apr 2023 Susmita/Rajib wrote:

On 04/04/2023, davidson <david...@freevolt.org> wrote:

[trimmed email headers]
[trimmed preliminary negotiation of what would constitute a solution]

That is, you'd like to be shown as many characters on one screen as
possible, without a lot of wastefully empty margins.

(I expect I have overstated your intent here. Do correct me.)


There can absolutely be no need for correcting you. You have extracted
what I really meant from my poor choice of words. Yes, more "column
width" would reflect in "longer lines" without wasting empty margins.
PERFECT. Thank you indeed.

It looks to me like icdiff tries to remain faithful to the source
comparands, to the files you request it to compare.


By "faithful", I mean two things:

 1. icdiff will *wrap* lines so that all characters (relevant to the
    diff context) make it onto the screen in whichever half of the
    screen they belong. This entails the insertion of line breaks that
    are not in the source files. This is nonetheless faithful because
    if icdiff did not do this, then it would be unable to display all
    characters in the sources (relevant to the diff context).

 2. icdiff will not *remove* line breaks that are present in the
    source files. This would not be faithful. Line breaks are
    characters too.

Its job is to accurately display the distinctions between two files
for you. If it *removed* certain characters to make things prettier
for you, it would sabotage its ability to accomplish its task in full
generality.

In other words, it would make itself less useful.

Your wish to fill the margins with text requires removal of line
breaks. Because icdiff is faithful to its input, you must arrange to
remove those line breaks from the input you provide to icdiff. It will
not do it for you.

[trimmed definition of naive flow-text function]

So the final steps should look like this:
Define a unique function:
icdiff-flowed () { icdiff <( tr '\n' ' ' <"$1" ) <( tr '\n' ' ' <"$2" ) ; }

Then use that function:
icdiff-flowed file1 file2 | less -R

Perfect.


Well, not quite, as you discovered. We removed ALL the newlines, so
icdiff had to process a pair of lines many thousands of characters
long.

But I received an error when the lxterminal screen was the default
size:
RecursionError: maximum recursion depth exceeded while calling a
Python object


I don't think icdiff was designed for lines that long.

When the screen was maximised, I received an output, but all
line-breaks, paragraph breaks,

               ^^^^^^^^^^^^^^^^

If you want to preserve paragraph breaks, you need to arrange for
that. If we remove all newlines, we remove all paragraph breaks.

So you want to remove *some* newlines, but preserve others.

distinctions, separate colours, et al, were made into two colours,
one for the new file and one for the old.


Yeah, it looked like garbage.

May be the translation of '\n' into blank space ' ' is creating the
problem. Removing all formatting.


We literally removed all the newlines. Replaced them with
spaces. That's what "tr" did, and that's *all* it did. The formatting
that you percieve to be removed consisted of nothing but newlines.

Could the creators/maintainers be contacted to amend the program to
adjust column width? Is there a way to set icdiff's column width?


If you read the man page, you will see that there is. But I expect
that you will be disappointed to discover that setting the column
width does not do what you want.

I expect, in fact, that you will find that icdiff *by default* already
sets its width optimally, for whatever dimensions your terminal has at
the time you invoke it.

What you probably want is flowed text.

If you remove paragraph-internal newlines (and *only* those newlines)
from the input you provide to icdiff, then icdiff will wrap the
paragraphs naturally (ie, insert newlines to keep the paragraphs
displayed within the available columns), as needed.

And since they are *your* paragraphs, *you* are the one who knows how
to remove "paragraph-internal" newlines. I do not recommend trying to
harrass the author of a decent general purpose tool into flowing them
for you.

Attached (unless the listserv software has nuked it) is a sed script
"flow" (with verbose comments) which might serve your needs. (Since
you have not exhibited here any of the text you are working with, I
can only play the role of speculative optimist.)

For trial purposes make a new, empty directory. Here we'll pretend
that directory is called "testing". Put "flow" in that directory. Then
do

 $ cd testing # Make testing your current directory
 $ chmod u+x flow # Make flow executable
 $ PATH="$PATH:$PWD" # Now "flow" means something, for this session
 $ icdiff-flow () { icdiff <( flow <"$1" ) <( flow <"$2" ) ; }

and then you should be able to test it out in that same shell session:

 $ flow document # see if flow works as intended with a single document
 $ icdiff-flow document1 document2 # see if it works well with icdiff

Could you please change the html files you had used for
experimentation into text files and then run the experiment again?

[trimmed signoff]

That would serve no purpose. Html files are already text files.

--
Hackers are free people. They are like artists. If they are in a good
mood, they get up in the morning and begin painting their pictures.
-- Vladimir Putin

#!/usr/bin/env -S sed -f

# Flow text. (Remove intra-paragraph newlines.)

# First line of document.
# Initialise storage with first line.
1 {
    h       # Copy first line to storage
    d       # Delete original and begin new cycle
}

# Non-empty lines that begin with non-whitespace.
# We assume these belong to a paragraph accumulating in storage.
/^[^[:blank:]]/ { 
    H       # Copy the current line to storage (after appending a newline to 
whatever is already stored there)
    d       # Delete original and begin new cycle
}

# Lines that either
#   1. begin with whitespace
#   -OR-
#   2. are empty
# We assume these begin a new paragraph.
# We further assume that whatever is in storage we may now format (and print) 
as if it were a paragraph.
/^\([[:blank:]]\|$\)/ {
    x                         # SWAP: What was in storage is now on the 
workbench, and vice versa
    s/\(.\)\n\(.\)/\1 \2/g    # Replace every interstitial newline with a space
}

# Final line of document.
# This line has already been copied to storage by one of the lines above.
# All that remains to be done is to retrieve what is in storage, so that it 
will get printed.
$ {
    g
}

Re: Which Diff tool could I use for visually comparing two text files where Word Wrap is possible?

Reply via email to