On Wed, 5 Apr 2023 davidson wrote:
On Wed, 5 Apr 2023 Susmita/Rajib wrote:
On 04/04/2023, davidson <david...@freevolt.org> wrote:
[trim]

Attached (unless the listserv software has nuked it) is a sed script
"flow" (with verbose comments) which might serve your needs. (Since
you have not exhibited here any of the text you are working with, I
can only play the role of speculative optimist.)

For trial purposes make a new, empty directory. Here we'll pretend
that directory is called "testing". Put "flow" in that directory. Then
do

$ cd testing # Make testing your current directory
$ chmod u+x flow # Make flow executable
$ PATH="$PATH:$PWD" # Now "flow" means something, for this session
$ icdiff-flow () { icdiff <( flow <"$1" ) <( flow <"$2" ) ; }

and then you should be able to test it out in that same shell session:

$ flow document # see if flow works as intended with a single document
$ icdiff-flow document1 document2 # see if it works well with icdiff

Attached is a more adequate version of "flow", for converting plain
text paragraphs, in flush or plain style*, to single lines. Unlike the
previous version, version 2.0 does not fumble on the last line of the
document and fail to print material before quitting.

* A "plain" paragraph begins with its first line indented, whereas a
  "flush" paragraph is distinguished from its neighbors by blank
  newlines.

--
Sometimes it pays to have squirrels in your head running around making
you question everything. -- Clive Robinson
#!/usr/bin/env -S sed -f
# Flow text. (Remove intra-paragraph newlines.)
# Version 2.0

# First line of document initialises storage.
1 {
    h       # 1. A copy goes to storage.
    d       # 2. The original (still on the workbench) is discarded and a new 
cycle begins.
}
# When a line starts with non-whitespace character,
# We assume it belongs to a paragraph accumulating in storage.
/^[^[:blank:]]/ { 
    H       # 1. A copy goes to storage.
    $ {                           # In case this line terminates the document...
        g                         # ...Get everything out of storage.
        s/\(.\)\n\(.\)/\1 \2/g    # ...Replace every interstitial newline with 
a space.
        q                         # ...Print and quit NOW.
    }
    d       # 2. Toss out the original (the one still on the workbench) and 
begin new cycle
}
# When a line does not start with non-whitespace character (ie, it is empty or 
begins with whitespace),
# We assume it begins a new paragraph.
# We further assume that whatever is in storage we may now format (and print) 
as if it were a paragraph.
/^\([[:blank:]]\|$\)/ {
    x                         # 1. Swap: A copy goes to storage, and what was 
in storage lands on the workbench.
    s/\(.\)\n\(.\)/\1 \2/g    # 2. Format: Replace every interstitial newline 
with a space. (Then print it.)
    $ {                       # In case this line terminates the document...
        p                     # ...The stuff we just formatted gets printed,
        g                     # ...and then retrieve the line we just stored, 
and print it too before we quit. 
    }
}

Reply via email to