On Wed, 5 Apr 2023 davidson wrote:
On Wed, 5 Apr 2023 Susmita/Rajib wrote:On 04/04/2023, davidson <david...@freevolt.org> wrote:
[trim]
Attached (unless the listserv software has nuked it) is a sed script "flow" (with verbose comments) which might serve your needs. (Since you have not exhibited here any of the text you are working with, I can only play the role of speculative optimist.) For trial purposes make a new, empty directory. Here we'll pretend that directory is called "testing". Put "flow" in that directory. Then do $ cd testing # Make testing your current directory $ chmod u+x flow # Make flow executable $ PATH="$PATH:$PWD" # Now "flow" means something, for this session $ icdiff-flow () { icdiff <( flow <"$1" ) <( flow <"$2" ) ; } and then you should be able to test it out in that same shell session: $ flow document # see if flow works as intended with a single document $ icdiff-flow document1 document2 # see if it works well with icdiff
Attached is a more adequate version of "flow", for converting plain text paragraphs, in flush or plain style*, to single lines. Unlike the previous version, version 2.0 does not fumble on the last line of the document and fail to print material before quitting. * A "plain" paragraph begins with its first line indented, whereas a "flush" paragraph is distinguished from its neighbors by blank newlines. -- Sometimes it pays to have squirrels in your head running around making you question everything. -- Clive Robinson
#!/usr/bin/env -S sed -f # Flow text. (Remove intra-paragraph newlines.) # Version 2.0 # First line of document initialises storage. 1 { h # 1. A copy goes to storage. d # 2. The original (still on the workbench) is discarded and a new cycle begins. } # When a line starts with non-whitespace character, # We assume it belongs to a paragraph accumulating in storage. /^[^[:blank:]]/ { H # 1. A copy goes to storage. $ { # In case this line terminates the document... g # ...Get everything out of storage. s/\(.\)\n\(.\)/\1 \2/g # ...Replace every interstitial newline with a space. q # ...Print and quit NOW. } d # 2. Toss out the original (the one still on the workbench) and begin new cycle } # When a line does not start with non-whitespace character (ie, it is empty or begins with whitespace), # We assume it begins a new paragraph. # We further assume that whatever is in storage we may now format (and print) as if it were a paragraph. /^\([[:blank:]]\|$\)/ { x # 1. Swap: A copy goes to storage, and what was in storage lands on the workbench. s/\(.\)\n\(.\)/\1 \2/g # 2. Format: Replace every interstitial newline with a space. (Then print it.) $ { # In case this line terminates the document... p # ...The stuff we just formatted gets printed, g # ...and then retrieve the line we just stored, and print it too before we quit. } }