Richard Owlett wrote: > On 06/28/2024 03:53 PM, Michael Kjörling wrote: > > On 28 Jun 2024 14:04 -0500, from rowl...@access.net (Richard Owlett): > > > I need to replace ANY occurrence of > > > <span class="verse" id="V1"> > > > thru [at most] > > > <span class="verse" id="V119"> > > > by > > > <sup> > > > > > > I'm reformatting a Bible stored in HTML format for a particular set of > > > vision impaired seniors (myself included). Each chapter is in its own > > > file. > > > > > > How do I open a file. > > > Do the above replacement. > > > Save and close the file. > > > > Ignoring the question about Emacs > > Emacs *CAN NOT* be ignored. > It is the _available_ editor known to be capable of handling regular > expressions.
If your machine doesn't have sed, it is not a working Debian system. Every Debian machine comes with sed by default. Even the rescue image has sed. The installer environment, before Debian is actually installed, has sed. sed is a basic tool that everyone has access to. emacs needs to be installed, and often is not. I know from past experience that it's useless to offer you any solution that deviates from the vision you have for the way the world ought to work, but this is a sufficiently common kind of problem that a full answer will be useful to other people. > > and focusing on the goal (your > > question otherwise is an excellent example of a XY question), this is > > not something regular expressions are very good at. > > HUH ?????????? An XY question is when someone asks "How can I do specific thing X?" but what they want to do is task Y, which is more easily accomplished in a different way that doesn't involve X at all. Usually this means that they have read something that tells them about X in a different context, and they think that is an essential part of solving their Y problem. If we're lucky, they tell us what Y is. Frequently, XY questions just show up as "How do I do X?" without context. It happens a lot on this mailing list. Or, maybe your expression of disbelief was about regular expressions? A regular expression (regexp) is a specific kind of formal language for specifying a pattern of tokens -- what we often call a "string". If the regexp describes a candidate string, we call that a "match". A common editing task is to find all the matches for a regexp and replace them with some other string. The program "grep" takes its name from a sequence of editor commands: global regular expression print. Michael says that regexps aren't great at this particular task because there's a variable component in the pattern which is hard to describe. He comes up with a clever solution based on the fact that the variable component is going to be an integer sequence. > > However, since > > it's presumably a once-only operation, I assume that you can live with > > it being done in a suboptimal way in terms of performance. > > > > In that case, assuming for simplicity that all the files are in a > > single directory, you could try something similar to: > > > > $ for v in $(seq 1 119); do sed -i 's,<span class="verse" > > id="V'$v'">,<sup>,g' ./*.html; done This sets up a loop which will execute 119 times, incrementing the variable $v from 1 to 119. Inside the loop, it calls `sed` to execute inplace (-i) which means it will change the files it encounters rather than spitting out new files on standard out. The command passed to sed is s,<span class="verse" id="V'$v'">,<sup>,g s means string substitution. It takes a pattern, a replacement, and options, separated by the next character after the s, which in this case is a comma. <span class="verse" id="V$v"> is the pattern. Because of the loop, the value $v is going to be replaced by the shell before sed sees this, so on various runs through the loop sed will see: <span class="verse" id="V1"> <span class="verse" id="V2"> ... <span class="verse" id="V118"> <span class="verse" id="V119"> You'll probably need to adjust this for other books. Anyway, whenever sed sees the pattern above, it will replace it with: <sup> which is what you said you wanted. The option "g" means that said should do this multiple times if it occurs in the same file (globally, like grep) instead of the default behavior which is to find the first match and just change that. ./*.html tells sed to operate on all the files in the current directory ending in .html -- yes, shells implement a version of regexp for file pattern matching. And that's the end of the loop. > I'll have to investigate sed further. > My project is not yet to the point of automatically editing ALL chapters. I > need to first establish how to edit all VERSES of an individual chapter. The solution Michael presented can be run on just one file instead of all the .html files in the current directory. > ROFL ;} No one would define me as a "programmer". I took an introduction to > computers course as a E.E. student in the 60's. Most of my jobs required > background in component level analog electronics. Got one assignment because > I was not "afraid" of 8080 ;} The true UNIX philosophy is that at any moment, any user can stop being "just a user" and use the tools present to do some programming to solve their problems. -dsr-