Richard Owlett wrote: 
> On 06/28/2024 03:53 PM, Michael Kjörling wrote:
> > On 28 Jun 2024 14:04 -0500, from rowl...@access.net (Richard Owlett):
> > > I need to replace ANY occurrence of
> > >      <span class="verse" id="V1">
> > >        thru [at most]
> > >      <span class="verse" id="V119">
> > > by
> > >      <sup>
> > > 
> > > I'm reformatting a Bible stored in HTML format for a particular set of
> > > vision impaired seniors (myself included). Each chapter is in its own 
> > > file.
> > > 
> > > How do I open a file.
> > > Do the above replacement.
> > > Save and close the file.
> > 
> > Ignoring the question about Emacs
> 
> Emacs *CAN NOT* be ignored.
> It is the _available_ editor known to be capable of handling regular
> expressions.

If your machine doesn't have sed, it is not a working Debian
system. 

Every Debian machine comes with sed by default.  Even the
rescue image has sed. The installer environment, before Debian
is actually installed, has sed. sed is a basic tool that
everyone has access to. emacs needs to be installed, and often
is not.

I know from past experience that it's useless to offer you any
solution that deviates from the vision you have for the way the
world ought to work, but this is a sufficiently common kind of
problem that a full answer will be useful to other people.

> > and focusing on the goal (your
> > question otherwise is an excellent example of a XY question), this is
> > not something regular expressions are very good at.
> 
> HUH ??????????

An XY question is when someone asks "How can I do specific thing
X?" but what they want to do is task Y, which is more easily
accomplished in a different way that doesn't involve X at all.
Usually this means that they have read something that tells them
about X in a different context, and they think that is an
essential part of solving their Y problem.

If we're lucky, they tell us what Y is. Frequently, XY questions
just show up as "How do I do X?" without context.

It happens a lot on this mailing list.

Or, maybe your expression of disbelief was about regular
expressions? A regular expression (regexp) is a specific kind of
formal language for specifying a pattern of tokens -- what we
often call a "string". If the regexp describes a candidate
string, we call that a "match". A common editing task is to find
all the matches for a regexp and replace them with some other
string.

The program "grep" takes its name from a sequence of editor
commands: global regular expression print. 

Michael says that regexps aren't great at this particular task
because there's a variable component in the pattern which is
hard to describe. He comes up with a clever solution based on
the fact that the variable component is going to be an integer
sequence.


> > However, since
> > it's presumably a once-only operation, I assume that you can live with
> > it being done in a suboptimal way in terms of performance.
> > 
> > In that case, assuming for simplicity that all the files are in a
> > single directory, you could try something similar to:
> > 
> > $ for v in $(seq 1 119); do sed -i 's,<span class="verse" 
> > id="V'$v'">,<sup>,g' ./*.html; done
 
This sets up a loop which will execute 119 times, incrementing
the variable $v from 1 to 119. Inside the loop, it calls `sed`
to execute inplace (-i) which means it will change the files it
encounters rather than spitting out new files on standard out.

The command passed to sed is

s,<span class="verse" id="V'$v'">,<sup>,g

s means string substitution. It takes a pattern, a replacement,
and options, separated by the next character after the s, which
in this case is a comma.

<span class="verse" id="V$v">

is the pattern. Because of the loop, the value $v is going to be
replaced by the shell before sed sees this, so on various runs
through the loop sed will see:

<span class="verse" id="V1">
<span class="verse" id="V2">
...
<span class="verse" id="V118">
<span class="verse" id="V119">


You'll probably need to adjust this for other books.

Anyway, whenever sed sees the pattern above, it will replace it
with:

<sup>

which is what you said you wanted.

The option "g" means that said should do this multiple times if
it occurs in the same file (globally, like grep) instead of the
default behavior which is to find the first match and just
change that.

./*.html

tells sed to operate on all the files in the current directory
ending in .html -- yes, shells implement a version of regexp for
file pattern matching. And that's the end of the loop.


> I'll have to investigate sed further.
> My project is not yet to the point of automatically editing ALL chapters. I
> need to first establish how to edit all VERSES of an individual chapter.

The solution Michael presented can be run on just one file
instead of all the .html files in the current directory.


> ROFL ;} No one would define me as a "programmer". I took an introduction to
> computers course as a E.E. student in the 60's. Most of my jobs required
> background in component level analog electronics. Got one assignment because
> I was not "afraid" of 8080 ;}

The true UNIX philosophy is that at any moment, any user can
stop being "just a user" and use the tools present to do some
programming to solve their problems. 


-dsr-

Reply via email to