Re: Looping Shell Scripts and System Load

2020-06-25 Thread David
On Thu, 25 Jun 2020 at 19:23, Tixy  wrote:
> On Wed, 2020-06-24 at 13:43 -0400, Greg Wooledge wrote:

> [Lots of good shell scripting advice snipped]

> Thanks Greg for posting these code reviews of people's scripts, it's
> not just the script authors which might learn something, but also some
> of us list subscribers. :-)

Another excellent resource for that:
https://www.shellcheck.net/



Re: Looping Shell Scripts and System Load

2020-06-25 Thread Tixy
On Wed, 2020-06-24 at 13:43 -0400, Greg Wooledge wrote:
[Lots of good shell scripting advice snipped]

Thanks Greg for posting these code reviews of people's scripts, it's
not just the script authors which might learn something, but also some
of us list subscribers. :-)

-- 
Tixy




Re: Looping Shell Scripts and System Load

2020-06-24 Thread Martin McCormick
Greg Wooledge  writes:
> All-caps names are reserved for environment variables (HOME, PATH),
> and internal shell variables (IFS, PWD, HISTFILE).
> 
> Avoiding all-caps names allows you to avoid collisions with a variable
> name that might be used for something else.  Most of the time.  This
> being the Unix shell, there are *always* stupid exceptions (http_proxy
> and friends on the environment side, and auto_resume and histchars in
> bash).

Thank you.  I learned something today that I didn't
expect to learn which is why I posted the question in the first
place.  I just wasn't thinking, I guess and used the all-caps
names to indicate that they stood for files.  If one collided
with an environment variable name, it could make the script fail
in strange ways that would be totally unpredictable, depending on
which variable one preempted.

Martin



Re: Looping Shell Scripts and System Load

2020-06-24 Thread David Christensen

On 2020-06-24 10:19, Martin McCormick wrote:

I wrote a shell script that unzips documents and I originally
wrote it such that it gets document #1, unzips it then gets
document #2, etc and it does that just fine so I wondered if I
could make it run faster by starting several processes at once,
each one unzipping a file.  It's certainly still running and will
eventually finish but I created a monster because it starts as
many processes as there are items to unzip.

#!/bin/sh
unarchive ()  {
  unzip $1
return 0
}
MEDIADIR=`pwd`
mountpoint /mags >/dev/null  ||mount /mags
mountpoint /mags >/dev/null || exit 1
cd /mags
#rm -r -f *
  for MEDIAFILE in `ls $MEDIADIR/*`; do
dirname=`basename $MEDIAFILE`
mkdir $dirname
cd $dirname
unarchive $MEDIAFILE &
cd ../
done
wait
cd ~
umount /mags
exit 0

If there are 3 zipped files, it's probably going to be ok
and start 3 unzip processes.  This directory had 13 zip files and
the first 2 or 3 roared to life and then things slowed down as
they all tried to run.

I expected this and I've been doing unix shell scripts
for literally 31 years this Summer so it is no mystery as each
new job spawns a whole new set of processes to unzip the file it
is working on while all the others are still grinding on.

Miscreants have been known to deliberately create
loops that keep starting processes until the system crashes.

Fortunately, this is one of my systems but this made me
wonder before I reinvent the wheel if there is a way to make a
shell script throttle itself based on current load so it keeps
slurping resources until the next iteration starts too many and
things start to bog down.  When some of the earlier or shorter
processes finish, the loop can restart and start some more unzips
until all are done.

Right now, uptime looks like:

  11:48:07 up 26 days, 23:10,  7 users,  load average: 16.15, 15.60, 10.65

That's pretty loaded so ideally, one could start the
looping script and it would fire up processes until things got
really busy and then not allow any more new procs to start until
some have stopped so cron and other system utilities don't stop
running which is what happens when systems get too busy.

Thanks for any constructive suggestions.

Martin McCormick   WB5AGZ


GNU Parallel looks like a possibility:

https://packages.debian.org/stretch/parallel

https://www.gnu.org/software/parallel/


David



Re: Looping Shell Scripts and System Load

2020-06-24 Thread Kamil Jońca
Greg Wooledge  writes:

> On Wed, Jun 24, 2020 at 08:23:18PM +0200, Roger Price wrote:
>> On Wed, 24 Jun 2020, Greg Wooledge wrote:
>> 
>> > > MEDIADIR=`pwd`
>> > 
>> > Don't use all caps variable names.
>> 
>> Without getting into syntax-religious wars, what is the reasoning behind
>> this recommendation?  Roger
>
> All-caps names are reserved for environment variables (HOME, PATH),
> and internal shell variables (IFS, PWD, HISTFILE).
Is it described somewhere in docs?
>
> Avoiding all-caps names allows you to avoid collisions with a variable
> name that might be used for something else.  Most of the time.  This
Well, I think that reasonable prefix ("KJ_" :) ) prevent this :)

KJ

-- 
http://stopstopnop.pl/stop_stopnop.pl_o_nas.html



Re: Looping Shell Scripts and System Load

2020-06-24 Thread Greg Wooledge
On Wed, Jun 24, 2020 at 08:23:18PM +0200, Roger Price wrote:
> On Wed, 24 Jun 2020, Greg Wooledge wrote:
> 
> > > MEDIADIR=`pwd`
> > 
> > Don't use all caps variable names.
> 
> Without getting into syntax-religious wars, what is the reasoning behind
> this recommendation?  Roger

All-caps names are reserved for environment variables (HOME, PATH),
and internal shell variables (IFS, PWD, HISTFILE).

Avoiding all-caps names allows you to avoid collisions with a variable
name that might be used for something else.  Most of the time.  This
being the Unix shell, there are *always* stupid exceptions (http_proxy
and friends on the environment side, and auto_resume and histchars in
bash).



Re: Looping Shell Scripts and System Load

2020-06-24 Thread Roger Price

On Wed, 24 Jun 2020, Greg Wooledge wrote:


MEDIADIR=`pwd`


Don't use all caps variable names.


Without getting into syntax-religious wars, what is the reasoning behind this 
recommendation?  Roger




Re: Looping Shell Scripts and System Load

2020-06-24 Thread D. R. Evans
Martin McCormick wrote on 6/24/20 11:19 AM:

> 
>   Right now, uptime looks like:
> 
>  11:48:07 up 26 days, 23:10,  7 users,  load average: 16.15, 15.60, 10.65
> 
>   That's pretty loaded so ideally, one could start the
> looping script and it would fire up processes until things got
> really busy and then not allow any more new procs to start until
> some have stopped so cron and other system utilities don't stop
> running which is what happens when systems get too busy.
> 
>   Thanks for any constructive suggestions.
> 

My general approach is to use sem to create N-1 parallel jobs, where N is the
number of CPUs on the machine.

Not /exactly/ what you're asking for, but something along those lines would
probably help the situation. At least, it works for me :-)

  Doc

-- 
Web:  http://enginehousebooks.com/drevans



signature.asc
Description: OpenPGP digital signature


Re: Looping Shell Scripts and System Load

2020-06-24 Thread Greg Wooledge
On Wed, Jun 24, 2020 at 01:24:23PM -0400, Roberto C. Sánchez wrote:
> I recommend you look at the parallel package.  It is specifically geared
> toward parallelization of constructed shell command lines.  Think
> something along the lines of "find  -exec " but with the ability
> to parallelize (in a way that considers the available CPU cores on your
> system).

In all honesty, many of us have *tried* to find a use for GNU parallel,
but it really seems to be a hammer looking for a nail.  The aggressive
self-promotion, the huge blob of nagging that it does... all of that, and
it's not even better than GNU xargs -P for 99% of the jobs that need a
tiny bit of parallelization.



Re: Looping Shell Scripts and System Load

2020-06-24 Thread Greg Wooledge
On Wed, Jun 24, 2020 at 12:19:30PM -0500, Martin McCormick wrote:
> #!/bin/sh

Why?  Use bash.

> unarchive ()  {
>  unzip $1

Quotes.  

> MEDIADIR=`pwd`

Don't use all caps variable names.

Don't use backticks.  Use $() for command substitution.

Don't use $(pwd) to get the current directory.  It's in the PWD variable
already.

> mountpoint /mags >/dev/null  ||mount /mags
> mountpoint /mags >/dev/null || exit 1
> cd /mags

Check the result of cd.  Exit if it fails.  cd /mags || exit 1

> #rm -r -f *
>  for MEDIAFILE in `ls $MEDIADIR/*`; do

Do not use ls.  

Quotes again.  

What you want is:   for mediafile in "$mediadir"/*; do

> dirname=`basename $MEDIAFILE`
> mkdir $dirname
> cd $dirname

Quotes, quotes, quotes.  

Always check the result of a cd.  cd "$dirname" || exit 1

> unarchive $MEDIAFILE &

Quotes!  

>   If there are 3 zipped files, it's probably going to be ok
> and start 3 unzip processes.  This directory had 13 zip files and
> the first 2 or 3 roared to life and then things slowed down as
> they all tried to run.

 has some examples
for writing "run n jobs at a time".  We found some newer ways as well,
and those haven't all made it to the wiki yet.

One of the better ones is:

13:40 =greybot> Run N processes in parallel (bash 4.3): i=0 n=5; for elem in 
"${array[@]}"; do if (( i++ >= n )); then wait -n; fi; my_job 
"$elem" & done; wait

In your script, that would be something like:

#!/bin/bash
# Requires bash 4.3 or higher.

# cd and mount and stuff

i=0 n=3
for f; do
  if ((i++ >= n)); then wait -n; fi
  unarchive "$f" &
done
wait


If you have to target older versions of bash, see the ProcessManagement
page on the wiki for alternatives.



Re: Looping Shell Scripts and System Load

2020-06-24 Thread Roberto C . Sánchez
On Wed, Jun 24, 2020 at 12:19:30PM -0500, Martin McCormick wrote:
> I wrote a shell script that unzips documents and I originally
> wrote it such that it gets document #1, unzips it then gets
> document #2, etc and it does that just fine so I wondered if I
> could make it run faster by starting several processes at once,
> each one unzipping a file.  It's certainly still running and will
> eventually finish but I created a monster because it starts as
> many processes as there are items to unzip.
> 

I recommend you look at the parallel package.  It is specifically geared
toward parallelization of constructed shell command lines.  Think
something along the lines of "find  -exec " but with the ability
to parallelize (in a way that considers the available CPU cores on your
system).

Regards,

-Roberto

-- 
Roberto C. Sánchez



Looping Shell Scripts and System Load

2020-06-24 Thread Martin McCormick
I wrote a shell script that unzips documents and I originally
wrote it such that it gets document #1, unzips it then gets
document #2, etc and it does that just fine so I wondered if I
could make it run faster by starting several processes at once,
each one unzipping a file.  It's certainly still running and will
eventually finish but I created a monster because it starts as
many processes as there are items to unzip.

#!/bin/sh
unarchive ()  {
 unzip $1
return 0
}
MEDIADIR=`pwd`
mountpoint /mags >/dev/null  ||mount /mags
mountpoint /mags >/dev/null || exit 1
cd /mags
#rm -r -f *
 for MEDIAFILE in `ls $MEDIADIR/*`; do
dirname=`basename $MEDIAFILE`
mkdir $dirname
cd $dirname
unarchive $MEDIAFILE &
cd ../
done
wait
cd ~
umount /mags
exit 0

If there are 3 zipped files, it's probably going to be ok
and start 3 unzip processes.  This directory had 13 zip files and
the first 2 or 3 roared to life and then things slowed down as
they all tried to run.

I expected this and I've been doing unix shell scripts
for literally 31 years this Summer so it is no mystery as each
new job spawns a whole new set of processes to unzip the file it
is working on while all the others are still grinding on.

Miscreants have been known to deliberately create
loops that keep starting processes until the system crashes.

Fortunately, this is one of my systems but this made me
wonder before I reinvent the wheel if there is a way to make a
shell script throttle itself based on current load so it keeps
slurping resources until the next iteration starts too many and
things start to bog down.  When some of the earlier or shorter
processes finish, the loop can restart and start some more unzips
until all are done.

Right now, uptime looks like:

 11:48:07 up 26 days, 23:10,  7 users,  load average: 16.15, 15.60, 10.65

That's pretty loaded so ideally, one could start the
looping script and it would fire up processes until things got
really busy and then not allow any more new procs to start until
some have stopped so cron and other system utilities don't stop
running which is what happens when systems get too busy.

Thanks for any constructive suggestions.

Martin McCormick   WB5AGZ