Re: need help making shell script use two CPUs/cores

2011-01-25 Thread Stan Hoeppner
Carl Johnson put forth on 1/24/2011 5:07 PM:
 Stan Hoeppner s...@hardwarefreak.com writes:
 
 Now we have 4 CPUs on two memory channels.  If not for caches, you'd 
 see
 no speedup past 2 Imagemagick processes.  Which is pretty much the behavior
 identified by another OP with an Athlon II x4 system--almost zeo speedup 
 from 2
 to 4 processes.
 
 I think you are referring to the data that I posted for my Athlon II x4
 system, but that is *NOT* what the data showed.  I thought that the data
 clearly showed pretty good scaling up to 4 processors, so I don't know
 what you are seeing that everybody else is missing.  I will copy some of
 the data below, but basically it showed that total time almost cut in
 half when it went from 1 to 2 processors, and again when it went from 2
 to 4 processors.
 
 Processors  Time (seconds)
 P1  66
 P2  36
 P4  20

Perfect scaling here would be a run time of 16.5 seconds with 4 processes/cores
with this particular sample set of photos.  20 is 1/4th of 80, and closer to
1/3rd of 66.  This isn't close to linear scaling, although it is a little better
than I expected from this particular CPU.  One can clearly see the effects of
memory contention at only 2 processes, and that trend continues out to 4
processes, getting progressively worse, as expected.  Past four processes,
likely at 5, and on from there, you'll see little, and then no scaling at all.

I must admit I am a bit surprised that a quad core AMD with only 512KB L2 per
core, and no L3, scales as well as it does to 4 processes.  The images in this
sample test are relatively small though, so cache size probably isn't that much
in play.  With larger image sizes I'm guessing we'd see less scaling than we do
here, as many more reads to main memory will be required to fetch the pixel
data, and thus memory contention among the 4 cores will be much higher.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d3edc1a.3090...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-24 Thread Bob Proulx
Stan Hoeppner wrote:
 Bob Proulx put forth:
  Here is some raw data from another test using GraphicsMagic from Debian
  Sid on an Intel Core2 Quad CPU Q9400 @ 2.66GHz.
  
  #CPUs  real   user  sys
  1 ... 32.17 100.15 2.29
  2 ... 28.02 102.09 2.25
  3 ... 26.96 101.41 2.02
  4 ... 26.18  99.85 2.10
  5 ... 26.03  98.58 2.27
  6 ... 27.07  97.32 2.17
  7 ... 27.74 100.09 2.03
  8 ... 26.76  97.83 1.99
  9 ... 27.24  97.31 2.88
 10 ... 26.27  99.05 2.76
 11 ... 26.35  99.30 1.84
 12 ... 25.91  97.63 2.08
 
 So, I'm not understanding how we have a quad core CPU with 12 CPUs.
 Is #CPUs here your xargs -P argument in the script you posted in
 response to my question that started this thread?

Sorry.  Yes I made a mistake in posting those headings.  Yes it was
the xargs -P parallelization argument listed where I said #CPUs.
Running a different number of conversion processes in parallel.

 Why bother going up to 12 processes with a quad core chip?  Anything
 over 4 processes/threads won't gain you anything, as your results
 above demonstrate.

I went to 12 because it would demonstrate the behavior three times
past the number of cores.  If I had only a dual core I would have only
chosen to go to 6.  But I would have gone to 6 for one core too since
three doesn't generate a smooth enough scatter plot for me.  But I
didn't want to spend too much time analyzing the problem to set up a
statistically designed experiment.  I just wanted to quickly perform
the test.  So plucked in 12 there and moved on.  Surely that would be
enough.  I didn't think I would need to rigorously defend that quick
choice against a panel.

At some point by doing more parallelism things will actually be slowed
down by it.  I didn't reach that point.

  And the same thing using ImageMagick on the same system.
  
  #CPUs  real  user  sys
  1 ... 24.69 62.60 2.87
  2 ... 19.28 63.17 2.50
  3 ... 17.82 60.34 2.65
  4 ... 17.48 58.86 2.55
  5 ... 16.60 58.11 2.34
  6 ... 15.85 58.03 2.38
  7 ... 15.61 58.09 2.44
  8 ... 15.36 57.68 2.48
  9 ... 15.48 57.76 2.38
 10 ... 15.38 57.76 2.28
 11 ... 15.36 57.97 2.27
 12 ... 15.73 58.76 2.17
  
  Watching the individual cpu load I observe that while the 1 cpu case
  did consume one cpu fully that the other three were also showing quite
  a bit of activity too.  
 
 Imagemagick will use threads on larger images.  To keep it from threading, in
 order for your testing to make more sense, use smaller images.

I couldn't find anything in the ImageMagick documentation that
described its threading behavior.  Where did I miss that useful
information?

For images I used your set of benchmark photos that we have been
discussing in this thread.

  three running all four cpus were looking pretty much 100% consumed.  I
  was timing all of the shell's for loop, the xargs and the convert
  processes all together.
 
 If you are converting images large enough that the threading kicks
 in, there's little reason to use multiple processes at that point.
 We'd already discussed this.  Were you simply trying to confirm that
 with these tests?

I expected that on this machine that the memory backplane wouldn't
have enough memory bandwidth to support all four processors.  I expect
it to brown out before getting to four.  Having a quad-core sounds
great but just having four cores doesn't mean all of them can be used
at the same time to advantage.  I expect that the extra cores will
get starved.  And so the curve will drop off sooner than four.

  I also tried running this same test on some slower hardware.  I have
  gotten spoiled by the faster machine.  The benchmark is still running
  on my slower machines. :-)  I am not going to wait for it to finish.
 
 What are the CPU specs of this older machine?

I tested this on an Intel Celeron 2.4GHz machine with 2.5G ram.
Unfortunately I see now that I have lost the saved data from that
test.  (Drat!  I know what I did but I would need to run the test
again to regenerate.)  But an entire run to six parallel conversions
there as I recall took over thirty minutes of total time to complete
and as I recall worked out to being twenty times slower.  Don't hold
me to those numbers as I would need to capture the actual data again
to be sure and I don't want to spend the time to do that.  But it was
slower, much slower.  (This is actually my main web server and
normally does image conversions when I upload photos.  This
information is probably going to motivate me to set up a task queue to
speed up my image conversions there.)

Bob


signature.asc
Description: Digital signature


Re: need help making shell script use two CPUs/cores

2011-01-24 Thread Stan Hoeppner
Bob Proulx put forth on 1/24/2011 12:21 PM:
 Stan Hoeppner wrote:

 Why bother going up to 12 processes with a quad core chip?  Anything
 over 4 processes/threads won't gain you anything, as your results
 above demonstrate.
 
 I went to 12 because it would demonstrate the behavior three times
 past the number of cores.  If I had only a dual core I would have only
 chosen to go to 6.  But I would have gone to 6 for one core too since
 three doesn't generate a smooth enough scatter plot for me.  But I
 didn't want to spend too much time analyzing the problem to set up a
 statistically designed experiment.  I just wanted to quickly perform
 the test.  So plucked in 12 there and moved on.  Surely that would be
 enough.  I didn't think I would need to rigorously defend that quick
 choice against a panel.

But you'll run out of memory bandwidth before you hit 4 processes, especially if
your 4-way chip has no L3 cache, such as the Athlon II x4 chips.  Going all the
way out to 12 processes seems a bit silly. Even with something like one of
Intel's Core i7s with a monster L3 cache, you'll exhaust your memory and cache
b/w well before you have (#cores*1.5) processes.

 At some point by doing more parallelism things will actually be slowed
 down by it.  I didn't reach that point.

This will probably only occur if you run out of memory and have to swap.  The
overhead of the Linux task scheduler is tiny--we're talking microseconds per
task switch.  And as I mentioned, you're already thrashing your caches at 4
processes, so beyond that point everything is purely memory b/w constrained.
This bandwidth is finite, static.  So no matter how many processes you run
(unless you run more processes than you have images) you probably won't see any
slowdown past 4 processes.

 Imagemagick will use threads on larger images.  To keep it from threading, in
 order for your testing to make more sense, use smaller images.

 I couldn't find anything in the ImageMagick documentation that
 described its threading behavior.  Where did I miss that useful
 information?

See:  http://www.imagemagick.org/script/architecture.php

 For images I used your set of benchmark photos that we have been
 discussing in this thread.

Hmmm.  If you were seeing threading with a single process with those images,
this would lead me to believe the Lenny Imagemagick version doesn't support
threads.  You're running the Squeeze package, correct?  I'm running:

$ identify -version
Version: ImageMagick 6.3.7 11/17/10 Q16 http://www.imagemagick.org
Copyright: Copyright (C) 1999-2008 ImageMagick Studio LLC

According to the docs I should see something like:

$ identify -version
Features: OpenMP OpenCL

but I don't.

 I expected that on this machine that the memory backplane wouldn't
 have enough memory bandwidth to support all four processors.  I expect

None of them do.  Recall when the first socket 939 AMD chips hit the market,
with all mobos having dual channel memory as the controller was on the CPU?  One
core with dual memory channels, and many applications saw huge performance
gains.  Now we have 4 CPUs on two memory channels.  If not for caches, you'd see
no speedup past 2 Imagemagick processes.  Which is pretty much the behavior
identified by another OP with an Athlon II x4 system--almost zeo speedup from 2
to 4 processes.

 it to brown out before getting to four.  Having a quad-core sounds
 great but just having four cores doesn't mean all of them can be used
 at the same time to advantage.  I expect that the extra cores will
 get starved.  And so the curve will drop off sooner than four.

This is always the case.  No multicore CPU has enough memory channels to keep
all cores fed on a byte/OP basis.  This is no secret.  It's been well discussed
for many years now.

 I also tried running this same test on some slower hardware.  I have
 gotten spoiled by the faster machine.  The benchmark is still running
 on my slower machines. :-)  I am not going to wait for it to finish.

 What are the CPU specs of this older machine?
 
 I tested this on an Intel Celeron 2.4GHz machine with 2.5G ram.

My test server is a dual Celeron 550 with only 384MB and it doesn't take
anywhere near 30 minutes for that set of test images.  IIRC it only took a few
minutes.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d3dcfd0.5090...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-24 Thread Carl Johnson
Stan Hoeppner s...@hardwarefreak.com writes:

 Now we have 4 CPUs on two memory channels.  If not for caches, you'd 
 see
 no speedup past 2 Imagemagick processes.  Which is pretty much the behavior
 identified by another OP with an Athlon II x4 system--almost zeo speedup from 
 2
 to 4 processes.

I think you are referring to the data that I posted for my Athlon II x4
system, but that is *NOT* what the data showed.  I thought that the data
clearly showed pretty good scaling up to 4 processors, so I don't know
what you are seeing that everybody else is missing.  I will copy some of
the data below, but basically it showed that total time almost cut in
half when it went from 1 to 2 processors, and again when it went from 2
to 4 processors.

Processors  Time (seconds)
P1  66
P2  36
P4  20

-- 
Carl Johnsonca...@peak.org


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87y669zysj.fsf@oak.localnet



Re: need help making shell script use two CPUs/cores

2011-01-23 Thread Bob Proulx
Carl Johnson wrote:
 #CPUs  time  theoretical   time-theoreticalgain/CPU(theoretical)
 1  66
 2  3666/2 = 33 36-33   = 3   (+9%) 1  -1/2 = 1/2
 3  2566/3 = 22 25-22   = 3   (+14%)1/2-1/3 = 1/6
 4  2066/4 = 16.5   20-16.5 = 3.5 (+21%)1/3-1/4 = 1/12

I liked that analysis.

Here is some raw data from another test using GraphicsMagic from Debian
Sid on an Intel Core2 Quad CPU Q9400 @ 2.66GHz.

#CPUs  real   user  sys
1 ... 32.17 100.15 2.29
2 ... 28.02 102.09 2.25
3 ... 26.96 101.41 2.02
4 ... 26.18  99.85 2.10
5 ... 26.03  98.58 2.27
6 ... 27.07  97.32 2.17
7 ... 27.74 100.09 2.03
8 ... 26.76  97.83 1.99
9 ... 27.24  97.31 2.88
   10 ... 26.27  99.05 2.76
   11 ... 26.35  99.30 1.84
   12 ... 25.91  97.63 2.08

And the same thing using ImageMagick on the same system.

#CPUs  real  user  sys
1 ... 24.69 62.60 2.87
2 ... 19.28 63.17 2.50
3 ... 17.82 60.34 2.65
4 ... 17.48 58.86 2.55
5 ... 16.60 58.11 2.34
6 ... 15.85 58.03 2.38
7 ... 15.61 58.09 2.44
8 ... 15.36 57.68 2.48
9 ... 15.48 57.76 2.38
   10 ... 15.38 57.76 2.28
   11 ... 15.36 57.97 2.27
   12 ... 15.73 58.76 2.17

Watching the individual cpu load I observe that while the 1 cpu case
did consume one cpu fully that the other three were also showing quite
a bit of activity too.  There was already quite a bit of parallelism
happening before adding the second cpu, and third, and so forth.  With
three running all four cpus were looking pretty much 100% consumed.  I
was timing all of the shell's for loop, the xargs and the convert
processes all together.

I also tried running this same test on some slower hardware.  I have
gotten spoiled by the faster machine.  The benchmark is still running
on my slower machines. :-)  I am not going to wait for it to finish.

Bob


signature.asc
Description: Digital signature


Re: need help making shell script use two CPUs/cores

2011-01-23 Thread Stan Hoeppner
Bob Proulx put forth on 1/23/2011 8:16 PM:

Apparently I've missed some of the thread since my earlier participation.

 Carl Johnson wrote:
 #CPUs  time  theoretical   time-theoreticalgain/CPU(theoretical)
 1  66
 2  3666/2 = 33 36-33   = 3   (+9%) 1  -1/2 = 1/2
 3  2566/3 = 22 25-22   = 3   (+14%)1/2-1/3 = 1/6
 4  2066/4 = 16.5   20-16.5 = 3.5 (+21%)1/3-1/4 = 1/12
 
 I liked that analysis.
 
 Here is some raw data from another test using GraphicsMagic from Debian
 Sid on an Intel Core2 Quad CPU Q9400 @ 2.66GHz.
 
 #CPUs  real   user  sys
 1 ... 32.17 100.15 2.29
 2 ... 28.02 102.09 2.25
 3 ... 26.96 101.41 2.02
 4 ... 26.18  99.85 2.10
 5 ... 26.03  98.58 2.27
 6 ... 27.07  97.32 2.17
 7 ... 27.74 100.09 2.03
 8 ... 26.76  97.83 1.99
 9 ... 27.24  97.31 2.88
10 ... 26.27  99.05 2.76
11 ... 26.35  99.30 1.84
12 ... 25.91  97.63 2.08

So, I'm not understanding how we have a quad core CPU with 12 CPUs.  Is #CPUs
here your xargs -P argument in the script you posted in response to my
question that started this thread?  Why bother going up to 12 processes with a
quad core chip?  Anything over 4 processes/threads won't gain you anything, as
your results above demonstrate.

 And the same thing using ImageMagick on the same system.
 
 #CPUs  real  user  sys
 1 ... 24.69 62.60 2.87
 2 ... 19.28 63.17 2.50
 3 ... 17.82 60.34 2.65
 4 ... 17.48 58.86 2.55
 5 ... 16.60 58.11 2.34
 6 ... 15.85 58.03 2.38
 7 ... 15.61 58.09 2.44
 8 ... 15.36 57.68 2.48
 9 ... 15.48 57.76 2.38
10 ... 15.38 57.76 2.28
11 ... 15.36 57.97 2.27
12 ... 15.73 58.76 2.17
 
 Watching the individual cpu load I observe that while the 1 cpu case
 did consume one cpu fully that the other three were also showing quite
 a bit of activity too.  

Imagemagick will use threads on larger images.  To keep it from threading, in
order for your testing to make more sense, use smaller images.

 There was already quite a bit of parallelism
 happening before adding the second cpu, and third, and so forth.  With

See above.  BTW, you weren't adding the 2nd CPU.  You were merely spawning
more _processes_ no?

 three running all four cpus were looking pretty much 100% consumed.  I
 was timing all of the shell's for loop, the xargs and the convert
 processes all together.

If you are converting images large enough that the threading kicks in, there's
little reason to use multiple processes at that point.  We'd already discussed
this.  Were you simply trying to confirm that with these tests?

 I also tried running this same test on some slower hardware.  I have
 gotten spoiled by the faster machine.  The benchmark is still running
 on my slower machines. :-)  I am not going to wait for it to finish.

What are the CPU specs of this older machine?

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d3ce747.5050...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-15 Thread Carl Johnson
Stan Hoeppner s...@hardwarefreak.com writes:

 Carl Johnson put forth on 1/13/2011 11:34 AM:

 Processors  Time (seconds)
 P1  66
 P2  36
 P3  25
 P4  20
 P5  20
 P6  20
 P7  20
 P8  20

 Your numbers bear out exactly what I predicted.  Look at the decrease in run
 time from 1 to 2, 2 to 3, and from 3 to 4 processes:

 #CPUs Decremental run timeFractional gain per CPU
 2 30s 1/2
 3 11s 1/6th
 4  5s 1/13th

 You can clearly see the effects of serious memory contention when 3 cores are
 pegged.  Bringing the 4th core into the mix yields almost nothing compared to
 three cores, cutting only 5 seconds from a 66 second run time.

I seem to be looking at it in a different way, because the numbers don't
seem that much different that what I would expect.

#CPUs  time  theoretical   time-theoreticalgain/CPU(theoretical)
1  66
2  3666/2 = 33 36-33   = 3   (+9%) 1  -1/2 = 1/2
3  2566/3 = 22 25-22   = 3   (+14%)1/2-1/3 = 1/6
4  2066/4 = 16.5   20-16.5 = 3.5 (+21%)1/3-1/4 = 1/12

-- 
Carl Johnsonca...@peak.org


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87ei8e2foa.fsf@oak.localnet



Re: need help making shell script use two CPUs/cores

2011-01-14 Thread Osamu Aoki
On Sun, Jan 09, 2011 at 10:05:43AM -0600, Stan Hoeppner wrote:
 I'm not very skilled at writing shell scripts.
 
 #! /bin/sh
 for k in $(ls *.JPG); do convert $k -resize 1024 $k; done
 
 I use the above script to batch re-size digital camera photos after I
 dump them to my web server.  It takes a very long time with lots of new
 photos as the server is fairly old, even though it is a 2-way SMP,
 because the script only runs one convert process at a time serially,
 only taking advantage of one CPU.  The convert program is part of the
 imagemagick toolkit.
 
 How can I best modify this script so that it splits the overall job in
 half, running two simultaneous convert processes, one on each CPU?
 Having such a script should cut the total run time in half, or nearly
 so, which would really be great.

Not really 2 but ...

Either use make to run controlled number of process or just do
something along

for k in $(ls *.JPG); do convert $k -resize 1024 $k log.txt 2log2.txt ; done



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110114154648.ga16...@debian.org



Re: need help making shell script use two CPUs/cores

2011-01-14 Thread Stan Hoeppner
Carl Johnson put forth on 1/13/2011 11:34 AM:

 Processors  Time (seconds)
 P1  66
 P2  36
 P3  25
 P4  20
 P5  20
 P6  20
 P7  20
 P8  20
 
 I am sure the time would have increased if the system had run out of
 memory and had to start swapping.  The system is not completely idle
 since I am running a KDE 4.4.5 desktop and VirtualBox with two guest
 OSs (Debian and NetBSD).  I suspect it would have closer to linear
 scaling if the system had been completely idle.

Your numbers bear out exactly what I predicted.  Look at the decrease in run
time from 1 to 2, 2 to 3, and from 3 to 4 processes:

#CPUs   Decremental run timeFractional gain per CPU
2   30s 1/2
3   11s 1/6th
45s 1/13th

You can clearly see the effects of serious memory contention when 3 cores are
pegged.  Bringing the 4th core into the mix yields almost nothing compared to
three cores, cutting only 5 seconds from a 66 second run time.

I'm anxious to see someone's results for a Phenom II X2 with the 6MB L2 cache to
verify my prediction there.  That's a tougher prediction though as I haven't
modeled the cache behavior of Imagemagick's convert program.  And the data above
shows it seems to be very memory b/w heavy.  Such a test would definitely be
very revealing of the effectiveness of the Phenom II X2's L3 cache, given what
we've seen so far.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d30d0f0.7000...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-13 Thread Stan Hoeppner
Bob Proulx put forth on 1/12/2011 2:48 PM:

 That makes a lot of sense to me.  And also when cpu time divides by
 1/N where N is the number of processes then if you have more convert
 processes running then effectively that task will get more total time
 than will the other tasks.  A little bit more here and a little bit
 less there on the other tasks running.  If you had two converts
 running and one mail task then the mail task would get 1/3rd and the
 two converts would get 2/3rds.  As opposed to one convert and one mail
 task with 1/2 and 1/2.

The math isn't quite that simple as it's a 2-way SMP box, and Linux can't
perfectly schedule compute intensive and non compute intensive processes across
CPUs.

 And context switching on a 550 MHz CPU with only 128K L2 cache is
 going to be expensive when two compute intensive tasks are running.
 
 I commend you on keeping that machine running.  My main mail and web
 server was, until the motherboard died very recently, a 400 MHz P2.  I
 was sad to see it go since it had been such a good performer for so
 many years.

I'll be _very_ sad when this one dies.  The Abit BP6 is the only dual Celeron
motherboard ever made.  It is legendary among over clockers due to the SMP
nature, the fact that Celerons were 1/3rd the price of PIIs at the time, and
that 333s were easily bumped to 500, and 366s easily bumped to 550--a 50%
increase in clock speed, usually achievable with stock heat sinks.  No modern
chip will do that AFAIK.  Thus, you got _more_ performance than an equivalent
dual PII workstation which topped out at 450 MHz, for less than 1/3rd the price.
 This single board prompted Intel to disable the SMP circuitry on all future
Celerons.  They'd left it enabled assuming no one would actually build such a
board.  Abit did, and using the venerable Intel 440BX northbridge no less. :)

The BP6 also has a lot of features most other boards were lacking at that time
(1999), including jumperless BIOS configuration of CPU FSB and multiplier,
independent CPU voltage adjustment, a Winbond voltage, thermal, and fan speed
monitoring chip, a dual channel 4 port HighPoint UDMA/66 chip yielding a board
with 8 HDD capability with 4 at UDMA/66 which no other board offered, and an
SMBus header, which _NO_ other consumer board had at that time.

The BP6 was the most exotic high end board on the market for at least a couple
of years.  It used a low end chip, but was faster than anything else at the
time.  Too bad Abit is no more:
http://www.theinquirer.net/inquirer/news/1051283/the-abit-obit

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d2ec415.9080...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-13 Thread Carl Johnson
Stan Hoeppner s...@hardwarefreak.com writes:


 Depending on the size of the photos one is converting, if they're relatively
 small like my 8.3MP 1.8MB jpegs, I'd think something like a dual core Phenom 
 II
 X2 w/ 6MB L3 cache and 21.4 GB/s memory b/w would likely continue to scale 
 with
 reduced overall script run time up to 4 parallel convert processes, maybe 
 more,
 due to the excess of L3 cache and the 10.7 GB/s available to each core.

 Conversely, I'd think that a quad core Athlon II X4 with no L3 cache and only
 512KB L2 cache per core, with each core receiving effectively only 5.3 GB/s of
 b/w, would not scale effectively to core_count*2 parallel processes as the
 Phenom II X2 would.  In fact, due to 4 cores with little cache sharing the 
 same
 21.4 GB/s of memory b/w, the quad core Athlon II would probably start seeing a
 decline in reduced run time going from 2 processes to 4 as twice as many cores
 compete for memory access, and tailing off dramatically as the process count 
 is
 increased to 5 and up.

 Just a guess.  Anyone have such systems to test with? :)

I have an Athlon II X4 620 (2.6 GHz), so I ran your test.  It is
somewhat different since I am currently running FreeBSD and didn't want
to reboot to get back into debian, and I have GraphicsMagick instead of
ImageMagick, but that shouldn't change the basic results.  The results
were that the time decreased up to 4 processes, but remained unchanged
after that.

Processors  Time (seconds)
P1  66
P2  36
P3  25
P4  20
P5  20
P6  20
P7  20
P8  20

I am sure the time would have increased if the system had run out of
memory and had to start swapping.  The system is not completely idle
since I am running a KDE 4.4.5 desktop and VirtualBox with two guest
OSs (Debian and NetBSD).  I suspect it would have closer to linear
scaling if the system had been completely idle.

-- 
Carl Johnsonca...@peak.org


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87oc7k4si5.fsf@oak.localnet



Re: need help making shell script use two CPUs/cores

2011-01-12 Thread Camaleón
On Tue, 11 Jan 2011 15:58:45 -0600, Stan Hoeppner wrote:

 Camaleón put forth on 1/11/2011 9:38 AM:
 
 I supposed you wouldn't care much in getting a script to run faster
 with all the available core occupied if you had a modern (4 years)
 cpu and plenty of speedy ram because the routine you wanted to run it
 should not take many time... unless you were going to process
 thousand of images :-)
 
 That's a bit ironic.  You're suggesting the solution is to upgrade to a
 new system with a faster processor and memory.  

Why did you get that impression? No, I said I thought you were running a 
resource-scarce machine so in order to simulate your environment I made 
the tests under my VM... nothing more.

 However, all the newer processors have 2, 4, 6, 8, or 12 cores.  So
 upgrading simply for single process throughput would waste all the
 other cores, which was the exact situation I found myself in.

But of course! I would not even think in upgrade the whole computer just 
to get one concrete task done a few more seconds faster.

 The ironic part is that parallelizing the script to maximize performance
 on my system will also do the same for the newer chips, but to an even
 greater degree on those with 4, 6, 8, or 12 cores.  Due to the fact that
 convert doesn't eat 100% of a core's time during its run, and the idle
 time in between one process finishing and xargs starting another, one
 could probably run 16-18 parallel convert processes on a 12 core Magny
 Cours with this script before run times stop decreasing.

I think the script should also work very well with single-core cpus.
 
 The script works.  It cut my run time by over 50%.  I'm happy.  As I
 said, this system's processing power is complete overkill 99% of the
 time.  It works beautifully with pretty much everything I've thrown at
 it, for 8 years now.  If I _really_ wanted to maximize the speed of this
 photo resizing task I'd install Win32 ImageMagick on my 2GHz Athlon XP
 workstation with dual channel memory nForce2 mobo, convert them on the
 workstation, and copy them to the server.
 
 However, absolute maximum performance of this task was not, and is not
 my goal.
  My goal was to make use of the second CPU, which was sitting idle in
  the
 server, to speed up the task completion.  That goal was accomplished. :)

Yeah, and tests are there to demonstrate the gain.

 Running more processes than real cores seems fine, did you try it?

 Define fine.
 
 Fine = system not hogging all resources.
 
 I had run 4 (2 core machine) and run time was a few seconds faster than
 2 processes, 3 seconds IIRC.  Running 8 processes pushed the system into
 swap and run time increased dramatically.  Given that 4 processes only
 have a few seconds faster than two, yet consumed twice as much memory,
 the best overall number of processes to run on this system is two.

Maybe the best number of processes is system-dependant (old processors 
could work better with a conservative value but newer ones can get some 
extra seconds with a higher one and without experiencing any significant 
penalty).

Greetings,

-- 
Camaleón


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/pan.2011.01.12.09.56...@gmail.com



Re: need help making shell script use two CPUs/cores

2011-01-12 Thread Bob Proulx
Stan Hoeppner wrote:
 Bob Proulx put forth:
  when otherwise it would be waiting for the disk.  I believe what you
  are seeing above is the result of being able to compute during that
  small block on I/O wait for the disk interval.
 
 That's gotta be a very small iowait interval.  So small, in fact, it
 doesn't show up in top at all.  I've watched top a few times during
 these runs and I never see iowait.

I would expect it to be very small.  So small that you won't see it by
eye when looking at it with top.  Motion pictures run at 24 frames per
second.  That is quite good enough for your eye to see it as
continuous motion.  But to a computer 1/24th of a second is a long
time.  I don't think you will be able to observe this by looking at it
with top and a one second update interval.

 I assumed the gain was simply because, watching top, each convert
 process doesn't actually fully peg the cpu during the entire process
 run life.  Running one or two more processes in parallel with the
 first two simply gives the kernel scheduler the opportunity to run
 another process during those idle ticks.

Uhm...  But that is pretty much exactly what I said!  :-) Doesn't
actually fully peg the cpu is because eventually it will need to
block on I/O from the disk.  The process will run until it either
blocks or is interrupted at the end of its timeslice.  Do you propose
other reasons for the process not to fully peg the cpu than for I/O
waits?

 There is also the time gap between a process exiting and xargs
 starting up the next one.

But what would be the cause of that gap?  Waiting on disk to load the
executable?  (Actually it should be cached into filesystem buffer
cache and and not have to wait for the disk.)  AFAIK there isn't any
gap there.  (Actually as long as there is another convert process in
memory then the next one will start very quickly by being able to
reuse the same memory code pages.)

 I have no idea how much time that takes.  But all the little bits
 add up in the total execution time of all 35 processes.

Yes.  All of the little bits add up and I believe accounts for the
decrease in total wall-clock time from start to finish.  A small but
measurable value.

And I think we were in agreement about everything else.  :-)

Bob


signature.asc
Description: Digital signature


Re: need help making shell script use two CPUs/cores

2011-01-12 Thread Stan Hoeppner
Camaleón put forth on 1/12/2011 3:56 AM:
 On Tue, 11 Jan 2011 15:58:45 -0600, Stan Hoeppner wrote:
 
 Camaleón put forth on 1/11/2011 9:38 AM:

 I supposed you wouldn't care much in getting a script to run faster
 with all the available core occupied if you had a modern (4 years)
 cpu and plenty of speedy ram because the routine you wanted to run it
 should not take many time... unless you were going to process
 thousand of images :-)

 That's a bit ironic.  You're suggesting the solution is to upgrade to a
 new system with a faster processor and memory.  
 
 Why did you get that impression? No, I said I thought you were running a 
 resource-scarce machine so in order to simulate your environment I made 
 the tests under my VM... nothing more.

My bad Camaleón.  I misunderstood what you said.  My apologies.

 However, all the newer processors have 2, 4, 6, 8, or 12 cores.  So
 upgrading simply for single process throughput would waste all the
 other cores, which was the exact situation I found myself in.
 
 But of course! I would not even think in upgrade the whole computer just 
 to get one concrete task done a few more seconds faster.

This depends on the task, of course.  It my case it just wouldn't make sense,
just as you say.  I've managed some systems that we'd upgrade every two years
because of a single application that never seemed to have enough horsepower
under the hood.  HPC compute centers seem to follow this trend.  There's never
enough cycles or enough nodes for many of them.

 The ironic part is that parallelizing the script to maximize performance
 on my system will also do the same for the newer chips, but to an even
 greater degree on those with 4, 6, 8, or 12 cores.  Due to the fact that
 convert doesn't eat 100% of a core's time during its run, and the idle
 time in between one process finishing and xargs starting another, one
 could probably run 16-18 parallel convert processes on a 12 core Magny
 Cours with this script before run times stop decreasing.
 
 I think the script should also work very well with single-core cpus.

This might depend on the hardware, but as I mentioned, it looks like the convert
program doesn't use 100% CPU during its run, so yes, using the xargs script to
fire up two concurrent convert processes with the kernel time slicing would
probably decrease overall run time to some degree.

 Yeah, and tests are there to demonstrate the gain.

Which is always a big plus.  No guess work. :)

 I had run 4 (2 core machine) and run time was a few seconds faster than
 2 processes, 3 seconds IIRC.  Running 8 processes pushed the system into
 swap and run time increased dramatically.  Given that 4 processes only
 have a few seconds faster than two, yet consumed twice as much memory,
 the best overall number of processes to run on this system is two.
 
 Maybe the best number of processes is system-dependant (old processors 
 could work better with a conservative value but newer ones can get some 
 extra seconds with a higher one and without experiencing any significant 
 penalty).

I don't have the machines here to confirm that hypothesis, but knowledge and
experience tell me you're exactly correct.  The reasons why you're correct are
tied mostly to available L2/L3 cache bandwidth, and memory size and bandwidth.
On my SUT, one convert process at its peak easily consumes more than half the
memory bandwidth, which is why I only see a 50% reduction in run time using 2
processes, on running on each CPU, instead of a 100% reduction.  Each 500 MHz
Celeron CPU only has 128KB of L2 cache.  System memory bandwidth of the 440BX
chipset is only 800 MB/s.

Depending on the size of the photos one is converting, if they're relatively
small like my 8.3MP 1.8MB jpegs, I'd think something like a dual core Phenom II
X2 w/ 6MB L3 cache and 21.4 GB/s memory b/w would likely continue to scale with
reduced overall script run time up to 4 parallel convert processes, maybe more,
due to the excess of L3 cache and the 10.7 GB/s available to each core.

Conversely, I'd think that a quad core Athlon II X4 with no L3 cache and only
512KB L2 cache per core, with each core receiving effectively only 5.3 GB/s of
b/w, would not scale effectively to core_count*2 parallel processes as the
Phenom II X2 would.  In fact, due to 4 cores with little cache sharing the same
21.4 GB/s of memory b/w, the quad core Athlon II would probably start seeing a
decline in reduced run time going from 2 processes to 4 as twice as many cores
compete for memory access, and tailing off dramatically as the process count is
increased to 5 and up.

Just a guess.  Anyone have such systems to test with? :)

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d2dfd15.8000...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-12 Thread Stan Hoeppner
Bob Proulx put forth on 1/12/2011 1:11 PM:
 Stan Hoeppner wrote:
 Bob Proulx put forth:
 when otherwise it would be waiting for the disk.  I believe what you
 are seeing above is the result of being able to compute during that
 small block on I/O wait for the disk interval.

 That's gotta be a very small iowait interval.  So small, in fact, it
 doesn't show up in top at all.  I've watched top a few times during
 these runs and I never see iowait.
 
 I would expect it to be very small.  So small that you won't see it by
 eye when looking at it with top.  Motion pictures run at 24 frames per
 second.  That is quite good enough for your eye to see it as
 continuous motion.  But to a computer 1/24th of a second is a long
 time.  I don't think you will be able to observe this by looking at it
 with top and a one second update interval.

My point wasn't that not seeing it meant that it wasn't happening.  I'm sure I'd
have seen something had I run iostat.  But being that small, with a total script
run time of over a minute, how does the IO wait time come into play, to any
significant degree, if the total IO wait is maybe 2 seconds?
(apt analogy btw--good for others who may not have understood otherwise)

 I assumed the gain was simply because, watching top, each convert
 process doesn't actually fully peg the cpu during the entire process
 run life.  Running one or two more processes in parallel with the
 first two simply gives the kernel scheduler the opportunity to run
 another process during those idle ticks.
 
 Uhm...  But that is pretty much exactly what I said!  :-) Doesn't
 actually fully peg the cpu is because eventually it will need to
 block on I/O from the disk.  The process will run until it either
 blocks or is interrupted at the end of its timeslice.  Do you propose
 other reasons for the process not to fully peg the cpu than for I/O
 waits?

Yes, I do.  I've not looked at the code so I can't say for sure.  However,
watching top (yes, not that accurate) during the runs showed periods of multiple
seconds where each convert process was only running at 60% CPU. Then it would
bump back to 100%.  IIRC this happened multiple times.  Considering this is an
image processing program, I would _assume_ the entire image file is loaded into
memory upon startup.  After processing is complete the image file is written
out.  I don't see why the process would be accessing disk during its run,
especially with these small 1.8 MB jpg files.  Thus, I am guessing that there
are a couple of code routines in the conversion process that just don't peg the
CPU.  Or, is it possible that memory contention between the two CPUs causes this
less than 100% CPU usage reported in top, and when each is running 100% CPU
that most of the workload is actually in that tiny 128KB L2 cache?  I'm not a
top expert.  If a process blocks on memory wait does the kernel still report the
process as 100% CPU or lower?

Anyway, these are the two possible reasons I propose for the less than 100% CPU
usage of the convert processes.  I'm making educated guesses here, not stating 
fact.

 There is also the time gap between a process exiting and xargs
 starting up the next one.
 
 But what would be the cause of that gap?  Waiting on disk to load the
 executable?  (Actually it should be cached into filesystem buffer
 cache and and not have to wait for the disk.)  AFAIK there isn't any
 gap there.  (Actually as long as there is another convert process in
 memory then the next one will start very quickly by being able to
 reuse the same memory code pages.)

As you said, top's 1 second interval, and the manner in which it displays what
is happening, may be masking what's really going on.  What I've stated was
looking at the %CPU for each process, not the summary area %CPU.  Likely what I
described as a gap was merely one convert PID dying and another starting up at
another location further up the screen.  With each of these things occurring in
a different frame that would explain the appearance of a time gap.

So, I'd say I was wrong in describing that as a time gap.  I'd have to do some
testing with other tools to absolutely verify all of this.  Frankly I'd rather
not waste the time on it at this point.  You solved my original problem Bob!
Thank again.  That was the important takeaway here.  Now we're into minutia
(which can be fun but I'm spending way too much time on debian-user email the
last few days)

 I have no idea how much time that takes.  But all the little bits
 add up in the total execution time of all 35 processes.
 
 Yes.  All of the little bits add up and I believe accounts for the
 decrease in total wall-clock time from start to finish.  A small but
 measurable value.
 
 And I think we were in agreement about everything else.  :-)

Yep.  Chalk all this up to incorrect data due to insufficient frame rate. :)

Ahh, something else I just realized.  Feel free to slap me if you like. :)

Given this is a production mx mail and web server, it's 

Re: need help making shell script use two CPUs/cores

2011-01-12 Thread Bob Proulx
Stan Hoeppner wrote:
 Frankly I'd rather not waste the time on it at this point.  You
 solved my original problem Bob!  Thank again.  That was the
 important takeaway here.  Now we're into minutia (which can be fun
 but I'm spending way too much time on debian-user email the last few
 days)

Glad to have been able to help with your original problem!  And I
agree, I am spending way too much time here too.  Need to get other
work done. :-)

 Ahh, something else I just realized.  Feel free to slap me if you like. :)

I missed that too.

 Given this is a production mx mail and web server, it's very likely that 
 daemons
 awoke and ate some CPU without causing a highlight change in top.  Since I was
 intensely watching the convert processes, I may not have noticed, or simply
 ignored them.  That's a better explanation for the less than 100% CPU per
 convert process than anything else, and far more likely.  smtpd, imapd,
 lighttpdd, etc are frequently firing and eating little bits of CPU.  This is a
 personal server so the traffic is small, but nonetheless daemons are firing
 regularly.  Postfix alone fires 3 or 4 daemons when mail arrives.  None of 
 these
 eat much CPU time, but they all add up.

That makes a lot of sense to me.  And also when cpu time divides by
1/N where N is the number of processes then if you have more convert
processes running then effectively that task will get more total time
than will the other tasks.  A little bit more here and a little bit
less there on the other tasks running.  If you had two converts
running and one mail task then the mail task would get 1/3rd and the
two converts would get 2/3rds.  As opposed to one convert and one mail
task with 1/2 and 1/2.

 And context switching on a 550 MHz CPU with only 128K L2 cache is
 going to be expensive when two compute intensive tasks are running.

I commend you on keeping that machine running.  My main mail and web
server was, until the motherboard died very recently, a 400 MHz P2.  I
was sad to see it go since it had been such a good performer for so
many years.

Bob


signature.asc
Description: Digital signature


Re: need help making shell script use two CPUs/cores

2011-01-11 Thread Stan Hoeppner
Camaleón put forth on 1/10/2011 2:11 PM:

 Did'nt you run any test? Okay... (now downloading the sample images)

Yes, or course.  I just didn't capture results to file.  And it's usually better
if people see their own results instead of someone else' copy/paste.

 2.  On your dual processor, or dual core system, execute:

 for k in *.JPG; do echo $k; done | xargs -I{} -P2 convert {} -resize
 3072 {} 
 
 I used a VM to get the closest environment as you seem to have (a low 
 resource machine) and the above command (timed) gives:

I'm not sure what you mean by resources in this context.  My box has plenty of
resources for the task we're discussing.  Each convert process, IIRC, was using
80MB on my system.  Only two can run simultaneously.  So why queue up 4 or more
processes?  That just eats memory uselessly for zero decrease in total run time.

 real  1m44.038s
 user  2m5.420s
 sys   1m17.561s
 
 It uses 2 convert proccesses so the files are being run on pairs.
 
 And you can even get the job done faster if using -P8:
 
 real  1m25.255s
 user  2m1.792s
 sys   0m43.563s

That's an unexpected result.  I would think running #cores*2^x with an
increasing x value would start yielding lower total run times within a few
multiples of #cores.

 No need to have a quad core with HT. Nice :-)

Use some of the other convert options on large files and you'll want those extra
two real cores. ;)

 Now, to compare the xargs -P parallel process performance to standard
 serial performance, clear the temp dir and copy the original files over
 again.  Now execute:

 for k in *.JPG; do convert $k -resize 3072 $k; done 
 
 This gives:
 
 real  2m30.007s
 user  2m11.908s
 sys   1m42.634s
 
 Which is ~0.46s. of plus delay. Not that bad.

You mean 46s not 0.46s.  104s vs 150s = 44% decrease in run time.  This _should_
be closer to a 90-100% decrease in a perfect world.  In this case there is
insufficient memory bandwidth to feed all the processors.

I just made two runs on the same set of photos but downsized them to 800x600 to
keep the run time down.  (I had you upscale them to 3072x2048 as your CPUs are
much newer)

$ time for k in *.JPG; do convert $k -resize 800 $k; done

real1m16.542s
user1m11.872s
sys 0m4.104s

$ time for k in *.JPG; do echo $k; done | xargs -I{} -P2 convert {} -resize 800 
{}

real0m41.188s
user1m14.837s
sys 0m4.812s

41s vs 77s = 53% decrease in run time.  In this case there is insufficient
memory bandwidth as well.  The Intel BX chipset supports a single channel of
PC100 memory for a raw bandwidth of 800MB/s.  Image manipulation programs will
eat all available memory b/w.  On my system, running two such processes allows
~400MB/s to each processor socket, starving the convert program of memory 
access.

To get close to _linear_ scaling in this scenario, one would need something like
an 8 core AMD Magny Cours system with quad memory channels, or whatever the
Intel platform is with quad channels.  One would run with xargs -P2, allowing
each process ~12GB/s of memory bandwidth.  This should yield a 90-100% decrease
in run time.

 Running more processes than real cores seems fine, did you try it?

Define fine.  Please post the specs of your SUT, both CPU/mem subsystem and OS
environment details (what hypervisor and guest).  (SUT is IBM speak for System
Under Test).

 Linux is pretty efficient at scheduling multiple processes among cores
 in multiprocessor and/or multi-core systems and achieving near linear
 performance scaling.  This is one reason why fork and forget is such a
 popular method used for parallel programming.  All you have to do is
 fork many children and the kernel takes care of scheduling the processes
 to run simultaneously.
 
 Yep. It handles the proccesses quite nice.

Are you new to the concept of parallel processing and what CPU process
scheduling is?

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d2c578b.3090...@hardwarefreak.com



Re: [OT]: Re: need help making shell script use two CPUs/cores

2011-01-11 Thread Stan Hoeppner
Dan Serban put forth on 1/10/2011 7:52 PM:
 On Mon, 10 Jan 2011 12:04:19 -0600
 Stan Hoeppner s...@hardwarefreak.com wrote:
 
 [snip]
 http://www.hardwarefreak.com/server-pics/
 
 Which gallery system are you using?  I quite like it.

That's the result of Curator:
http://furius.ca/curator/

I've been using it for 7+ years.  Debian dropped the package sometime back,
before Etch IIRC.  Last time I installed it I grabbed it from SourceForge.  It's
a python app so you need python and you'll need the imagemagick tools.

Unfortunately its functions are written in a manner that psyco can't optimize.
It's plenty fast though if you're doing a directory structure with only a couple
hundred pic files or less.  My server is pretty old, 550MHz, and I've got a
couple of dirs with thousands of image files.  It takes over 12 hours to process
them.  It processes all subdirs under a dir.  I've found no option to disable
this.  Thus, be mindful of the way you setup your directory structures.  Even if
nothing in a subdir has changed since the last run, curator will still process
all subdirs.  It's pretty fast at doing so, but if you have 100 subdirs with 100
files in each that's 10,000 image files to be looked at, and bumps up the run 
time.

With any modern 2-3GHz x86 AMD/Intel CPU you prolly don't need to worry about
the speed of curator.  I've never run it on a modern chip, just my lowly, but
uber cool, vintage Abit BP6 dual Celeron 3...@550 server, which is the server in
those photos.  I have a tendency to hang onto systems as long as they're still
useful.  At one time it was my workstation/gaming rig.  Those dual Celerons are
now idle 99%| of the time, and the machine is usually plenty fast for any
interactive command line or batch work I need to do.

Of note, if you've been reading this thread, you'll notice I use this script and
ImageMagick's convert utility to resize my camera photos before running curator
on them, since I can now resize them almost twice as fast, running 2 parallel
convert processes.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d2c74d8.6030...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-11 Thread Camaleón
On Tue, 11 Jan 2011 07:13:47 -0600, Stan Hoeppner wrote:

 Camaleón put forth on 1/10/2011 2:11 PM:
 
 I used a VM to get the closest environment as you seem to have (a low
 resource machine) and the above command (timed) gives:
 
 I'm not sure what you mean by resources in this context.  My box has
 plenty of resources for the task we're discussing.  Each convert
 process, IIRC, was using 80MB on my system.  Only two can run
 simultaneously.  So why queue up 4 or more processes?  That just eats
 memory uselessly for zero decrease in total run time.

I supposed you wouldn't care much in getting a script to run faster with 
all the available core occupied if you had a modern (4 years) cpu and 
plenty of speedy ram because the routine you wanted to run it should not 
take many time... unless you were going to process thousand of 
images :-)

(...)

 I just made two runs on the same set of photos but downsized them to
 800x600 to keep the run time down.  (I had you upscale them to 3072x2048
 as your CPUs are much newer)
 
 $ time for k in *.JPG; do convert $k -resize 800 $k; done
 
 real1m16.542s
 user1m11.872s
 sys 0m4.104s
 
 $ time for k in *.JPG; do echo $k; done | xargs -I{} -P2 convert {}
 -resize 800 {}
 
 real0m41.188s
 user1m14.837s
 sys 0m4.812s
 
 41s vs 77s = 53% decrease in run time.  In this case there is
 insufficient memory bandwidth as well.  The Intel BX chipset supports a
 single channel of PC100 memory for a raw bandwidth of 800MB/s.  Image
 manipulation programs will eat all available memory b/w.  On my system,
 running two such processes allows ~400MB/s to each processor socket,
 starving the convert program of memory access.
 
 To get close to _linear_ scaling in this scenario, one would need
 something like an 8 core AMD Magny Cours system with quad memory
 channels, or whatever the Intel platform is with quad channels.  One
 would run with xargs -P2, allowing each process ~12GB/s of memory
 bandwidth.  This should yield a 90-100% decrease in run time.
 
 Running more processes than real cores seems fine, did you try it?
 
 Define fine.  

Fine = system not hogging all resources.

 Please post the specs of your SUT, both CPU/mem
 subsystem and OS environment details (what hypervisor and guest).  (SUT
 is IBM speak for System Under Test).

I didn't know the meaning of that SUT term... The test was run in a 
laptop (Toshiba Tecra A7) with an Intel Core Duo T2400 (in brief, 2M 
Cache, 1.83 GHz, 667 MHz FSB, full specs¹) and 4 GiB of ram (DDR2).

VM is Virtualbox (4.0) with Windows XP Pro as host and Debian Squeeze as 
guest. VM was setup to use the 2 cores and 1.5 GiB of system ram. Disk 
controller is emulated via ich6.

 Linux is pretty efficient at scheduling multiple processes among cores
 in multiprocessor and/or multi-core systems and achieving near linear
 performance scaling.  This is one reason why fork and forget is such
 a popular method used for parallel programming.  All you have to do is
 fork many children and the kernel takes care of scheduling the
 processes to run simultaneously.
 
 Yep. It handles the proccesses quite nice.
 
 Are you new to the concept of parallel processing and what CPU process
 scheduling is?

No... I guess this is quite similar to the way most of the daemons do 
when running in background and launch several instances (like amavisd-
new does) but I didn't think there was a direct relation in the number 
of the running daemons/processes and the cores available in the CPU, I 
mean, I thought the kernel would automatically handle all the resources 
available the best it can, regardless of the number of cores in use.

¹http://ark.intel.com/Product.aspx?id=27235

Greetings,

-- 
Camaleón


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/pan.2011.01.11.15.38...@gmail.com



Re: need help making shell script use two CPUs/cores

2011-01-11 Thread Bob Proulx
Stan Hoeppner wrote:
 Camaleón put forth:
  real1m44.038s
  user2m5.420s
  sys 1m17.561s
  
  It uses 2 convert proccesses so the files are being run on pairs.
  
  And you can even get the job done faster if using -P8:
  
  real1m25.255s
  user2m1.792s
  sys 0m43.563s
 
 That's an unexpected result.  I would think running #cores*2^x with an
 increasing x value would start yielding lower total run times within a few
 multiples of #cores.

If you have enough memory (which is critical) then increasing the
number of processes above the number of compute units *a little bit*
is okay and increases overall throughput.

You are processing image data.  That is a large amount of disk data
and won't ever be completely cached.  At some point the process will
block on I/O waiting for the disk.  Perhaps not often but enough.  At
that moment the cpu will be idle until the disk block becomes
available.  When you are runing four processes on your two cpu machine
that means there will always be another process in the run queue ready
to go while waiting for the disk.  That allows processing to continue
when otherwise it would be waiting for the disk.  I believe what you
are seeing above is the result of being able to compute during that
small block on I/O wait for the disk interval.

On the negative side having more processes in the run queue does
consume a little more overhead for process scheduling.  And launching
a lot of processes consumes resources.  So it definitely doesn't make
sense to launch one process per image.  But being above the number of
cpus does help a small amount.

Another negative is that other tasks then suffer.  With excess compute
capacity you always have some cpu time for the desktop side of life.
Moving windows, rendering web pages, other user tasks, delivering
email.  Sometimes squeezing that last percentage point out of
something can really kill your interactive experience and end up
frustrating you more.  So as a hint I wouldn't push too hard on it.

  No need to have a quad core with HT. Nice :-)

My benchmarks show that hyperthreading (fake cpus) actually slow down
single thread processes such as image conversions.  HT seems like a
marketing breakthrough to me.  Although having the effective extra
registers available may benefit a highly threaded application.  I just
don't have any performance critical highly threaded applications.  I
am sure they exist somewhere along with unicorns and other good
sources of sparkles.

Bob


signature.asc
Description: Digital signature


Re: need help making shell script use two CPUs/cores

2011-01-11 Thread Bob Proulx
Camaleón wrote:
 No... I guess this is quite similar to the way most of the daemons do 
 when running in background and launch several instances (like amavisd-
 new does)

That is an optimization to help with the latency overhead associated
with forking processes.  In order to reduce the response time to react
to an external event such as arrival of email or processing a web page
many daemons such as those pre-fork copies ahead of time so that they
will be ready and waiting.  Those processes don't consume cpu time
while waiting.  They do consume memory and cpu scheduling queue
resources.  But pre-forked, ready to go, and waiting they just sit
there waiting for something to do.  But then when there is I/O and
they have something to do then they can get going on it very quickly
since they are already loaded in memory.  This reduces response latency.

Bob


signature.asc
Description: Digital signature


Re: need help making shell script use two CPUs/cores

2011-01-11 Thread Stan Hoeppner
Camaleón put forth on 1/11/2011 9:38 AM:

 I supposed you wouldn't care much in getting a script to run faster with 
 all the available core occupied if you had a modern (4 years) cpu and 
 plenty of speedy ram because the routine you wanted to run it should not 
 take many time... unless you were going to process thousand of 
 images :-)

That's a bit ironic.  You're suggesting the solution is to upgrade to a new
system with a faster processor and memory.  However, all the newer processors
have 2, 4, 6, 8, or 12 cores.  So upgrading simply for single process throughput
would waste all the other cores, which was the exact situation I found myself 
in.

The ironic part is that parallelizing the script to maximize performance on my
system will also do the same for the newer chips, but to an even greater degree
on those with 4, 6, 8, or 12 cores.  Due to the fact that convert doesn't eat
100% of a core's time during its run, and the idle time in between one process
finishing and xargs starting another, one could probably run 16-18 parallel
convert processes on a 12 core Magny Cours with this script before run times
stop decreasing.

The script works.  It cut my run time by over 50%.  I'm happy.  As I said, this
system's processing power is complete overkill 99% of the time.  It works
beautifully with pretty much everything I've thrown at it, for 8 years now.  If
I _really_ wanted to maximize the speed of this photo resizing task I'd install
Win32 ImageMagick on my 2GHz Athlon XP workstation with dual channel memory
nForce2 mobo, convert them on the workstation, and copy them to the server.

However, absolute maximum performance of this task was not, and is not my goal.
 My goal was to make use of the second CPU, which was sitting idle in the
server, to speed up the task completion.  That goal was accomplished. :)

 Running more processes than real cores seems fine, did you try it?

 Define fine.  
 
 Fine = system not hogging all resources.

I had run 4 (2 core machine) and run time was a few seconds faster than 2
processes, 3 seconds IIRC.  Running 8 processes pushed the system into swap and
run time increased dramatically.  Given that 4 processes only have a few seconds
faster than two, yet consumed twice as much memory, the best overall number of
processes to run on this system is two.

 I didn't know the meaning of that SUT term... 

I like using it.  It's good short hand.  I wish more people used it, or were
familiar with it, so I wouldn't have to define it every time I use it. :)

 The test was run in a 
 laptop (Toshiba Tecra A7) with an Intel Core Duo T2400 (in brief, 2M 
 Cache, 1.83 GHz, 667 MHz FSB, full specs¹) and 4 GiB of ram (DDR2)

 VM is Virtualbox (4.0) with Windows XP Pro as host and Debian Squeeze as 
 guest. VM was setup to use the 2 cores and 1.5 GiB of system ram. Disk 
 controller is emulated via ich6.

I wonder how much faster convert it would run on bare metal on that laptop.

 Are you new to the concept of parallel processing and what CPU process
 scheduling is?
 
 No... I guess this is quite similar to the way most of the daemons do 
 when running in background and launch several instances (like amavisd-
 new does) but I didn't think there was a direct relation in the number 
 of the running daemons/processes and the cores available in the CPU, I 
 mean, I thought the kernel would automatically handle all the resources 
 available the best it can, regardless of the number of cores in use.

This is correct.  But the kernel can't take a single process make it run across
all cores, maximizing performance.  For this, the process must be written to
create threads, forks, or children.  The kernel will then run each of these on a
different processor core.  This is why Imagemagick convert needs to be
parallelized when batching many photos.  If you don't parallelize it, the kernel
can't schedule it across all cores.  The docs say it will use threads but only
with large files.  Apparently 8.2 megapixel JPGs aren't large, as the
threading has never kicked in for me.  By using xargs for parallelization, we
create x number of concurrent processes.  The kernel then schedules each one on
a different cpu core.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d2cd295.30...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-11 Thread John Hasler
Bob writes:
 They do consume memory and cpu scheduling queue resources.

Very little, due to shared memory and copy-on-write.
-- 
John Hasler


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/871v4j6qf4@thumper.dhh.gt.org



Re: need help making shell script use two CPUs/cores

2011-01-11 Thread John Hasler
Bob writes:
 Another negative is that other tasks then suffer.

That's what group scheduling is for.
-- 
John Hasler


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87wrmb5bmt@thumper.dhh.gt.org



Re: [OT]: Re: need help making shell script use two CPUs/cores

2011-01-11 Thread Dan Serban
On Tue, 11 Jan 2011 09:18:48 -0600
Stan Hoeppner s...@hardwarefreak.com wrote:

 Dan Serban put forth on 1/10/2011 7:52 PM:
  On Mon, 10 Jan 2011 12:04:19 -0600
  Stan Hoeppner s...@hardwarefreak.com wrote:
  
  [snip]
  http://www.hardwarefreak.com/server-pics/
  
  Which gallery system are you using?  I quite like it.
 
 That's the result of Curator:
 http://furius.ca/curator/
 
 I've been using it for 7+ years.  Debian dropped the package sometime
 back, before Etch IIRC.  Last time I installed it I grabbed it from
 SourceForge.  It's a python app so you need python and you'll need
 the imagemagick tools.

It's a nice looking interface, simple is what I like.

 
 Unfortunately its functions are written in a manner that psyco can't
 optimize. It's plenty fast though if you're doing a directory
 structure with only a couple hundred pic files or less.  My server is
 pretty old, 550MHz, and I've got a couple of dirs with thousands of
 image files.  It takes over 12 hours to process them.  It processes
 all subdirs under a dir.  I've found no option to disable this.
 Thus, be mindful of the way you setup your directory structures.
 Even if nothing in a subdir has changed since the last run, curator
 will still process all subdirs.  It's pretty fast at doing so, but if
 you have 100 subdirs with 100 files in each that's 10,000 image files
 to be looked at, and bumps up the run time.
 

Indeed, I find that simple services always seem to end up eating a
lot more resources than originally thought.

 With any modern 2-3GHz x86 AMD/Intel CPU you prolly don't need to
 worry about the speed of curator.  I've never run it on a modern
 chip, just my lowly, but uber cool, vintage Abit BP6 dual Celeron
 3...@550 server, which is the server in those photos.  I have a
 tendency to hang onto systems as long as they're still useful.  At
 one time it was my workstation/gaming rig.  Those dual Celerons are
 now idle 99%| of the time, and the machine is usually plenty fast for
 any interactive command line or batch work I need to do.
 

I commend  your spirit.  I have collections of such hardware, but in my
incessant need to have more power, and less power usage, half of this
stuff gets retired.  I wish I could find a good cause to give it to,
but the linux/debian zealot in me refuses to just give it away to
the dark side :/, if it'll run windows, I want you to give me money
for it.  Heh.

I have a dual proc p3 1ghz motherboard. Pretty much
worthless now, though it did a hell of a job running internal email and
web/db services.

 Of note, if you've been reading this thread, you'll notice I use this
 script and ImageMagick's convert utility to resize my camera photos
 before running curator on them, since I can now resize them almost
 twice as fast, running 2 parallel convert processes.
 

I certainly have followed the thread and have learned that xargs allows
you to parallel process commands.  Something my 20 years of linux
adventures haven't taught me until yesterday.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/2011052102.7be6c...@ws82.int.tlc



Re: need help making shell script use two CPUs/cores

2011-01-11 Thread Stan Hoeppner
Bob Proulx put forth on 1/11/2011 3:08 PM:
 Stan Hoeppner wrote:
 Camaleón put forth:
 real1m44.038s
 user2m5.420s
 sys 1m17.561s

 It uses 2 convert proccesses so the files are being run on pairs.

 And you can even get the job done faster if using -P8:

 real1m25.255s
 user2m1.792s
 sys 0m43.563s

 That's an unexpected result.  I would think running #cores*2^x with an
 increasing x value would start yielding lower total run times within a few
 multiples of #cores.
 
 If you have enough memory (which is critical) then increasing the
 number of processes above the number of compute units *a little bit*
 is okay and increases overall throughput.
 
 You are processing image data.  That is a large amount of disk data
 and won't ever be completely cached.  At some point the process will

Not really.  Each file, in my case, started as a 1.8MB jpeg.  The disk
throughput on my server is ~80MB/s.  Read latency is about 15-20ms on average.
In my recent example workload there were 35 such images.

 block on I/O waiting for the disk.  Perhaps not often but enough.  At
 that moment the cpu will be idle until the disk block becomes
 available.  When you are runing four processes on your two cpu machine
 that means there will always be another process in the run queue ready
 to go while waiting for the disk.  That allows processing to continue
 when otherwise it would be waiting for the disk.  I believe what you
 are seeing above is the result of being able to compute during that
 small block on I/O wait for the disk interval.

That's gotta be a very small iowait interval.  So small, in fact, it doesn't
show up in top at all.  I've watched top a few times during these runs and I
never see iowait.

I assumed the gain was simply because, watching top, each convert process
doesn't actually fully peg the cpu during the entire process run life.  Running
one or two more processes in parallel with the first two simply gives the kernel
scheduler the opportunity to run another process during those idle ticks.  There
is also the time gap between a process exiting and xargs starting up the next
one.  I have no idea how much time that takes.  But all the little bits add up
in the total execution time of all 35 processes.

 On the negative side having more processes in the run queue does
 consume a little more overhead for process scheduling.  And launching
 a lot of processes consumes resources.  So it definitely doesn't make
 sense to launch one process per image.  But being above the number of
 cpus does help a small amount.

Totally agree.  That amount of decreased run time is small enough on my system
that I don't bother with 3 processes.  I only parallelize 2, as the extra ~80MB
of memory consumed by the 3rd is better consumed by smtpd, imapd, httpd than
saving me 5-10 seconds of execution time for the batch photo resize.  This is a
server after all. ;)

 Another negative is that other tasks then suffer.  With excess compute
 capacity you always have some cpu time for the desktop side of life.
 Moving windows, rendering web pages, other user tasks, delivering
 email.  Sometimes squeezing that last percentage point out of
 something can really kill your interactive experience and end up
 frustrating you more.  So as a hint I wouldn't push too hard on it.

In my case those other tasks aren't interactive, but they exist nonetheless, as
mentioned above.

 My benchmarks show that hyperthreading (fake cpus) actually slow down
 single thread processes such as image conversions.  HT seems like a
 marketing breakthrough to me.  Although having the effective extra
 registers available may benefit a highly threaded application.  I just
 don't have any performance critical highly threaded applications.  I
 am sure they exist somewhere along with unicorns and other good
 sources of sparkles.

This has been my experience as well.  SMT traditionally doesn't work well when
you oversubscribe more compute bound processes than a machine has physical
cores.  This was discovered relatively quickly after Intel's HT CPUs hit the
market.  Folks began running one s...@home process per virtual CPU on dual
socket Xeon boxen, 4 processes total, and their elapsed time per process
increased substantially vs running one process per socket.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d2cecef.1020...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-11 Thread Stan Hoeppner
John Hasler put forth on 1/11/2011 4:12 PM:
 Bob writes:
 They do consume memory and cpu scheduling queue resources.
 
 Very little, due to shared memory and copy-on-write.

In this case I don't think all that much memory is shared.  Each process' data
portion is different as each processes a different picture file.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d2cee26.4010...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-11 Thread John Hasler
Bob writes:
 They do consume memory and cpu scheduling queue resources.

I wrote:
 Very little, due to shared memory and copy-on-write.

Stan writes:
 In this case I don't think all that much memory is shared.  Each
 process' data portion is different as each processes a different
 picture file.

I was referring to pre-forking.  Pre-forked processes share text and
also share data while waiting for work.  Thus they consume little in the
way of resources until they have something to do.
-- 
John Hasler


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/87sjwz53q4@thumper.dhh.gt.org



Re: need help making shell script use two CPUs/cores

2011-01-10 Thread Stan Hoeppner
Karl Vogel put forth on 1/9/2011 6:04 PM:
 On Sun, 09 Jan 2011 10:05:43 -0600, 
 Stan Hoeppner s...@hardwarefreak.com said:
 
 S #! /bin/sh
 S for k in $(ls *.JPG); do convert $k -resize 1024 $k; done
 
Someone was ragging on you to let the shell do the file expansion.  I
like your way better because most scripting shells aren't smart enough
to realize that when there aren't any .JPG files, I don't want the
script to echo '*.JPG' as if that's actually useful.

This doesn't matter to me as I only use this script on a single temp directory
after I dump the camera files into it.  The camera, a Fujifilm FinePix A820
8.3MP, saves its files in all upper case.

 S I use the above script to batch re-size digital camera photos after I
 S dump them to my web server.  It takes a very long time with lots of new
 S photos as the server is fairly old, even though it is a 2-way SMP,
 S because the script only runs one convert process at a time serially,
 S only taking advantage of one CPU.
 
First things first: are you absolutely certain that running two parallel
jobs will exercise both CPUs?  I've seen SMP systems that don't exactly
live up to truth-in-advertising.  If you stuff two convert jobs in the
background and then run top (or the moral equivalent) do you SEE both
CPUs being worked?

See my response to Bob.  And see Bob's response to you. :)  The issue you
describe was resolved with a few patches many years ago, and only reared its
ugly head on processors with SMT (HT) enabled.  The kernel scheduler work lagged
behind the hardware releases of IBM's SMT and Intel's HT. The chips were on the
market a while before regular distro release cycles caught up.  So early
adopters of SMT chips saw the problem you describe.  As Bob noted, in most
situations, simply turning SMT off fixed the problem instantly.  For those who
don't know the acronyms, SMT stands for Simultaneous Multi-threading, which is
the textbook term for this technology.  Intel gave their SMT implementation a
catchy marketing name, HyperThreading, as they seem to do with every product, 
sadly.

Second: do you have taskset installed?  If the work isn't being
divided up the way you like, you can bind a process to a desired core:
http://planet.admon.org/how-to-bind-a-certain-process-to-specified-core/

cpusets (see also cpumemsets) which is the kernel feature that tasksel
manipulates, is overkill for managing process scheduling on a 2-way box, and
wouldn't yield much, if any, benefit.  In fact, if I were to attempt using it
with my piddly workloads, I'd likely be far less efficient at manually
scheduling tasks than the kernel.  In fact, I can guarantee you of this. :)

And last: if you're not using something like LVM, can you do anything to
make sure you're not hitting the same disk?  If all your new photos are
on the same drive, any CPU savings you get from parallel processing will
probably be erased by disk contention.  Better yet, do you have enough
memory to do the processing on a RAM-backed filesystem?

Apparently you've never used Imagemagick's convert utility, or any other image
manipulation tools, or not on an older ~550MHz machine with tiny L2 cache (by
today's standards).  Image manipulation programs are always CPU bound, rarely,
if ever, IO bound.  I'd say never but I'm sure there is a rare corner case out
there somewhere.

It's odd isn't it, that I have pretty intimate knowledge of the things above,
yet am handicapped WRT shell scripting?  Nobody knows everything, and I'm sure
glad lists such as debian-users exist to fill in the knowledge gaps.  :)

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d2ac7b7.3010...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-10 Thread Camaleón
On Sun, 09 Jan 2011 14:39:56 -0600, Stan Hoeppner wrote:

 Camaleón put forth on 1/9/2011 12:12 PM:
 
 Better if you check it, but I dunno how to get the compile options for
 the lenny package... where is this defined, in source or diff packages?
 
 You're taking this thread down the wrong path.  I asked for assistance
 writing a simple script to do what I want it to do.  Accomplishing that
 will fix all of my problems WRT Imagemagick.  I didn't ask for help in
 optimizing or fixing the Lenny i386 Imagemagick package. ;)

I read it as how to speed up the execution of a batch script that has to 
deal with resizing big images and usually you get some gains if the 
program to run was compiled to work with threads in mind.

 Anyway, how are you going to take any advantadge of multi-threading
 capabilities if the program you are going to run was not compiled with
 this flag enabled?
 
 I think you're missing something.  Go back and read my original post. If
 you still don't understand, maybe refresh yourself on Linux process
 scheduling.

Good. It would be nice to see the results when you finally go it working 
the way you like ;-)

Greetings,

-- 
Camaleón


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/pan.2011.01.10.14.08...@gmail.com



Re: need help making shell script use two CPUs/cores

2011-01-10 Thread Stan Hoeppner
Camaleón put forth on 1/10/2011 8:08 AM:
 On Sun, 09 Jan 2011 14:39:56 -0600, Stan Hoeppner wrote:
 
 Camaleón put forth on 1/9/2011 12:12 PM:

 Better if you check it, but I dunno how to get the compile options for
 the lenny package... where is this defined, in source or diff packages?

 You're taking this thread down the wrong path.  I asked for assistance
 writing a simple script to do what I want it to do.  Accomplishing that
 will fix all of my problems WRT Imagemagick.  I didn't ask for help in
 optimizing or fixing the Lenny i386 Imagemagick package. ;)
 
 I read it as how to speed up the execution of a batch script that has to 
 deal with resizing big images and usually you get some gains if the 
 program to run was compiled to work with threads in mind.

I said lots of small images, IIRC.  Regardless, threading isn't simply turned on
with a compile time argument.  A program must be written specifically to create
master and worker threads.  Implementation is somewhat similar to exec and fork,
compared to serial programming anyway, though the IPC semantics are different.
It's a safe bet that the programs in the Lenny i386 Imagemagick package do have
the threading support.  The following likely explains why _I_ wasn't seeing the
threading.  From:

http://www.imagemagick.org/Usage/api/#speed

For small images using the IM multi-thread capabilities will not give you any
advantage, though on a large busy server it could be detrimental. But for large
images the OpenMP multi-thread capabilities can produce a definate speed
advantage as it uses more CPU's to complete the individual image processing
operations.

It would be nice to know their definition of small images.


 Good. It would be nice to see the results when you finally go it working 
 the way you like ;-)

Bob's xargs suggestion got it working instantly many hours ago.  I'm not sure of
the results you refer to.  Are you looking for something like watch top output
for Cpu0 and Cpu1?  See for yourself.

1.  wget all the 35 .JPG files from this URL:
http://www.hardwarefreak.com/server-pics/
copy them all to a working temp dir

2.  On your dual processor, or dual core system, execute:

for k in *.JPG; do echo $k; done | xargs -I{} -P2 convert {} -resize 3072 {} 

For a quad core system, change -P2 to -P4.  You may want to wrap it with the
time command.

3.  Immediately execute top and watch Cpu0/1/2/3 in the summary area.  You'll
see pretty linear parallel scaling of the convert processes.  Also note memory
consumption doubles with each doubling of the process count.

Now, to compare the xargs -P parallel process performance to standard serial
performance, clear the temp dir and copy the original files over again.  Now
execute:

for k in *.JPG; do convert $k -resize 3072 $k; done 

and launch top.  You'll see only a single convert process running.  Again, you
can wrap this with the time command if you like to compare total run times.
What you'll find is nearly linear scaling as the number of convert processes is
doubled, up to the point #processes equals #cores.  Running more processes than
cores merely eats memory wastefully and increases total processing time.

Linux is pretty efficient at scheduling multiple processes among cores in
multiprocessor and/or multi-core systems and achieving near linear performance
scaling.  This is one reason why fork and forget is such a popular method used
for parallel programming.  All you have to do is fork many children and the
kernel takes care of scheduling the processes to run simultaneously.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d2b4a23.3020...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-10 Thread Camaleón
On Mon, 10 Jan 2011 12:04:19 -0600, Stan Hoeppner wrote:

 Camaleón put forth on 1/10/2011 8:08 AM:

 Good. It would be nice to see the results when you finally go it
 working the way you like ;-)
 
 Bob's xargs suggestion got it working instantly many hours ago.  I'm not
 sure of the results you refer to.  Are you looking for something like
 watch top output for Cpu0 and Cpu1?  See for yourself.

Did'nt you run any test? Okay... (now downloading the sample images)

 2.  On your dual processor, or dual core system, execute:
 
 for k in *.JPG; do echo $k; done | xargs -I{} -P2 convert {} -resize
 3072 {} 

I used a VM to get the closest environment as you seem to have (a low 
resource machine) and the above command (timed) gives:

real1m44.038s
user2m5.420s
sys 1m17.561s

It uses 2 convert proccesses so the files are being run on pairs.

And you can even get the job done faster if using -P8:

real1m25.255s
user2m1.792s
sys 0m43.563s

No need to have a quad core with HT. Nice :-)

 Now, to compare the xargs -P parallel process performance to standard
 serial performance, clear the temp dir and copy the original files over
 again.  Now execute:
 
 for k in *.JPG; do convert $k -resize 3072 $k; done 

This gives:

real2m30.007s
user2m11.908s
sys 1m42.634s

Which is ~0.46s. of plus delay. Not that bad.

 and launch top.  You'll see only a single convert process running. 
 Again, you can wrap this with the time command if you like to compare
 total run times. What you'll find is nearly linear scaling as the number
 of convert processes is doubled, up to the point #processes equals
 #cores.  Running more processes than cores merely eats memory wastefully
 and increases total processing time.

Running more processes than real cores seems fine, did you try it?
 
 Linux is pretty efficient at scheduling multiple processes among cores
 in multiprocessor and/or multi-core systems and achieving near linear
 performance scaling.  This is one reason why fork and forget is such a
 popular method used for parallel programming.  All you have to do is
 fork many children and the kernel takes care of scheduling the processes
 to run simultaneously.

Yep. It handles the proccesses quite nice.

Greetings,

-- 
Camaleón


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/pan.2011.01.10.20.11...@gmail.com



[OT]: Re: need help making shell script use two CPUs/cores

2011-01-10 Thread Dan Serban
On Mon, 10 Jan 2011 12:04:19 -0600
Stan Hoeppner s...@hardwarefreak.com wrote:

[snip]
 http://www.hardwarefreak.com/server-pics/

Which gallery system are you using?  I quite like it.


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20110110175242.5cb05...@ws82.int.tlc



need help making shell script use two CPUs/cores

2011-01-09 Thread Stan Hoeppner
I'm not very skilled at writing shell scripts.

#! /bin/sh
for k in $(ls *.JPG); do convert $k -resize 1024 $k; done

I use the above script to batch re-size digital camera photos after I
dump them to my web server.  It takes a very long time with lots of new
photos as the server is fairly old, even though it is a 2-way SMP,
because the script only runs one convert process at a time serially,
only taking advantage of one CPU.  The convert program is part of the
imagemagick toolkit.

How can I best modify this script so that it splits the overall job in
half, running two simultaneous convert processes, one on each CPU?
Having such a script should cut the total run time in half, or nearly
so, which would really be great.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d29dcd7.6090...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-09 Thread David Sastre
On Sun, Jan 09, 2011 at 10:05:43AM -0600, Stan Hoeppner wrote:
 #! /bin/sh
 for k in $(ls *.JPG); do convert $k -resize 1024 $k; done
 
 I use the above script to batch re-size digital camera photos after I
 dump them to my web server.  It takes a very long time with lots of new
 photos as the server is fairly old, even though it is a 2-way SMP,
 because the script only runs one convert process at a time serially,
 only taking advantage of one CPU.  The convert program is part of the
 imagemagick toolkit.
 
 How can I best modify this script so that it splits the overall job in
 half, running two simultaneous convert processes, one on each CPU?
 Having such a script should cut the total run time in half, or nearly
 so, which would really be great.

You need parallel:

http://ftp.gnu.org/gnu/parallel/

From their home page (http://freshmeat.net/projects/parallel):

GNU parallel is a shell tool for executing jobs in parallel locally
or using remote computers. A job is typically a single command or
a small script that has to be run for each of the lines in the input.
The typical input is a list of files, a list of hosts, a list of
users, a list of URLs, or a list of tables. If you use xargs today you
will find GNU parallel very easy to use, as GNU parallel is written to
have the same options as xargs. If you write loops in shell, you will
find GNU parallel may be able to replace most of the loops and make
them run faster by running several jobs in parallel. If you use ppss
or pexec you will find GNU parallel will often make the command easier
to read. GNU parallel makes sure output from the commands is the same
output as you would get had you run the commands sequentially. This
makes it possible to use output from GNU parallel as input for other
programs.

-- 
Huella de clave primaria: 0FDA C36F F110 54F4 D42B  D0EB 617D 396C 448B 31EB


signature.asc
Description: Digital signature


Re: need help making shell script use two CPUs/cores

2011-01-09 Thread Camaleón
On Sun, 09 Jan 2011 10:05:43 -0600, Stan Hoeppner wrote:

 I'm not very skilled at writing shell scripts.
 
 #! /bin/sh
 for k in $(ls *.JPG); do convert $k -resize 1024 $k; done
 
 I use the above script to batch re-size digital camera photos after I
 dump them to my web server.  It takes a very long time with lots of new
 photos as the server is fairly old, even though it is a 2-way SMP,
 because the script only runs one convert process at a time serially,
 only taking advantage of one CPU.  The convert program is part of the
 imagemagick toolkit.
 
 How can I best modify this script so that it splits the overall job in
 half, running two simultaneous convert processes, one on each CPU?
 Having such a script should cut the total run time in half, or nearly
 so, which would really be great.

http://www.imagemagick.org/Usage/api/#speed

The above doc provides hints on how to speed-up image magick operations. 

Note that multi-threading should be automatically used whether possible, 
as per this paragraph:

***
# IM by default uses multiple threads for image processing operations. 
That means you can have the computer do two or more separate threads of 
image processing, it will be faster than a single CPU machine. 
***

I'm afraid you will have to find out whether your IM package was compiled 
with multi-threading capablities.

Greetings,

-- 
Camaleón


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/pan.2011.01.09.16.59...@gmail.com



Re: need help making shell script use two CPUs/cores

2011-01-09 Thread Stan Hoeppner
Camaleón put forth on 1/9/2011 10:59 AM:

 http://www.imagemagick.org/Usage/api/#speed
 
 The above doc provides hints on how to speed-up image magick operations. 
 
 Note that multi-threading should be automatically used whether possible, 
 as per this paragraph:
 
 ***
 # IM by default uses multiple threads for image processing operations. 
 That means you can have the computer do two or more separate threads of 
 image processing, it will be faster than a single CPU machine. 
 ***
 
 I'm afraid you will have to find out whether your IM package was compiled 
 with multi-threading capablities.

I'm using the i386 Lenny package.  Obviously it wasn't, or it would be
working, and it is not.

No script ideas Camaleón?  You're not a script kiddie?

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d29ed90.3080...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-09 Thread Camaleón
On Sun, 09 Jan 2011 11:17:04 -0600, Stan Hoeppner wrote:

 Camaleón put forth on 1/9/2011 10:59 AM:

 ***
 # IM by default uses multiple threads for image processing operations.
 That means you can have the computer do two or more separate threads of
 image processing, it will be faster than a single CPU machine. 
 ***
 
 I'm afraid you will have to find out whether your IM package was
 compiled with multi-threading capablities.
 
 I'm using the i386 Lenny package.  Obviously it wasn't, or it would be
 working, and it is not.

Better if you check it, but I dunno how to get the compile options for 
the lenny package... where is this defined, in source or diff packages? 
 
 No script ideas Camaleón?  You're not a script kiddie?

He, he.. not at all :-)

Anyway, how are you going to take any advantadge of multi-threading 
capabilities if the program you are going to run was not compiled with 
this flag enabled? 

Greetings,

-- 
Camaleón


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/pan.2011.01.09.18.12...@gmail.com



Re: need help making shell script use two CPUs/cores

2011-01-09 Thread Stan Hoeppner
Camaleón put forth on 1/9/2011 12:12 PM:

 Better if you check it, but I dunno how to get the compile options for 
 the lenny package... where is this defined, in source or diff packages? 

You're taking this thread down the wrong path.  I asked for assistance
writing a simple script to do what I want it to do.  Accomplishing that
will fix all of my problems WRT Imagemagick.  I didn't ask for help in
optimizing or fixing the Lenny i386 Imagemagick package. ;)

 No script ideas Camaleón?  You're not a script kiddie?
 
 He, he.. not at all :-)
 
 Anyway, how are you going to take any advantadge of multi-threading 
 capabilities if the program you are going to run was not compiled with 
 this flag enabled? 

I think you're missing something.  Go back and read my original post.
If you still don't understand, maybe refresh yourself on Linux process
scheduling.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d2a1d1c.6070...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-09 Thread Bob Proulx
Stan Hoeppner wrote:
 I'm not very skilled at writing shell scripts.
 
 #! /bin/sh
 for k in $(ls *.JPG); do convert $k -resize 1024 $k; done

First off don't use ls to list files matching a pattern.  Instead let
the shell match the pattern.

  #! /bin/sh
  for k in *.JPG; do convert $k -resize 1024 $k; done

I never like to resize in place.  Because then if I mess things up I
can lose resolution.  So I recommend doing it to a named resolution
file.  Do anything you like but this would be the way I would go.

  for k in *.JPG; do
convert $k -resize 1024 $(basename $k .JPG).1024.jpg
  done

And not wanting to do the same work again and again:

  for k in *.JPG; do
base=$(basename $k .JPG)
test -f $base.1024.jpg  continue  # skip if already done
convert $k -resize 1024 $base.1024.jpg
  done

 I use the above script to batch re-size digital camera photos after I
 dump them to my web server.  It takes a very long time with lots of new
 photos as the server is fairly old, even though it is a 2-way SMP,
 because the script only runs one convert process at a time serially,
 only taking advantage of one CPU.  The convert program is part of the
 imagemagick toolkit.
 
 How can I best modify this script so that it splits the overall job in
 half, running two simultaneous convert processes, one on each CPU?
 Having such a script should cut the total run time in half, or nearly
 so, which would really be great.

GNU xargs has an extension to run jobs in parallel.  This is already
installed on your system.  (But won't work on other Unix systems.)

  for k in *.JPG; do echo $k; done | xargs -I{} -P4 echo convert {} -resize 
1024 {}

Verify that does what you want and then remove the echo.

unfortunately that simple approach is harder to do with my renaming
scheme.  So I would probably write a helper script that did the
options to convert and renamed the file and so forth.

  for k in *.JPG; do
base=$(basename $k .JPG)
test -f $base.1024.jpg  continue  # skip if already done
echo $k;
  done | xargs -L1 -P4 echo my-convert-helper

And my-convert-helper could take the argument and apply the options in
the order needed and so forth.

Adjust 4 in the above to be the number of jobs you want to run on your
multicore system.  Note that in Sid in the latest coreutils there is a
new command 'nproc' to print out the number of cores.  Or you could
get it from grep.

  grep -c ^processor /proc/cpuinfo

All of the above is off the top of my head and needs to be tested.
YMMV.  But hopefully it will give you some ideas.

HTH,
Bob


signature.asc
Description: Digital signature


Re: need help making shell script use two CPUs/cores

2011-01-09 Thread Karl Vogel
 On Sun, 09 Jan 2011 10:05:43 -0600, 
 Stan Hoeppner s...@hardwarefreak.com said:

S #! /bin/sh
S for k in $(ls *.JPG); do convert $k -resize 1024 $k; done

   Someone was ragging on you to let the shell do the file expansion.  I
   like your way better because most scripting shells aren't smart enough
   to realize that when there aren't any .JPG files, I don't want the
   script to echo '*.JPG' as if that's actually useful.

S I use the above script to batch re-size digital camera photos after I
S dump them to my web server.  It takes a very long time with lots of new
S photos as the server is fairly old, even though it is a 2-way SMP,
S because the script only runs one convert process at a time serially,
S only taking advantage of one CPU.

   First things first: are you absolutely certain that running two parallel
   jobs will exercise both CPUs?  I've seen SMP systems that don't exactly
   live up to truth-in-advertising.  If you stuff two convert jobs in the
   background and then run top (or the moral equivalent) do you SEE both
   CPUs being worked?

   Second: do you have taskset installed?  If the work isn't being
   divided up the way you like, you can bind a process to a desired core:
   http://planet.admon.org/how-to-bind-a-certain-process-to-specified-core/

   And last: if you're not using something like LVM, can you do anything to
   make sure you're not hitting the same disk?  If all your new photos are
   on the same drive, any CPU savings you get from parallel processing will
   probably be erased by disk contention.  Better yet, do you have enough
   memory to do the processing on a RAM-backed filesystem?

-- 
Karl Vogel  I don't speak for the USAF or my company

If you're searching for the cause of a ghastly noise and find out that it's
not the cat, leave the area immediately.  --how to survive a horror movie


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/2011011500.46f09b...@kev.msw.wpafb.af.mil



Re: need help making shell script use two CPUs/cores

2011-01-09 Thread Bob Proulx
Karl Vogel wrote:
  Stan Hoeppner said:
 S for k in $(ls *.JPG); do convert $k -resize 1024 $k; done
 
Someone was ragging on you to let the shell do the file expansion.  I
like your way better because most scripting shells aren't smart enough
to realize that when there aren't any .JPG files, I don't want the
script to echo '*.JPG' as if that's actually useful.

:-) I thought about saying something about .JPG instead of .jpg.  Unix
is all about lower case after all.  But I restrained myself.  :-) :-)

  $ for i in *.doesnotexist; do echo $i; done
  *.doesnotexist

As to your comment about shells passing the glob off when it doesn't
match, that is a good comment.  That wasn't really too much in my
style of coding and so had missed it.  Thanks for keeping me honest!
Just to push on it a little bit more I could do this:

  set -- *.doesnotexist
  for i in $@; do
echo $i
  done

That avoids the problem and still avoids spawning another process.
And depending upon what I was doing I might do something completely
different.

First things first: are you absolutely certain that running two parallel
jobs will exercise both CPUs?  I've seen SMP systems that don't exactly
live up to truth-in-advertising.  If you stuff two convert jobs in the
background and then run top (or the moral equivalent) do you SEE both
CPUs being worked?

When was that in terms of kernel versions?  Was Intel hyperthreading
also involved?  Because your description matches very closely running
a two core system with Intel hyperthreading on an older Linux kernel.

Here is the problem I know about.  On a dual cpu system with
hyperthreading the Linux kernel saw four cores and numbered them 0, 1,
2, 3.  But of course zero and one were on one core and two and three
were on the other core.  The first process would run on, say, zero.
The second process would run on the next core, say, one.  To Linux of
that day it thought it had allocated those processes onto different
cpus.  But of course both were running on the same cpu, each getting
half of it and taking twice as long to run, and the other cpu was
idle.  A big problem.

I always disabled Intel hyperthreading to avoid that problem.  It was
more trouble than it was worth.  Also my benchmarks showed that HT
would slightly slow down single threaded simulation processes (mostly
Spice and other simulations) that we were running.

But as far as I know this has now been addressed and the Linux kernel
now knows about hyperthreaded cpus.  It seems that with recent kernels
that the cpu allocation works okay even in the presence of fake cpus
through hyperthreading.  So I think the problem you describe is now
behind us.

Bob


signature.asc
Description: Digital signature


Re: need help making shell script use two CPUs/cores

2011-01-09 Thread shawn wilson
On Jan 9, 2011 3:09 PM, Stan Hoeppner s...@hardwarefreak.com wrote:

 shawn wilson put forth on 1/9/2011 11:43 AM:
  On Jan 9, 2011 12:17 PM, Stan Hoeppner s...@hardwarefreak.com wrote:
 
  Camaleón put forth on 1/9/2011 10:59 AM:
 
  http://www.imagemagick.org/Usage/api/#speed
 
  The above doc provides hints on how to speed-up image magick
operations.
 
  Note that multi-threading should be automatically used whether
possible,
  as per this paragraph:
 
  ***
  # IM by default uses multiple threads for image processing operations.
  That means you can have the computer do two or more separate threads
of
  image processing, it will be faster than a single CPU machine.
  ***
 
  I'm afraid you will have to find out whether your IM package was
  compiled
  with multi-threading capablities.
 
  I'm using the i386 Lenny package.  Obviously it wasn't, or it would be
  working, and it is not.
 
  No script ideas Camaleón?  You're not a script kiddie?
 
 
  If parallel does actually have the same args as xargs than you should be
  able to convert this fairly easily:
 
  find -type f -iname *. jpg -print0 ¦ xargs -0 -i{} convert {} -resize
1024
  {}

 I don't quite follow this Shawn.  Will this command line simply
 simultaneously launch one convert process for each jpg file in the
 directory?  I.e. if I have 500 photos in the directory will this command
 line simply fire up 500 simultaneous convert processes?


I think your question has been answered. However what that does is find all
jpg files with a case insensitive match (iname vs name). The print0 is
pretty specific to xargs (though you could probably just -print and pipe it
through and do {} in xargs with the same effect). xargs takes that input
and knows about find's -print0 with the -0 switch and -i{} tells it to use
{} as a place holder for what it gets as input.

I prefer to use find for searching for files because it is fast, very
customizable and has a nice File::Find template builder with find2perl. Also
since Bob pointed out that xargs has the -P option, you might just use that
along with find's searching and I don't think you'll get much better results
(use -type f with find will speed that up a bit too).


Re: need help making shell script use two CPUs/cores

2011-01-09 Thread Stan Hoeppner
Bob Proulx put forth on 1/9/2011 3:12 PM:

 GNU xargs has an extension to run jobs in parallel.  This is already
 installed on your system.  (But won't work on other Unix systems.)
 
   for k in *.JPG; do echo $k; done | xargs -I{} -P4 echo convert {} -resize 
 1024 {}
 
 Verify that does what you want and then remove the echo.

Thank you Bob.  This is EXACTLY what I was looking for.  It does exactly what I
want, in the precise way I want.  And it's such a simple modification of my
original script, not requiring a big rewrite.  Excellent. :)

I'm using -P2 as the target system is an old dual proc server, two single core
CPUs.  I made three timed runs against 11 camera photo files, first using -P4,
then -P2, then the original script.  The two process run was 5 seconds faster
and consuming half as much memory as the 4 processes run, and the -P2 overall
run time was almost exactly half that of the original script.

Very excellent indeed.  Thanks again Bob.  You rock.  :)

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4d2a78d7.5000...@hardwarefreak.com



Re: need help making shell script use two CPUs/cores

2011-01-09 Thread Stefan Monnier
 unfortunately that simple approach is harder to do with my renaming
 scheme.  So I would probably write a helper script that did the
 options to convert and renamed the file and so forth.

   for k in *.JPG; do
 base=$(basename $k .JPG)
 test -f $base.1024.jpg  continue  # skip if already done
 echo $k;
   done | xargs -L1 -P4 echo my-convert-helper

 And my-convert-helper could take the argument and apply the options in
 the order needed and so forth.

If you want to use the renaming form of the command (which I also tend
to prefer), then I think that using a Makefile makes a lot of sense (and
GNU make's -j argument lets you specify parallel behavior).


Stefan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/jwvlj2tz6bg.fsf-monnier+gmane.linux.debian.u...@gnu.org



Re: need help making shell script use two CPUs/cores

2011-01-09 Thread Boyd Stephen Smith Jr.
In 2011011500.46f09b...@kev.msw.wpafb.af.mil, Karl Vogel wrote:
 On Sun, 09 Jan 2011 10:05:43 -0600,
 Stan Hoeppner s...@hardwarefreak.com said:
S #! /bin/sh
S for k in $(ls *.JPG); do convert $k -resize 1024 $k; done

   Someone was ragging on you to let the shell do the file expansion.  I
   like your way better because most scripting shells aren't smart enough
   to realize that when there aren't any .JPG files, I don't want the
   script to echo '*.JPG' as if that's actually useful.

$(ls *.ext) splits into arguments at each run of shell-whitespace in the ls 
output.
*.ext splits into arguments at the end of each filename.
If you want to do the right thing, independent of the characters in the 
filenames and the value of the IFS environment variable, use the later.

TL;DR: *.ext work when filenames contain spaces; $(ls *.ext) doesn't.
-- 
Boyd Stephen Smith Jr.   ,= ,-_-. =.
b...@iguanasuicide.net   ((_/)o o(\_))
ICQ: 514984 YM/AIM: DaTwinkDaddy `-'(. .)`-'
http://iguanasuicide.net/\_/


signature.asc
Description: This is a digitally signed message part.