Re: lzo2 shows insane speed gap

2009-01-07 Thread Tim Kientzle

Kris Kennaway wrote:

Christian Weisgerber wrote:

Bruce Cran br...@cran.org.uk wrote:


I'm running 8.0-CURRENT amd64 here on a Turion64 X2 machine. Without
malloc debugging (malloc.conf - aj) 'make test' takes 25s; after
removing malloc.conf thus turning on debugging, it takes over 10
minutes.

...

But still.  Two orders of magnitude?  That is a pathological case.


Probably it means that lzo2 is doing pathological numbers of mallocs.


Rather, the lzo2 test suite.  Test suites do tend
to hammer malloc() pretty hard.  I see similar variations
for the libarchive test suite with malloc debugging.

Tim
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: lzo2 shows insane speed gap

2009-01-04 Thread Kris Kennaway

Christian Weisgerber wrote:

Bruce Cran br...@cran.org.uk wrote:


I'm running 8.0-CURRENT amd64 here on a Turion64 X2 machine. Without
malloc debugging (malloc.conf - aj) 'make test' takes 25s; after
removing malloc.conf thus turning on debugging, it takes over 10
minutes.


Wow!  That.  Is.  It.

Toggling malloc debugging option J makes the slow machines fast
and vice versa.


Athlon 64 X2 5200+ 2.6 GHz,  FreeBSD 8.0-CURRENT amd64   ~60 min


19 seconds.

I guess that falls under the obvious configuration differences
to check, but since it usually doesn't cause a significant slowdown
I completely forgot about it.  Embarrassing.

But still.  Two orders of magnitude?  That is a pathological case.



Probably it means that lzo2 is doing pathological numbers of mallocs.

Kris
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: lzo2 shows insane speed gap

2008-12-30 Thread Christian Weisgerber
Bruce Cran br...@cran.org.uk wrote:

 I'm running 8.0-CURRENT amd64 here on a Turion64 X2 machine. Without
 malloc debugging (malloc.conf - aj) 'make test' takes 25s; after
 removing malloc.conf thus turning on debugging, it takes over 10
 minutes.

Wow!  That.  Is.  It.

Toggling malloc debugging option J makes the slow machines fast
and vice versa.

  Athlon 64 X2 5200+ 2.6 GHz,  FreeBSD 8.0-CURRENT amd64   ~60 min

19 seconds.

I guess that falls under the obvious configuration differences
to check, but since it usually doesn't cause a significant slowdown
I completely forgot about it.  Embarrassing.

But still.  Two orders of magnitude?  That is a pathological case.

-- 
Christian naddy Weisgerber  na...@mips.inka.de

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: lzo2 shows insane speed gap

2008-12-30 Thread Dag-Erling Smørgrav
Christian Weisgerber na...@mips.inka.de writes:
 Oh, and everybody is invited to run

 $ cd /usr/ports/archivers/lzo2  make

I assume you meant time make.

This is insane:

 3108.27 real  1215.69 user  1888.06 sys

on an E6600 with 4 GB RAM.

What surprises me most is the high sys time.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


lzo2 shows insane speed gap

2008-12-29 Thread Christian Weisgerber
The archivers/lzo2 port runs a series of regression tests after the
actual build.  These tests show extremely divergent behavior on
different machines.  There are two types of machines:

Type #1:
  Running the tests takes roughly the same time as configure and
  compile did, whether it's 30 seconds on a fast machine or 10
  minutes on an old slow one.

Type #2:
  Running the tests takes much, much, MUCH longer.

I've tried this across alpha, amd64, i386, and sparc64, partially
on FreeBSD, partially on OpenBSD.  The operating system doesn't
matter and there is no pattern related to endianness or 32/64 bits.

You can find machines that are the same architecture (e.g. amd64)
and are of similar overall speed (e.g. an Intel Xeon Xeon E5405 and
an AMD Phenom 9350e) and one of these machines will be type #1 and
the other will be #2 and take _a hundred_ times longer to run the
tests.  A hundred times.

I have never seen anything like this before.

On the slow machines, the tests also consume a lot of system time.
I've seen figures from 20 to 50%.  However, ktrace shows nothing
out of the ordinary.

My best guess at this time is that lzo2 somehow manages to induce
crazy cache thrashing on some CPU models.

Ideas and explanations welcome.

-- 
Christian naddy Weisgerber  na...@mips.inka.de

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: lzo2 shows insane speed gap

2008-12-29 Thread Dimitry Andric
On 2008-12-29 22:25, Christian Weisgerber wrote:
 On the slow machines, the tests also consume a lot of system time.
 I've seen figures from 20 to 50%.  However, ktrace shows nothing
 out of the ordinary.

What's up with the memory on these machines?  Lzo tends to take insane 
amounts
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: lzo2 shows insane speed gap

2008-12-29 Thread Dimitry Andric
On 2008-12-30 00:17, Dimitry Andric wrote:
 What's up with the memory on these machines?  Lzo tends to take insane 
 amounts

Duh, nevermind... I'm confusing this with lzma. :)  Sorry for the noise.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: lzo2 shows insane speed gap

2008-12-29 Thread Nate Eldredge

On Mon, 29 Dec 2008, Christian Weisgerber wrote:


The archivers/lzo2 port runs a series of regression tests after the
actual build.  These tests show extremely divergent behavior on
different machines.  There are two types of machines:

Type #1:
 Running the tests takes roughly the same time as configure and
 compile did, whether it's 30 seconds on a fast machine or 10
 minutes on an old slow one.

Type #2:
 Running the tests takes much, much, MUCH longer.

I've tried this across alpha, amd64, i386, and sparc64, partially
on FreeBSD, partially on OpenBSD.  The operating system doesn't
matter and there is no pattern related to endianness or 32/64 bits.

You can find machines that are the same architecture (e.g. amd64)
and are of similar overall speed (e.g. an Intel Xeon Xeon E5405 and
an AMD Phenom 9350e) and one of these machines will be type #1 and
the other will be #2 and take _a hundred_ times longer to run the
tests.  A hundred times.

I have never seen anything like this before.


It might be good first to rule out compiler / library differences.

First, can you isolate a single lzo command / input combination whose time 
differs dramatically?  This would simplify tests compared to running the 
whole test suite.  (It should be easy because it looks like the test suite 
prints the time for each test.)  It might also simplify things to work on 
one fast and one slow machine.


Then try copying the lzo binary from the fast machine to the slow 
machine (and vice versa) and see if the same test speeds up with the 
copied binary.  If not, try again with the binary statically linked.  If 
still not, it would be good to have a copy of the binary made available, 
along with more information about the fast and slow machines (CPU, 
amount of memory, load on the machine, kernel version, disk, etc).


If the copied binary isn't faster than the natively produced one, then it 
would be good to have information about the compiler options, versions, 
etc.


--

Nate Eldredge
neldre...@math.ucsd.edu
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: lzo2 shows insane speed gap

2008-12-29 Thread Yuri

Christian Weisgerber wrote:

skipped

My best guess at this time is that lzo2 somehow manages to induce
crazy cache thrashing on some CPU models.

Ideas and explanations welcome.
  


Did you ask the author? He might be the best person to ask.

Yuri

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: lzo2 shows insane speed gap

2008-12-29 Thread Mel
On Monday 29 December 2008 12:25:00 Christian Weisgerber wrote:
 On the slow machines, the tests also consume a lot of system time.
 I've seen figures from 20 to 50%.  However, ktrace shows nothing
 out of the ordinary.

If the program itself doesn't directly cause the system time, do interrupt 
rates give any hint as to what does?
And to rule out the obvious, you did check swapping?
-- 
Mel
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: lzo2 shows insane speed gap

2008-12-29 Thread Yuri

Christian Weisgerber wrote:

My best guess at this time is that lzo2 somehow manages to induce
crazy cache thrashing on some CPU models.

Ideas and explanations welcome



Try running single command that is different on different machines under
valgrind (callgrind) on these machines and see that at least number of 
instructions

executed is the same.

Lzo2 documentation says that there are a lot of algorithms implemented.
It might be choosing the algorithm based on the CPU and the choice it's 
making might be bad.


Yuri
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: lzo2 shows insane speed gap

2008-12-29 Thread Artem Belevich
I see this performance difference on my boxes.

First one has Core2Duo(E5-something), 4GB and  runs RELENG_7/i386.
lzotest is very fast.
Second box is Core2Quad (Q9450), 8GB RAM and runs -current as of about
a week ago. lzo2 binary built from ports is *slow*. However, 32-bit
binary from the first box runs very fast.

The only interesting difference I can see in ktrace is that read and
munmap take much much longer in case of 64-bit lzotest.

Here are two excerpts from ktrace on the second box:
### 32-bit app - runs fast on both boxes.

 59657 lzotest  0.10 CALL  open(0xd91b,O_RDONLY,unused0x1b6)
 59657 lzotest  0.07 NAMI  ./src/lzo1_d.ch
 59657 lzotest  0.12 RET   open 3
 59657 lzotest  0.05 CALL  fstat(0x3,0xd504)
 59657 lzotest  0.07 STRU  struct stat {dev=102, ino=544718,
mode=-rw-r--r-- , nlink=1, uid=0, gid=0, rdev=2169160,
atime=1230595144, stime=1209559909, ctime=1230588212,
birthtime=1209559909, size=4563, blksize=4096, blocks=12, flags=0x0 }
 59657 lzotest  0.05 RET   fstat 0
 59657 lzotest  0.06 CALL  lseek(0x3,0,SEEK_SET,0x1)
 59657 lzotest  0.05 RET   lseek 0
 59657 lzotest  0.05 CALL  lseek(0x3,0x400,SEEK_SET,0)
 59657 lzotest  0.05 RET   lseek 67108864/0x400
 59657 lzotest  0.06 CALL  lseek(0x3,0,SEEK_SET,0)
 59657 lzotest  0.05 RET   lseek 0
 59657 lzotest  0.05 CALL
mmap(0,0x400,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,0x,0,0)
 59657 lzotest  0.07 RET   mmap 673185792/0x2820
 59657 lzotest  0.06 CALL  read(0x3,0x28196000,0x1000)
 59657 lzotest  0.10 GIO   fd 3 read 4096 bytes
 59657 lzotest  0.29 RET   read 4096/0x1000
 59657 lzotest  0.28 CALL  read(0x3,0x28196000,0x1000)
 59657 lzotest  0.10 GIO   fd 3 read 467 bytes
 59657 lzotest  0.05 RET   read 467/0x1d3
 59657 lzotest  0.10 CALL  read(0x3,0x28196000,0x1000)
 59657 lzotest  0.07 GIO   fd 3 read 0 bytes
 59657 lzotest  0.06 RET   read 0
 59657 lzotest  0.05 CALL  close(0x3)
 59657 lzotest  0.10 RET   close 0
 59657 lzotest  0.25 CALL  getrusage(0,0xd60c)
 59657 lzotest  0.06 RET   getrusage 0
 59657 lzotest  0.05 CALL  getrusage(0,0xd628)
 59657 lzotest  0.06 RET   getrusage 0
 59657 lzotest  0.05 CALL  getrusage(0,0xd60c)
 59657 lzotest  0.06 RET   getrusage 0
 59657 lzotest  0.64 CALL  getrusage(0,0xd60c)
 59657 lzotest  0.06 RET   getrusage 0
 59657 lzotest  0.05 CALL  getrusage(0,0xd60c)
 59657 lzotest  0.06 RET   getrusage 0
 59657 lzotest  0.29 CALL  getrusage(0,0xd60c)
 59657 lzotest  0.06 RET   getrusage 0
 59657 lzotest  0.12 CALL  getrusage(0,0xd60c)
 59657 lzotest  0.36 RET   getrusage 0
 59657 lzotest  0.10 CALL  write(0x1,0x28194000,0x4f)
 59657 lzotest  0.10 GIO   fd 1 wrote 79 bytes
 59657 lzotest  0.06 RET   write 79/0x4f
 59657 lzotest  0.06 CALL  munmap(0x2820,0x400)
 59657 lzotest  0.17 RET   munmap 0

### same file. 64-bit app (slow). Look at read/munmap

 59158 lzotest  0.15 CALL  open(0x7fffe760,O_RDONLY,unused0x1b6)
 59158 lzotest  0.14 NAMI  ./src/lzo1_d.ch
 59158 lzotest  0.24 RET   open 3
 59158 lzotest  0.11 CALL  fstat(0x3,0x7fffe2d0)
 59158 lzotest  0.11 STRU  struct stat {dev=102, ino=544718,
mode=-rw-r--r-- , nlink=1, uid=0, gid=0, rdev=2169160,
atime=1230588427, stime=1209559909, ctime=1230588212,
birthtime=1209559909, size=4563, blksize=4096, blocks=12, flags=0x0 }
 59158 lzotest  0.07 RET   fstat 0
 59158 lzotest  0.15 CALL  lseek(0x3,0,SEEK_CUR)
 59158 lzotest  0.07 RET   lseek 0
 59158 lzotest  0.06 CALL  lseek(0x3,0x400,SEEK_SET)
 59158 lzotest  0.07 RET   lseek 67108864/0x400
 59158 lzotest  0.07 CALL  lseek(0x3,0,SEEK_SET)
 59158 lzotest  0.06 RET   lseek 0
 59158 lzotest  0.08 CALL
mmap(0,0x400,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON,0x,0)
 59158 lzotest  0.10 RET   mmap 11534336/0x800b0
 59158 lzotest  0.074126 CALL  read(0x3,0x800a9e000,0x1000)
 59158 lzotest  0.54 GIO   fd 3 read 4096 bytes
 59158 lzotest  0.10 RET   read 4096/0x1000
 59158 lzotest  0.07 CALL  read(0x3,0x800a9e000,0x1000)
 59158 lzotest  0.12 GIO   fd 3 read 467 bytes
 59158 lzotest  0.06 RET   read 467/0x1d3
 59158 lzotest  0.07 CALL  read(0x3,0x800a9e000,0x1000)
 59158 lzotest  0.09 GIO   fd 3 read 0 bytes
 59158 lzotest  0.06 RET   read 0
 59158 lzotest  0.08 CALL  close(0x3)
 59158 lzotest  0.20 RET   close 0
 59158 lzotest  0.29 CALL  getrusage(0,0x7fffe3d0)
 59158 lzotest  0.10 RET   getrusage 0
 59158 lzotest  0.07 CALL  getrusage(0,0x7fffe3e0)
 59158 lzotest  0.07 RET   getrusage 0
 59158 lzotest  0.07 CALL  getrusage(0,0x7fffe3d0)
 59158 lzotest  0.07 RET   getrusage 0
 59158 lzotest  0.69 CALL  getrusage(0,0x7fffe3d0)
 59158 lzotest  0.07 RET   getrusage 0
 59158 lzotest  0.06 CALL  

Re: lzo2 shows insane speed gap

2008-12-29 Thread Christian Weisgerber
Mel fbsd.hack...@rachie.is-a-geek.net wrote:

 If the program itself doesn't directly cause the system time, do interrupt 
 rates give any hint as to what does?

systat -vmstat shows a conspicuously large number of traps, I think.
(I'm short on comparable FreeBSD machines.)

 And to rule out the obvious, you did check swapping?

No swapping.

-- 
Christian naddy Weisgerber  na...@mips.inka.de

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: lzo2 shows insane speed gap

2008-12-29 Thread Christian Weisgerber
Nate Eldredge:

 It might be good first to rule out compiler / library differences.

Sure.  Let's cut this short:

Slow
Athlon 64 X2 5200+ 2.6 GHz,  FreeBSD 8.0-CURRENT amd64   ~60 min
Phenom 9350e 2.0 GHz,OpenBSD 4.4-CURRENT amd64   ~80 min
UltraSPARC-IIe 500 MHz (Blade 100),  OpenBSD 4.4-CURRENT sparc64  10 h++

Fast
Pentium 4 3.0 GHz,   FreeBSD 6.4-RELEASE i386 36 s
Xeon E5405 2.0 GHz (PowerEdge 1950), OpenBSD 4.4-CURRENT amd6447 s
Alpha 21164A 500 MHz (AlphaPC164),   OpenBSD 4.4-CURRENT alpha 9 min

Let me draw your attention to the fact that the two amd64 systems
that run different operating systems are both slow, whereas the two
amd64 systems that run the same operating system (compiler, libraries)
diverge in speed.


Oh, and everybody is invited to run

$ cd /usr/ports/archivers/lzo2  make

and check for themselves.


PS: The Blade 100 is still crunching as I write this...
-- 
Christian naddy Weisgerber  na...@mips.inka.de
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: lzo2 shows insane speed gap

2008-12-29 Thread Bruce Cran
On Tue, 30 Dec 2008 01:47:47 +0100
Christian Weisgerber na...@mips.inka.de wrote:

 Nate Eldredge:
 
  It might be good first to rule out compiler / library differences.
 
 Sure.  Let's cut this short:
 
 Slow
 Athlon 64 X2 5200+ 2.6 GHz,  FreeBSD 8.0-CURRENT amd64   ~60
 min Phenom 9350e 2.0 GHz,OpenBSD 4.4-CURRENT amd64
 ~80 min UltraSPARC-IIe 500 MHz (Blade 100),  OpenBSD 4.4-CURRENT
 sparc64  10 h++
 
 Fast
 Pentium 4 3.0 GHz,   FreeBSD 6.4-RELEASE i386 36 s
 Xeon E5405 2.0 GHz (PowerEdge 1950), OpenBSD 4.4-CURRENT amd6447 s
 Alpha 21164A 500 MHz (AlphaPC164),   OpenBSD 4.4-CURRENT alpha 9
 min
 
 Let me draw your attention to the fact that the two amd64 systems
 that run different operating systems are both slow, whereas the two
 amd64 systems that run the same operating system (compiler, libraries)
 diverge in speed.
 
 
 Oh, and everybody is invited to run
 
 $ cd /usr/ports/archivers/lzo2  make

I'm running 8.0-CURRENT amd64 here on a Turion64 X2 machine. Without
malloc debugging (malloc.conf - aj) 'make test' takes 25s; after
removing malloc.conf thus turning on debugging, it takes over 10
minutes.

-- 
Bruce Cran
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: lzo2 shows insane speed gap

2008-12-29 Thread Kevin Day



Oh, and everybody is invited to run

$ cd /usr/ports/archivers/lzo2  make

and check for themselves.




I've used lzo2 quite a bit in the past and never saw this, so I  
thought I'd try this on a few boxes we have... Output is from make  
fetch ; time make


8-core Opteron 2350 2.0ghz, 64GB RAM, FreeBSD 7.1-PRERELEASE (just  
before RC1 was tagged), amd64

41.464u 20.671s 1:02.04 100.1%  2430+1556k 0+0io 377pf+0w

4-core Opteron 280 2.4ghz, 4GB RAM, FreeBSD 7.0-RELEASE-p6, amd64
40.907u 18.638s 1:03.08 94.3%   2339+603k 182+91io 681pf+0w

Dual Athlon MP 2100+ 1.73ghz, 1GB RAM, FreeBSD 6.3-RELEASE, i386
82.812u 44.963s 2:06.89 100.6%  959+37724k 32+82io 46pf+0w

Dual P3 850mhz, 1GB RAM, FreeBSD 7.0-RELEASE-p4, i386
208.494u 84.935s 8:07.23 60.2%  2270+990k 17+87io 60pf+0w

4-core Opteron 2218 2.6ghz, 16GB RAM, FreeBSD 7.0-RELEASE-p4, amd64
38.893u 16.623s 0:55.53 99.9%   2290+591k 96+99io 48pf+0w

Dual Xeon 3.06GHz, 4GB RAM, FreeBSD 7.0-RELEASE-p4, i386
60.910u 24.667s 1:22.54 103.6%  2143+988k 146+134io 105pf+0w

Dual P3 866mhz, 2GB RAM, FreeBSD 7.0-RELEASE-p4, i386
169.135u 58.198s 3:52.71 97.6%  2443+1002k 160+99io 368pf+0w

2-core Core 2 Duo 2.33ghz, 2GB RAM, Mac OS X 10.5.6, i386
48.155u 29.896s 1:25.14 91.6%   0+0k 30+222io 1845pf+0w

4-core Xeon 2.66ghz, 6GB RAM, Mac OS X 10.5.6, i386
real1m17.024s user  0m44.373s sys   0m34.249s


None of these boxes were idle, so relative times are pretty useless,  
but i'm not seeing anything on the order of tens of minutes or hours.  
Is the source .tar.gz identical on all your systems?


-- Kevin

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org


Re: lzo2 shows insane speed gap

2008-12-29 Thread Alex Kozlov
Christian Weisgerber na...@mips.inka.de wrote:

 Oh, and everybody is invited to run   
 
 $ cd /usr/ports/archivers/lzo2  make
$cd /usr/ports/archivers/lzo2  time sudo make
[...]
All tests passed. Now you are ready to install LZO.

real1m1.041s
user0m38.087s
sys 0m17.613s

This is Intel q6600.


--
Adios
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org