On 9/30/12 2:21 PM, Liviu Nicoara wrote:
Forwarding with the attachment.
-------- Original Message --------
Subject: Re: STDCXX-1071 numpunct facet defect
Date: Sun, 30 Sep 2012 12:09:10 -0600
From: Martin Sebor <[email protected]>
To: Liviu Nicoara <[email protected]>
On 9/27/12 8:27 PM, Martin Sebor wrote:
Here are my timings for library-reduction.cpp when compiled
GCC 4.5.3 on Solaris 10 (4 SPARCV9 CPUs). I had to make a small
number of trivial changes to get it to compile:
With cache No cache
real 1m38.332s 8m58.568s
user 6m30.244s 34m25.942s
sys 0m0.060s 0m3.922s
I also experimented with the program on Linux (CEL 4 with 16
CPUs). Initially, I saw no differences between the two versions.
So I modified it a bit to make it closer to the library (the
modified program is attached). With those changes the timings
I see the difference -- your program has a virtual function it calls from the
inline grouping function.
are below:
With cache No cache
real 0m 1.107s 0m 5.669s
user 0m17.204s 0m 5.669s
sys 0m 0.000s 0m22.347s
I also recompiled and re-ran the test on Solaris. To speed
things along, I set the number threads and loops to 8 and
1000000. The numbers are as follows:
With cache No cache
real 0m3.341s 0m26.333s
user 0m13.052s 1m37.470s
sys 0m0.009s 0m0.132s
The numbers match my expectation. The overhead without the
"numpunct cache" is considerable.
I have done another (smaller) round of measurements, this time using the test
program you posted. Here are the results:
* iMac, 4x Intel, 12S:
16, 10000000:
Cached Not cached
real 0m9.300s 0m5.224s
user 0m36.441s 0m20.523s
sys 0m0.043s 0m0.068s
* iMac, 4x Intel, 12D:
Cached Not cached
real 0m9.012s 0m5.774s
user 0m35.343s 0m20.997s
sys 0m0.045s 0m0.183s
* Linux Slackware, 16x AMD Opteron, 12S:
16, 10000000:
Cached Not cached
real 0m29.798s 0m3.278s
user 0m48.662s 0m47.338s
sys 6m18.525s 0m3.298s
Somewhat unexpectedly, the test with the cache didn't crash.
On my iMac it did not crash for me either (gcc 4.5.4), this time. On the other
box (gcc 4.5.2) crashed every time with caching, so I had to add a call to
fac.grouping outside the thread function to initialize the "facet".
Liviu