Re: STDCXX-1056 [was: Re: STDCXX forks]

Liviu Nicoara Sun, 16 Sep 2012 16:45:02 -0700

On 9/16/12 3:20 AM, Stefan Teleman wrote:
> On Sat, Sep 15, 2012 at 4:53 PM, Liviu Nicoara <[email protected]> wrote:
>
>> Now, to clear the confusion I created: the timing numbers I posted in the
>> attachment stdcxx-1056-timings.tgz to STDCXX-1066 (09/11/2012) showed that a
>> perfectly forwarding, no caching public interface (exemplified by a changed
>> grouping) performs better than the current implementation. It was that test
>> case that I hoped you could time, perhaps on SPARC, in both MT and ST
>> builds. The t.cpp program is for MT, s.cpp for ST.
>
> I got your patch, and have tested it.
>
> I have created two Experiments (that's what they are called) with the
> SunPro Performance Analyzer. Both experiments are targeting race
> conditions and deadlocks in the instrumented program,  and both
> experiments are running the 22.locale.numpunct.mt program from the
> stdcxx test harness. One experiment is with  your patch applied. The
> other experiment is with our (Solaris) patch applied.
>
> Here are the results:


I looked at the analysis more closely.

>
> 1. with your patch applied:
>
> http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.nts/

I see here (http://tinyurl.com/94pbmzc) that the implementation of the facetpublic interface is forwarding, with no caching.


>
> 2. with our (Solaris) patch applied:
>
> http://s247136804.onlinehome.us/22.locale.numpunct.mt.1.er.ts/

Unfortunately, can't do the same here. Could you please refresh my memory whatdoes the patch contain? This patch is not part of the patch set you publishedhere earlier (http://tinyurl.com/8pyql4g)?

AFAICT, the race accesses that the analyzer points out are writes to sharedlocations which occur along the thread execution path. They do not necessarilymean that a race condition exists, and in fact we know that no race conditionexists if the public facet interface forwards to the protected virtualinterface. Which is what was tested in the first analysis, looking at_numpunct.h: http://tinyurl.com/94pbmzc

Looking elsewhere, also in the first analysis, the __rw_get_numpunct function(src link points here: http://tinyurl.com/8ez85e2). All highlighted lines, eachperforming a write to shared locations, are potential race points, but do notlead to race conditions because of the proper synchronization we know occurs inthe __rw_setlocale class.

The number of race accesses in __rw_get_numpunct sums up to ~3400 race accesses,with a forwarding patch. That you pointed out in a later email. That number wasa bit puzzling, but then looking at the thread function I see the test uses thenumpunct test suite code, which creates a locale and extracts the facet from itin each iteration.

That means that, ideally, for 4 threads iterating 10000 times, I would expectlocales being created 40K times, and so for the facets and so for the__rw_get_numpunct calls, etc. The number or race accesses collected, far lessthan that, could be explained by a lesser degree of thread overlapping? I.e.,some threads start earlier, others later, and only partially overlap?

If that is the case I would not ascribe much importance to these numbers. As Ithink was pointed out earlier, a numpunct facet is initialized at the first tripin __rw_get_numpunct and that trip is (only then) properly synchronized. Allsubsequent trips in __rw_get_numpunct find the facet data already there and theyjust read it, no synchronization needed, and return it. Therefore, the cost ofinitialization/synchronization is paid only once.


Thanks.

Liviu

Re: STDCXX-1056 [was: Re: STDCXX forks]

Reply via email to