Re: [PATCH] Use __rw_atomic_xxx() on Windows

2007-09-07 Thread Martin Sebor

Travis Vitek wrote:

Doh! I should know better. Here is the results from a 12d build on the
same hardware.


Does this mean that there is almost no difference between the
intrinsic functions and the out of line ones, or that the test
is too simple to demonstrate them?

I expect the greatest advantage of the intrinsics over ordinary 
out-of-line functions to be that they (might) make it possible

for the optimizer to generate better code *in certain contexts*
depending on from where they are called. This is going to be
hard to demonstrate in a simple test case. I suspect we would
need a more realistic test with a number of different uses of
string (and the atomic functions) to get some idea of how much
they might help.

Martin



  normal  patched
  --  1 threads   --  1 threads
  ms934   ms   1015
  ms/op  0.5567   ms/op  0.6050
  --  2 threads   --  2 threads
  ms   6049   ms   6266
  ms/op  0.00036055   ms/op  0.00037348
  --  4 threads   --  4 threads
  ms  11948   ms  11813
  ms/op  0.00071216   ms/op  0.00070411
  --  8 threads   --  8 threads
  ms  23855   ms  24743
  ms/op  0.00142187   ms/op  0.00147480 




Martin Sebor wrote:

8d is not thread-safe so the atomic function templates should
be implemented in terms of ordinary increments and decrements
(if they aren't it's a bug). They should only expand to the
atomic assembly (or the Win32 Interlocked) functions in 12X
and 15X build types.

Martin





RE: [PATCH] Use __rw_atomic_xxx() on Windows

2007-09-06 Thread Travis Vitek

Doh! I should know better. Here is the results from a 12d build on the
same hardware.

  normal  patched
  --  1 threads   --  1 threads
  ms934   ms   1015
  ms/op  0.5567   ms/op  0.6050
  --  2 threads   --  2 threads
  ms   6049   ms   6266
  ms/op  0.00036055   ms/op  0.00037348
  --  4 threads   --  4 threads
  ms  11948   ms  11813
  ms/op  0.00071216   ms/op  0.00070411
  --  8 threads   --  8 threads
  ms  23855   ms  24743
  ms/op  0.00142187   ms/op  0.00147480 



Martin Sebor wrote:
>
>8d is not thread-safe so the atomic function templates should
>be implemented in terms of ordinary increments and decrements
>(if they aren't it's a bug). They should only expand to the
>atomic assembly (or the Win32 Interlocked) functions in 12X
>and 15X build types.
>
>Martin
>


Re: [PATCH] Use __rw_atomic_xxx() on Windows

2007-09-06 Thread Martin Sebor

Farid Zaripov wrote:

-Original Message-
From: Martin Sebor [mailto:[EMAIL PROTECTED] 
Sent: Thursday, September 06, 2007 5:49 AM

To: stdcxx-dev@incubator.apache.org
Subject: Re: [PATCH] Use __rw_atomic_xxx() on Windows

Travis Vitek wrote:
Oh, yeah. that is the other thing that I did Friday. I wrote a 
testcase to compare __rw_atomic_add32() against 

InterlockedIncrement() on Win32.

There is a performance penalty...
I'd be curious to know if the performance penalty is due to 
the function call overhead or something else.


In any case though, I think we could tweak the patch and 
change the __rw_atomic_pre{de,in}crement() overloads for int 
and long to call the appropriate Interlocked{De,In}crement() 
intrinsics and have the other overloads use the new ones.


Farid, what do you think about this approach?


  I agree. And I decided to make the changes above
without making the tests to see the performance penalty.


Okay, that sounds like a good approach to me. You can still
commit the int and long atomic functions, we just won't call
them.

Martin


RE: [PATCH] Use __rw_atomic_xxx() on Windows

2007-09-06 Thread Farid Zaripov
> -Original Message-
> From: Martin Sebor [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, September 06, 2007 5:49 AM
> To: stdcxx-dev@incubator.apache.org
> Subject: Re: [PATCH] Use __rw_atomic_xxx() on Windows
> 
> Travis Vitek wrote:
> > Oh, yeah. that is the other thing that I did Friday. I wrote a 
> > testcase to compare __rw_atomic_add32() against 
> InterlockedIncrement() on Win32.
> > There is a performance penalty...
> 
> I'd be curious to know if the performance penalty is due to 
> the function call overhead or something else.
> 
> In any case though, I think we could tweak the patch and 
> change the __rw_atomic_pre{de,in}crement() overloads for int 
> and long to call the appropriate Interlocked{De,In}crement() 
> intrinsics and have the other overloads use the new ones.
> 
> Farid, what do you think about this approach?

  I agree. And I decided to make the changes above
without making the tests to see the performance penalty.

Farid.


Re: [PATCH] Use __rw_atomic_xxx() on Windows

2007-09-05 Thread Martin Sebor

Travis Vitek wrote:

Since we don't have a string perf test that I could find, I wrote up a
quick and dirty one that just made many copies of the same string
repeatedly to exercise the atomic increment/decrement. The results show
a 3% performance penalty when using the newer atomic functions. This
test was run with an 8d configuration, so the atomic functions were
compiled into the stdcxx dll. The test hardware is a Lenovo T60p [Intel
Core 2 T7600 2.33GHz CPU, 2GB RAM].


8d is not thread-safe so the atomic function templates should
be implemented in terms of ordinary increments and decrements
(if they aren't it's a bug). They should only expand to the
atomic assembly (or the Win32 Interlocked) functions in 12X
and 15X build types.

Martin



  Oldnew [patched]
  --  1 threads  --  1 threads
  ms714  ms737
  ms/op  0.4256  ms/op  0.4393
  --  2 threads  --  2 threads
  ms   3911  ms   4024
  ms/op  0.00023311  ms/op  0.00023985
  --  4 threads  --  4 threads
  ms   7660  ms   7865
  ms/op  0.00045657  ms/op  0.00046879
  --  8 threads  --  8 threads
  ms  15192  ms  15585
  ms/op  0.00090551  ms/op  0.00092894

I'm wondering if we used inline assembly for the __rw_atomic_* functions
if the cost would be reduced. We could also evaluate the intrinsic
pragma that is available on MSVC.

Travis


-Original Message-

I will do a quick run using the string performance test after lunch.
I'll report the results on that later. I've pasted the source for the
bulk of my test below. If someone wants the entire thing, let me know
and I'll provide everything.

Travis





Re: [PATCH] Use __rw_atomic_xxx() on Windows

2007-09-05 Thread Martin Sebor

Travis Vitek wrote:

Oh, yeah. that is the other thing that I did Friday. I wrote a testcase
to compare __rw_atomic_add32() against InterlockedIncrement() on Win32.
There is a performance penalty...


I'd be curious to know if the performance penalty is due to the
function call overhead or something else.

In any case though, I think we could tweak the patch and change
the __rw_atomic_pre{de,in}crement() overloads for int and long
to call the appropriate Interlocked{De,In}crement() intrinsics
and have the other overloads use the new ones.

Farid, what do you think about this approach?

Martin



  C:\Temp>t 2 && t 4 && t 8
  -- locked inc  atomic_add  2 threads
  ms   42664469
  ms/op  0.3178  0.3330  -4.7586%
  thr ms  18117   18437
  thr ms/op  0.00013498  0.00013737  -1.7663%
  -- locked inc  atomic_add  4 threads
  ms   79698609
  ms/op  0.5937  0.6414  -8.0311%
  thr ms  36359   37019
  thr ms/op  0.00027090  0.00027581  -1.8152%
  -- locked inc  atomic_add  8 threads
  ms   50165484
  ms/op  0.3737  0.4086  -9.3301%
  thr ms  60846   66130
  thr ms/op  0.00045334  0.00049271  -8.6842%

  C:\Temp>t 2 && t 4 && t 8
  -- locked inc  atomic_add  2 threads
  ms   27812906
  ms/op  0.2072  0.2165  -4.4948%
  thr ms  14961   16093
  thr ms/op  0.00011147  0.00011990  -7.5663%
  -- locked inc  atomic_add  4 threads
  ms   27812891
  ms/op  0.2072  0.2154  -3.9554%
  thr ms  30867   31328
  thr ms/op  0.00022998  0.00023341  -1.4935%
  -- locked inc  atomic_add  8 threads
  ms   27822890
  ms/op  0.2073  0.2153  -3.8821%
  thr ms  64318   64341
  thr ms/op  0.00047921  0.00047938  -0.0358%

I will do a quick run using the string performance test after lunch.
I'll report the results on that later. I've pasted the source for the
bulk of my test below. If someone wants the entire thing, let me know
and I'll provide everything.

Travis


Martin Sebor wrote:

Subject: Re: [PATCH] Use __rw_atomic_xxx() on Windows

What's the status of this? We need to decide if we can put this
in 4.2 or defer it for 4.2.1. To put it in 4.2 we need to make
sure the new functions don't cause a performance regression in
basic_string. I.e., we need to see the before and after numbers.

Martin

Martin Sebor wrote:

One concern I have is performance. Does replacing the intrinsics with
out of line function call whose semantics the compiler has no idea
about have any impact on the runtime efficiency of the 

generated code?

I would be especially interested in "real life" scenarios such as the
usage of the atomic operations in basic_string.

It would be good to see some before and after numbers. If you don't
have all the platforms to run the test post your benchmark and Travis
can help you put them together.


#include 
#include 

#define WIN32_LEAN_AND_MEAN
#include 
#include 

#include "lib.h"

#define MIN_THREADS 2
#define MAX_THREADS 16

unsigned long locked_inc(long* val, long iters)
{
const unsigned long t0 = GetTickCount ();

long n;
for (n = 0; n < iters; ++n)
{
InterlockedIncrement(val);
}

const unsigned long t1 = GetTickCount ();

return (t1 - t0);
}

unsigned long atomic_add(long* val, long iters)
{
const unsigned long t0 = GetTickCount ();

long n;
for (n = 0; n < iters; ++n)
{
__rw_atomic_add32(val, 1);
}

const unsigned long t1 = GetTickCount ();

return (t1 - t0);
}

struct thread_param {

// atomic variable
long* variable;

// number of iterations
long iters;

// function to invoke
unsigned long (*fun)(long*, long);

// result of function
unsigned long result;

// thread handle used by main thread
HANDLE thread;
};

extern "C" {

void thread_func(void* p)
{
thread_param* param = (thread_param*)p;
param->result = (param->fun)(param->variable, param->iters);
}

} // extern "C"


unsigned long run_threads(int nthreads, unsigned long (*fun)(long*,
long), long iters)
{
thread_param params[MAX_THREADS];
long thread_var = 0;

int i;
for (i = 0; i < nthreads; ++i) {
params[i].variable = &thread_var;
params[i].result   = 0;
params[i].fun  = fun;
params[i].iters= iters;
}

int n;
for (n = 0; n < nthreads; ++n) {
params[n].thread = (HANDLE)_beginthread(thread_func, 0,
¶ms

RE: [PATCH] Use __rw_atomic_xxx() on Windows

2007-09-05 Thread Travis Vitek

Since we don't have a string perf test that I could find, I wrote up a
quick and dirty one that just made many copies of the same string
repeatedly to exercise the atomic increment/decrement. The results show
a 3% performance penalty when using the newer atomic functions. This
test was run with an 8d configuration, so the atomic functions were
compiled into the stdcxx dll. The test hardware is a Lenovo T60p [Intel
Core 2 T7600 2.33GHz CPU, 2GB RAM].

  Oldnew [patched]
  --  1 threads  --  1 threads
  ms714  ms737
  ms/op  0.4256  ms/op  0.4393
  --  2 threads  --  2 threads
  ms   3911  ms   4024
  ms/op  0.00023311  ms/op  0.00023985
  --  4 threads  --  4 threads
  ms   7660  ms   7865
  ms/op  0.00045657  ms/op  0.00046879
  --  8 threads  --  8 threads
  ms  15192  ms  15585
  ms/op  0.00090551  ms/op  0.00092894

I'm wondering if we used inline assembly for the __rw_atomic_* functions
if the cost would be reduced. We could also evaluate the intrinsic
pragma that is available on MSVC.

Travis

>-Original Message-
>
>I will do a quick run using the string performance test after lunch.
>I'll report the results on that later. I've pasted the source for the
>bulk of my test below. If someone wants the entire thing, let me know
>and I'll provide everything.
>
>Travis
>


RE: [PATCH] Use __rw_atomic_xxx() on Windows

2007-09-05 Thread Travis Vitek

Oh, yeah. that is the other thing that I did Friday. I wrote a testcase
to compare __rw_atomic_add32() against InterlockedIncrement() on Win32.
There is a performance penalty...

  C:\Temp>t 2 && t 4 && t 8
  -- locked inc  atomic_add  2 threads
  ms   42664469
  ms/op  0.3178  0.3330  -4.7586%
  thr ms  18117   18437
  thr ms/op  0.00013498  0.00013737  -1.7663%
  -- locked inc  atomic_add  4 threads
  ms   79698609
  ms/op  0.5937  0.6414  -8.0311%
  thr ms  36359   37019
  thr ms/op  0.00027090  0.00027581  -1.8152%
  -- locked inc  atomic_add  8 threads
  ms   50165484
  ms/op  0.3737  0.4086  -9.3301%
  thr ms  60846   66130
  thr ms/op  0.00045334  0.00049271  -8.6842%

  C:\Temp>t 2 && t 4 && t 8
  -- locked inc  atomic_add  2 threads
  ms   27812906
  ms/op  0.2072  0.2165  -4.4948%
  thr ms  14961   16093
  thr ms/op  0.00011147  0.00011990  -7.5663%
  -- locked inc  atomic_add  4 threads
  ms   27812891
  ms/op  0.2072  0.2154  -3.9554%
  thr ms  30867   31328
  thr ms/op  0.00022998  0.00023341  -1.4935%
  -- locked inc  atomic_add  8 threads
  ms   27822890
  ms/op  0.2073  0.2153  -3.8821%
  thr ms  64318   64341
  thr ms/op  0.00047921  0.00047938  -0.0358%

I will do a quick run using the string performance test after lunch.
I'll report the results on that later. I've pasted the source for the
bulk of my test below. If someone wants the entire thing, let me know
and I'll provide everything.

Travis


Martin Sebor wrote:
>Subject: Re: [PATCH] Use __rw_atomic_xxx() on Windows
>
>What's the status of this? We need to decide if we can put this
>in 4.2 or defer it for 4.2.1. To put it in 4.2 we need to make
>sure the new functions don't cause a performance regression in
>basic_string. I.e., we need to see the before and after numbers.
>
>Martin
>
>Martin Sebor wrote:
>>
>> One concern I have is performance. Does replacing the intrinsics with
>> out of line function call whose semantics the compiler has no idea
>> about have any impact on the runtime efficiency of the 
>generated code?
>> I would be especially interested in "real life" scenarios such as the
>> usage of the atomic operations in basic_string.
>> 
>> It would be good to see some before and after numbers. If you don't
>> have all the platforms to run the test post your benchmark and Travis
>> can help you put them together.
>

#include 
#include 

#define WIN32_LEAN_AND_MEAN
#include 
#include 

#include "lib.h"

#define MIN_THREADS 2
#define MAX_THREADS 16

unsigned long locked_inc(long* val, long iters)
{
const unsigned long t0 = GetTickCount ();

long n;
for (n = 0; n < iters; ++n)
{
InterlockedIncrement(val);
}

const unsigned long t1 = GetTickCount ();

return (t1 - t0);
}

unsigned long atomic_add(long* val, long iters)
{
const unsigned long t0 = GetTickCount ();

long n;
for (n = 0; n < iters; ++n)
{
__rw_atomic_add32(val, 1);
}

const unsigned long t1 = GetTickCount ();

return (t1 - t0);
}

struct thread_param {

// atomic variable
long* variable;

// number of iterations
long iters;

// function to invoke
unsigned long (*fun)(long*, long);

// result of function
unsigned long result;

// thread handle used by main thread
HANDLE thread;
};

extern "C" {

void thread_func(void* p)
{
thread_param* param = (thread_param*)p;
param->result = (param->fun)(param->variable, param->iters);
}

} // extern "C"


unsigned long run_threads(int nthreads, unsigned long (*fun)(long*,
long), long iters)
{
thread_param params[MAX_THREADS];
long thread_var = 0;

int i;
for (i = 0; i < nthreads; ++i) {
params[i].variable = &thread_var;
params[i].result   = 0;
params[i].fun  = fun;
params[i].iters= iters;
}

int n;
for (n = 0; n < nthreads; ++n) {
params[n].thread = (HANDLE)_beginthread(thread_func, 0,
¶ms[n]);
}

unsigned long thread_time = 0;

for (n = 0; n < nthreads; ++n) {
WaitForSingleObject (params[n].thread, INFINITE);
thread_time += params[n].result;
}

return thread_time;
}


int main(int argc, char* argv[])
{
int nthreads = MIN_THREADS;
if (1 < a

Re: [PATCH] Use __rw_atomic_xxx() on Windows

2007-09-04 Thread Martin Sebor

What's the status of this? We need to decide if we can put this
in 4.2 or defer it for 4.2.1. To put it in 4.2 we need to make
sure the new functions don't cause a performance regression in
basic_string. I.e., we need to see the before and after numbers.

Martin

Martin Sebor wrote:

One concern I have is performance. Does replacing the intrinsics with
out of line function call whose semantics the compiler has no idea
about have any impact on the runtime efficiency of the generated code?
I would be especially interested in "real life" scenarios such as the
usage of the atomic operations in basic_string.

It would be good to see some before and after numbers. If you don't
have all the platforms to run the test post your benchmark and Travis
can help you put them together.

FYI, in case you're not aware of this (it's not immediately obvious),
even though we define the full set of atomic operations (i.e., for all
integer types) the library only uses the overloads for int and long.

Martin

Farid Zaripov wrote:
 Attached is a patch, adding __rw_atomic_{add|xchg}xx() functions on 
Win32/Win64 platforms.


 ChangeLog:
 * msvc-7.0.config: Added AS config variable.
 * msvc-8.0-x64.config: Ditto.
 * filterdef.js: Added definition of the CustomFileDef class
 * projectdef.js (InitAsmTool): New function to init custom build rule 
for .asm files.

 * projects.js: Added definitions of the platform dependent files.
 * utilities.js: Read AS configuration variable from the .config file.
 * i86/atomic.asm: New file with definitions of the __rw_atomic_xxx() 
for Win32 platform.
 * i86_64/atomic.asm: New file with definitions of the 
__rw_atomic_xxx() for Windows/x64 platform.

 * _mutex.h: Removed all dependencies on InterlockedXXX() API functions.
 Use new __rw_atomic_xxx() functions instead of InterlockedXXX().
 * once.cpp [_WIN32 && _DLL]: Tell linker to export __atomic_xxx()
 functions, defined in .asm files.

Farid.




Index: etc/config/windows/filterdef.js
===
--- etc/config/windows/filterdef.js(revision 570339)
+++ etc/config/windows/filterdef.js(working copy)
@@ -25,7 +25,7 @@
 
 var sourceFilterName = "Source Files";

 var sourceFilterUuid = "{4FC737F1-C7A5-4376-A066-2A32D752A2FF}";
-var sourceFilterExts = ".cpp;.cxx;.s";
+var sourceFilterExts = ".cpp;.cxx;.s;.asm";
 
 var headerFilterName = "Header Files";

 var headerFilterUuid = "{93995380-89BD-4b04-88EB-625FBE52EBFB}";
@@ -56,6 +56,21 @@
 return str;
 }
 
+//

+// CustomFileDef class
+//
+
+// CustomFileDef .ctor
+function CustomFileDef(filepath, platform, initfun)
+{
+this.filepath = filepath;
+this.platform = platform;
+this.initfun  = initfun;
+}
+
+// global array with platform dependent files definitions
+var customFileDefs = new Array();
+
 // common macros
 var cmnMacros = new Array();
 
@@ -126,7 +141,29 @@

 var VCFile = filter.AddFile(filename);
 if (null != filetype && typeof(VCFile.FileType) != "undefined")
 VCFile.FileType = filetype;
-+
+var customFileDef = null;
+
+if (!exclude)
+{
+// find the platform dependent file definition
+for (var i = 0; i < customFileDefs.length; ++i)
+{
+var custFileDef = customFileDefs[i];
+var pos = VCFile.FullPath.length - 
custFileDef.filepath.length;
+if (0 <= pos && pos == 
VCFile.FullPath.indexOf(custFileDef.filepath))

+{
+customFileDef = custFileDef;
+break;
+}
+}
+
+// exclude this file from build if current platform
+// is not custom file target platform
+if (null != customFileDef && customFileDef.platform != PLATFORM)
+exclude = true;
+}
+ if (exclude)
 {
 var cfgs = VCFile.FileConfigurations;
@@ -144,6 +181,12 @@
 cfg.ExcludedFromBuild = exclude;
 }
 }
+else if (null != customFileDef &&
+ "undefined" != typeof(customFileDef.initfun))
+{
+// init
+customFileDef.initfun(VCFile);
+}
 }
 
 // create VCFilter object from the FilterDef definition

Index: etc/config/windows/msvc-7.0.config
===
--- etc/config/windows/msvc-7.0.config(revision 570339)
+++ etc/config/windows/msvc-7.0.config(working copy)
@@ -38,6 +38,7 @@
 CXX=cl
 LD=cl
 AR=lib
+AS=ml
 
 // Use singlethreaded or mutlithreaded CRT in 11s, 11d solution 
configurations

 // 0 for MS VisualStudio .NET and MS VisualStudio .NET 2003
Index: etc/config/windows/msvc-8.0-x64.config
===
--- etc/config/windows/msvc-8.0-x64.config(revision 570339)
+++ etc/config/windows/msvc-8.0-x64.config(wo

Re: [PATCH] Use __rw_atomic_xxx() on Windows

2007-08-29 Thread Martin Sebor

One concern I have is performance. Does replacing the intrinsics with
out of line function call whose semantics the compiler has no idea
about have any impact on the runtime efficiency of the generated code?
I would be especially interested in "real life" scenarios such as the
usage of the atomic operations in basic_string.

It would be good to see some before and after numbers. If you don't
have all the platforms to run the test post your benchmark and Travis
can help you put them together.

FYI, in case you're not aware of this (it's not immediately obvious),
even though we define the full set of atomic operations (i.e., for all
integer types) the library only uses the overloads for int and long.

Martin

Farid Zaripov wrote:
 Attached is a patch, adding __rw_atomic_{add|xchg}xx() functions on 
Win32/Win64 platforms.


 ChangeLog:
 * msvc-7.0.config: Added AS config variable.
 * msvc-8.0-x64.config: Ditto.
 * filterdef.js: Added definition of the CustomFileDef class
 * projectdef.js (InitAsmTool): New function to init custom build rule 
for .asm files.

 * projects.js: Added definitions of the platform dependent files.
 * utilities.js: Read AS configuration variable from the .config file.
 * i86/atomic.asm: New file with definitions of the __rw_atomic_xxx() 
for Win32 platform.
 * i86_64/atomic.asm: New file with definitions of the __rw_atomic_xxx() 
for Windows/x64 platform.

 * _mutex.h: Removed all dependencies on InterlockedXXX() API functions.
 Use new __rw_atomic_xxx() functions instead of InterlockedXXX().
 * once.cpp [_WIN32 && _DLL]: Tell linker to export __atomic_xxx()
 functions, defined in .asm files.

Farid.




Index: etc/config/windows/filterdef.js
===
--- etc/config/windows/filterdef.js (revision 570339)
+++ etc/config/windows/filterdef.js (working copy)
@@ -25,7 +25,7 @@
 
 var sourceFilterName = "Source Files";

 var sourceFilterUuid = "{4FC737F1-C7A5-4376-A066-2A32D752A2FF}";
-var sourceFilterExts = ".cpp;.cxx;.s";
+var sourceFilterExts = ".cpp;.cxx;.s;.asm";
 
 var headerFilterName = "Header Files";

 var headerFilterUuid = "{93995380-89BD-4b04-88EB-625FBE52EBFB}";
@@ -56,6 +56,21 @@
 return str;
 }
 
+//

+// CustomFileDef class
+//
+
+// CustomFileDef .ctor
+function CustomFileDef(filepath, platform, initfun)
+{
+this.filepath = filepath;
+this.platform = platform;
+this.initfun  = initfun;
+}
+
+// global array with platform dependent files definitions
+var customFileDefs = new Array();
+
 // common macros
 var cmnMacros = new Array();
 
@@ -126,7 +141,29 @@

 var VCFile = filter.AddFile(filename);
 if (null != filetype && typeof(VCFile.FileType) != "undefined")
 VCFile.FileType = filetype;
-
+

+var customFileDef = null;
+
+if (!exclude)
+{
+// find the platform dependent file definition
+for (var i = 0; i < customFileDefs.length; ++i)
+{
+var custFileDef = customFileDefs[i];
+var pos = VCFile.FullPath.length - custFileDef.filepath.length;
+if (0 <= pos && pos == 
VCFile.FullPath.indexOf(custFileDef.filepath))
+{
+customFileDef = custFileDef;
+break;
+}
+}
+
+// exclude this file from build if current platform
+// is not custom file target platform
+if (null != customFileDef && customFileDef.platform != PLATFORM)
+exclude = true;
+}
+
 if (exclude)

 {
 var cfgs = VCFile.FileConfigurations;
@@ -144,6 +181,12 @@
 cfg.ExcludedFromBuild = exclude;
 }
 }
+else if (null != customFileDef &&
+ "undefined" != typeof(customFileDef.initfun))
+{
+// init
+customFileDef.initfun(VCFile);
+}
 }
 
 // create VCFilter object from the FilterDef definition

Index: etc/config/windows/msvc-7.0.config
===
--- etc/config/windows/msvc-7.0.config  (revision 570339)
+++ etc/config/windows/msvc-7.0.config  (working copy)
@@ -38,6 +38,7 @@
 CXX=cl
 LD=cl
 AR=lib
+AS=ml
 
 // Use singlethreaded or mutlithreaded CRT in 11s, 11d solution configurations

 // 0 for MS VisualStudio .NET and MS VisualStudio .NET 2003
Index: etc/config/windows/msvc-8.0-x64.config
===
--- etc/config/windows/msvc-8.0-x64.config  (revision 570339)
+++ etc/config/windows/msvc-8.0-x64.config  (working copy)
@@ -1,2 +1,3 @@
 #include msvc-8.0
 PLATFORM=x64
+AS=ml64
Index: etc/config/windows/projectdef.js
===
--- etc/config/windows/projectdef.js(revision 570339)
+++ etc/config/windows/projectdef.js(worki

[PATCH] Use __rw_atomic_xxx() on Windows

2007-08-28 Thread Farid Zaripov
 Attached is a patch, adding __rw_atomic_{add|xchg}xx() functions on 
Win32/Win64 platforms.


 ChangeLog:
 * msvc-7.0.config: Added AS config variable.
 * msvc-8.0-x64.config: Ditto.
 * filterdef.js: Added definition of the CustomFileDef class
 * projectdef.js (InitAsmTool): New function to init custom build rule 
for .asm files.

 * projects.js: Added definitions of the platform dependent files.
 * utilities.js: Read AS configuration variable from the .config file.
 * i86/atomic.asm: New file with definitions of the __rw_atomic_xxx() 
for Win32 platform.
 * i86_64/atomic.asm: New file with definitions of the 
__rw_atomic_xxx() for Windows/x64 platform.

 * _mutex.h: Removed all dependencies on InterlockedXXX() API functions.
 Use new __rw_atomic_xxx() functions instead of InterlockedXXX().
 * once.cpp [_WIN32 && _DLL]: Tell linker to export __atomic_xxx()
 functions, defined in .asm files.

Farid.

Index: etc/config/windows/filterdef.js
===
--- etc/config/windows/filterdef.js (revision 570339)
+++ etc/config/windows/filterdef.js (working copy)
@@ -25,7 +25,7 @@
 
 var sourceFilterName = "Source Files";
 var sourceFilterUuid = "{4FC737F1-C7A5-4376-A066-2A32D752A2FF}";
-var sourceFilterExts = ".cpp;.cxx;.s";
+var sourceFilterExts = ".cpp;.cxx;.s;.asm";
 
 var headerFilterName = "Header Files";
 var headerFilterUuid = "{93995380-89BD-4b04-88EB-625FBE52EBFB}";
@@ -56,6 +56,21 @@
 return str;
 }
 
+//
+// CustomFileDef class
+//
+
+// CustomFileDef .ctor
+function CustomFileDef(filepath, platform, initfun)
+{
+this.filepath = filepath;
+this.platform = platform;
+this.initfun  = initfun;
+}
+
+// global array with platform dependent files definitions
+var customFileDefs = new Array();
+
 // common macros
 var cmnMacros = new Array();
 
@@ -126,7 +141,29 @@
 var VCFile = filter.AddFile(filename);
 if (null != filetype && typeof(VCFile.FileType) != "undefined")
 VCFile.FileType = filetype;
-
+
+var customFileDef = null;
+
+if (!exclude)
+{
+// find the platform dependent file definition
+for (var i = 0; i < customFileDefs.length; ++i)
+{
+var custFileDef = customFileDefs[i];
+var pos = VCFile.FullPath.length - custFileDef.filepath.length;
+if (0 <= pos && pos == 
VCFile.FullPath.indexOf(custFileDef.filepath))
+{
+customFileDef = custFileDef;
+break;
+}
+}
+
+// exclude this file from build if current platform
+// is not custom file target platform
+if (null != customFileDef && customFileDef.platform != PLATFORM)
+exclude = true;
+}
+
 if (exclude)
 {
 var cfgs = VCFile.FileConfigurations;
@@ -144,6 +181,12 @@
 cfg.ExcludedFromBuild = exclude;
 }
 }
+else if (null != customFileDef &&
+ "undefined" != typeof(customFileDef.initfun))
+{
+// init
+customFileDef.initfun(VCFile);
+}
 }
 
 // create VCFilter object from the FilterDef definition
Index: etc/config/windows/msvc-7.0.config
===
--- etc/config/windows/msvc-7.0.config  (revision 570339)
+++ etc/config/windows/msvc-7.0.config  (working copy)
@@ -38,6 +38,7 @@
 CXX=cl
 LD=cl
 AR=lib
+AS=ml
 
 // Use singlethreaded or mutlithreaded CRT in 11s, 11d solution configurations
 // 0 for MS VisualStudio .NET and MS VisualStudio .NET 2003
Index: etc/config/windows/msvc-8.0-x64.config
===
--- etc/config/windows/msvc-8.0-x64.config  (revision 570339)
+++ etc/config/windows/msvc-8.0-x64.config  (working copy)
@@ -1,2 +1,3 @@
 #include msvc-8.0
 PLATFORM=x64
+AS=ml64
Index: etc/config/windows/projectdef.js
===
--- etc/config/windows/projectdef.js(revision 570339)
+++ etc/config/windows/projectdef.js(working copy)
@@ -941,3 +941,25 @@
 
 return projectDef;
 }
+
+// init custom build rule for .asm files
+function InitAsmTool(VCFile)
+{
+var cfgs = VCFile.FileConfigurations;
+for (var i = 1; i <= cfgs.Count; ++i)
+{
+var cfg = cfgs.Item(i);
+if ((typeof(cfg.Tool.ToolKind) != "undefined" &&
+cfg.Tool.ToolKind != "VCCustomBuildTool") ||
+cfg.Tool.ToolName != "Custom Build Tool")
+{
+cfg.Tool = 
cfg.ProjectConfiguration.FileTools.Item("VCCustomBuildTool");
+}
+
+var tool = cfg.Tool;
+tool.Description = "Compiling .asm file...";
+tool.Outputs = "$(IntDir)\\$(InputName).obj";
+tool.CommandLine = AS + " /c /nologo /Fo" + tool.Outputs +
+   " /W3 /Zi /Ta" + VCFile.RelativePath;
+