Re: [PATCH] Use __rw_atomic_xxx() on Windows
Travis Vitek wrote: Doh! I should know better. Here is the results from a 12d build on the same hardware. Does this mean that there is almost no difference between the intrinsic functions and the out of line ones, or that the test is too simple to demonstrate them? I expect the greatest advantage of the intrinsics over ordinary out-of-line functions to be that they (might) make it possible for the optimizer to generate better code *in certain contexts* depending on from where they are called. This is going to be hard to demonstrate in a simple test case. I suspect we would need a more realistic test with a number of different uses of string (and the atomic functions) to get some idea of how much they might help. Martin normal patched -- 1 threads -- 1 threads ms934 ms 1015 ms/op 0.5567 ms/op 0.6050 -- 2 threads -- 2 threads ms 6049 ms 6266 ms/op 0.00036055 ms/op 0.00037348 -- 4 threads -- 4 threads ms 11948 ms 11813 ms/op 0.00071216 ms/op 0.00070411 -- 8 threads -- 8 threads ms 23855 ms 24743 ms/op 0.00142187 ms/op 0.00147480 Martin Sebor wrote: 8d is not thread-safe so the atomic function templates should be implemented in terms of ordinary increments and decrements (if they aren't it's a bug). They should only expand to the atomic assembly (or the Win32 Interlocked) functions in 12X and 15X build types. Martin
RE: [PATCH] Use __rw_atomic_xxx() on Windows
Doh! I should know better. Here is the results from a 12d build on the same hardware. normal patched -- 1 threads -- 1 threads ms934 ms 1015 ms/op 0.5567 ms/op 0.6050 -- 2 threads -- 2 threads ms 6049 ms 6266 ms/op 0.00036055 ms/op 0.00037348 -- 4 threads -- 4 threads ms 11948 ms 11813 ms/op 0.00071216 ms/op 0.00070411 -- 8 threads -- 8 threads ms 23855 ms 24743 ms/op 0.00142187 ms/op 0.00147480 Martin Sebor wrote: > >8d is not thread-safe so the atomic function templates should >be implemented in terms of ordinary increments and decrements >(if they aren't it's a bug). They should only expand to the >atomic assembly (or the Win32 Interlocked) functions in 12X >and 15X build types. > >Martin >
Re: [PATCH] Use __rw_atomic_xxx() on Windows
Farid Zaripov wrote: -Original Message- From: Martin Sebor [mailto:[EMAIL PROTECTED] Sent: Thursday, September 06, 2007 5:49 AM To: stdcxx-dev@incubator.apache.org Subject: Re: [PATCH] Use __rw_atomic_xxx() on Windows Travis Vitek wrote: Oh, yeah. that is the other thing that I did Friday. I wrote a testcase to compare __rw_atomic_add32() against InterlockedIncrement() on Win32. There is a performance penalty... I'd be curious to know if the performance penalty is due to the function call overhead or something else. In any case though, I think we could tweak the patch and change the __rw_atomic_pre{de,in}crement() overloads for int and long to call the appropriate Interlocked{De,In}crement() intrinsics and have the other overloads use the new ones. Farid, what do you think about this approach? I agree. And I decided to make the changes above without making the tests to see the performance penalty. Okay, that sounds like a good approach to me. You can still commit the int and long atomic functions, we just won't call them. Martin
RE: [PATCH] Use __rw_atomic_xxx() on Windows
> -Original Message- > From: Martin Sebor [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 06, 2007 5:49 AM > To: stdcxx-dev@incubator.apache.org > Subject: Re: [PATCH] Use __rw_atomic_xxx() on Windows > > Travis Vitek wrote: > > Oh, yeah. that is the other thing that I did Friday. I wrote a > > testcase to compare __rw_atomic_add32() against > InterlockedIncrement() on Win32. > > There is a performance penalty... > > I'd be curious to know if the performance penalty is due to > the function call overhead or something else. > > In any case though, I think we could tweak the patch and > change the __rw_atomic_pre{de,in}crement() overloads for int > and long to call the appropriate Interlocked{De,In}crement() > intrinsics and have the other overloads use the new ones. > > Farid, what do you think about this approach? I agree. And I decided to make the changes above without making the tests to see the performance penalty. Farid.
Re: [PATCH] Use __rw_atomic_xxx() on Windows
Travis Vitek wrote: Since we don't have a string perf test that I could find, I wrote up a quick and dirty one that just made many copies of the same string repeatedly to exercise the atomic increment/decrement. The results show a 3% performance penalty when using the newer atomic functions. This test was run with an 8d configuration, so the atomic functions were compiled into the stdcxx dll. The test hardware is a Lenovo T60p [Intel Core 2 T7600 2.33GHz CPU, 2GB RAM]. 8d is not thread-safe so the atomic function templates should be implemented in terms of ordinary increments and decrements (if they aren't it's a bug). They should only expand to the atomic assembly (or the Win32 Interlocked) functions in 12X and 15X build types. Martin Oldnew [patched] -- 1 threads -- 1 threads ms714 ms737 ms/op 0.4256 ms/op 0.4393 -- 2 threads -- 2 threads ms 3911 ms 4024 ms/op 0.00023311 ms/op 0.00023985 -- 4 threads -- 4 threads ms 7660 ms 7865 ms/op 0.00045657 ms/op 0.00046879 -- 8 threads -- 8 threads ms 15192 ms 15585 ms/op 0.00090551 ms/op 0.00092894 I'm wondering if we used inline assembly for the __rw_atomic_* functions if the cost would be reduced. We could also evaluate the intrinsic pragma that is available on MSVC. Travis -Original Message- I will do a quick run using the string performance test after lunch. I'll report the results on that later. I've pasted the source for the bulk of my test below. If someone wants the entire thing, let me know and I'll provide everything. Travis
Re: [PATCH] Use __rw_atomic_xxx() on Windows
Travis Vitek wrote: Oh, yeah. that is the other thing that I did Friday. I wrote a testcase to compare __rw_atomic_add32() against InterlockedIncrement() on Win32. There is a performance penalty... I'd be curious to know if the performance penalty is due to the function call overhead or something else. In any case though, I think we could tweak the patch and change the __rw_atomic_pre{de,in}crement() overloads for int and long to call the appropriate Interlocked{De,In}crement() intrinsics and have the other overloads use the new ones. Farid, what do you think about this approach? Martin C:\Temp>t 2 && t 4 && t 8 -- locked inc atomic_add 2 threads ms 42664469 ms/op 0.3178 0.3330 -4.7586% thr ms 18117 18437 thr ms/op 0.00013498 0.00013737 -1.7663% -- locked inc atomic_add 4 threads ms 79698609 ms/op 0.5937 0.6414 -8.0311% thr ms 36359 37019 thr ms/op 0.00027090 0.00027581 -1.8152% -- locked inc atomic_add 8 threads ms 50165484 ms/op 0.3737 0.4086 -9.3301% thr ms 60846 66130 thr ms/op 0.00045334 0.00049271 -8.6842% C:\Temp>t 2 && t 4 && t 8 -- locked inc atomic_add 2 threads ms 27812906 ms/op 0.2072 0.2165 -4.4948% thr ms 14961 16093 thr ms/op 0.00011147 0.00011990 -7.5663% -- locked inc atomic_add 4 threads ms 27812891 ms/op 0.2072 0.2154 -3.9554% thr ms 30867 31328 thr ms/op 0.00022998 0.00023341 -1.4935% -- locked inc atomic_add 8 threads ms 27822890 ms/op 0.2073 0.2153 -3.8821% thr ms 64318 64341 thr ms/op 0.00047921 0.00047938 -0.0358% I will do a quick run using the string performance test after lunch. I'll report the results on that later. I've pasted the source for the bulk of my test below. If someone wants the entire thing, let me know and I'll provide everything. Travis Martin Sebor wrote: Subject: Re: [PATCH] Use __rw_atomic_xxx() on Windows What's the status of this? We need to decide if we can put this in 4.2 or defer it for 4.2.1. To put it in 4.2 we need to make sure the new functions don't cause a performance regression in basic_string. I.e., we need to see the before and after numbers. Martin Martin Sebor wrote: One concern I have is performance. Does replacing the intrinsics with out of line function call whose semantics the compiler has no idea about have any impact on the runtime efficiency of the generated code? I would be especially interested in "real life" scenarios such as the usage of the atomic operations in basic_string. It would be good to see some before and after numbers. If you don't have all the platforms to run the test post your benchmark and Travis can help you put them together. #include #include #define WIN32_LEAN_AND_MEAN #include #include #include "lib.h" #define MIN_THREADS 2 #define MAX_THREADS 16 unsigned long locked_inc(long* val, long iters) { const unsigned long t0 = GetTickCount (); long n; for (n = 0; n < iters; ++n) { InterlockedIncrement(val); } const unsigned long t1 = GetTickCount (); return (t1 - t0); } unsigned long atomic_add(long* val, long iters) { const unsigned long t0 = GetTickCount (); long n; for (n = 0; n < iters; ++n) { __rw_atomic_add32(val, 1); } const unsigned long t1 = GetTickCount (); return (t1 - t0); } struct thread_param { // atomic variable long* variable; // number of iterations long iters; // function to invoke unsigned long (*fun)(long*, long); // result of function unsigned long result; // thread handle used by main thread HANDLE thread; }; extern "C" { void thread_func(void* p) { thread_param* param = (thread_param*)p; param->result = (param->fun)(param->variable, param->iters); } } // extern "C" unsigned long run_threads(int nthreads, unsigned long (*fun)(long*, long), long iters) { thread_param params[MAX_THREADS]; long thread_var = 0; int i; for (i = 0; i < nthreads; ++i) { params[i].variable = &thread_var; params[i].result = 0; params[i].fun = fun; params[i].iters= iters; } int n; for (n = 0; n < nthreads; ++n) { params[n].thread = (HANDLE)_beginthread(thread_func, 0, ¶ms
RE: [PATCH] Use __rw_atomic_xxx() on Windows
Since we don't have a string perf test that I could find, I wrote up a quick and dirty one that just made many copies of the same string repeatedly to exercise the atomic increment/decrement. The results show a 3% performance penalty when using the newer atomic functions. This test was run with an 8d configuration, so the atomic functions were compiled into the stdcxx dll. The test hardware is a Lenovo T60p [Intel Core 2 T7600 2.33GHz CPU, 2GB RAM]. Oldnew [patched] -- 1 threads -- 1 threads ms714 ms737 ms/op 0.4256 ms/op 0.4393 -- 2 threads -- 2 threads ms 3911 ms 4024 ms/op 0.00023311 ms/op 0.00023985 -- 4 threads -- 4 threads ms 7660 ms 7865 ms/op 0.00045657 ms/op 0.00046879 -- 8 threads -- 8 threads ms 15192 ms 15585 ms/op 0.00090551 ms/op 0.00092894 I'm wondering if we used inline assembly for the __rw_atomic_* functions if the cost would be reduced. We could also evaluate the intrinsic pragma that is available on MSVC. Travis >-Original Message- > >I will do a quick run using the string performance test after lunch. >I'll report the results on that later. I've pasted the source for the >bulk of my test below. If someone wants the entire thing, let me know >and I'll provide everything. > >Travis >
RE: [PATCH] Use __rw_atomic_xxx() on Windows
Oh, yeah. that is the other thing that I did Friday. I wrote a testcase to compare __rw_atomic_add32() against InterlockedIncrement() on Win32. There is a performance penalty... C:\Temp>t 2 && t 4 && t 8 -- locked inc atomic_add 2 threads ms 42664469 ms/op 0.3178 0.3330 -4.7586% thr ms 18117 18437 thr ms/op 0.00013498 0.00013737 -1.7663% -- locked inc atomic_add 4 threads ms 79698609 ms/op 0.5937 0.6414 -8.0311% thr ms 36359 37019 thr ms/op 0.00027090 0.00027581 -1.8152% -- locked inc atomic_add 8 threads ms 50165484 ms/op 0.3737 0.4086 -9.3301% thr ms 60846 66130 thr ms/op 0.00045334 0.00049271 -8.6842% C:\Temp>t 2 && t 4 && t 8 -- locked inc atomic_add 2 threads ms 27812906 ms/op 0.2072 0.2165 -4.4948% thr ms 14961 16093 thr ms/op 0.00011147 0.00011990 -7.5663% -- locked inc atomic_add 4 threads ms 27812891 ms/op 0.2072 0.2154 -3.9554% thr ms 30867 31328 thr ms/op 0.00022998 0.00023341 -1.4935% -- locked inc atomic_add 8 threads ms 27822890 ms/op 0.2073 0.2153 -3.8821% thr ms 64318 64341 thr ms/op 0.00047921 0.00047938 -0.0358% I will do a quick run using the string performance test after lunch. I'll report the results on that later. I've pasted the source for the bulk of my test below. If someone wants the entire thing, let me know and I'll provide everything. Travis Martin Sebor wrote: >Subject: Re: [PATCH] Use __rw_atomic_xxx() on Windows > >What's the status of this? We need to decide if we can put this >in 4.2 or defer it for 4.2.1. To put it in 4.2 we need to make >sure the new functions don't cause a performance regression in >basic_string. I.e., we need to see the before and after numbers. > >Martin > >Martin Sebor wrote: >> >> One concern I have is performance. Does replacing the intrinsics with >> out of line function call whose semantics the compiler has no idea >> about have any impact on the runtime efficiency of the >generated code? >> I would be especially interested in "real life" scenarios such as the >> usage of the atomic operations in basic_string. >> >> It would be good to see some before and after numbers. If you don't >> have all the platforms to run the test post your benchmark and Travis >> can help you put them together. > #include #include #define WIN32_LEAN_AND_MEAN #include #include #include "lib.h" #define MIN_THREADS 2 #define MAX_THREADS 16 unsigned long locked_inc(long* val, long iters) { const unsigned long t0 = GetTickCount (); long n; for (n = 0; n < iters; ++n) { InterlockedIncrement(val); } const unsigned long t1 = GetTickCount (); return (t1 - t0); } unsigned long atomic_add(long* val, long iters) { const unsigned long t0 = GetTickCount (); long n; for (n = 0; n < iters; ++n) { __rw_atomic_add32(val, 1); } const unsigned long t1 = GetTickCount (); return (t1 - t0); } struct thread_param { // atomic variable long* variable; // number of iterations long iters; // function to invoke unsigned long (*fun)(long*, long); // result of function unsigned long result; // thread handle used by main thread HANDLE thread; }; extern "C" { void thread_func(void* p) { thread_param* param = (thread_param*)p; param->result = (param->fun)(param->variable, param->iters); } } // extern "C" unsigned long run_threads(int nthreads, unsigned long (*fun)(long*, long), long iters) { thread_param params[MAX_THREADS]; long thread_var = 0; int i; for (i = 0; i < nthreads; ++i) { params[i].variable = &thread_var; params[i].result = 0; params[i].fun = fun; params[i].iters= iters; } int n; for (n = 0; n < nthreads; ++n) { params[n].thread = (HANDLE)_beginthread(thread_func, 0, ¶ms[n]); } unsigned long thread_time = 0; for (n = 0; n < nthreads; ++n) { WaitForSingleObject (params[n].thread, INFINITE); thread_time += params[n].result; } return thread_time; } int main(int argc, char* argv[]) { int nthreads = MIN_THREADS; if (1 < a
Re: [PATCH] Use __rw_atomic_xxx() on Windows
What's the status of this? We need to decide if we can put this in 4.2 or defer it for 4.2.1. To put it in 4.2 we need to make sure the new functions don't cause a performance regression in basic_string. I.e., we need to see the before and after numbers. Martin Martin Sebor wrote: One concern I have is performance. Does replacing the intrinsics with out of line function call whose semantics the compiler has no idea about have any impact on the runtime efficiency of the generated code? I would be especially interested in "real life" scenarios such as the usage of the atomic operations in basic_string. It would be good to see some before and after numbers. If you don't have all the platforms to run the test post your benchmark and Travis can help you put them together. FYI, in case you're not aware of this (it's not immediately obvious), even though we define the full set of atomic operations (i.e., for all integer types) the library only uses the overloads for int and long. Martin Farid Zaripov wrote: Attached is a patch, adding __rw_atomic_{add|xchg}xx() functions on Win32/Win64 platforms. ChangeLog: * msvc-7.0.config: Added AS config variable. * msvc-8.0-x64.config: Ditto. * filterdef.js: Added definition of the CustomFileDef class * projectdef.js (InitAsmTool): New function to init custom build rule for .asm files. * projects.js: Added definitions of the platform dependent files. * utilities.js: Read AS configuration variable from the .config file. * i86/atomic.asm: New file with definitions of the __rw_atomic_xxx() for Win32 platform. * i86_64/atomic.asm: New file with definitions of the __rw_atomic_xxx() for Windows/x64 platform. * _mutex.h: Removed all dependencies on InterlockedXXX() API functions. Use new __rw_atomic_xxx() functions instead of InterlockedXXX(). * once.cpp [_WIN32 && _DLL]: Tell linker to export __atomic_xxx() functions, defined in .asm files. Farid. Index: etc/config/windows/filterdef.js === --- etc/config/windows/filterdef.js(revision 570339) +++ etc/config/windows/filterdef.js(working copy) @@ -25,7 +25,7 @@ var sourceFilterName = "Source Files"; var sourceFilterUuid = "{4FC737F1-C7A5-4376-A066-2A32D752A2FF}"; -var sourceFilterExts = ".cpp;.cxx;.s"; +var sourceFilterExts = ".cpp;.cxx;.s;.asm"; var headerFilterName = "Header Files"; var headerFilterUuid = "{93995380-89BD-4b04-88EB-625FBE52EBFB}"; @@ -56,6 +56,21 @@ return str; } +// +// CustomFileDef class +// + +// CustomFileDef .ctor +function CustomFileDef(filepath, platform, initfun) +{ +this.filepath = filepath; +this.platform = platform; +this.initfun = initfun; +} + +// global array with platform dependent files definitions +var customFileDefs = new Array(); + // common macros var cmnMacros = new Array(); @@ -126,7 +141,29 @@ var VCFile = filter.AddFile(filename); if (null != filetype && typeof(VCFile.FileType) != "undefined") VCFile.FileType = filetype; -+ +var customFileDef = null; + +if (!exclude) +{ +// find the platform dependent file definition +for (var i = 0; i < customFileDefs.length; ++i) +{ +var custFileDef = customFileDefs[i]; +var pos = VCFile.FullPath.length - custFileDef.filepath.length; +if (0 <= pos && pos == VCFile.FullPath.indexOf(custFileDef.filepath)) +{ +customFileDef = custFileDef; +break; +} +} + +// exclude this file from build if current platform +// is not custom file target platform +if (null != customFileDef && customFileDef.platform != PLATFORM) +exclude = true; +} + if (exclude) { var cfgs = VCFile.FileConfigurations; @@ -144,6 +181,12 @@ cfg.ExcludedFromBuild = exclude; } } +else if (null != customFileDef && + "undefined" != typeof(customFileDef.initfun)) +{ +// init +customFileDef.initfun(VCFile); +} } // create VCFilter object from the FilterDef definition Index: etc/config/windows/msvc-7.0.config === --- etc/config/windows/msvc-7.0.config(revision 570339) +++ etc/config/windows/msvc-7.0.config(working copy) @@ -38,6 +38,7 @@ CXX=cl LD=cl AR=lib +AS=ml // Use singlethreaded or mutlithreaded CRT in 11s, 11d solution configurations // 0 for MS VisualStudio .NET and MS VisualStudio .NET 2003 Index: etc/config/windows/msvc-8.0-x64.config === --- etc/config/windows/msvc-8.0-x64.config(revision 570339) +++ etc/config/windows/msvc-8.0-x64.config(wo
Re: [PATCH] Use __rw_atomic_xxx() on Windows
One concern I have is performance. Does replacing the intrinsics with out of line function call whose semantics the compiler has no idea about have any impact on the runtime efficiency of the generated code? I would be especially interested in "real life" scenarios such as the usage of the atomic operations in basic_string. It would be good to see some before and after numbers. If you don't have all the platforms to run the test post your benchmark and Travis can help you put them together. FYI, in case you're not aware of this (it's not immediately obvious), even though we define the full set of atomic operations (i.e., for all integer types) the library only uses the overloads for int and long. Martin Farid Zaripov wrote: Attached is a patch, adding __rw_atomic_{add|xchg}xx() functions on Win32/Win64 platforms. ChangeLog: * msvc-7.0.config: Added AS config variable. * msvc-8.0-x64.config: Ditto. * filterdef.js: Added definition of the CustomFileDef class * projectdef.js (InitAsmTool): New function to init custom build rule for .asm files. * projects.js: Added definitions of the platform dependent files. * utilities.js: Read AS configuration variable from the .config file. * i86/atomic.asm: New file with definitions of the __rw_atomic_xxx() for Win32 platform. * i86_64/atomic.asm: New file with definitions of the __rw_atomic_xxx() for Windows/x64 platform. * _mutex.h: Removed all dependencies on InterlockedXXX() API functions. Use new __rw_atomic_xxx() functions instead of InterlockedXXX(). * once.cpp [_WIN32 && _DLL]: Tell linker to export __atomic_xxx() functions, defined in .asm files. Farid. Index: etc/config/windows/filterdef.js === --- etc/config/windows/filterdef.js (revision 570339) +++ etc/config/windows/filterdef.js (working copy) @@ -25,7 +25,7 @@ var sourceFilterName = "Source Files"; var sourceFilterUuid = "{4FC737F1-C7A5-4376-A066-2A32D752A2FF}"; -var sourceFilterExts = ".cpp;.cxx;.s"; +var sourceFilterExts = ".cpp;.cxx;.s;.asm"; var headerFilterName = "Header Files"; var headerFilterUuid = "{93995380-89BD-4b04-88EB-625FBE52EBFB}"; @@ -56,6 +56,21 @@ return str; } +// +// CustomFileDef class +// + +// CustomFileDef .ctor +function CustomFileDef(filepath, platform, initfun) +{ +this.filepath = filepath; +this.platform = platform; +this.initfun = initfun; +} + +// global array with platform dependent files definitions +var customFileDefs = new Array(); + // common macros var cmnMacros = new Array(); @@ -126,7 +141,29 @@ var VCFile = filter.AddFile(filename); if (null != filetype && typeof(VCFile.FileType) != "undefined") VCFile.FileType = filetype; - + +var customFileDef = null; + +if (!exclude) +{ +// find the platform dependent file definition +for (var i = 0; i < customFileDefs.length; ++i) +{ +var custFileDef = customFileDefs[i]; +var pos = VCFile.FullPath.length - custFileDef.filepath.length; +if (0 <= pos && pos == VCFile.FullPath.indexOf(custFileDef.filepath)) +{ +customFileDef = custFileDef; +break; +} +} + +// exclude this file from build if current platform +// is not custom file target platform +if (null != customFileDef && customFileDef.platform != PLATFORM) +exclude = true; +} + if (exclude) { var cfgs = VCFile.FileConfigurations; @@ -144,6 +181,12 @@ cfg.ExcludedFromBuild = exclude; } } +else if (null != customFileDef && + "undefined" != typeof(customFileDef.initfun)) +{ +// init +customFileDef.initfun(VCFile); +} } // create VCFilter object from the FilterDef definition Index: etc/config/windows/msvc-7.0.config === --- etc/config/windows/msvc-7.0.config (revision 570339) +++ etc/config/windows/msvc-7.0.config (working copy) @@ -38,6 +38,7 @@ CXX=cl LD=cl AR=lib +AS=ml // Use singlethreaded or mutlithreaded CRT in 11s, 11d solution configurations // 0 for MS VisualStudio .NET and MS VisualStudio .NET 2003 Index: etc/config/windows/msvc-8.0-x64.config === --- etc/config/windows/msvc-8.0-x64.config (revision 570339) +++ etc/config/windows/msvc-8.0-x64.config (working copy) @@ -1,2 +1,3 @@ #include msvc-8.0 PLATFORM=x64 +AS=ml64 Index: etc/config/windows/projectdef.js === --- etc/config/windows/projectdef.js(revision 570339) +++ etc/config/windows/projectdef.js(worki
[PATCH] Use __rw_atomic_xxx() on Windows
Attached is a patch, adding __rw_atomic_{add|xchg}xx() functions on Win32/Win64 platforms. ChangeLog: * msvc-7.0.config: Added AS config variable. * msvc-8.0-x64.config: Ditto. * filterdef.js: Added definition of the CustomFileDef class * projectdef.js (InitAsmTool): New function to init custom build rule for .asm files. * projects.js: Added definitions of the platform dependent files. * utilities.js: Read AS configuration variable from the .config file. * i86/atomic.asm: New file with definitions of the __rw_atomic_xxx() for Win32 platform. * i86_64/atomic.asm: New file with definitions of the __rw_atomic_xxx() for Windows/x64 platform. * _mutex.h: Removed all dependencies on InterlockedXXX() API functions. Use new __rw_atomic_xxx() functions instead of InterlockedXXX(). * once.cpp [_WIN32 && _DLL]: Tell linker to export __atomic_xxx() functions, defined in .asm files. Farid. Index: etc/config/windows/filterdef.js === --- etc/config/windows/filterdef.js (revision 570339) +++ etc/config/windows/filterdef.js (working copy) @@ -25,7 +25,7 @@ var sourceFilterName = "Source Files"; var sourceFilterUuid = "{4FC737F1-C7A5-4376-A066-2A32D752A2FF}"; -var sourceFilterExts = ".cpp;.cxx;.s"; +var sourceFilterExts = ".cpp;.cxx;.s;.asm"; var headerFilterName = "Header Files"; var headerFilterUuid = "{93995380-89BD-4b04-88EB-625FBE52EBFB}"; @@ -56,6 +56,21 @@ return str; } +// +// CustomFileDef class +// + +// CustomFileDef .ctor +function CustomFileDef(filepath, platform, initfun) +{ +this.filepath = filepath; +this.platform = platform; +this.initfun = initfun; +} + +// global array with platform dependent files definitions +var customFileDefs = new Array(); + // common macros var cmnMacros = new Array(); @@ -126,7 +141,29 @@ var VCFile = filter.AddFile(filename); if (null != filetype && typeof(VCFile.FileType) != "undefined") VCFile.FileType = filetype; - + +var customFileDef = null; + +if (!exclude) +{ +// find the platform dependent file definition +for (var i = 0; i < customFileDefs.length; ++i) +{ +var custFileDef = customFileDefs[i]; +var pos = VCFile.FullPath.length - custFileDef.filepath.length; +if (0 <= pos && pos == VCFile.FullPath.indexOf(custFileDef.filepath)) +{ +customFileDef = custFileDef; +break; +} +} + +// exclude this file from build if current platform +// is not custom file target platform +if (null != customFileDef && customFileDef.platform != PLATFORM) +exclude = true; +} + if (exclude) { var cfgs = VCFile.FileConfigurations; @@ -144,6 +181,12 @@ cfg.ExcludedFromBuild = exclude; } } +else if (null != customFileDef && + "undefined" != typeof(customFileDef.initfun)) +{ +// init +customFileDef.initfun(VCFile); +} } // create VCFilter object from the FilterDef definition Index: etc/config/windows/msvc-7.0.config === --- etc/config/windows/msvc-7.0.config (revision 570339) +++ etc/config/windows/msvc-7.0.config (working copy) @@ -38,6 +38,7 @@ CXX=cl LD=cl AR=lib +AS=ml // Use singlethreaded or mutlithreaded CRT in 11s, 11d solution configurations // 0 for MS VisualStudio .NET and MS VisualStudio .NET 2003 Index: etc/config/windows/msvc-8.0-x64.config === --- etc/config/windows/msvc-8.0-x64.config (revision 570339) +++ etc/config/windows/msvc-8.0-x64.config (working copy) @@ -1,2 +1,3 @@ #include msvc-8.0 PLATFORM=x64 +AS=ml64 Index: etc/config/windows/projectdef.js === --- etc/config/windows/projectdef.js(revision 570339) +++ etc/config/windows/projectdef.js(working copy) @@ -941,3 +941,25 @@ return projectDef; } + +// init custom build rule for .asm files +function InitAsmTool(VCFile) +{ +var cfgs = VCFile.FileConfigurations; +for (var i = 1; i <= cfgs.Count; ++i) +{ +var cfg = cfgs.Item(i); +if ((typeof(cfg.Tool.ToolKind) != "undefined" && +cfg.Tool.ToolKind != "VCCustomBuildTool") || +cfg.Tool.ToolName != "Custom Build Tool") +{ +cfg.Tool = cfg.ProjectConfiguration.FileTools.Item("VCCustomBuildTool"); +} + +var tool = cfg.Tool; +tool.Description = "Compiling .asm file..."; +tool.Outputs = "$(IntDir)\\$(InputName).obj"; +tool.CommandLine = AS + " /c /nologo /Fo" + tool.Outputs + + " /W3 /Zi /Ta" + VCFile.RelativePath; +