Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On 8/3/2013 3:28 PM, Jonathan M Davis wrote: On Saturday, August 03, 2013 14:55:29 Walter Bright wrote: This is for testing porpoises, and of course for those that Feel Da Need For Speed. But what if I prefer to test dolphins? ;) They all look alike anyway, what's the difference?
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
Am 05.08.2013 19:52, schrieb Walter Bright: On 8/5/2013 4:01 AM, Richard Webb wrote: Using the latest DMD and this snn.lib, i'm seeing it take about 11.5 seconds to compile the algorithm unit tests (when i tried it last week, it was taking closer to 17 seconds). For comparison, the MSVC build takes about 10 seconds on the same machine (Athlon 64X2 6000+). Keep up the good work :-) So I guess the DMC code generator isn't as awful as has been assumed! This is hardly the first time the culprit was a library routine, not the code generator. don't start the party to early there are still 1.5 seconds left :)
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On 05/08/2013 18:52, Walter Bright wrote: This is hardly the first time the culprit was a library routine It's possible that other library routines are causing some of the remaining difference from the MSVC build (e.g. the profiler suggests that the DMC build spends somewhat more time inside memcpy than the MSVC build). Not sure if it's down to implementation or optimization though - might be down to intrinsics/inlining and such? (the proflie for the DMC build says it's using ~1% of its time inside strlen and the profile for the MSVC build doesn't mention it at all, which i guess is because it's using an intrinsic version).
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On 8/6/2013 5:13 AM, Richard Webb wrote: It's possible that other library routines are causing some of the remaining difference from the MSVC build (e.g. the profiler suggests that the DMC build spends somewhat more time inside memcpy than the MSVC build). Not sure if it's down to implementation or optimization though - might be down to intrinsics/inlining and such? (the proflie for the DMC build says it's using ~1% of its time inside strlen and the profile for the MSVC build doesn't mention it at all, which i guess is because it's using an intrinsic version). If it's inlined then it won't show up in the profile. And yes, it's possible MSVC has a faster memcpy(). After all, enormous effort has been poured into memcpy().
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On Tuesday, 6 August 2013 at 17:48:57 UTC, Walter Bright wrote: On 8/6/2013 5:13 AM, Richard Webb wrote: It's possible that other library routines are causing some of the remaining difference from the MSVC build (e.g. the profiler suggests that the DMC build spends somewhat more time inside memcpy than the MSVC build). Not sure if it's down to implementation or optimization though - might be down to intrinsics/inlining and such? (the proflie for the DMC build says it's using ~1% of its time inside strlen and the profile for the MSVC build doesn't mention it at all, which i guess is because it's using an intrinsic version). If it's inlined then it won't show up in the profile. And yes, it's possible MSVC has a faster memcpy(). After all, enormous effort has been poured into memcpy(). If you use a profiler with line or instruction granularity (like perf on Linux), it will show up. On Windows, that would probably be VTune and CodeAnalyst.
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On Tuesday, 6 August 2013 at 18:38:43 UTC, Kiith-Sa wrote: On Tuesday, 6 August 2013 at 17:48:57 UTC, Walter Bright wrote: On 8/6/2013 5:13 AM, Richard Webb wrote: It's possible that other library routines are causing some of the remaining difference from the MSVC build (e.g. the profiler suggests that the DMC build spends somewhat more time inside memcpy than the MSVC build). Not sure if it's down to implementation or optimization though - might be down to intrinsics/inlining and such? (the proflie for the DMC build says it's using ~1% of its time inside strlen and the profile for the MSVC build doesn't mention it at all, which i guess is because it's using an intrinsic version). If it's inlined then it won't show up in the profile. And yes, it's possible MSVC has a faster memcpy(). After all, enormous effort has been poured into memcpy(). If you use a profiler with line or instruction granularity (like perf on Linux), it will show up. On Windows, that would probably be VTune and CodeAnalyst. (obviously, as a part of the function it was inlined into, but you'll get the time consumed at lines/instructions from the inlined function)
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On 03/08/2013 22:55, Walter Bright wrote: The execrable existing implementation was scrapped, and the new one uses Windows HeapAlloc(). http://ftp.digitalmars.com/snn.lib This is for testing porpoises, and of course for those that Feel Da Need For Speed. Using the latest DMD and this snn.lib, i'm seeing it take about 11.5 seconds to compile the algorithm unit tests (when i tried it last week, it was taking closer to 17 seconds). For comparison, the MSVC build takes about 10 seconds on the same machine (Athlon 64X2 6000+). Keep up the good work :-)
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
Am 04.08.2013 11:28, schrieb Denis Shelomovskij: 04.08.2013 1:55, Walter Bright пОÑеÑ: The execrable existing implementation was scrapped, and the new one uses Windows HeapAlloc(). http://ftp.digitalmars.com/snn.lib This is for testing porpoises, and of course for those that Feel Da Need For Speed. So I suppose you use `HeapFree` too? Please, be sure you use this Windows API BOOL/BOOLEAN bug workaround: https://github.com/denis-sh/phobos-additions/blob/e061d1ad282b4793d1c75dfcc20962b99ec842df/unstd/windows/heap.d#L178 but please without using two ifs and GetVersion on every free call
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On 8/5/2013 4:01 AM, Richard Webb wrote: Using the latest DMD and this snn.lib, i'm seeing it take about 11.5 seconds to compile the algorithm unit tests (when i tried it last week, it was taking closer to 17 seconds). For comparison, the MSVC build takes about 10 seconds on the same machine (Athlon 64X2 6000+). Keep up the good work :-) So I guess the DMC code generator isn't as awful as has been assumed! This is hardly the first time the culprit was a library routine, not the code generator.
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On Sunday, 4 August 2013 at 09:28:11 UTC, Denis Shelomovskij wrote: So I suppose you use `HeapFree` too? Please, be sure you use this Windows API BOOL/BOOLEAN bug workaround: https://github.com/denis-sh/phobos-additions/blob/e061d1ad282b4793d1c75dfcc20962b99ec842df/unstd/windows/heap.d#L178 BOOLEAN is either TRUE or FALSE, so it should be ok to check only the least significant byte.
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On Monday, 5 August 2013 at 21:42:11 UTC, Kagamin wrote: On Sunday, 4 August 2013 at 09:28:11 UTC, Denis Shelomovskij wrote: So I suppose you use `HeapFree` too? Please, be sure you use this Windows API BOOL/BOOLEAN bug workaround: https://github.com/denis-sh/phobos-additions/blob/e061d1ad282b4793d1c75dfcc20962b99ec842df/unstd/windows/heap.d#L178 BOOLEAN is either TRUE or FALSE, so it should be ok to check only the least significant byte. Not in Windows: typedef BYTE BOOLEAN; typedef int BOOL; (c) http://msdn.microsoft.com/en-us/library/windows/desktop/aa383751%28v=vs.85%29.aspx While ideally it should be TRUE or FALSE, sometimes it isn't. In fact, for functions that return BOOL, MSDN states the following: If the function succeeds, the return value is nonzero.
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
Am 03.08.2013 23:55, schrieb Walter Bright: The execrable existing implementation was scrapped, and the new one uses Windows HeapAlloc(). http://ftp.digitalmars.com/snn.lib This is for testing porpoises, and of course for those that Feel Da Need For Speed. ever tested nedmalloc (http://www.nedprod.com/programs/portable/nedmalloc/) or other malloc allocators?
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On 8/3/2013 11:07 PM, dennis luehring wrote: ever tested nedmalloc (http://www.nedprod.com/programs/portable/nedmalloc/) or other malloc allocators? No, I haven't.
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On Sunday, 4 August 2013 at 06:07:54 UTC, dennis luehring wrote: ever tested nedmalloc (http://www.nedprod.com/programs/portable/nedmalloc/) or other malloc allocators? Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain state-of-the-art allocators and no third party allocator is likely to significantly improve on them in real world results. So there may be minimal returns from incorporating nedmalloc on modern OS's ... ?
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On 8/4/2013 12:19 AM, Joseph Rushton Wakeling wrote: On Sunday, 4 August 2013 at 06:07:54 UTC, dennis luehring wrote: ever tested nedmalloc (http://www.nedprod.com/programs/portable/nedmalloc/) or other malloc allocators? Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain state-of-the-art allocators and no third party allocator is likely to significantly improve on them in real world results. So there may be minimal returns from incorporating nedmalloc on modern OS's ... ? As I wrote earlier, Microsoft has enormous incentive to make Heap as fast as possible, as it will pay dividends for every Microsoft software product and software designed for Windows. I'm sure the engineers there know all about the various strategies available on the intarnets. Why not take advantage of their work?
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
Am 04.08.2013 09:35, schrieb Walter Bright: On 8/4/2013 12:19 AM, Joseph Rushton Wakeling wrote: On Sunday, 4 August 2013 at 06:07:54 UTC, dennis luehring wrote: ever tested nedmalloc (http://www.nedprod.com/programs/portable/nedmalloc/) or other malloc allocators? Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain state-of-the-art allocators and no third party allocator is likely to significantly improve on them in real world results. So there may be minimal returns from incorporating nedmalloc on modern OS's ... ? As I wrote earlier, Microsoft has enormous incentive to make Heap as fast as possible, as it will pay dividends for every Microsoft software product and software designed for Windows. I'm sure the engineers there know all about the various strategies available on the intarnets. Why not take advantage of their work? HeapAlloc is a forwarder to RtlHeapAlloc and C++ new does call RtlHeapAlloc directly - would it be better to use this kernel32 api directly? (maybe if used in druntime to reduce dll dependencies)
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On 8/4/2013 12:53 AM, dennis luehring wrote: HeapAlloc is a forwarder to RtlHeapAlloc and C++ new does call RtlHeapAlloc directly - would it be better to use this kernel32 api directly? (maybe if used in druntime to reduce dll dependencies) I can't find any documentation on RtlHeapAlloc.
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
04.08.2013 11:53, dennis luehring пишет: Am 04.08.2013 09:35, schrieb Walter Bright: On 8/4/2013 12:19 AM, Joseph Rushton Wakeling wrote: On Sunday, 4 August 2013 at 06:07:54 UTC, dennis luehring wrote: ever tested nedmalloc (http://www.nedprod.com/programs/portable/nedmalloc/) or other malloc allocators? Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain state-of-the-art allocators and no third party allocator is likely to significantly improve on them in real world results. So there may be minimal returns from incorporating nedmalloc on modern OS's ... ? As I wrote earlier, Microsoft has enormous incentive to make Heap as fast as possible, as it will pay dividends for every Microsoft software product and software designed for Windows. I'm sure the engineers there know all about the various strategies available on the intarnets. Why not take advantage of their work? HeapAlloc is a forwarder to RtlHeapAlloc and C++ new does call RtlHeapAlloc directly - would it be better to use this kernel32 api directly? (maybe if used in druntime to reduce dll dependencies) Up to Windows XP (at least) KERNEL32's HeapAlloc function is forwarded to RtlAllocateHeap [1] function exported by NTDLL so there is no runtime performance overhead. There is no RtlHeapAlloc function on my Windows XP and I can't find any information about it on the web. Looks like a Windows 6.x stuff or a mistake in name. Also note there are tons of errors because of such slightly different names. If we are talking about Heap* functions: 1. Incorrect RtlAllocHeap name here [2]. 2. Incorrect HeapFree function signature (4-byte BOOL is returned but it is just a wrapper of RtlFreeHeap which returns 1-byte BOOLEAN) (fixed in Windows 6.x). [1] http://msdn.microsoft.com/en-us/library/windows/hardware/ff552108(v=vs.85).aspx [2] http://msdn.microsoft.com/ru-ru/magazine/cc301808(en-us).aspx -- Денис В. Шеломовский Denis V. Shelomovskij
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
04.08.2013 1:55, Walter Bright пишет: The execrable existing implementation was scrapped, and the new one uses Windows HeapAlloc(). http://ftp.digitalmars.com/snn.lib This is for testing porpoises, and of course for those that Feel Da Need For Speed. So I suppose you use `HeapFree` too? Please, be sure you use this Windows API BOOL/BOOLEAN bug workaround: https://github.com/denis-sh/phobos-additions/blob/e061d1ad282b4793d1c75dfcc20962b99ec842df/unstd/windows/heap.d#L178 -- Денис В. Шеломовский Denis V. Shelomovskij
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On 8/4/2013 2:28 AM, Denis Shelomovskij wrote: 04.08.2013 1:55, Walter Bright пишет: The execrable existing implementation was scrapped, and the new one uses Windows HeapAlloc(). http://ftp.digitalmars.com/snn.lib This is for testing porpoises, and of course for those that Feel Da Need For Speed. So I suppose you use `HeapFree` too? Yes. Please, be sure you use this Windows API BOOL/BOOLEAN bug workaround: https://github.com/denis-sh/phobos-additions/blob/e061d1ad282b4793d1c75dfcc20962b99ec842df/unstd/windows/heap.d#L178 That's good to know, thanks!
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
your're right it was RtlAllocateHeap Am 04.08.2013 11:25, schrieb Denis Shelomovskij: 04.08.2013 11:53, dennis luehring пОÑеÑ: Am 04.08.2013 09:35, schrieb Walter Bright: On 8/4/2013 12:19 AM, Joseph Rushton Wakeling wrote: On Sunday, 4 August 2013 at 06:07:54 UTC, dennis luehring wrote: ever tested nedmalloc (http://www.nedprod.com/programs/portable/nedmalloc/) or other malloc allocators? Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain state-of-the-art allocators and no third party allocator is likely to significantly improve on them in real world results. So there may be minimal returns from incorporating nedmalloc on modern OS's ... ? As I wrote earlier, Microsoft has enormous incentive to make Heap as fast as possible, as it will pay dividends for every Microsoft software product and software designed for Windows. I'm sure the engineers there know all about the various strategies available on the intarnets. Why not take advantage of their work? HeapAlloc is a forwarder to RtlHeapAlloc and C++ new does call RtlHeapAlloc directly - would it be better to use this kernel32 api directly? (maybe if used in druntime to reduce dll dependencies) Up to Windows XP (at least) KERNEL32's HeapAlloc function is forwarded to RtlAllocateHeap [1] function exported by NTDLL so there is no runtime performance overhead. There is no RtlHeapAlloc function on my Windows XP and I can't find any information about it on the web. Looks like a Windows 6.x stuff or a mistake in name. Also note there are tons of errors because of such slightly different names. If we are talking about Heap* functions: 1. Incorrect RtlAllocHeap name here [2]. 2. Incorrect HeapFree function signature (4-byte BOOL is returned but it is just a wrapper of RtlFreeHeap which returns 1-byte BOOLEAN) (fixed in Windows 6.x). [1] http://msdn.microsoft.com/en-us/library/windows/hardware/ff552108(v=vs.85).aspx [2] http://msdn.microsoft.com/ru-ru/magazine/cc301808(en-us).aspx
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On 8/3/2013 2:55 PM, Walter Bright wrote: Feel Da Need For Speed. So much better than: Feel Da Need For Reduced Elapsed Time :-)
Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()
On Saturday, August 03, 2013 14:55:29 Walter Bright wrote: The execrable existing implementation was scrapped, and the new one uses Windows HeapAlloc(). http://ftp.digitalmars.com/snn.lib This is for testing porpoises, and of course for those that Feel Da Need For Speed. But what if I prefer to test dolphins? ;) - Jonathan M Davis P.S. So long, and thanks for all the fish.