Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-11 Thread Walter Bright

On 8/3/2013 3:28 PM, Jonathan M Davis wrote:

On Saturday, August 03, 2013 14:55:29 Walter Bright wrote:

This is for testing porpoises, and of course for those that Feel Da Need For
Speed.


But what if I prefer to test dolphins? ;)


They all look alike anyway, what's the difference?



Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-06 Thread dennis luehring

Am 05.08.2013 19:52, schrieb Walter Bright:

On 8/5/2013 4:01 AM, Richard Webb wrote:

Using the latest DMD and this snn.lib, i'm seeing it take about 11.5 seconds to
compile the algorithm unit tests (when i tried it last week, it was taking
closer to 17 seconds).

For comparison, the MSVC build takes about 10 seconds on the same machine
(Athlon 64X2 6000+).

Keep up the good work :-)



So I guess the DMC code generator isn't as awful as has been assumed! This is
hardly the first time the culprit was a library routine, not the code generator.



don't start the party to early there are still 1.5 seconds left :)


Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-06 Thread Richard Webb

On 05/08/2013 18:52, Walter Bright wrote:

This is hardly the first time the culprit was a library routine



It's possible that other library routines are causing some of the 
remaining difference from the MSVC build (e.g. the profiler suggests 
that the DMC build spends somewhat more time inside memcpy than the MSVC 
build).


Not sure if it's down to implementation or optimization though - might 
be down to intrinsics/inlining and such? (the proflie for the DMC build 
says it's using ~1% of its time inside strlen and the profile for the 
MSVC build doesn't mention it at all, which i guess is because it's 
using an intrinsic version).






Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-06 Thread Walter Bright

On 8/6/2013 5:13 AM, Richard Webb wrote:

It's possible that other library routines are causing some of the remaining
difference from the MSVC build (e.g. the profiler suggests that the DMC build
spends somewhat more time inside memcpy than the MSVC build).

Not sure if it's down to implementation or optimization though - might be down
to intrinsics/inlining and such? (the proflie for the DMC build says it's using
~1% of its time inside strlen and the profile for the MSVC build doesn't mention
it at all, which i guess is because it's using an intrinsic version).



If it's inlined then it won't show up in the profile. And yes, it's possible 
MSVC has a faster memcpy(). After all, enormous effort has been poured into 
memcpy().


Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-06 Thread Kiith-Sa

On Tuesday, 6 August 2013 at 17:48:57 UTC, Walter Bright wrote:

On 8/6/2013 5:13 AM, Richard Webb wrote:
It's possible that other library routines are causing some of 
the remaining
difference from the MSVC build (e.g. the profiler suggests 
that the DMC build

spends somewhat more time inside memcpy than the MSVC build).

Not sure if it's down to implementation or optimization though 
- might be down
to intrinsics/inlining and such? (the proflie for the DMC 
build says it's using
~1% of its time inside strlen and the profile for the MSVC 
build doesn't mention
it at all, which i guess is because it's using an intrinsic 
version).



If it's inlined then it won't show up in the profile. And yes, 
it's possible MSVC has a faster memcpy(). After all, enormous 
effort has been poured into memcpy().


If you use a profiler with line or instruction granularity
(like perf on Linux), it will show up. On Windows, that would 
probably

be VTune and CodeAnalyst.


Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-06 Thread Kiith-Sa

On Tuesday, 6 August 2013 at 18:38:43 UTC, Kiith-Sa wrote:

On Tuesday, 6 August 2013 at 17:48:57 UTC, Walter Bright wrote:

On 8/6/2013 5:13 AM, Richard Webb wrote:
It's possible that other library routines are causing some of 
the remaining
difference from the MSVC build (e.g. the profiler suggests 
that the DMC build

spends somewhat more time inside memcpy than the MSVC build).

Not sure if it's down to implementation or optimization 
though - might be down
to intrinsics/inlining and such? (the proflie for the DMC 
build says it's using
~1% of its time inside strlen and the profile for the MSVC 
build doesn't mention
it at all, which i guess is because it's using an intrinsic 
version).



If it's inlined then it won't show up in the profile. And yes, 
it's possible MSVC has a faster memcpy(). After all, enormous 
effort has been poured into memcpy().


If you use a profiler with line or instruction granularity
(like perf on Linux), it will show up. On Windows, that would 
probably

be VTune and CodeAnalyst.


(obviously, as a part of the function it was inlined into,
but you'll get the time consumed at lines/instructions from the 
inlined function)


Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-05 Thread Richard Webb

On 03/08/2013 22:55, Walter Bright wrote:

The execrable existing implementation was scrapped, and the new one uses
Windows HeapAlloc().

http://ftp.digitalmars.com/snn.lib

This is for testing porpoises, and of course for those that Feel Da Need
For Speed.



Using the latest DMD and this snn.lib, i'm seeing it take about 11.5 
seconds to compile the algorithm unit tests (when i tried it last week, 
it was taking closer to 17 seconds).


For comparison, the MSVC build takes about 10 seconds on the same 
machine (Athlon 64X2 6000+).


Keep up the good work :-)



Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-05 Thread dennis luehring

Am 04.08.2013 11:28, schrieb Denis Shelomovskij:

04.08.2013 1:55, Walter Bright пОшет:

The execrable existing implementation was scrapped, and the new one uses
Windows HeapAlloc().

http://ftp.digitalmars.com/snn.lib

This is for testing porpoises, and of course for those that Feel Da Need
For Speed.


So I suppose you use `HeapFree` too? Please, be sure you use this
Windows API BOOL/BOOLEAN bug workaround:
https://github.com/denis-sh/phobos-additions/blob/e061d1ad282b4793d1c75dfcc20962b99ec842df/unstd/windows/heap.d#L178



but please without using two ifs and GetVersion on every free call


Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-05 Thread Walter Bright

On 8/5/2013 4:01 AM, Richard Webb wrote:

Using the latest DMD and this snn.lib, i'm seeing it take about 11.5 seconds to
compile the algorithm unit tests (when i tried it last week, it was taking
closer to 17 seconds).

For comparison, the MSVC build takes about 10 seconds on the same machine
(Athlon 64X2 6000+).

Keep up the good work :-)



So I guess the DMC code generator isn't as awful as has been assumed! This is 
hardly the first time the culprit was a library routine, not the code generator.


Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-05 Thread Kagamin
On Sunday, 4 August 2013 at 09:28:11 UTC, Denis Shelomovskij 
wrote:
So I suppose you use `HeapFree` too? Please, be sure you use 
this Windows API BOOL/BOOLEAN bug workaround:

https://github.com/denis-sh/phobos-additions/blob/e061d1ad282b4793d1c75dfcc20962b99ec842df/unstd/windows/heap.d#L178


BOOLEAN is either TRUE or FALSE, so it should be ok to check only 
the least significant byte.


Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-05 Thread Mr. Anonymous

On Monday, 5 August 2013 at 21:42:11 UTC, Kagamin wrote:
On Sunday, 4 August 2013 at 09:28:11 UTC, Denis Shelomovskij 
wrote:
So I suppose you use `HeapFree` too? Please, be sure you use 
this Windows API BOOL/BOOLEAN bug workaround:

https://github.com/denis-sh/phobos-additions/blob/e061d1ad282b4793d1c75dfcc20962b99ec842df/unstd/windows/heap.d#L178


BOOLEAN is either TRUE or FALSE, so it should be ok to check 
only the least significant byte.


Not in Windows:
typedef BYTE BOOLEAN;
typedef int BOOL;

(c) 
http://msdn.microsoft.com/en-us/library/windows/desktop/aa383751%28v=vs.85%29.aspx


While ideally it should be TRUE or FALSE, sometimes it isn't.
In fact, for functions that return BOOL, MSDN states the 
following:

If the function succeeds, the return value is nonzero.


Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-04 Thread dennis luehring

Am 03.08.2013 23:55, schrieb Walter Bright:

The execrable existing implementation was scrapped, and the new one uses Windows
HeapAlloc().

http://ftp.digitalmars.com/snn.lib

This is for testing porpoises, and of course for those that Feel Da Need For 
Speed.



ever tested nedmalloc 
(http://www.nedprod.com/programs/portable/nedmalloc/) or other malloc 
allocators?


Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-04 Thread Walter Bright

On 8/3/2013 11:07 PM, dennis luehring wrote:

ever tested nedmalloc (http://www.nedprod.com/programs/portable/nedmalloc/) or
other malloc allocators?


No, I haven't.


Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-04 Thread Joseph Rushton Wakeling

On Sunday, 4 August 2013 at 06:07:54 UTC, dennis luehring wrote:
ever tested nedmalloc 
(http://www.nedprod.com/programs/portable/nedmalloc/) or other 
malloc allocators?


Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain 
state-of-the-art allocators and no third party allocator is 
likely to significantly improve on them in real world results.


So there may be minimal returns from incorporating nedmalloc on 
modern OS's ... ?


Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-04 Thread Walter Bright

On 8/4/2013 12:19 AM, Joseph Rushton Wakeling wrote:

On Sunday, 4 August 2013 at 06:07:54 UTC, dennis luehring wrote:

ever tested nedmalloc (http://www.nedprod.com/programs/portable/nedmalloc/) or
other malloc allocators?


Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain state-of-the-art
allocators and no third party allocator is likely to significantly improve on
them in real world results.

So there may be minimal returns from incorporating nedmalloc on modern OS's ... 
?


As I wrote earlier, Microsoft has enormous incentive to make Heap as fast as 
possible, as it will pay dividends for every Microsoft software product and 
software designed for Windows. I'm sure the engineers there know all about the 
various strategies available on the intarnets. Why not take advantage of their work?


Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-04 Thread dennis luehring

Am 04.08.2013 09:35, schrieb Walter Bright:

On 8/4/2013 12:19 AM, Joseph Rushton Wakeling wrote:

On Sunday, 4 August 2013 at 06:07:54 UTC, dennis luehring wrote:

ever tested nedmalloc (http://www.nedprod.com/programs/portable/nedmalloc/) or
other malloc allocators?


Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain state-of-the-art
allocators and no third party allocator is likely to significantly improve on
them in real world results.

So there may be minimal returns from incorporating nedmalloc on modern OS's ... 
?


As I wrote earlier, Microsoft has enormous incentive to make Heap as fast as
possible, as it will pay dividends for every Microsoft software product and
software designed for Windows. I'm sure the engineers there know all about the
various strategies available on the intarnets. Why not take advantage of their 
work?


HeapAlloc is a forwarder to RtlHeapAlloc and C++ new does call 
RtlHeapAlloc directly - would it be better to use this kernel32 api 
directly? (maybe if used in druntime to reduce dll dependencies)




Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-04 Thread Walter Bright

On 8/4/2013 12:53 AM, dennis luehring wrote:

HeapAlloc is a forwarder to RtlHeapAlloc and C++ new does call RtlHeapAlloc
directly - would it be better to use this kernel32 api directly? (maybe if used
in druntime to reduce dll dependencies)



I can't find any documentation on RtlHeapAlloc.


Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-04 Thread Denis Shelomovskij

04.08.2013 11:53, dennis luehring пишет:

Am 04.08.2013 09:35, schrieb Walter Bright:

On 8/4/2013 12:19 AM, Joseph Rushton Wakeling wrote:

On Sunday, 4 August 2013 at 06:07:54 UTC, dennis luehring wrote:

ever tested nedmalloc
(http://www.nedprod.com/programs/portable/nedmalloc/) or
other malloc allocators?


Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain
state-of-the-art
allocators and no third party allocator is likely to significantly
improve on
them in real world results.

So there may be minimal returns from incorporating nedmalloc on
modern OS's ... ?


As I wrote earlier, Microsoft has enormous incentive to make Heap
as fast as
possible, as it will pay dividends for every Microsoft software
product and
software designed for Windows. I'm sure the engineers there know all
about the
various strategies available on the intarnets. Why not take advantage
of their work?


HeapAlloc is a forwarder to RtlHeapAlloc and C++ new does call
RtlHeapAlloc directly - would it be better to use this kernel32 api
directly? (maybe if used in druntime to reduce dll dependencies)



Up to Windows XP (at least) KERNEL32's HeapAlloc function is forwarded 
to RtlAllocateHeap [1] function exported by NTDLL so there is no runtime 
performance overhead.


There is no RtlHeapAlloc function on my Windows XP and I can't find any 
information about it on the web. Looks like a Windows 6.x stuff or a 
mistake in name.


Also note there are tons of errors because of such slightly different 
names. If we are talking about Heap* functions:

1. Incorrect RtlAllocHeap name here [2].
2. Incorrect HeapFree function signature (4-byte BOOL is returned but 
it is just a wrapper of RtlFreeHeap which returns 1-byte BOOLEAN) (fixed 
in Windows 6.x).


[1] 
http://msdn.microsoft.com/en-us/library/windows/hardware/ff552108(v=vs.85).aspx

[2] http://msdn.microsoft.com/ru-ru/magazine/cc301808(en-us).aspx

--
Денис В. Шеломовский
Denis V. Shelomovskij


Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-04 Thread Denis Shelomovskij

04.08.2013 1:55, Walter Bright пишет:

The execrable existing implementation was scrapped, and the new one uses
Windows HeapAlloc().

http://ftp.digitalmars.com/snn.lib

This is for testing porpoises, and of course for those that Feel Da Need
For Speed.


So I suppose you use `HeapFree` too? Please, be sure you use this 
Windows API BOOL/BOOLEAN bug workaround:

https://github.com/denis-sh/phobos-additions/blob/e061d1ad282b4793d1c75dfcc20962b99ec842df/unstd/windows/heap.d#L178

--
Денис В. Шеломовский
Denis V. Shelomovskij


Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-04 Thread Walter Bright

On 8/4/2013 2:28 AM, Denis Shelomovskij wrote:

04.08.2013 1:55, Walter Bright пишет:

The execrable existing implementation was scrapped, and the new one uses
Windows HeapAlloc().

http://ftp.digitalmars.com/snn.lib

This is for testing porpoises, and of course for those that Feel Da Need
For Speed.


So I suppose you use `HeapFree` too?


Yes.


Please, be sure you use this Windows API
BOOL/BOOLEAN bug workaround:
https://github.com/denis-sh/phobos-additions/blob/e061d1ad282b4793d1c75dfcc20962b99ec842df/unstd/windows/heap.d#L178


That's good to know, thanks!



Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-04 Thread dennis luehring

your're right it was RtlAllocateHeap

Am 04.08.2013 11:25, schrieb Denis Shelomovskij:

04.08.2013 11:53, dennis luehring пОшет:

Am 04.08.2013 09:35, schrieb Walter Bright:

On 8/4/2013 12:19 AM, Joseph Rushton Wakeling wrote:

On Sunday, 4 August 2013 at 06:07:54 UTC, dennis luehring wrote:

ever tested nedmalloc
(http://www.nedprod.com/programs/portable/nedmalloc/) or
other malloc allocators?


Windows 7, Linux 3.x, FreeBSD 8, Mac OS X 10.6 all contain
state-of-the-art
allocators and no third party allocator is likely to significantly
improve on
them in real world results.

So there may be minimal returns from incorporating nedmalloc on
modern OS's ... ?


As I wrote earlier, Microsoft has enormous incentive to make Heap
as fast as
possible, as it will pay dividends for every Microsoft software
product and
software designed for Windows. I'm sure the engineers there know all
about the
various strategies available on the intarnets. Why not take advantage
of their work?


HeapAlloc is a forwarder to RtlHeapAlloc and C++ new does call
RtlHeapAlloc directly - would it be better to use this kernel32 api
directly? (maybe if used in druntime to reduce dll dependencies)



Up to Windows XP (at least) KERNEL32's HeapAlloc function is forwarded
to RtlAllocateHeap [1] function exported by NTDLL so there is no runtime
performance overhead.

There is no RtlHeapAlloc function on my Windows XP and I can't find any
information about it on the web. Looks like a Windows 6.x stuff or a
mistake in name.

Also note there are tons of errors because of such slightly different
names. If we are talking about Heap* functions:
1. Incorrect RtlAllocHeap name here [2].
2. Incorrect HeapFree function signature (4-byte BOOL is returned but
it is just a wrapper of RtlFreeHeap which returns 1-byte BOOLEAN) (fixed
in Windows 6.x).

[1]
http://msdn.microsoft.com/en-us/library/windows/hardware/ff552108(v=vs.85).aspx
[2] http://msdn.microsoft.com/ru-ru/magazine/cc301808(en-us).aspx





Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-03 Thread Walter Bright

On 8/3/2013 2:55 PM, Walter Bright wrote:

Feel Da Need For Speed.


So much better than:

 Feel Da Need For Reduced Elapsed Time

:-)



Re: New malloc() for win32 that should produce faster DMD's and faster D code that uses malloc()

2013-08-03 Thread Jonathan M Davis
On Saturday, August 03, 2013 14:55:29 Walter Bright wrote:
 The execrable existing implementation was scrapped, and the new one uses
 Windows HeapAlloc().
 
 http://ftp.digitalmars.com/snn.lib
 
 This is for testing porpoises, and of course for those that Feel Da Need For
 Speed.

But what if I prefer to test dolphins? ;)

- Jonathan M Davis


P.S. So long, and thanks for all the fish.