Re: memset and related things

2009-09-20 Thread bearophile
language_fan:

> Every time I see your benchmarks, Java is actually doing just 
> fine, sometimes it's even faster than D. Are your benchmarks somehow 
> flawed every time Java wins D in performance?

I don't fully understand your question. The main purpose of some of the 
benchmarks I've done in the last few months is to find performance bugs (or 
sometimes just bugs) in LDC or LLVM (I have stopped doing something similar for 
DMD because people look less interested in improving it performance).

Now often I use code originally written in C++ and Java because I've seen that 
in most situations LLVM is able to produce good binaries from very C-like code 
(with few exceptions).

My benchmarks aren't chosen randomly, I naturally focus on things that are 
slower in D, so sometimes you can see Java to "win". I usually discard the code 
where Java results slower :-)

Installing and using a debug build of the JavaVM to see the assembly its JIT 
produces is not handy, but I sometimes do it. Once in a while you can find some 
surprises there.

Recently I have found a nice single-file C++ program, that heavily uses 
templates and partial specialization, but I am having problems in translating 
it to good D to test templates in LDC because I don't know enough C++ yet. It 
finds the optimal solution to the 15 problems, and it's quite efficient:
http://codepad.org/96ATkuhx
Help/suggestions are welcome :-)

Bye,
bearophile


Re: memset and related things

2009-09-20 Thread language_fan
Sun, 20 Sep 2009 16:09:50 -0400, bearophile thusly wrote:

> I'm just a newbie on this stuff, while
> people that write the memset of 64bit glibc are expert.

I have been wondering why people often here complain that Java is slower 
than D. Every time I see your benchmarks, Java is actually doing just 
fine, sometimes it's even faster than D. Are your benchmarks somehow 
flawed every time Java wins D in performance?


Re: memset and related things

2009-09-20 Thread Jeremie Pelletier

bearophile wrote:

I think this version is a bit better:

[snip]


Bye,
bearophile


This is quite interesting, it made me rethink my own uses of memset. By 
the way you should first verify if SSE is present through the cpuid 
instruction.


Re: memset and related things

2009-09-20 Thread bearophile
Don:

> It'll be interesting to see what the priorities are now -- 
> maybe this stuff is of more interest now.

Probably removing bugs is more important still :-)
For example your work has changed a little how compile-time functions can be 
used in D.


> BTW the AMD manual for K7 (or might be K6 optimisation manual? don't 
> exactly remember) goes into great detail about both memcpy() and 
> memset(). Turns out there's about five different cases.

In the meantime Deewiant has told me that on 64 bit glibc memset is better and 
on more modern CPUs the timings are different (and on 64 bit my first version 
may not work, maybe the second one is better. I have not tested it on 64 bit 
LDC yet). I'm just a newbie on this stuff, while people that write the memset 
of 64bit glibc are expert.

Bye and thank you,
bearophile


Re: memset and related things

2009-09-20 Thread Don

bearophile wrote:

In a program I've seen that in the inner loop an array cleaning was taking too 
much time. To solve the problem I've done many experiments, and I've also 
produced the following testing program.

The short summary is, to set array of 4 byte integers to a certain constant the 
best was are:
- if len <~ 20, then just use an inlined loop.
- if 20 < len < 200_000 it's better to use a loop unrolled 4 times with the 
movaps instruction (8 times unrolled is a little worse).
- if n > 200_000 a loop with the movntps instruction is better.

Generally such solutions are better than the memset() (only when len is about 
150_000 memset is a bit better than four movaps).


Yeah, DMD's memset() and memcpy() are far from optimal. IIRC memcpy() is 
even worse. I had done a bit of work on it, as well, but when I posted 
preliminary stuff, there wasn't much interest. The general feedback 
seemed to be that it'd be more useful to fix the compiler ICE bugs. So I 
did that . It'll be interesting to see what the priorities are now -- 
maybe this stuff is of more interest now.


BTW the AMD manual for K7 (or might be K6 optimisation manual? don't 
exactly remember) goes into great detail about both memcpy() and 
memset(). Turns out there's about five different cases.


Re: Mixin a constructor ?

2009-09-20 Thread Christopher Wright

Michel Fortin wrote:

On 2009-09-19 21:17:36 -0400, language_fan  said:


Since the constructor has no meaning outside classes, should it be
interpreted as a free function if mixed in a non-class context? I really
wonder how this could be valid code. Does the grammar even support the
3rd line?


Personally, I'd like it very much if functions from template mixins 
could overload with functions from outside the mixin. It'd allow me to 
replace string mixins with template mixins in quite a few places.


Also if you could implement a function from an interface with a template 
mixin.


Re: memset and related things

2009-09-20 Thread bearophile
I think this version is a bit better:

void memset4(T)(T[] a, T value=T.init) {
static assert (T.sizeof == 4);
static assert (size_t.sizeof == (T*).sizeof);
if (!a.length)
return;
auto a_ptr = a.ptr;
auto a_end = a_ptr + a.length;

// align pointer to 16 bytes, processing leading unaligned items
size_t a_end_trimmed = (cast(size_t)a_ptr + 15) & (~15);
while (cast(size_t)a_ptr < a_end_trimmed)
*a_ptr++ = value;

// ending pointer minus the last % 64 bytes
a_end_trimmed = cast(size_t)a_end & (~cast(size_t)63);

//printf("%d %d %d %u\n", a_ptr, a_end, a_end_trimmed);
//int counter1, counter2;

if (a_end_trimmed - cast(size_t)a_ptr > (200_000 * T.sizeof))
asm {
mov ESI, a_ptr;
mov EDI, a_end_trimmed;

//pxor XMM0, XMM0; // XMMO = value, value, value, value
// XMM0 = value,value,value,value
movss XMM0, value;
shufps XMM0, XMM0, 0;

align 8;
LOOP1: // writes 4 * 4 * 4 bytes each loop
//inc counter1;
add ESI, 64;
movntps [ESI+ 0-64], XMM0;
movntps [ESI+16-64], XMM0;
movntps [ESI+32-64], XMM0;
movntps [ESI+48-64], XMM0;
cmp ESI, EDI;
jb LOOP1;

mov a_ptr, ESI;
}
else if (a_end_trimmed - cast(size_t)a_ptr > 16)
asm {
mov ESI, a_ptr;
mov EDI, a_end_trimmed;

//pxor XMM0, XMM0; // XMMO = value, value, value, value
// XMM0 = value,value,value,value
movss XMM0, value;
shufps XMM0, XMM0, 0;

align 8;
LOOP2: // writes 4 * 4 * 4 bytes each loop
//inc counter2;
add ESI, 64;
movaps [ESI+ 0-64], XMM0;
movaps [ESI+16-64], XMM0;
movaps [ESI+32-64], XMM0;
movaps [ESI+48-64], XMM0;
cmp ESI, EDI;
jb LOOP2;

mov a_ptr, ESI;
}

//printf("counter1, counter2: %d %d\n", counter1, counter2);

// the last % 16 items
while (a_ptr < a_end)
*a_ptr++ = value;
}


Bye,
bearophile


Re: Mixin a constructor ?

2009-09-20 Thread Ellery Newcomer
language_fan wrote:
> Since the constructor has no meaning outside classes, should it be 
> interpreted as a free function if mixed in a non-class context? I really 
> wonder how this could be valid code. Does the grammar even support the 
> 3rd line?

Checking whether a constructor is inside a class happens during one of
the semantic passes. The parser makes no distinction between
class/interface/template/struct/union bodies.


Re: Mixin a constructor ?

2009-09-20 Thread Michel Fortin

On 2009-09-19 21:17:36 -0400, language_fan  said:


Since the constructor has no meaning outside classes, should it be
interpreted as a free function if mixed in a non-class context? I really
wonder how this could be valid code. Does the grammar even support the
3rd line?


Personally, I'd like it very much if functions from template mixins 
could overload with functions from outside the mixin. It'd allow me to 
replace string mixins with template mixins in quite a few places.


--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



Re: Rich Hickey's slides from jvm lang summit - worth a read?

2009-09-20 Thread Lutger
Hell yeah, super interesting! Also extremely well presented.

I found the conclusion of the presentation on youtube, anybody knows if the 
full presentation is (or will be) on the internet?

http://www.youtube.com/watch?v=zRTx1oGG_1Y&feature=channel_page


Re: How Nested Functions Work, part 2

2009-09-20 Thread Lutger
language_fan wrote:

> Sun, 20 Sep 2009 01:09:56 +, language_fan thusly wrote:
> 
>> Sat, 19 Sep 2009 11:44:33 -0700, Walter Bright thusly wrote:
>> 
>>> Lutger wrote:
 Cool article, I posted a comment. Reddit seems to be going downhill
 fast though, it's even worse than slashdot.
>>> 
>>> I know, the negative comments don't even make any sense.
>>> 
 Are locally instantiated templates used in phobos?
>>> 
>>> Yes.
>> 
>> I read the comments and I think some of them are justified. You cannot
>> really expect the way you built dmd to be the only alternative.
>> Efficient closures can be implemented differently if you have a VM that
>> supports precise generational gc, region inference, and the language is
>> a bit more value oriented (= functional).

Where does it say that the article describes the one true method? 

> Another thing is that often when an article about D is released, the only
> positive comments come from the members of the existing (smallish)
> community. There is no real interest in D outside the community (IMHO).
> Reddit's programming section is full of language fanatics, thus it is a
> bit hard to impress folks with tiny tricks as they daily work with
> various programming languages and concepts.

The majority of the negative comments on this post follow one of two lines 
of attacks: 1) D sucks or 2) Glaringly obvious straw man. From most of 
these, it's quite clear they haven't even read the article but responded 
anyway.