Re: Does dmd have SSE intrinsics?

2009-09-23 Thread Christopher Wright

Andrei Alexandrescu wrote:
Yah, but inside "do something interesting" you need to do special casing 
anyway.


Andrei


Sure, but if you're writing a generic library you can punt the problem 
to the user, who may or may not care about the return value at all. As 
is, it's a cost you pay whether you care or not.


Re: Does dmd have SSE intrinsics?

2009-09-23 Thread Michel Fortin
On 2009-09-22 12:32:25 -0400, Andrei Alexandrescu 
 said:



Daniel Keep wrote:

P.S. And another thing while I'm at it: why can't we declare void
variables?  This is another thing that really complicates generic code.


How would you use them?


Here's some generic code that would benefit from void as a variable 
type in the D/Objective-C bridge. Basically, it keeps the result of a 
function call, does some cleaning, and returns the result (with value 
conversions if needed). Unfortunately, you need a separate path for 
functions that returns void:


// Call Objective-C code that may raise an exception here.
static if (is(R == void)) func(objcArgs);
else ObjcType!(R) objcResult = func(objcArgs);

_NSRemoveHandler2(&_localHandler);

// Converting return value.
static if (is(R == void)) return;
else return decapsulate!(R)(objcResult);

It could be rewriten in a simpler way if void variables were supported:

// Call Objective-C code that may raise an exception here.
ObjcType!(R) objcResult = func(objcArgs);

_NSRemoveHandler2(&_localHandler);

// Converting return value.
return decapsulate!(R)(objcResult);

Note that returning a void resulting from a function call already works 
in D. You just can't "store" the result of such functions in a variable.


That said, it's not a big hassle in this case, thanks to static if. 
What suffers most is code readability.


--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/



Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Jeremie Pelletier

Andrei Alexandrescu wrote:

Jeremie Pelletier wrote:

Andrei Alexandrescu wrote:

Daniel Keep wrote:

Andrei Alexandrescu wrote:

Daniel Keep wrote:

P.S. And another thing while I'm at it: why can't we declare void
variables?  This is another thing that really complicates generic 
code.

How would you use them?


Andrei


Here's an OLD example:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
alias ReturnType!(Fn) returnT;

static if( is( returnT == void ) )
Fn(args);
else
auto result = Fn(args);

glCheckError();

static if( !is( returnT == void ) )
return result;
}

This function is used to wrap OpenGL calls so that error checking is
performed automatically.  Here's what it would look like if we could 
use

void variables:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
auto result = Fn(args);

glCheckError();

return result;
}


ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
scope(exit) glCheckError();
return Fn(args);
}

:o)


Andrei


Calling into a framehandler for such a trivial routine, especially if 
used with real time rendering, is definitely not a good idea, no 
matter how elegant its syntax is!


I guess that's what the smiley was about!

Andrei


I thought it meant "there, problem solved!"

:o)


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Andrei Alexandrescu

Jeremie Pelletier wrote:

Andrei Alexandrescu wrote:

Daniel Keep wrote:

Andrei Alexandrescu wrote:

Daniel Keep wrote:

P.S. And another thing while I'm at it: why can't we declare void
variables?  This is another thing that really complicates generic 
code.

How would you use them?


Andrei


Here's an OLD example:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
alias ReturnType!(Fn) returnT;

static if( is( returnT == void ) )
Fn(args);
else
auto result = Fn(args);

glCheckError();

static if( !is( returnT == void ) )
return result;
}

This function is used to wrap OpenGL calls so that error checking is
performed automatically.  Here's what it would look like if we could use
void variables:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
auto result = Fn(args);

glCheckError();

return result;
}


ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
scope(exit) glCheckError();
return Fn(args);
}

:o)


Andrei


Calling into a framehandler for such a trivial routine, especially if 
used with real time rendering, is definitely not a good idea, no matter 
how elegant its syntax is!


I guess that's what the smiley was about!

Andrei


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Jeremie Pelletier

Andrei Alexandrescu wrote:

Daniel Keep wrote:

Andrei Alexandrescu wrote:

Daniel Keep wrote:

P.S. And another thing while I'm at it: why can't we declare void
variables?  This is another thing that really complicates generic code.

How would you use them?


Andrei


Here's an OLD example:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
alias ReturnType!(Fn) returnT;

static if( is( returnT == void ) )
Fn(args);
else
auto result = Fn(args);

glCheckError();

static if( !is( returnT == void ) )
return result;
}

This function is used to wrap OpenGL calls so that error checking is
performed automatically.  Here's what it would look like if we could use
void variables:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
auto result = Fn(args);

glCheckError();

return result;
}


ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
scope(exit) glCheckError();
return Fn(args);
}

:o)


Andrei


Calling into a framehandler for such a trivial routine, especially if 
used with real time rendering, is definitely not a good idea, no matter 
how elegant its syntax is!


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Andrei Alexandrescu

Daniel Keep wrote:

Andrei Alexandrescu wrote:

Daniel Keep wrote:

P.S. And another thing while I'm at it: why can't we declare void
variables?  This is another thing that really complicates generic code.

How would you use them?


Andrei


Here's an OLD example:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
alias ReturnType!(Fn) returnT;

static if( is( returnT == void ) )
Fn(args);
else
auto result = Fn(args);

glCheckError();

static if( !is( returnT == void ) )
return result;
}

This function is used to wrap OpenGL calls so that error checking is
performed automatically.  Here's what it would look like if we could use
void variables:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
auto result = Fn(args);

glCheckError();

return result;
}


ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
scope(exit) glCheckError();
return Fn(args);
}

:o)


Andrei


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Robert Jacques
On Tue, 22 Sep 2009 19:40:03 -0400, Jeremie Pelletier   
wrote:



Christopher Wright wrote:

Jeremie Pelletier wrote:
Why would you declare void variables? The point of declaring typed  
variables is to know what kind of storage to use, void means no  
storage at all. The only time I use void in variable types is for  
void* and void[] (which really is just a void* with a length).


In fact, every single scope has an infinity of void variables, you  
just don't need to explicitly declare them :)


'void foo;' is the same semantically as ''.
 It simplifies generic code a fair bit. Let's say you want to intercept  
a method call transparently -- maybe wrap it in a database transaction,  
for instance. I do similar things in dmocks.

 Anyway, you need to store the return value. You could write:
 ReturnType!(func) func(ParameterTupleOf!(func) params)
{
auto result = innerObj.func(params);
// do something interesting
return result;
}
 Except then you get the error: voids have no value
 So instead you need to do some amount of special casing, perhaps quite  
a lot if you have to do something with the function result.


I don't get how void could be used to simplify generic code. You can  
already use type unions and variants for that and if you need a single  
more generic type you can always use void* to point to the data.


Besides in your above example, suppose the interesting thing its doing  
is to modify the result data, how would the compiler know how to modify  
void? It would just push back the error to the next statement.


Why don't you just replace ReturnType!func by auto and let the compiler  
resolve the return type to void?


Because auto returns suffer from forward referencing problems :

//Bad
auto x = bar;
auto bar() { return foo; }
auto foo() { return 1.0; }

//Okay
auto foo() { return 1.0; }
auto bar() { return foo; }
auto x = bar;



Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Jeremie Pelletier

Daniel Keep wrote:


Jeremie Pelletier wrote:

I don't get how void could be used to simplify generic code. You can
already use type unions and variants for that and if you need a single
more generic type you can always use void* to point to the data.


You can't take the address of a return value.  I'm not even sure you
could define a union type that would function generically without
specialising on void anyway.

And using a Variant is just ridiculous; it's adding runtime overhead
that is completely unnecessary.


Besides in your above example, suppose the interesting thing its doing
is to modify the result data, how would the compiler know how to modify
void? It would just push back the error to the next statement.


Example from actual code:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
alias ReturnType!(Fn) returnT;

static if( is( returnT == void ) )
Fn(args);
else
auto result = Fn(args);

glCheckError();

static if( !is( returnT == void ) )
return result;
}

This function is used to wrap OpenGL calls so that error checking is
performed automatically.  Here's what it would look like if we could use
void variables:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
auto result = Fn(args);

glCheckError();

return result;
}

I don't CARE about the result.  If I did, I wouldn't be allowing voids
at all, or I would be special-casing on it anyway and it wouldn't be an
issue.

The point is that there is NO WAY in a generic function to NOT care what
the return type is.  You have to, even if it ultimately doesn't matter.


Why don't you just replace ReturnType!func by auto and let the compiler
resolve the return type to void?


Well, there's this thing called "D1".  Quite a few people use it.

Especially since D2 isn't finished yet.


Oops sorry! I tend to forget the semantics and syntax of D1, I haven't 
used it since I first found about D2!


I would have to agree that you do make a good point here, void values 
could be useful in such a case, so long as the value is only assigned by 
method calls and not modified locally.


Basically in your example, auto result would just mean "use no storage 
and ignore return statements on result if auto resolves to void, but 
keep the value around until I return result if auto resolves to any 
other type".


Jeremie


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Daniel Keep


Jeremie Pelletier wrote:
> I don't get how void could be used to simplify generic code. You can
> already use type unions and variants for that and if you need a single
> more generic type you can always use void* to point to the data.

You can't take the address of a return value.  I'm not even sure you
could define a union type that would function generically without
specialising on void anyway.

And using a Variant is just ridiculous; it's adding runtime overhead
that is completely unnecessary.

> Besides in your above example, suppose the interesting thing its doing
> is to modify the result data, how would the compiler know how to modify
> void? It would just push back the error to the next statement.

Example from actual code:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
alias ReturnType!(Fn) returnT;

static if( is( returnT == void ) )
Fn(args);
else
auto result = Fn(args);

glCheckError();

static if( !is( returnT == void ) )
return result;
}

This function is used to wrap OpenGL calls so that error checking is
performed automatically.  Here's what it would look like if we could use
void variables:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
auto result = Fn(args);

glCheckError();

return result;
}

I don't CARE about the result.  If I did, I wouldn't be allowing voids
at all, or I would be special-casing on it anyway and it wouldn't be an
issue.

The point is that there is NO WAY in a generic function to NOT care what
the return type is.  You have to, even if it ultimately doesn't matter.

> Why don't you just replace ReturnType!func by auto and let the compiler
> resolve the return type to void?

Well, there's this thing called "D1".  Quite a few people use it.

Especially since D2 isn't finished yet.


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Daniel Keep

Andrei Alexandrescu wrote:
> Daniel Keep wrote:
>> P.S. And another thing while I'm at it: why can't we declare void
>> variables?  This is another thing that really complicates generic code.
> 
> How would you use them?
> 
> 
> Andrei

Here's an OLD example:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
alias ReturnType!(Fn) returnT;

static if( is( returnT == void ) )
Fn(args);
else
auto result = Fn(args);

glCheckError();

static if( !is( returnT == void ) )
return result;
}

This function is used to wrap OpenGL calls so that error checking is
performed automatically.  Here's what it would look like if we could use
void variables:

ReturnType!(Fn) glCheck(alias Fn)(ParameterTypeTuple!(Fn) args)
{
auto result = Fn(args);

glCheckError();

return result;
}


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread bearophile
Robert Jacques:

> Well, fixed length arrays are an implicit/explicit pointer to some  
> (stack/heap) allocated memory. So returning a fixed length array usually  
> means returning a pointer to now invalid stack memory. Allowing  
> fixed-length arrays to be returned by value would be nice, but basically  
> means the compiler is wrapping the array in a struct, which is easy enough  
> to do yourself. Using wrappers also avoids the breaking the logical  
> semantics of arrays (i.e. pass by reference).

As usual this discussion is developing into other directions that are both 
interesting and bordeline too complex for me :-)

Arrays are the most common and useful data structure (beside single 
values/variables). And experience shows me that in some situations static 
arrays can lead to higher performance (for example if you have a matrix, and 
its number of columns is known at compile time and such number is a power of 2, 
then the compiler can use just a shift to find a cell).

So I'd like to see improving the D management of such arrays (for me it's a 
MUCH more common problem than for example the last contravariant argument types 
discussed by Andrei. I am for improving simple things that I can understand and 
use every day first, and complex things later. D2 is getting too much difficult 
for me), even if some extra annotations are necessary.

The possible ways that can be useful:
- To return small arrays (for example the ones used by SSE/AVX registers) by 
value. Non need to create silly wrapper structs. The compiler has to show a 
performance warning when such arrays is bigger than 1024 bytes of RAM.
- LLVM has good stack-allocated (alloca) arrays, like the ones introduced by 
C99. Having a way to use them in D too is good.
- A way to return just the reference to a dynamic array when the function 
already takes in input the reference to it.
- To automatically allocate and copy returned static arrays on the heap, to 
keep the situation safe and avoid too many copies of large arrays (so it gets 
copied only once here). I'm not sure about this.

Bye,
bearophile


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Robert Jacques
On Tue, 22 Sep 2009 19:06:22 -0400, Christopher Wright  
 wrote:

Robert Jacques wrote:
On Tue, 22 Sep 2009 07:09:09 -0400, bearophile  
 wrote:

Robert Jacques:

[snip]
Also, another issue for game/graphic/robotic programmers is the  
ability to

return fixed length arrays from functions. Though struct wrappers
mitigates this.


Why doesn't D allow to return fixed-sized arrays from functions? It's  
a basic feature that I can find useful in many situations, it looks  
more useful than most of the last features implemented in D2.


Bye,
bearophile
 Well, fixed length arrays are an implicit/explicit pointer to some  
(stack/heap) allocated memory. So returning a fixed length array  
usually means returning a pointer to now invalid stack memory. Allowing  
fixed-length arrays to be returned by value would be nice, but  
basically means the compiler is wrapping the array in a struct, which  
is easy enough to do yourself. Using wrappers also avoids the breaking  
the logical semantics of arrays (i.e. pass by reference).


You could ease the restriction by disallowing implicit conversion from  
static to dynamic arrays in certain situations. A function returning a  
dynamic array cannot return a static array; you cannot assign the return  
value of a function returning a static array to a dynamic array.


Or in those cases, put the static array on the heap.


I'm not sure what you're referencing.


A function returning a dynamic array cannot return a static array;

This is already true; you have to .dup the array to return it.

you cannot assign the return value of a function returning a static  
array to a dynamic array.
This is already sorta true; once the return value is assigned to a  
static-array, it may then be implicitly casted to dynamic.


Neither of which help the situation.



Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Andrei Alexandrescu

Christopher Wright wrote:

Jeremie Pelletier wrote:
Why would you declare void variables? The point of declaring typed 
variables is to know what kind of storage to use, void means no 
storage at all. The only time I use void in variable types is for 
void* and void[] (which really is just a void* with a length).


In fact, every single scope has an infinity of void variables, you 
just don't need to explicitly declare them :)


'void foo;' is the same semantically as ''.


It simplifies generic code a fair bit. Let's say you want to intercept a 
method call transparently -- maybe wrap it in a database transaction, 
for instance. I do similar things in dmocks.


Anyway, you need to store the return value. You could write:

ReturnType!(func) func(ParameterTupleOf!(func) params)
{
auto result = innerObj.func(params);
// do something interesting
return result;
}

Except then you get the error: voids have no value

So instead you need to do some amount of special casing, perhaps quite a 
lot if you have to do something with the function result.


Yah, but inside "do something interesting" you need to do special casing 
anyway.


Andrei


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Robert Jacques
On Tue, 22 Sep 2009 18:56:12 -0400, Christopher Wright  
 wrote:



Robert Jacques wrote:
Yes, although classes have hidden vars, which are runtime dependent,  
changing the offset. Structs may be embedded in other things (therefore  
offset). And then there's the whole slicing from an array issue.


Um, no. Field accesses for class variables are (pointer + offset).  
Successive subclasses append their fields to the object, so if you  
sliced an object and changed its vtbl pointer, you could get a valid  
instance of its superclass.


If the class layout weren't determined at compile time, field accesses  
would be as slow as virtual function calls.


Clarification: I meant slicing an array of value types. i.e. if the size  
of the value type isn't a multiple of 16, then the alignment will change.  
(i.e. float3[])


As for classes, yes the compiler knows, but the point is that you don't  
know the size and therefore alignment of your super-class. Worse, it could  
change with different run-times or OSes. So trying to manually align  
things by introducing spacing vars, etc. is both hard, error-prone and  
non-portable.


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Jeremie Pelletier

Christopher Wright wrote:

Jeremie Pelletier wrote:
Why would you declare void variables? The point of declaring typed 
variables is to know what kind of storage to use, void means no 
storage at all. The only time I use void in variable types is for 
void* and void[] (which really is just a void* with a length).


In fact, every single scope has an infinity of void variables, you 
just don't need to explicitly declare them :)


'void foo;' is the same semantically as ''.


It simplifies generic code a fair bit. Let's say you want to intercept a 
method call transparently -- maybe wrap it in a database transaction, 
for instance. I do similar things in dmocks.


Anyway, you need to store the return value. You could write:

ReturnType!(func) func(ParameterTupleOf!(func) params)
{
auto result = innerObj.func(params);
// do something interesting
return result;
}

Except then you get the error: voids have no value

So instead you need to do some amount of special casing, perhaps quite a 
lot if you have to do something with the function result.


I don't get how void could be used to simplify generic code. You can 
already use type unions and variants for that and if you need a single 
more generic type you can always use void* to point to the data.


Besides in your above example, suppose the interesting thing its doing 
is to modify the result data, how would the compiler know how to modify 
void? It would just push back the error to the next statement.


Why don't you just replace ReturnType!func by auto and let the compiler 
resolve the return type to void?


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Christopher Wright

Jeremie Pelletier wrote:
Why would you declare void variables? The point of declaring typed 
variables is to know what kind of storage to use, void means no storage 
at all. The only time I use void in variable types is for void* and 
void[] (which really is just a void* with a length).


In fact, every single scope has an infinity of void variables, you just 
don't need to explicitly declare them :)


'void foo;' is the same semantically as ''.


It simplifies generic code a fair bit. Let's say you want to intercept a 
method call transparently -- maybe wrap it in a database transaction, 
for instance. I do similar things in dmocks.


Anyway, you need to store the return value. You could write:

ReturnType!(func) func(ParameterTupleOf!(func) params)
{
auto result = innerObj.func(params);
// do something interesting
return result;
}

Except then you get the error: voids have no value

So instead you need to do some amount of special casing, perhaps quite a 
lot if you have to do something with the function result.


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Christopher Wright

Robert Jacques wrote:
On Tue, 22 Sep 2009 07:09:09 -0400, bearophile 
 wrote:

Robert Jacques:

[snip]
Also, another issue for game/graphic/robotic programmers is the 
ability to

return fixed length arrays from functions. Though struct wrappers
mitigates this.


Why doesn't D allow to return fixed-sized arrays from functions? It's 
a basic feature that I can find useful in many situations, it looks 
more useful than most of the last features implemented in D2.


Bye,
bearophile


Well, fixed length arrays are an implicit/explicit pointer to some 
(stack/heap) allocated memory. So returning a fixed length array usually 
means returning a pointer to now invalid stack memory. Allowing 
fixed-length arrays to be returned by value would be nice, but basically 
means the compiler is wrapping the array in a struct, which is easy 
enough to do yourself. Using wrappers also avoids the breaking the 
logical semantics of arrays (i.e. pass by reference).


You could ease the restriction by disallowing implicit conversion from 
static to dynamic arrays in certain situations. A function returning a 
dynamic array cannot return a static array; you cannot assign the return 
value of a function returning a static array to a dynamic array.


Or in those cases, put the static array on the heap.


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Christopher Wright

Robert Jacques wrote:
Yes, although classes have hidden vars, which are runtime dependent, 
changing the offset. Structs may be embedded in other things (therefore 
offset). And then there's the whole slicing from an array issue.


Um, no. Field accesses for class variables are (pointer + offset). 
Successive subclasses append their fields to the object, so if you 
sliced an object and changed its vtbl pointer, you could get a valid 
instance of its superclass.


If the class layout weren't determined at compile time, field accesses 
would be as slow as virtual function calls.


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Andrei Alexandrescu

grauzone wrote:

Robert Jacques wrote:
On Tue, 22 Sep 2009 12:32:25 -0400, Andrei Alexandrescu 
 wrote:

Daniel Keep wrote:

[snip]

 The problem is that currently you have a class of types which can be
passed as arguments but cannot be returned.
 For example, Tango's Variant has this horrible hack where the ACTUAL
definition of Variant.get is:
 returnT!(S) get(S)();
 where you have:
 template returnT(T)
{
static if( isStaticArrayType!(T) )
alias typeof(T.dup) returnT;
else
alias T returnT;
}
 I can't recall the number of times this stupid hole in the language 
has
bitten me.  As for safety concerns, it's really no different to 
allowing
people to return delegates.  Not a very good reason, but I *REALLY* 
hate

having to special-case static arrays.


Yah, same in std.variant. I think there it's called 
DecayStaticToDynamicArray!T. Has someone added the correct handling 
of static arrays to bugzilla? Walter wants to implement it, but we 
want to make sure it's not forgotten.


Well, what is the correct handling? Struct style RVO or delegate 
auto-magical heap allocation? Something else?


Both solutions are far from perfect.
RVO breaks the reference semantics of arrays, though it works for many 
common cases and is high performance. This would be my choice, as I 
would like to efficiently return short vectors from functions.
Delegate style heap allocation runs into the whole 
I'd-rather-be-safe-than-sorry issue of excessive heap allocations. I'd 
imagine this would be better for generic code, since it would always 
work.


I think static arrays should be value types. Then this isn't a problem 
anymore, and returning a static array can be handled exactly like 
returning structs.


Didn't Walter once say that a type shouldn't behave differently, if it's 
wrapped into a struct? With current static array semantics, this rule is 
violated. If a static array has reference or value semantics kind of 
depends, if it's inside a struct: if you copy a struct, the embedded 
static array obviously looses its reference semantics.


Yah.

Also, I second that it should be possible to declare void variables. 
It'd be really useful for doing return value handling when transparently 
wrapping delegate calls in generic code.


I think that already works.


Andrei


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread grauzone

Robert Jacques wrote:
On Tue, 22 Sep 2009 12:32:25 -0400, Andrei Alexandrescu 
 wrote:

Daniel Keep wrote:

[snip]

 The problem is that currently you have a class of types which can be
passed as arguments but cannot be returned.
 For example, Tango's Variant has this horrible hack where the ACTUAL
definition of Variant.get is:
 returnT!(S) get(S)();
 where you have:
 template returnT(T)
{
static if( isStaticArrayType!(T) )
alias typeof(T.dup) returnT;
else
alias T returnT;
}
 I can't recall the number of times this stupid hole in the language has
bitten me.  As for safety concerns, it's really no different to allowing
people to return delegates.  Not a very good reason, but I *REALLY* hate
having to special-case static arrays.


Yah, same in std.variant. I think there it's called 
DecayStaticToDynamicArray!T. Has someone added the correct handling of 
static arrays to bugzilla? Walter wants to implement it, but we want 
to make sure it's not forgotten.


Well, what is the correct handling? Struct style RVO or delegate 
auto-magical heap allocation? Something else?


Both solutions are far from perfect.
RVO breaks the reference semantics of arrays, though it works for many 
common cases and is high performance. This would be my choice, as I 
would like to efficiently return short vectors from functions.
Delegate style heap allocation runs into the whole 
I'd-rather-be-safe-than-sorry issue of excessive heap allocations. I'd 
imagine this would be better for generic code, since it would always work.


I think static arrays should be value types. Then this isn't a problem 
anymore, and returning a static array can be handled exactly like 
returning structs.


Didn't Walter once say that a type shouldn't behave differently, if it's 
wrapped into a struct? With current static array semantics, this rule is 
violated. If a static array has reference or value semantics kind of 
depends, if it's inside a struct: if you copy a struct, the embedded 
static array obviously looses its reference semantics.


Also, I second that it should be possible to declare void variables. 
It'd be really useful for doing return value handling when transparently 
wrapping delegate calls in generic code.


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Jeremie Pelletier

Robert Jacques wrote:
On Tue, 22 Sep 2009 12:09:23 -0400, Jeremie Pelletier 
 wrote:



#ponce wrote:

In practice it's about an 8X speed difference!

On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
On i7, movups on aligned data is the same speed as movaps. It's 
still slower if it's an unaligned access.


It all depends on how important you think performance on Core2 and 
earlier Intel processors is.
I wasn't aware of that, and here I was wondering why my SSE code was 
slower than the FPU in certain places on my core2 quad, I now recall 
using a lot of movups instructions, thanks for the tip.

 Indeed SSE is known to be overkill when dealing with unaligned data.
In C++ writing SSE code is so painful you either have to use 
intrisics, or use libraries like Eigen (a SIMD vectorization library 
based on expression templates, which can generate SSE, AVX or FPU 
code). But using such a library is often way too intrusive, and 
alignement is not in standard C++.
 D does already understand arrays operations like Eigen do, in order 
to increase cacheability. It would be great if it could statically 
detect 16-byte aligned data and perform SSE when possible (though 
there must be many others things to do :) ).


The D memory manager already aligns data on 16 bytes boundaries. The 
only case I can think of right now is when data is in a struct or class:


struct {
float[4] vec; // aligned!
int a;
float[4] vec; // unaligned!
}


Yes, although classes have hidden vars, which are runtime dependent, 
changing the offset. Structs may be embedded in other things (therefore 
offset). And then there's the whole slicing from an array issue.


Ah yes, you are right. Then I guess it really is up to the programmer to 
know if the data is aligned or not and select different code paths from 
it. Adding checks at runtime just adds to the overhead we're trying to 
save by using SSE in the first place.


It would be great if we could declare aliases to asm instructions and 
use template functions with a (bool aligned = true) and set a movps 
alias to either movaps or movups depending on the value of aligned.


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread bearophile
Jeremie Pelletier:

> The D memory manager already aligns data on 16 bytes boundaries. The 
> only case I can think of right now is when data is in a struct or class:

LDC doesn't align to 16 the normal arrays inside functions:
A small test program:

void main() {
float[4] a = [1.0f, 2.0, 3.0, 4.0];
float[4] b, c;
b[] = 10.0f;
c[] = a[] + b[];
}


The ll code (the asm of the LLVM) LDC produces, this is the head:
ldc -O3 -inline -release -output-ll vect1.d

define x86_stdcallcc i32 @_Dmain(%"char[][]" %unnamed) {
entry:
  %a = alloca [4 x float], align 4; <[4 x float]*> [#uses=5]
  %b = alloca [4 x float], align 4; <[4 x float]*> [#uses=4]
  %c = alloca [4 x float], align 4; <[4 x float]*> [#uses=4]
  %.gc_mem = call noalias i8* @_d_newarrayvT(%object.TypeInfo* 
@_D11TypeInfo_Af6__initZ, i32 4) ;  [#uses=5]
[...]


The asm it produces for the whole main (the call to the array op is inlined, 
while _d_array_init_float is not inlined, I don't know why):
ldc -O3 -inline -release -output-s vect1.d

_Dmain:
pushl   %esi
subl$64, %esp
movl$4, 4(%esp)
movl$_D11TypeInfo_Af6__initZ, (%esp)
call_d_newarrayvT
movl$1065353216, (%eax)
movl$1073741824, 4(%eax)
movl$1077936128, 8(%eax)
movl$1082130432, 12(%eax)
movl8(%eax), %ecx
movl%ecx, 56(%esp)
movl4(%eax), %ecx
movl%ecx, 52(%esp)
movl(%eax), %eax
movl%eax, 48(%esp)
movl$1082130432, 60(%esp)
leal32(%esp), %esi
movl%esi, (%esp)
movl$2143289344, 8(%esp)
movl$4, 4(%esp)
call_d_array_init_float
leal16(%esp), %eax
movl%eax, (%esp)
movl$2143289344, 8(%esp)
movl$4, 4(%esp)
call_d_array_init_float
movl%esi, (%esp)
movl$1092616192, 8(%esp)
movl$4, 4(%esp)
call_d_array_init_float
movss   48(%esp), %xmm0
addss   32(%esp), %xmm0
movss   %xmm0, 16(%esp)
movss   52(%esp), %xmm0
addss   36(%esp), %xmm0
movss   %xmm0, 20(%esp)
movss   56(%esp), %xmm0
addss   40(%esp), %xmm0
movss   %xmm0, 24(%esp)
movss   60(%esp), %xmm0
addss   44(%esp), %xmm0
movss   %xmm0, 28(%esp)
xorl%eax, %eax
addl$64, %esp
popl%esi
ret $8


By the way, using Link-Time Optimization and interning LDC produces this LL 
(whole main):

define x86_stdcallcc i32 @_Dmain(%"char[][]" %unnamed) {
entry:
  %b = alloca [4 x float], align 4; <[4 x float]*> [#uses=1]
  %c = alloca [4 x float], align 4; <[4 x float]*> [#uses=1]
  %.gc_mem = call noalias i8* @_d_newarrayvT(%object.TypeInfo* 
@_D11TypeInfo_Af6__initZ, i32 4) ;  [#uses=4]
  %.gc_mem1 = bitcast i8* %.gc_mem to float*  ;  [#uses=1]
  store float 1.00e+00, float* %.gc_mem1
  %tmp3 = getelementptr i8* %.gc_mem, i32 4   ;  [#uses=1]
  %0 = bitcast i8* %tmp3 to float*;  [#uses=1]
  store float 2.00e+00, float* %0
  %tmp4 = getelementptr i8* %.gc_mem, i32 8   ;  [#uses=1]
  %1 = bitcast i8* %tmp4 to float*;  [#uses=1]
  store float 3.00e+00, float* %1
  %tmp5 = getelementptr i8* %.gc_mem, i32 12  ;  [#uses=1]
  %2 = bitcast i8* %tmp5 to float*;  [#uses=1]
  store float 4.00e+00, float* %2
  %tmp8 = getelementptr [4 x float]* %b, i32 0, i32 0 ;  [#uses=2]
  call void @_d_array_init_float(float* nocapture %tmp8, i32 4, float 
0x7FF8)
  %tmp9 = getelementptr [4 x float]* %c, i32 0, i32 0 ;  [#uses=1]
  call void @_d_array_init_float(float* nocapture %tmp9, i32 4, float 
0x7FF8)
  call void @_d_array_init_float(float* nocapture %tmp8, i32 4, float 
1.00e+01)
  ret i32 0
}


Bye,
bearophile


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Robert Jacques
On Tue, 22 Sep 2009 12:32:25 -0400, Andrei Alexandrescu  
 wrote:

Daniel Keep wrote:

[snip]

 The problem is that currently you have a class of types which can be
passed as arguments but cannot be returned.
 For example, Tango's Variant has this horrible hack where the ACTUAL
definition of Variant.get is:
 returnT!(S) get(S)();
 where you have:
 template returnT(T)
{
static if( isStaticArrayType!(T) )
alias typeof(T.dup) returnT;
else
alias T returnT;
}
 I can't recall the number of times this stupid hole in the language has
bitten me.  As for safety concerns, it's really no different to allowing
people to return delegates.  Not a very good reason, but I *REALLY* hate
having to special-case static arrays.


Yah, same in std.variant. I think there it's called  
DecayStaticToDynamicArray!T. Has someone added the correct handling of  
static arrays to bugzilla? Walter wants to implement it, but we want to  
make sure it's not forgotten.


Well, what is the correct handling? Struct style RVO or delegate  
auto-magical heap allocation? Something else?


Both solutions are far from perfect.
RVO breaks the reference semantics of arrays, though it works for many  
common cases and is high performance. This would be my choice, as I would  
like to efficiently return short vectors from functions.
Delegate style heap allocation runs into the whole  
I'd-rather-be-safe-than-sorry issue of excessive heap allocations. I'd  
imagine this would be better for generic code, since it would always work.







Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Robert Jacques
On Tue, 22 Sep 2009 12:09:23 -0400, Jeremie Pelletier   
wrote:



#ponce wrote:

In practice it's about an 8X speed difference!

On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
On i7, movups on aligned data is the same speed as movaps. It's still  
slower if it's an unaligned access.


It all depends on how important you think performance on Core2 and  
earlier Intel processors is.
I wasn't aware of that, and here I was wondering why my SSE code was  
slower than the FPU in certain places on my core2 quad, I now recall  
using a lot of movups instructions, thanks for the tip.

 Indeed SSE is known to be overkill when dealing with unaligned data.
In C++ writing SSE code is so painful you either have to use intrisics,  
or use libraries like Eigen (a SIMD vectorization library based on  
expression templates, which can generate SSE, AVX or FPU code). But  
using such a library is often way too intrusive, and alignement is not  
in standard C++.
 D does already understand arrays operations like Eigen do, in order to  
increase cacheability. It would be great if it could statically detect  
16-byte aligned data and perform SSE when possible (though there must  
be many others things to do :) ).


The D memory manager already aligns data on 16 bytes boundaries. The  
only case I can think of right now is when data is in a struct or class:


struct {
float[4] vec; // aligned!
int a;
float[4] vec; // unaligned!
}


Yes, although classes have hidden vars, which are runtime dependent,  
changing the offset. Structs may be embedded in other things (therefore  
offset). And then there's the whole slicing from an array issue.


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Andrei Alexandrescu

Daniel Keep wrote:


Robert Jacques wrote:

On Tue, 22 Sep 2009 07:09:09 -0400, bearophile
 wrote:

Robert Jacques:

[snip]

Also, another issue for game/graphic/robotic programmers is the
ability to
return fixed length arrays from functions. Though struct wrappers
mitigates this.

Why doesn't D allow to return fixed-sized arrays from functions? It's
a basic feature that I can find useful in many situations, it looks
more useful than most of the last features implemented in D2.

Bye,
bearophile

Well, fixed length arrays are an implicit/explicit pointer to some
(stack/heap) allocated memory. So returning a fixed length array usually
means returning a pointer to now invalid stack memory. Allowing
fixed-length arrays to be returned by value would be nice, but basically
means the compiler is wrapping the array in a struct, which is easy
enough to do yourself. Using wrappers also avoids the breaking the
logical semantics of arrays (i.e. pass by reference).


The problem is that currently you have a class of types which can be
passed as arguments but cannot be returned.

For example, Tango's Variant has this horrible hack where the ACTUAL
definition of Variant.get is:

returnT!(S) get(S)();

where you have:

template returnT(T)
{
static if( isStaticArrayType!(T) )
alias typeof(T.dup) returnT;
else
alias T returnT;
}

I can't recall the number of times this stupid hole in the language has
bitten me.  As for safety concerns, it's really no different to allowing
people to return delegates.  Not a very good reason, but I *REALLY* hate
having to special-case static arrays.


Yah, same in std.variant. I think there it's called 
DecayStaticToDynamicArray!T. Has someone added the correct handling of 
static arrays to bugzilla? Walter wants to implement it, but we want to 
make sure it's not forgotten.



P.S. And another thing while I'm at it: why can't we declare void
variables?  This is another thing that really complicates generic code.


How would you use them?


Andrei


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Lutger
Jeremie Pelletier wrote:

...
> Why would you declare void variables? The point of declaring typed
> variables is to know what kind of storage to use, void means no storage
> at all. The only time I use void in variable types is for void* and
> void[] (which really is just a void* with a length).
> 
> In fact, every single scope has an infinity of void variables, you just
> don't need to explicitly declare them :)
> 
> 'void foo;' is the same semantically as ''.

exactly: thus 'return foo;' in generic code can mean 'return;' when foo is 
of type void. This is similar to how return foo(); is already allowed when 
foo itself returns void.


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Jeremie Pelletier

#ponce wrote:

In practice it's about an 8X speed difference!

On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
On i7, movups on aligned data is the same speed as movaps. It's still 
slower if it's an unaligned access.


It all depends on how important you think performance on Core2 and 
earlier Intel processors is.
I wasn't aware of that, and here I was wondering why my SSE code was 
slower than the FPU in certain places on my core2 quad, I now recall 
using a lot of movups instructions, thanks for the tip.


Indeed SSE is known to be overkill when dealing with unaligned data.
In C++ writing SSE code is so painful you either have to use intrisics, or use 
libraries like Eigen (a SIMD vectorization library based on expression 
templates, which can generate SSE, AVX or FPU code). But using such a library 
is often way too intrusive, and alignement is not in standard C++.

D does already understand arrays operations like Eigen do, in order to increase 
cacheability. It would be great if it could statically detect 16-byte aligned 
data and perform SSE when possible (though there must be many others things to 
do :) ).


The D memory manager already aligns data on 16 bytes boundaries. The 
only case I can think of right now is when data is in a struct or class:


struct {
float[4] vec; // aligned!
int a;
float[4] vec; // unaligned!
}


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread #ponce
> > In practice it's about an 8X speed difference!
> > 
> > On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
> > On i7, movups on aligned data is the same speed as movaps. It's still 
> > slower if it's an unaligned access.
> > 
> > It all depends on how important you think performance on Core2 and 
> > earlier Intel processors is.
> 
> I wasn't aware of that, and here I was wondering why my SSE code was 
> slower than the FPU in certain places on my core2 quad, I now recall 
> using a lot of movups instructions, thanks for the tip.

Indeed SSE is known to be overkill when dealing with unaligned data.
In C++ writing SSE code is so painful you either have to use intrisics, or use 
libraries like Eigen (a SIMD vectorization library based on expression 
templates, which can generate SSE, AVX or FPU code). But using such a library 
is often way too intrusive, and alignement is not in standard C++.

D does already understand arrays operations like Eigen do, in order to increase 
cacheability. It would be great if it could statically detect 16-byte aligned 
data and perform SSE when possible (though there must be many others things to 
do :) ).


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Jeremie Pelletier

Don wrote:

bearophile wrote:

Robert Jacques:


Yes, but the unaligned version is slower, even for aligned data.


This is true today, but in future it may become a little less true, 
thanks to improvements in the CPUs.


The problem is that difference today is so extreme. On core2:
 movaps [mem128], xmm0; // aligned,   1 micro-op
 movups [mem128], xmm0; // unaligned, 9 micro-ops, even on aligned data!
In practice it's about an 8X speed difference!

On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
On i7, movups on aligned data is the same speed as movaps. It's still 
slower if it's an unaligned access.


It all depends on how important you think performance on Core2 and 
earlier Intel processors is.


I wasn't aware of that, and here I was wondering why my SSE code was 
slower than the FPU in certain places on my core2 quad, I now recall 
using a lot of movups instructions, thanks for the tip.


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Don

bearophile wrote:

Robert Jacques:


Yes, but the unaligned version is slower, even for aligned data.


This is true today, but in future it may become a little less true, thanks to 
improvements in the CPUs.


The problem is that difference today is so extreme. On core2:
 movaps [mem128], xmm0; // aligned,   1 micro-op
 movups [mem128], xmm0; // unaligned, 9 micro-ops, even on aligned data!
In practice it's about an 8X speed difference!

On AMD K8, it's only 2 vs 5 ops, and on K10 it's 2 vs 3 ops.
On i7, movups on aligned data is the same speed as movaps. It's still 
slower if it's an unaligned access.


It all depends on how important you think performance on Core2 and 
earlier Intel processors is.


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Jeremie Pelletier

Daniel Keep wrote:


Robert Jacques wrote:

On Tue, 22 Sep 2009 07:09:09 -0400, bearophile
 wrote:

Robert Jacques:

[snip]

Also, another issue for game/graphic/robotic programmers is the
ability to
return fixed length arrays from functions. Though struct wrappers
mitigates this.

Why doesn't D allow to return fixed-sized arrays from functions? It's
a basic feature that I can find useful in many situations, it looks
more useful than most of the last features implemented in D2.

Bye,
bearophile

Well, fixed length arrays are an implicit/explicit pointer to some
(stack/heap) allocated memory. So returning a fixed length array usually
means returning a pointer to now invalid stack memory. Allowing
fixed-length arrays to be returned by value would be nice, but basically
means the compiler is wrapping the array in a struct, which is easy
enough to do yourself. Using wrappers also avoids the breaking the
logical semantics of arrays (i.e. pass by reference).


The problem is that currently you have a class of types which can be
passed as arguments but cannot be returned.

For example, Tango's Variant has this horrible hack where the ACTUAL
definition of Variant.get is:

returnT!(S) get(S)();

where you have:

template returnT(T)
{
static if( isStaticArrayType!(T) )
alias typeof(T.dup) returnT;
else
alias T returnT;
}

I can't recall the number of times this stupid hole in the language has
bitten me.  As for safety concerns, it's really no different to allowing
people to return delegates.  Not a very good reason, but I *REALLY* hate
having to special-case static arrays.
P.S. And another thing while I'm at it: why can't we declare void
variables?  This is another thing that really complicates generic code.


Why would you declare void variables? The point of declaring typed 
variables is to know what kind of storage to use, void means no storage 
at all. The only time I use void in variable types is for void* and 
void[] (which really is just a void* with a length).


In fact, every single scope has an infinity of void variables, you just 
don't need to explicitly declare them :)


'void foo;' is the same semantically as ''.


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Daniel Keep


Robert Jacques wrote:
> On Tue, 22 Sep 2009 07:09:09 -0400, bearophile
>  wrote:
>> Robert Jacques:
> [snip]
>>> Also, another issue for game/graphic/robotic programmers is the
>>> ability to
>>> return fixed length arrays from functions. Though struct wrappers
>>> mitigates this.
>>
>> Why doesn't D allow to return fixed-sized arrays from functions? It's
>> a basic feature that I can find useful in many situations, it looks
>> more useful than most of the last features implemented in D2.
>>
>> Bye,
>> bearophile
> 
> Well, fixed length arrays are an implicit/explicit pointer to some
> (stack/heap) allocated memory. So returning a fixed length array usually
> means returning a pointer to now invalid stack memory. Allowing
> fixed-length arrays to be returned by value would be nice, but basically
> means the compiler is wrapping the array in a struct, which is easy
> enough to do yourself. Using wrappers also avoids the breaking the
> logical semantics of arrays (i.e. pass by reference).

The problem is that currently you have a class of types which can be
passed as arguments but cannot be returned.

For example, Tango's Variant has this horrible hack where the ACTUAL
definition of Variant.get is:

returnT!(S) get(S)();

where you have:

template returnT(T)
{
static if( isStaticArrayType!(T) )
alias typeof(T.dup) returnT;
else
alias T returnT;
}

I can't recall the number of times this stupid hole in the language has
bitten me.  As for safety concerns, it's really no different to allowing
people to return delegates.  Not a very good reason, but I *REALLY* hate
having to special-case static arrays.

P.S. And another thing while I'm at it: why can't we declare void
variables?  This is another thing that really complicates generic code.


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread Robert Jacques
On Tue, 22 Sep 2009 07:09:09 -0400, bearophile   
wrote:

Robert Jacques:

[snip]
Also, another issue for game/graphic/robotic programmers is the ability  
to

return fixed length arrays from functions. Though struct wrappers
mitigates this.


Why doesn't D allow to return fixed-sized arrays from functions? It's a  
basic feature that I can find useful in many situations, it looks more  
useful than most of the last features implemented in D2.


Bye,
bearophile


Well, fixed length arrays are an implicit/explicit pointer to some  
(stack/heap) allocated memory. So returning a fixed length array usually  
means returning a pointer to now invalid stack memory. Allowing  
fixed-length arrays to be returned by value would be nice, but basically  
means the compiler is wrapping the array in a struct, which is easy enough  
to do yourself. Using wrappers also avoids the breaking the logical  
semantics of arrays (i.e. pass by reference).


Re: Does dmd have SSE intrinsics?

2009-09-22 Thread bearophile
Robert Jacques:

> Yes, but the unaligned version is slower, even for aligned data.

This is true today, but in future it may become a little less true, thanks to 
improvements in the CPUs.


> Also, another issue for game/graphic/robotic programmers is the ability to  
> return fixed length arrays from functions. Though struct wrappers  
> mitigates this.

Why doesn't D allow to return fixed-sized arrays from functions? It's a basic 
feature that I can find useful in many situations, it looks more useful than 
most of the last features implemented in D2.

Bye,
bearophile


Re: Does dmd have SSE intrinsics?

2009-09-21 Thread Robert Jacques
On Mon, 21 Sep 2009 18:32:50 -0400, Jeremie Pelletier   
wrote:



bearophile wrote:

Don:
(1) They don't take advantage of fixed-length arrays. In particular,  
operations on float[4] should be a single SSE instruction (no function  
call, no loop, nothing). This will make a huge difference to game and  
graphics programmers, I believe.

[...]

It's issue (1) which is the killer.

 In my answer I have forgotten to say another small thing.
 The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I  
may like to add a second argument to such GC malloc, to specify the  
alignment, this can be used to save some memory when the alignment  
isn't necessary), while I think the std.c.stdlib.malloc() doesn't give  
pointers aligned to 16 bytes.
 In the following code if you want to implement the last line with one  
vector instruction then a and b arrays have to be aligned to 16 bytes.  
I think that currently LDC doesn't align a and b to 16 bytes.

 float[4] a = [1.f, 2., 3., 4.];
float[4] b[] = 10f;
float[4] c[] = a[] + b[];
 So you may need a syntax like the following, that's not handy:
 align(16) float[4] a = [1.f, 2., 3., 4.];
align(16) float[4] b[] = 10f;
align(16) float[4] c[] = a[] + b[];
 A possible solution is to automatically align to 16 (by default, but  
it can be changed to save stack space in specific situations) all  
static arrays allocated on the stack too :-)
A note: in future probably CPU vector instructions will relax their  
alignment requirements... it's already happening.

 Bye,
bearophile


That 16bytes alignment is a restriction of the current usage of bit  
fields. Since every bit in the field indexes a single 16bytes block, a  
simple shift 4 bits to the right translate a pointer into its index in  
the bit field. You could align on 4 bytes boundaries but at the cost of  
doubling the size of bit fields, and possibly having slower collection  
runs.


Doesn't SSE have aligned and unaligned versions of its move  
instructions? like MOVAPS and MOVUPS.


Yes, but the unaligned version is slower, even for aligned data.

Also, another issue for game/graphic/robotic programmers is the ability to  
return fixed length arrays from functions. Though struct wrappers  
mitigates this.


Re: Does dmd have SSE intrinsics?

2009-09-21 Thread Jeremie Pelletier

bearophile wrote:

Don:
(1) They don't take advantage of fixed-length arrays. In particular, 
operations on float[4] should be a single SSE instruction (no function 
call, no loop, nothing). This will make a huge difference to game and 
graphics programmers, I believe.

[...]

It's issue (1) which is the killer.


In my answer I have forgotten to say another small thing.

The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I may like 
to add a second argument to such GC malloc, to specify the alignment, this can 
be used to save some memory when the alignment isn't necessary), while I think 
the std.c.stdlib.malloc() doesn't give pointers aligned to 16 bytes.

In the following code if you want to implement the last line with one vector 
instruction then a and b arrays have to be aligned to 16 bytes. I think that 
currently LDC doesn't align a and b to 16 bytes.

float[4] a = [1.f, 2., 3., 4.];
float[4] b[] = 10f;
float[4] c[] = a[] + b[];

So you may need a syntax like the following, that's not handy:

align(16) float[4] a = [1.f, 2., 3., 4.];
align(16) float[4] b[] = 10f;
align(16) float[4] c[] = a[] + b[];

A possible solution is to automatically align to 16 (by default, but it can be 
changed to save stack space in specific situations) all static arrays allocated 
on the stack too :-)
A note: in future probably CPU vector instructions will relax their alignment 
requirements... it's already happening.

Bye,
bearophile


That 16bytes alignment is a restriction of the current usage of bit 
fields. Since every bit in the field indexes a single 16bytes block, a 
simple shift 4 bits to the right translate a pointer into its index in 
the bit field. You could align on 4 bytes boundaries but at the cost of 
doubling the size of bit fields, and possibly having slower collection runs.


Doesn't SSE have aligned and unaligned versions of its move 
instructions? like MOVAPS and MOVUPS.


Re: Does dmd have SSE intrinsics?

2009-09-21 Thread bearophile
Don:
> (1) They don't take advantage of fixed-length arrays. In particular, 
> operations on float[4] should be a single SSE instruction (no function 
> call, no loop, nothing). This will make a huge difference to game and 
> graphics programmers, I believe.
[...]
>It's issue (1) which is the killer.

In my answer I have forgotten to say another small thing.

The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I may like 
to add a second argument to such GC malloc, to specify the alignment, this can 
be used to save some memory when the alignment isn't necessary), while I think 
the std.c.stdlib.malloc() doesn't give pointers aligned to 16 bytes.

In the following code if you want to implement the last line with one vector 
instruction then a and b arrays have to be aligned to 16 bytes. I think that 
currently LDC doesn't align a and b to 16 bytes.

float[4] a = [1.f, 2., 3., 4.];
float[4] b[] = 10f;
float[4] c[] = a[] + b[];

So you may need a syntax like the following, that's not handy:

align(16) float[4] a = [1.f, 2., 3., 4.];
align(16) float[4] b[] = 10f;
align(16) float[4] c[] = a[] + b[];

A possible solution is to automatically align to 16 (by default, but it can be 
changed to save stack space in specific situations) all static arrays allocated 
on the stack too :-)
A note: in future probably CPU vector instructions will relax their alignment 
requirements... it's already happening.

Bye,
bearophile


Re: Does dmd have SSE intrinsics?

2009-09-21 Thread Jeremie Pelletier

Don wrote:

dsimcha wrote:

== Quote from Don (nos...@nospam.com)'s article

Jeremie Pelletier wrote:
While writing SSE assembly by hand in D is fun and works well, I'm 
wondering
if the compiler has intrinsics for its instruction set, much like 
xmmintrin.h in C.
The reason is that the compiler can usually reorder the intrinsics 
to optimize

performance.
I could always use C code to implement my SSE routines but then I'd 
lose the

ability to inline them in D.

I know this is an old post, but since it wasn't answered...
Make sure you know what the SSE intrinsics actually *do* in VC++/Intel!
I've read many complaints about how poorly they perform on all compilers
-- the penalty for allowing them to be reordered is that extra
instructions are often added, which means that straightforward C code is
sometimes faster!
In this regard, I'm personally excited about array operations. I think
the need for SSE intrinsics and vectorisation is a result of abstract
inversion: the instruction set is higher-level than the "high level
language"! Array operations allow D to catch up with asm again. When
array operations get implemented properly, it'll be interesting to see
how much need for SSE intrinsics remains.


What's wrong with the current implementation of array ops (other than 
a few misc.
bugs that have already been filed)?  I thought they already use SSE if 
available.


(1) They don't take advantage of fixed-length arrays. In particular, 
operations on float[4] should be a single SSE instruction (no function 
call, no loop, nothing). This will make a huge difference to game and 
graphics programmers, I believe.

(2) The operations don't block on cache size.
(3) DMD doesn't allow you to generate code assuming a minimum CPU 
capabilities. (In fact, when generating inline asm, the CPU type is 
8086! (this is in bugzilla)) This limits the possible use of (1).


It's issue (1) which is the killer.



I agree that a -arch switch of some sort would the best thing to hit 
dmd. It is already most useful in gcc which supported up to core2 when I 
last used it.


I wrote a linear algebra module with support for 2D,3D,4D vectors, 
quaternions, 3x2 and 4x4 matrices, all with template structs so I can 
declare them for float, double, or real components. I used SSE for the 
bigger operations which grew up the module size considerably. This is 
where I first started looking for SSE intrinsics. It would also be 
greatly helpful if the compiler could generate SSE code by itself, it 
would save a LOT of inline assembly for simple operations.


Re: Does dmd have SSE intrinsics?

2009-09-21 Thread Don

dsimcha wrote:

== Quote from Don (nos...@nospam.com)'s article

Jeremie Pelletier wrote:

While writing SSE assembly by hand in D is fun and works well, I'm wondering

if the compiler has intrinsics for its instruction set, much like xmmintrin.h 
in C.

The reason is that the compiler can usually reorder the intrinsics to optimize

performance.

I could always use C code to implement my SSE routines but then I'd lose the

ability to inline them in D.

I know this is an old post, but since it wasn't answered...
Make sure you know what the SSE intrinsics actually *do* in VC++/Intel!
I've read many complaints about how poorly they perform on all compilers
-- the penalty for allowing them to be reordered is that extra
instructions are often added, which means that straightforward C code is
sometimes faster!
In this regard, I'm personally excited about array operations. I think
the need for SSE intrinsics and vectorisation is a result of abstract
inversion: the instruction set is higher-level than the "high level
language"! Array operations allow D to catch up with asm again. When
array operations get implemented properly, it'll be interesting to see
how much need for SSE intrinsics remains.


What's wrong with the current implementation of array ops (other than a few 
misc.
bugs that have already been filed)?  I thought they already use SSE if 
available.


(1) They don't take advantage of fixed-length arrays. In particular, 
operations on float[4] should be a single SSE instruction (no function 
call, no loop, nothing). This will make a huge difference to game and 
graphics programmers, I believe.

(2) The operations don't block on cache size.
(3) DMD doesn't allow you to generate code assuming a minimum CPU 
capabilities. (In fact, when generating inline asm, the CPU type is 
8086! (this is in bugzilla)) This limits the possible use of (1).


It's issue (1) which is the killer.





Re: Does dmd have SSE intrinsics?

2009-09-21 Thread bearophile
dsimcha:

> What's wrong with the current implementation of array ops (other than a few 
> misc.
> bugs that have already been filed)?  I thought they already use SSE if 
> available.

The idea is to improve array operations so they become a handy way to 
efficiently use present and future (AVX too, 
http://en.wikipedia.org/wiki/Advanced_Vector_Extensions ) vector instructions.

So for example if in my D code I have:
float[4] a = [1.f, 2., 3., 4.];
float[4] b[] = 10f;
float[4] c = a + b;

The compiler has to use a single inlined SSE instruction to implement the third 
line (the 4 float sum) of D code. And to use two instructions to load & 
broadcast the float value 10 to a whole XMM register.

If the D code is:
float[8] a = [1.f, 2., 3., 4., 5., 6., 7., 8.];
float[8] b = [10.f, 20., 30., 40., 50., 60., 70., 80.];
float[8] c = a + b;
The current vector instructions aren't wide enough to do that in a single 
instruction (but future AVX will be able to), so the compiler has to inline two 
SSE instructions.

Currently such operations are implemented with calls to a function (that also 
tests if/what vector instructions are available), that slow down code if you 
have to sum just 4 floats.

Another problem is that some important semantics is missing, for example some 
shuffling, and few other things. With some care some, most, or all such 
operations (keeping a good look at AVX too) can be mapped to built-in array 
methods...

The problem here is that you don't want to tie too much the D language to the 
currently available vector instructions because in 5-10 years CPUs may change. 
So what you want is to add enough semantics that later the compiler can compile 
as it can (with the scalar instructions, with SSE1, with future AVX 1024 bit 
wide, or with something today unknown). If the language doesn't give enough 
semantics to the compiler, you are forced to do as GCC that now tries to infer 
vector operations from normal code, but it's a complex thing and usually not as 
efficient as using GCC SSE intrinsics.

This is something that deserves a thread here :-) In the end implementing all 
this doesn't look hard. It's mostly a matter of designing it well (while 
implementing the auto-vectorization as in GCC is harder to implement).

Bye,
bearophile


Re: Does dmd have SSE intrinsics?

2009-09-21 Thread dsimcha
== Quote from Don (nos...@nospam.com)'s article
> Jeremie Pelletier wrote:
> > While writing SSE assembly by hand in D is fun and works well, I'm wondering
if the compiler has intrinsics for its instruction set, much like xmmintrin.h 
in C.
> >
> > The reason is that the compiler can usually reorder the intrinsics to 
> > optimize
performance.
> >
> > I could always use C code to implement my SSE routines but then I'd lose the
ability to inline them in D.
> I know this is an old post, but since it wasn't answered...
> Make sure you know what the SSE intrinsics actually *do* in VC++/Intel!
> I've read many complaints about how poorly they perform on all compilers
> -- the penalty for allowing them to be reordered is that extra
> instructions are often added, which means that straightforward C code is
> sometimes faster!
> In this regard, I'm personally excited about array operations. I think
> the need for SSE intrinsics and vectorisation is a result of abstract
> inversion: the instruction set is higher-level than the "high level
> language"! Array operations allow D to catch up with asm again. When
> array operations get implemented properly, it'll be interesting to see
> how much need for SSE intrinsics remains.

What's wrong with the current implementation of array ops (other than a few 
misc.
bugs that have already been filed)?  I thought they already use SSE if 
available.


Re: Does dmd have SSE intrinsics?

2009-09-21 Thread Don

Jeremie Pelletier wrote:

While writing SSE assembly by hand in D is fun and works well, I'm wondering if 
the compiler has intrinsics for its instruction set, much like xmmintrin.h in C.

The reason is that the compiler can usually reorder the intrinsics to optimize 
performance.

I could always use C code to implement my SSE routines but then I'd lose the 
ability to inline them in D.


I know this is an old post, but since it wasn't answered...

Make sure you know what the SSE intrinsics actually *do* in VC++/Intel! 
I've read many complaints about how poorly they perform on all compilers 
-- the penalty for allowing them to be reordered is that extra 
instructions are often added, which means that straightforward C code is 
sometimes faster!


In this regard, I'm personally excited about array operations. I think 
the need for SSE intrinsics and vectorisation is a result of abstract 
inversion: the instruction set is higher-level than the "high level 
language"! Array operations allow D to catch up with asm again. When 
array operations get implemented properly, it'll be interesting to see 
how much need for SSE intrinsics remains.


Does dmd have SSE intrinsics?

2009-08-26 Thread Jeremie Pelletier
While writing SSE assembly by hand in D is fun and works well, I'm wondering if 
the compiler has intrinsics for its instruction set, much like xmmintrin.h in C.

The reason is that the compiler can usually reorder the intrinsics to optimize 
performance.

I could always use C code to implement my SSE routines but then I'd lose the 
ability to inline them in D.