[Mono-dev] Mono.SIMD supported platforms

2012-04-16 Thread Alexander Mezin
Hi.
I can't find any information on what platforms are currently supported
by Mono.Simd. In particular, is Mono.Simd hardware accelerated on
iPhone and Android?
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono.SIMD supported platforms

2012-04-16 Thread Rodrigo Kumpera
Only x86 and amd64 are supported.


On Mon, Apr 16, 2012 at 9:22 AM, Alexander Mezin
mezin.alexan...@gmail.comwrote:

 Hi.
 I can't find any information on what platforms are currently supported
 by Mono.Simd. In particular, is Mono.Simd hardware accelerated on
 iPhone and Android?
 ___
 Mono-devel-list mailing list
 Mono-devel-list@lists.ximian.com
 http://lists.ximian.com/mailman/listinfo/mono-devel-list

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd

2010-09-13 Thread Miguel de Icaza

   This patch is contributed under the MIT license
 
 I don't have push access to the main repository, so please
 commit the patch yourself.

This is an oversight, could I have your GitHub account so I can add you
to the group?

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd

2010-09-13 Thread Miguel de Icaza

   This patch is contributed under the MIT license
 
 I don't have push access to the main repository, so please
 commit the patch yourself.

Ah, never mind, found you: robert-j

You are now part of the Mono commit team.

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd

2010-09-09 Thread Robert Jordan
Hi Rodrigo,

On 07.09.2010 02:32, Rodrigo Kumpera wrote:
 Robert, can you commit your patch after you state the license of it? Either
 via email
 on MDL or on the commit message.


This patch is contributed under the MIT license

I don't have push access to the main repository, so please
commit the patch yourself.

Thanks,
Robert

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd

2010-09-06 Thread Rodrigo Kumpera
Robert, can you commit your patch after you state the license of it? Either
via email
on MDL or on the commit message.


On Wed, Aug 25, 2010 at 11:56 PM, Rodrigo Kumpera kump...@gmail.com wrote:

 a

 On Mon, Aug 23, 2010 at 7:01 PM, Robert Jordan robe...@gmx.net wrote:

 On 23.08.2010 23:13, Rodrigo Kumpera wrote:
 
  I think it's easier to catch the security exception under MS since its
  accell mode is None anyway.
 

 I had to move icall's call site outside the .cctor and mark
 the call site's method as non-inlineable to make this work.
 Thanks for the hint.


 http://github.com/robert-j/mono/commit/0450be20f52c64e2788287400fe1eb9ff9be6817

 Robert


 Patch looks good. Please just state the license it's under and you can
 commit it.




___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd

2010-08-27 Thread Alan
I think you missed the important part of that last email. If wanted you to
state the license of the patch, then commit it :)

Alan.

On 27 Aug 2010 02:10, Jerry Maine - KF5ADY crashfou...@gmail.com wrote:

 Please, I found this bug to be very annoying as it hampers the use of
dynamic languagues with Mono.Simd. I found this bug trying to use mono.simd
in ironpython.




On 08/25/2010 09:56 PM, Rodrigo Kumpera wrote:



 a

 On Mon, Aug 23, 2010 at 7:01 PM, Robert Jordan robe...@gmx.net wrote:

 On 23.08.20...


___
Mono-devel-list mailing list

 Mono-devel-list@lists.ximian.com
 http://lists.ximian.com/mailman/listinfo/mono-devel-list




___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd

2010-08-26 Thread Jerry Maine - KF5ADY
Please, I found this bug to be very annoying as it hampers the use of 
dynamic languagues with Mono.Simd. I found this bug trying to use 
mono.simd in ironpython.



On 08/25/2010 09:56 PM, Rodrigo Kumpera wrote:

a

On Mon, Aug 23, 2010 at 7:01 PM, Robert Jordan robe...@gmx.net 
mailto:robe...@gmx.net wrote:


On 23.08.2010 23:13, Rodrigo Kumpera wrote:

 I think it's easier to catch the security exception under MS
since its
 accell mode is None anyway.


I had to move icall's call site outside the .cctor and mark
the call site's method as non-inlineable to make this work.
Thanks for the hint.


http://github.com/robert-j/mono/commit/0450be20f52c64e2788287400fe1eb9ff9be6817

Robert


Patch looks good. Please just state the license it's under and you can 
commit it.




___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list
   


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd

2010-08-25 Thread Jerry Maine - KF5ADY
Well, I tried to make find a place would have system properties store 
where a key like mono.simd.accel could be used to get back the 
available acceleration capabilities. It could make the code a bit 
cleaner with not making a internal method call inside the Mono.Simd 
assembly.

Any ideas where that could be?


On 08/23/2010 05:01 PM, Robert Jordan wrote:
 On 23.08.2010 23:13, Rodrigo Kumpera wrote:

 I think it's easier to catch the security exception under MS since its
 accell mode is None anyway.

  
 I had to move icall's call site outside the .cctor and mark
 the call site's method as non-inlineable to make this work.
 Thanks for the hint.

 http://github.com/robert-j/mono/commit/0450be20f52c64e2788287400fe1eb9ff9be6817

 Robert

 ___
 Mono-devel-list mailing list
 Mono-devel-list@lists.ximian.com
 http://lists.ximian.com/mailman/listinfo/mono-devel-list


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd

2010-08-25 Thread Rodrigo Kumpera
a

On Mon, Aug 23, 2010 at 7:01 PM, Robert Jordan robe...@gmx.net wrote:

 On 23.08.2010 23:13, Rodrigo Kumpera wrote:
 
  I think it's easier to catch the security exception under MS since its
  accell mode is None anyway.
 

 I had to move icall's call site outside the .cctor and mark
 the call site's method as non-inlineable to make this work.
 Thanks for the hint.


 http://github.com/robert-j/mono/commit/0450be20f52c64e2788287400fe1eb9ff9be6817

 Robert


Patch looks good. Please just state the license it's under and you can
commit it.
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd

2010-08-23 Thread Robert Jordan
On 23.08.2010 13:16, Robert Jordan wrote:
 On 23.08.2010 04:53, Jerry Maine - KF5ADY wrote:
 I found a discrepency in Mono.Simd.SimdRuntime.AccelMode and it is
 equivalent access by reflection. I believe this is a bug.

 Attached is a test for this. I believe there are more cases of this in
 Mono.Simd

 Any ideas on how to fix this?

 Assuming that you want to fix this in mono: you could implement
 Mono.Simd.SimdRuntime.AccelMode as an icall. This will assure
 that both fast path and slow path would yield the same value.

A patch proposal:

http://github.com/robert-j/mono/commit/1107e83b1de0d65a00b2f62d1e88f275f17797e6

Robert

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd

2010-08-23 Thread Jerry Maine
Would the c# portion of the patch work on MS .Net?
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd

2010-08-23 Thread Robert Jordan
On 23.08.2010 19:24, Jerry Maine wrote:
 Would the c# portion of the patch work on MS .Net?

Dammit! I thought the icall would be ignored by MS.NET because
I took care of not invoking it in this case. But icalls are not
allowed in assemblies != mscorlib under MS.NET.

Unless I'm misguided, the only solution seems to evolve
around adding a branch to

marshal.cs: mono_marshal_get_runtime_invoke ()

Schematic code:

if (method-klass == Mono.Simd.SimdRuntime) {
need_direct_wrapper = TRUE;
}

The flag will instruct this function to create yet another
wrapper around calls to methods of the Mono.Simd.SimdRuntime
class. This additional wrapper lets the runtime take the
fast path even for reflection calls.

Robert

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd

2010-08-23 Thread Rodrigo Kumpera
On Mon, Aug 23, 2010 at 3:29 PM, Robert Jordan robe...@gmx.net wrote:

 On 23.08.2010 19:24, Jerry Maine wrote:
  Would the c# portion of the patch work on MS .Net?

 Dammit! I thought the icall would be ignored by MS.NET because
 I took care of not invoking it in this case. But icalls are not
 allowed in assemblies != mscorlib under MS.NET.

 Unless I'm misguided, the only solution seems to evolve
 around adding a branch to

marshal.cs: mono_marshal_get_runtime_invoke ()

 Schematic code:

 if (method-klass == Mono.Simd.SimdRuntime) {
need_direct_wrapper = TRUE;
 }

 The flag will instruct this function to create yet another
 wrapper around calls to methods of the Mono.Simd.SimdRuntime
 class. This additional wrapper lets the runtime take the
 fast path even for reflection calls.



I think it's easier to catch the security exception under MS since its
accell mode is None anyway.
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd

2010-08-23 Thread Robert Jordan
On 23.08.2010 23:13, Rodrigo Kumpera wrote:

 I think it's easier to catch the security exception under MS since its
 accell mode is None anyway.


I had to move icall's call site outside the .cctor and mark
the call site's method as non-inlineable to make this work.
Thanks for the hint.

http://github.com/robert-j/mono/commit/0450be20f52c64e2788287400fe1eb9ff9be6817

Robert

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd

2010-08-23 Thread Jerry Maine
I have an alternate idea that I'd like to research. It may lead to a
cleaner implementation.

On Mon, Aug 23, 2010 at 5:01 PM, Robert Jordan robe...@gmx.net wrote:

 On 23.08.2010 23:13, Rodrigo Kumpera wrote:
 
  I think it's easier to catch the security exception under MS since its
  accell mode is None anyway.
 

 I had to move icall's call site outside the .cctor and mark
 the call site's method as non-inlineable to make this work.
 Thanks for the hint.


 http://github.com/robert-j/mono/commit/0450be20f52c64e2788287400fe1eb9ff9be6817

 Robert

 ___
 Mono-devel-list mailing list
 Mono-devel-list@lists.ximian.com
 http://lists.ximian.com/mailman/listinfo/mono-devel-list

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-dev] mono.simd

2010-08-22 Thread Jerry Maine - KF5ADY
I found a discrepency in Mono.Simd.SimdRuntime.AccelMode and it is 
equivalent access by reflection. I believe this is a bug.


Attached is a test for this. I believe there are more cases of this in 
Mono.Simd


Any ideas on how to fix this?
using System;
using Mono.Simd;

namespace simd
{
	class SimdReflectionTest
	{
		public static void Main (string[] args)
		{
			
			Console.WriteLine(SimdRuntime.AccelMode);
			Console.WriteLine(typeof(SimdRuntime).GetProperty(AccelMode).GetValue(null,null));
			
			if (SimdRuntime.AccelMode != (AccelMode) typeof(SimdRuntime).GetProperty(AccelMode).GetValue(null,null)) {
Console.WriteLine(SimdRuntime.AccelMode != (AccelMode) typeof(SimdRuntime).GetProperty(\AccelMode).GetValue(null,null));
			}
		}
	}
}

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono.Simd AltiVec port

2010-05-02 Thread Sergei Dyshel
Hello Rodrigo and all,

Returning to my old problem which deals with alignment of vector variables.
I noticed that on x86 vector locals are aligned at 8-byte boundary instead
of 16-byte thus causing to use 'movups' instead of much more efficient
'movaps'.
On PowerPC there is no such bug, so I tried to compare their routines for
locals' allocation.  In 'mini-x86.c', in function 'mono_arch_allocate_vars',
I discovered this strange (to me) piece of code:

 /*
  * EBP is at alignment 8 % MONO_ARCH_FRAME_ALIGNMENT, so if we
  * have locals larger than 8 bytes we need to make sure that
  * they have the appropriate offset.
  */
 if (MONO_ARCH_FRAME_ALIGNMENT  8  locals_stack_align  8)
 offset += MONO_ARCH_FRAME_ALIGNMENT - sizeof (gpointer) *
2;

AFAIU, 'if's condition satisfied when there are vector locals and in that
case 'offset' is incremented by 16-4*2=8 bytes thus spoiling the alignment.
I tried to remove these lines and didn't notice anything bad, except that
alignment got fixed.
Moreover, there is no such lines in 'mini-amd64.c'.

Can somebody explain to me the meaning of this piece?
-- 
Regards,
Sergei Dyshel


On Thu, Feb 4, 2010 at 03:59, Rodrigo Kumpera kump...@gmail.com wrote:

 Hi Sergei,

 On Tue, Feb 2, 2010 at 6:59 AM, Sergei Dyshel qyron.priv...@gmail.comwrote:

 Hello all,

 I'm currently working on PowerPC port of Mono which utilizes AltiVec SIMD
 instructions. During the development I've encountered an alignment
 problem:

 As far as I understood from running Mono's JIT, stack-allocated
 Mono.Simd.Vector* types are always aligned by 16 byte bound, but global
 ones aren't (such as static class members). This is not a problem for SSE
 which has unaligned load/stores but AltiVec doesn't have them. Instead of
 implementing misaligned loads/stores for AltiVec I think it's better to
 force alignment in global variables, as it done in the case of stack.


 No, the JIT doesn't align all Vector types to 16 bytes. There are places,
 like spill, code that
 still doesn't do it correctly. Not a lot of work to get there, but still
 not done.


 If by global variables you mean statics, then making them properly aligned
 is possible with some trickery.
 The only issue alignment issue we can't currently fix are heap objects due
 to how our GC works.
 Our new GC might eventually gain the ability to properly align such
 objects, but this is something
 for the far future.



 Can somebody help me with that (e.g. point at relevant places in
 'mini-ppc.c')?


 To fix the alignment of stack variables you need to mess with a bunch of
 places:

 -The spill code from mini-codegen.c
 -The var allocation code in mono_allocate_stack_slots (mini.c)

 To fix the static storage alignment you need to change the code that
 allocate the statics area
 to use the proper alignment.

 This is the same problem as with objects as it uses a gc routine to
 allocate the memory blob.
 Fixing this requires boing deep into the GC, which is not something simple.



___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono.Simd AltiVec port

2010-02-11 Thread Rodrigo Kumpera
The way to handle those situations is to have a arch decomposition pass that
converts MULPS into a VZERO + MULADD.
For bonus points, you can add to the arch peephole code to fuse MULPS +
ADDPS.

For an example of that, take a look at mini-x86.c /
mono_arch_decompose_opts.

Rodrigo

On Tue, Feb 9, 2010 at 11:57 AM, Sergei Dyshel qyron.priv...@gmail.comwrote:

 Hi,
 Now I'm stuck with another problem on PPC. For multiplication of floats
 Altivec has only a fuse-add instruction which does a*b+c. So in order to
 implement OP_MULPS I need to assure c==0. The only solution which comes to
 mind is:
 XZERO D
 MULADD D = S1, S2, D

 Where MULADD is the instruction and D, S1, S2 are ins-dreg, sreg1, sreg2.
 But this solution won't work with cases in which S1=D or S2=D since D would
 be zeroed before use. So 2 possibilities remain:
 1) Make sure that D  S1 and D  S2 and then previously-mentioned
 solution will work.
 2) Allocate and additional (vector) register for MULPS and somehow store it
 inside MonoInst structure.

 What is the traditional way to do such things? I really need to solve this
 problem, any help will be greatly appreciated!

 Thanks,
 Sergei


 On Thu, Feb 4, 2010 at 02:59, Rodrigo Kumpera kump...@gmail.com wrote:

 Hi Sergei,

 On Tue, Feb 2, 2010 at 6:59 AM, Sergei Dyshel qyron.priv...@gmail.comwrote:

 Hello all,

 I'm currently working on PowerPC port of Mono which utilizes AltiVec SIMD
 instructions. During the development I've encountered an alignment
 problem:

 As far as I understood from running Mono's JIT, stack-allocated
 Mono.Simd.Vector* types are always aligned by 16 byte bound, but global
 ones aren't (such as static class members). This is not a problem for SSE
 which has unaligned load/stores but AltiVec doesn't have them. Instead of
 implementing misaligned loads/stores for AltiVec I think it's better to
 force alignment in global variables, as it done in the case of stack.


 No, the JIT doesn't align all Vector types to 16 bytes. There are places,
 like spill, code that
 still doesn't do it correctly. Not a lot of work to get there, but still
 not done.


 If by global variables you mean statics, then making them properly aligned
 is possible with some trickery.
 The only issue alignment issue we can't currently fix are heap objects due
 to how our GC works.
 Our new GC might eventually gain the ability to properly align such
 objects, but this is something
 for the far future.



 Can somebody help me with that (e.g. point at relevant places in
 'mini-ppc.c')?


 To fix the alignment of stack variables you need to mess with a bunch of
 places:

 -The spill code from mini-codegen.c
 -The var allocation code in mono_allocate_stack_slots (mini.c)

 To fix the static storage alignment you need to change the code that
 allocate the statics area
 to use the proper alignment.

 This is the same problem as with objects as it uses a gc routine to
 allocate the memory blob.
 Fixing this requires boing deep into the GC, which is not something
 simple.




 ___
 Mono-devel-list mailing list
 Mono-devel-list@lists.ximian.com
 http://lists.ximian.com/mailman/listinfo/mono-devel-list


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono.Simd AltiVec port

2010-02-09 Thread Sergei Dyshel
Hi,
Now I'm stuck with another problem on PPC. For multiplication of floats
Altivec has only a fuse-add instruction which does a*b+c. So in order to
implement OP_MULPS I need to assure c==0. The only solution which comes to
mind is:
XZERO D
MULADD D = S1, S2, D

Where MULADD is the instruction and D, S1, S2 are ins-dreg, sreg1, sreg2.
But this solution won't work with cases in which S1=D or S2=D since D would
be zeroed before use. So 2 possibilities remain:
1) Make sure that D  S1 and D  S2 and then previously-mentioned solution
will work.
2) Allocate and additional (vector) register for MULPS and somehow store it
inside MonoInst structure.

What is the traditional way to do such things? I really need to solve this
problem, any help will be greatly appreciated!

Thanks,
Sergei


On Thu, Feb 4, 2010 at 02:59, Rodrigo Kumpera kump...@gmail.com wrote:

 Hi Sergei,

 On Tue, Feb 2, 2010 at 6:59 AM, Sergei Dyshel qyron.priv...@gmail.comwrote:

 Hello all,

 I'm currently working on PowerPC port of Mono which utilizes AltiVec SIMD
 instructions. During the development I've encountered an alignment
 problem:

 As far as I understood from running Mono's JIT, stack-allocated
 Mono.Simd.Vector* types are always aligned by 16 byte bound, but global
 ones aren't (such as static class members). This is not a problem for SSE
 which has unaligned load/stores but AltiVec doesn't have them. Instead of
 implementing misaligned loads/stores for AltiVec I think it's better to
 force alignment in global variables, as it done in the case of stack.


 No, the JIT doesn't align all Vector types to 16 bytes. There are places,
 like spill, code that
 still doesn't do it correctly. Not a lot of work to get there, but still
 not done.


 If by global variables you mean statics, then making them properly aligned
 is possible with some trickery.
 The only issue alignment issue we can't currently fix are heap objects due
 to how our GC works.
 Our new GC might eventually gain the ability to properly align such
 objects, but this is something
 for the far future.



 Can somebody help me with that (e.g. point at relevant places in
 'mini-ppc.c')?


 To fix the alignment of stack variables you need to mess with a bunch of
 places:

 -The spill code from mini-codegen.c
 -The var allocation code in mono_allocate_stack_slots (mini.c)

 To fix the static storage alignment you need to change the code that
 allocate the statics area
 to use the proper alignment.

 This is the same problem as with objects as it uses a gc routine to
 allocate the memory blob.
 Fixing this requires boing deep into the GC, which is not something simple.



___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono.Simd AltiVec port

2010-02-03 Thread Rodrigo Kumpera
Hi Sergei,

On Tue, Feb 2, 2010 at 6:59 AM, Sergei Dyshel qyron.priv...@gmail.comwrote:

 Hello all,

 I'm currently working on PowerPC port of Mono which utilizes AltiVec SIMD
 instructions. During the development I've encountered an alignment problem:

 As far as I understood from running Mono's JIT, stack-allocated
 Mono.Simd.Vector* types are always aligned by 16 byte bound, but global
 ones aren't (such as static class members). This is not a problem for SSE
 which has unaligned load/stores but AltiVec doesn't have them. Instead of
 implementing misaligned loads/stores for AltiVec I think it's better to
 force alignment in global variables, as it done in the case of stack.


No, the JIT doesn't align all Vector types to 16 bytes. There are places,
like spill, code that
still doesn't do it correctly. Not a lot of work to get there, but still not
done.


If by global variables you mean statics, then making them properly aligned
is possible with some trickery.
The only issue alignment issue we can't currently fix are heap objects due
to how our GC works.
Our new GC might eventually gain the ability to properly align such objects,
but this is something
for the far future.



 Can somebody help me with that (e.g. point at relevant places in
 'mini-ppc.c')?


To fix the alignment of stack variables you need to mess with a bunch of
places:

-The spill code from mini-codegen.c
-The var allocation code in mono_allocate_stack_slots (mini.c)

To fix the static storage alignment you need to change the code that
allocate the statics area
to use the proper alignment.

This is the same problem as with objects as it uses a gc routine to allocate
the memory blob.
Fixing this requires boing deep into the GC, which is not something simple.
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-dev] Mono.Simd AltiVec port

2010-02-02 Thread Sergei Dyshel
Hello all,

I'm currently working on PowerPC port of Mono which utilizes AltiVec SIMD
instructions. During the development I've encountered an alignment problem:

As far as I understood from running Mono's JIT, stack-allocated
Mono.Simd.Vector* types are always aligned by 16 byte bound, but global
ones aren't (such as static class members). This is not a problem for SSE
which has unaligned load/stores but AltiVec doesn't have them. Instead of
implementing misaligned loads/stores for AltiVec I think it's better to
force alignment in global variables, as it done in the case of stack.

Can somebody help me with that (e.g. point at relevant places in
'mini-ppc.c')?

Thanks,
Sergei Dyshel
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-dev] Mono.Simd and Threefish256

2009-10-01 Thread Marcus Griep
As part of my free time, I decided to start down the path to SIMD-ing
some cryptography algorithms. As a starter exercise, I took Threefish256
from the SHA-3 submission Skein. The experience was very enlightening,
and as I haven't been able to find anything of substance out there about
working with Mono.Simd, I thought I'd write some articles about it.

I'm posting my experience to my blog in a 5-part series. The first of
the posts has already been published, and I'll have the rest ready by
the end of the weekend:

http://blog.xpdm.us/2009/10/01/skein-threefish-and-mono-simd-part-1/

Thanks to all the folks who've been keeping the Mono.Simd project going.

--
Marcus Griep


signature.asc
Description: This is a digitally signed message part
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono.SIMD

2009-02-23 Thread Alan McGovern
Hey,

The big issue you're having is that you haven't implemented a SIMD algorithm
;) I spent 15 mins 'optimising' your code and came up with this. Notice that
I made everything a SIMD operation. There is no scalar code in the method
anymore. This tripled performance as compared to the non-SIMD version. On my
machine:

-FLOAT 00:00:00.3888930 Color
-SIMD   00:00:00.1266820 Mono.Simd.Vector4f

You'd want to double check the result just in case I made a mistake with my
alterations.

Alan.

public static Vector4f GradientSIMD()
{
Vector4f finv_WH = new Vector4f (1.0f / (w*h), 1.0f / (w*h),
1.0f / (w*h), 1.0f / (w*h));
Vector4f ret = new Vector4f();

Vector4f a = new Vector4f(0.0f, 0.0f, 1.0f, 1.0f);
a += new Vector4f(0.0f, 1.0f, 0.0f, 1.0f);
a += new Vector4f(1.0f, 0.0f, 0.0f, 1.0f);
a += new Vector4f(0.5f, 0.5f, 1.0f, 1.0f);

//Process operator
Vector4f yVec = new Vector4f (h, h, 0, 0);
Vector4f yDiff = new Vector4f (-1, -1, 1, 1);
for (int y=0; yh; y++)
{
Vector4f factor = yVec * finv_WH;
yVec += yDiff;

Vector4f xVec = new Vector4f (w, 0, w, 0);
Vector4f xDiff = new Vector4f (-1, 1, -1, 1);
for (int x=0; xw; x++)
{
ret += (a * xVec * factor);
xVec += xDiff;
}
}
return ret;
}

On Fri, Feb 20, 2009 at 8:12 AM, Johann_fxgen jnadalu...@gmail.com wrote:


 I have done some performance tests of SIMD under windows.

 Results tests in ms:
 In MS C 235   (Visual Studio Release Mode With SIMD)
 In MS C 360   (Visual Studio Release Mode With 4D Float)
 In Mono C#453   (With Mono SIMD)
 In Mono C#562   (With Mono 4D Float)
 In MS C#   609   (Visual Studio With 4D Float)
 In MS C 672   (Visual Studio Debug Mode)

 I'm just surprise by difference between C SIMD and mono SIMD version.

 Is Mono.SIMD under linux speeder than under windows ?

 Johann.

 My mono code for test:

using Mono.Simd;
using System;
using Mono;

public struct Color
{
public float r,g,b,a;
};

public class TestMonoSIMD
{
public  Color m_pixels;
const int w = 4096;
const int h = 4096;

public static void Main ()
{
//Debug
Console.WriteLine(AccelMode: {0},
 Mono.Simd.SimdRuntime.AccelMode );

//Without SIMD
DateTime start1 = DateTime.Now;
Color ret1 = Gradient();
TimeSpan ts1 = DateTime.Now - start1;
Console.WriteLine(-FLOAT {0} {1}, ts1, ret1);

//With SIMD
DateTime start2 = DateTime.Now;
Vector4f ret2 = GradientSIMD();
TimeSpan ts2 = DateTime.Now - start2;
Console.WriteLine(-SIMD  {0} {1}, ts2, ret2);
}

public static Color Gradient()
{
float finv_WH = 1.0f / (float)(w*h);
Color ret = new Color();
ret.r=ret.g=ret.b=ret.a=0.0f;

Color a = new Color();
Color b = new Color();
Color c = new Color();
Color d = new Color();
a.r=0.0f;   a.g=0.0f; a.b=1.0f; a.a=1.0f;
b.r=0.0f;   b.g=1.0f; b.b=0.0f; b.a=1.0f;
c.r=1.0f;   c.g=0.0f; c.b=0.0f; c.a=1.0f;
d.r=0.5f;   d.g=0.5f; d.b=1.0f; d.a=1.0f;

//Process operator
for (int y=0; yh; y++)
{
for (int x=0; xw; x++)
{
//Calc percent A,B,C,D
float pa = (float)((w-x)*
 (h-y)) * finv_WH;
float pb = (float)((x)  *
 (h-y)) * finv_WH;
float pc = (float)((w-x)*
 (y))   * finv_WH;
float pd = (float)((x)  *
 (y))   * finv_WH;

float cr= ((a.r*pa) + (b.r*pb) +
 (c.r*pc) + (d.r*pd));
float cg= ((a.g*pa) + (b.g*pb) +
 (c.g*pc) + (d.g*pd));
float cb= ((a.b*pa) + (b.b*pb) +
 (c.b*pc) + (d.b*pd));
float ca= ((a.a*pa) + (b.a*pb) +
 (c.a*pc) + (d.a*pd));
   

Re: [Mono-dev] Mono.SIMD

2009-02-23 Thread Alan McGovern
Hey,

The C++ code seems very similar to the C# SIMD code, so I don't know what
would make the C# version any faster. This question would be best directed
at jit guys, who may know what causes the difference.

If you want to try speeding up the mono version, you should just use trial
and error to see if you can rewrite things so that you can get better
performance. For example, unrolling the loop may improve performance
noticably.

Alan.

On Mon, Feb 23, 2009 at 1:16 PM, Johann Nadalutti jnadalu...@gmail.comwrote:

 Hey,
  thanks a lot for your modifications.
  I have now SIMD x3 faster than 4DFloat version !
  I make the same code in C++ and It's x3 more faster than Mono.SIMD.
 I just want to know why and how to optimize my Mono code.
  What do you use as IDE to develop and debug Mono ?


 My Visual C++ code for test:

 class VectorSIMD
 {
 public:

 VectorSIMD();
 VectorSIMD(float x, float y, float z, float w);

 VectorSIMD operator*(const VectorSIMD other)
 {
 VectorSIMD r;
 r.vec = _mm_mul_ps(vec, other.vec);
 return r;
 }

 VectorSIMD operator*(float f)
 {
 VectorSIMD r;
 __m128 b = _mm_load1_ps(f);
 r.vec = _mm_mul_ps(vec, b);
 return r;
 }


 VectorSIMD operator+(const VectorSIMD other)
 {
 VectorSIMD r;
 r.vec = _mm_add_ps(vec, other.vec);
 return r;
 }

 //Datas
 union
 {
 __m128 vec;
 struct { float x, y, z, w; };
 };

 };

 VectorSIMD::VectorSIMD()
 {
 }

 VectorSIMD::VectorSIMD(float _x, float _y, float _z, float _w)
 {
 x=_x;y=_y; z=_z; w=_w;
 }


 VectorSIMD GradientSIMD()
 {
   VectorSIMD finv_WH(1.0f / (_W*_H), 1.0f / (_W*_H), 1.0f / (_W*_H), 1.0f /
 (_W*_H));
 VectorSIMD ret(0.0, 0.0, 0.0, 0.0);

 VectorSIMD a(0.0f, 0.0f, 1.0f, 1.0f);
 a =a + VectorSIMD(0.0f, 1.0f, 0.0f, 1.0f);
 a =a + VectorSIMD(1.0f, 0.0f, 0.0f, 1.0f);
 a =a + VectorSIMD(0.5f, 0.5f, 1.0f, 1.0f);


 //Process operator
   VectorSIMD yVec(_H, _H, 0, 0);
   VectorSIMD yDiff(-1.0f, -1.0f, 1.0f, 1.0f);
 for (int y=0; y_H; y++)
 {
 VectorSIMD factor = yVec * finv_WH;
 yVec = yVec + yDiff;

 VectorSIMD xVec(_W, 0, _W, 0);
 VectorSIMD xDiff(-1.0f, 1.0f, -1.0f, 1.0f);
 for (int x=0; x_W; x++)
 {
 ret=ret+(a*xVec*factor);
 xVec=xVec+xDiff;
 }
 }

 return ret;
 }


 Johann.




 2009/2/23 Alan McGovern alan.mcgov...@gmail.com

 Hey,

 The big issue you're having is that you haven't implemented a SIMD
 algorithm ;) I spent 15 mins 'optimising' your code and came up with this.
 Notice that I made everything a SIMD operation. There is no scalar code in
 the method anymore. This tripled performance as compared to the non-SIMD
 version. On my machine:

 -FLOAT 00:00:00.3888930 Color
 -SIMD   00:00:00.1266820 Mono.Simd.Vector4f

 You'd want to double check the result just in case I made a mistake with
 my alterations.

 Alan.

 public static Vector4f GradientSIMD()
 {
 Vector4f finv_WH = new Vector4f (1.0f / (w*h), 1.0f / (w*h),
 1.0f / (w*h), 1.0f / (w*h));
 Vector4f ret = new Vector4f();

 Vector4f a = new Vector4f(0.0f, 0.0f, 1.0f, 1.0f);
 a += new Vector4f(0.0f, 1.0f, 0.0f, 1.0f);
 a += new Vector4f(1.0f, 0.0f, 0.0f, 1.0f);
 a += new Vector4f(0.5f, 0.5f, 1.0f, 1.0f);

 //Process operator
 Vector4f yVec = new Vector4f (h, h, 0, 0);
 Vector4f yDiff = new Vector4f (-1, -1, 1, 1);
 for (int y=0; yh; y++)
 {
 Vector4f factor = yVec * finv_WH;
 yVec += yDiff;

 Vector4f xVec = new Vector4f (w, 0, w, 0);
 Vector4f xDiff = new Vector4f (-1, 1, -1, 1);
 for (int x=0; xw; x++)
 {
 ret += (a * xVec * factor);
 xVec += xDiff;
 }
 }
 return ret;
 }

 On Fri, Feb 20, 2009 at 8:12 AM, Johann_fxgen jnadalu...@gmail.comwrote:


 I have done some performance tests of SIMD under windows.

 Results tests in ms:
 In MS C 235   (Visual Studio Release Mode With SIMD)
 In MS C 360   (Visual Studio Release Mode With 4D Float)
 In Mono C#453   (With Mono SIMD)
 In Mono C#562   (With Mono 4D Float)
 In MS C#   609   (Visual Studio With 4D Float)
 In MS C 672   (Visual Studio Debug Mode)

 I'm just surprise by difference between C SIMD and mono SIMD version.

 Is Mono.SIMD under linux speeder than under windows ?

 Johann.

 My mono code for test:

using Mono.Simd;
using System;
using Mono;

public struct Color
{
public float r,g,b,a;
};

public class TestMonoSIMD
{
public  Color m_pixels;
const int w = 4096;
   

[Mono-dev] Mono.SIMD

2009-02-22 Thread Johann_fxgen

I have done some performance tests of SIMD under windows.

Results tests in ms:
In MS C 235   (Visual Studio Release Mode With SIMD)
In MS C 360   (Visual Studio Release Mode With 4D Float)
In Mono C#453   (With Mono SIMD)
In Mono C#562   (With Mono 4D Float)
In MS C#   609   (Visual Studio With 4D Float)
In MS C 672   (Visual Studio Debug Mode)

I'm just surprise by difference between C SIMD and mono SIMD version.

Is Mono.SIMD under linux speeder than under windows ?

Johann.

My mono code for test:

using Mono.Simd;
using System;
using Mono;

public struct Color
{
public float r,g,b,a;
};

public class TestMonoSIMD
{
public  Color m_pixels;
const int w = 4096;
const int h = 4096;

public static void Main ()
{
//Debug
Console.WriteLine(AccelMode: {0}, 
Mono.Simd.SimdRuntime.AccelMode );

//Without SIMD
DateTime start1 = DateTime.Now;
Color ret1 = Gradient();
TimeSpan ts1 = DateTime.Now - start1;
Console.WriteLine(-FLOAT {0} {1}, ts1, ret1);

//With SIMD
DateTime start2 = DateTime.Now;
Vector4f ret2 = GradientSIMD();
TimeSpan ts2 = DateTime.Now - start2;
Console.WriteLine(-SIMD  {0} {1}, ts2, ret2);
}

public static Color Gradient()
{
float finv_WH = 1.0f / (float)(w*h);
Color ret = new Color();
ret.r=ret.g=ret.b=ret.a=0.0f;

Color a = new Color();
Color b = new Color();
Color c = new Color();
Color d = new Color();  
a.r=0.0f;   a.g=0.0f; a.b=1.0f; a.a=1.0f;
b.r=0.0f;   b.g=1.0f; b.b=0.0f; b.a=1.0f;
c.r=1.0f;   c.g=0.0f; c.b=0.0f; c.a=1.0f;
d.r=0.5f;   d.g=0.5f; d.b=1.0f; d.a=1.0f;   


//Process operator
for (int y=0; yh; y++)
{
for (int x=0; xw; x++)
{
//Calc percent A,B,C,D
float pa = (float)((w-x)* 
(h-y)) * finv_WH;
float pb = (float)((x)  * 
(h-y)) * finv_WH;
float pc = (float)((w-x)* (y))  
 * finv_WH;
float pd = (float)((x)  * (y))  
 * finv_WH;

float cr= ((a.r*pa) + (b.r*pb) + 
(c.r*pc) + (d.r*pd));
float cg= ((a.g*pa) + (b.g*pb) + 
(c.g*pc) + (d.g*pd));
float cb= ((a.b*pa) + (b.b*pb) + 
(c.b*pc) + (d.b*pd));
float ca= ((a.a*pa) + (b.a*pb) + 
(c.a*pc) + (d.a*pd));
ret.r+=cr;  ret.g+=cg;  
ret.b+=cb;  ret.a+=ca;
}
}
return ret;
}

public static Vector4f GradientSIMD()
{
float finv_WH = 1.0f / (float)(w*h);
Vector4f ret = new Vector4f(0.0f, 0.0f, 0.0f, 0.0f);

Vector4f a = new Vector4f(0.0f, 0.0f, 1.0f, 1.0f);
Vector4f b = new Vector4f(0.0f, 1.0f, 0.0f, 1.0f);
Vector4f c = new Vector4f(1.0f, 0.0f, 0.0f, 1.0f);
Vector4f d = new Vector4f(0.5f, 0.5f, 1.0f, 1.0f);  


//Process operator
Vector4f p = new Vector4f();
Vector4f r = new Vector4f();
for (int y=0; yh; y++)
{
for (int x=0; xw; x++)
{
//Calc percent A,B,C,D
p.X = (float)((w-x) * (h-y)) * 
finv_WH;
p.Y = (float)((x)   * 
(h-y)) * finv_WH;
p.Z = (float)((w-x) * (y))   * 
finv_WH;
p.W = (float)((x)   * 

Re: [Mono-dev] Mono.Simd: Accelerated methods analysis

2008-12-10 Thread Rodrigo Kumpera
Oh,

BTW, there are 2 issues with your program.

The following code is wrong mi.GetParameters() [i].GetType(), it should be
mi.GetParameters() [i].ParameterType otherwise you'll be querying for
ParameterInfo class instead of what you want.

The other one is minor, in that some functions might not report as
accelerated because you're running it on an old machine without support.

Cheers,
Rodrigo


On Wed, Dec 10, 2008 at 10:21 AM, Rodrigo Kumpera [EMAIL PROTECTED] wrote:

 Hi Bart,

 Right now the only methods that are not accelerated are indexers, if any
 method is missing from this list, it's a bug.

 Cheers,
 Rodrigo




 On Sat, Dec 6, 2008 at 11:53 PM, Bart Masschelein [EMAIL PROTECTED]wrote:

 Hi all,

 I've written aprogram that uses reflection to give a list of relevant
 methods in the Mono.Simd, and reports whether they are accelerated or
 not (see below). This small program might be of interest to others, to
 see how well their processor behave.

 There are methods that have overloaded, for which I should give the
 signature, but I'm a bit lost in how this signature should look like.
 I tried to convert the ParameterInfo[] of the methods to Type[], as
 required by the IsMethodAccelerated method, but this gives erroneous
 results. Is it only the parameters list, or is there more to it?

 I thought of removing the overloaded methods (see list), but I guess I
 might risk to remove relevant methods as well. The overloaded methods
 are mainly op_Explicit, LoadAligned, StoreAligned, and the PrefetchXxx
 methods. Are these relevant to show up in such a list?

 Anyway, I'm quite thrilled to see that almost all of the methods are
 accelerated :-).

 Bart

 using System;
 using Mono.Simd;
 using System.Reflection;

 namespace AcceleratedMethods
 {
 class MainClass
 {
 public static void Main(string[] args)
 {
 // Change to your location of Mono.Simd
 string monoSimdLocation = @/Users/masschel/local/mono/
 lib/mono/2.0/Mono.Simd.dll;
 Assembly assembly = Assembly.LoadFile(monoSimdLocation);
 foreach(Type type in assembly.GetTypes())
 {
 string typeName = type.Name;
 if (typeName.Length=6  typeName.Substring(0,6) ==
 Vector)
 {
 Console.WriteLine(Type {0}, type.Name);
 foreach(MethodInfo mi in type.GetMethods())
 {
 string methodName = mi.Name;
 bool ctu = methodName != Equals
  methodName != GetHashCode
  methodName != ToString
  methodName != GetType
  (methodName.Length=4
  methodName.Substring(0, 4) !=
 get_
  methodName.Substring(0, 4) !=
 set_);
 if (ctu)
 {
 try
 {
 Console.WriteLine(   Method {0}
 {1}, mi.Name, SimdRuntime.IsMethodAccelerated(type, mi.Name));
 }
 // Overloaded methods
 catch
 (System.Reflection.AmbiguousMatchException amme)
 {
 Type[] types = new
 Type[mi.GetParameters().Length];
 for(int i = 0; i 
 mi.GetParameters().Length; i++)
 {
 types[i] = mi.GetParameters()
 [i].GetType();
 }
 Console.WriteLine(
 AmbiguousMatchException for method {0} {1}, mi.Name,
 SimdRuntime.IsMethodAccelerated(type, mi.Name, types));
 }
 }
 }
 }
 }
 }
 }
 }
 ___
 Mono-devel-list mailing list
 Mono-devel-list@lists.ximian.com
 http://lists.ximian.com/mailman/listinfo/mono-devel-list



___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono.Simd: Accelerated methods analysis

2008-12-10 Thread Bart Masschelein
 The following code is wrong mi.GetParameters() [i].GetType(), it  
 should be mi.GetParameters() [i].ParameterType otherwise you'll be  
 querying for ParameterInfo class instead of what you want.

Thanks, that was what I was looking for, updated program below.

 The other one is minor, in that some functions might not report as  
 accelerated because you're running it on an old machine without  
 support.

That is exactly what this program is supposed to do: see which  
functions are accelerated on a certain machine, and which not, to know  
if I can expect an increase or not, or rather choose for another  
option, without having to investigate this for each method seperately.  
I just have to run this program ones, and keep the list at hand. As an  
example, the output on my MacBookPro is added to the end.

Sorry for the lengthy mail ;-).

Bart

// Main.cs created with MonoDevelop
// User: masschel at 15:51 11/21/2008
//
// To change standard headers go to Edit-Preferences-Coding- 
 Standard Headers
//
using System;
using Mono.Simd;
using System.Reflection;

namespace AcceleratedMethods
{
 class MainClass
 {
 public static void Main(string[] args)
 {
 // Change to your location of Mono.Simd
 string monoSimdLocation = @/Users/masschel/local/mono/ 
lib/mono/2.0/Mono.Simd.dll;
 Assembly assembly = Assembly.LoadFile(monoSimdLocation);
 foreach(Type type in assembly.GetTypes())
 {
 string typeName = type.Name;
 if (typeName.Length=6  typeName.Substring(0,6) ==  
Vector)
 {
 Console.WriteLine(Type {0}, type.Name);
 foreach(MethodInfo mi in type.GetMethods())
 {
 string methodName = mi.Name;
 bool ctu = methodName != Equals
  methodName != GetHashCode
  methodName != ToString
  methodName != GetType
 /* (methodName.Length=4
  methodName.Substring(0, 4) !=  
get_
  methodName.Substring(0, 4) !=  
set_)*/;
 if (ctu)
 {
 Type[] types = new  
Type[mi.GetParameters().Length];
 Console.Write(   Method {0}(, mi.Name);
 for(int i = 0; i   
mi.GetParameters().Length; i++)
 {
 types[i] = mi.GetParameters() 
[i].ParameterType;
 if (i+1mi.GetParameters().Length)  
Console.Write({0}, , types[i].Name);
 else Console.Write({0},  
types[i].Name);
 }
 Console.WriteLine():{0} accelerated:  
{1}, mi.ReturnParameter, SimdRuntime.IsMethodAccelerated(type,  
mi.Name, types));
 }
 }
 }
 }
 }
 }
}


Type Vector2d
Method AndNot(Vector2d, Vector2d):Vector2d accelerated: True
Method HorizontalAdd(Vector2d, Vector2d):Vector2d accelerated: True
Method AddSub(Vector2d, Vector2d):Vector2d accelerated: True
Method HorizontalSub(Vector2d, Vector2d):Vector2d accelerated: True
Method InterleaveHigh(Vector2d, Vector2d):Vector2d accelerated: True
Method InterleaveLow(Vector2d, Vector2d):Vector2d accelerated: True
Method CompareEqual(Vector2d, Vector2d):Vector2d accelerated: True
Method CompareLessThan(Vector2d, Vector2d):Vector2d accelerated:  
True
Method CompareLessEqual(Vector2d, Vector2d):Vector2d accelerated:  
True
Method CompareUnordered(Vector2d, Vector2d):Vector2d accelerated:  
True
Method CompareNotEqual(Vector2d, Vector2d):Vector2d accelerated:  
True
Method CompareNotLessThan(Vector2d, Vector2d):Vector2d  
accelerated: True
Method CompareNotLessEqual(Vector2d, Vector2d):Vector2d  
accelerated: True
Method CompareOrdered(Vector2d, Vector2d):Vector2d accelerated: True
Method Duplicate(Vector2d):Vector2d accelerated: True
Method LoadAligned(Vector2d):Vector2d accelerated: True
Method StoreAligned(Vector2d, Vector2d):Void accelerated: True
Method LoadAligned(Vector2d*):Vector2d accelerated: True
Method StoreAligned(Vector2d*, Vector2d):Void accelerated: True
Method PrefetchTemporalAllCacheLevels(Vector2d):Void accelerated:  
True
Method PrefetchTemporal1stLevelCache(Vector2d):Void accelerated:  
True
Method PrefetchTemporal2ndLevelCache(Vector2d):Void accelerated:  
True
Method PrefetchNonTemporal(Vector2d):Void accelerated: True
Method PrefetchTemporalAllCacheLevels(Vector2d*):Void accelerated:  
True
Method PrefetchTemporal1stLevelCache(Vector2d*):Void accelerated:  
True
Method 

[Mono-dev] Mono.Simd: Accelerated methods analysis

2008-12-06 Thread Bart Masschelein
Hi all,

I've written aprogram that uses reflection to give a list of relevant  
methods in the Mono.Simd, and reports whether they are accelerated or  
not (see below). This small program might be of interest to others, to  
see how well their processor behave.

There are methods that have overloaded, for which I should give the  
signature, but I'm a bit lost in how this signature should look like.  
I tried to convert the ParameterInfo[] of the methods to Type[], as  
required by the IsMethodAccelerated method, but this gives erroneous  
results. Is it only the parameters list, or is there more to it?

I thought of removing the overloaded methods (see list), but I guess I  
might risk to remove relevant methods as well. The overloaded methods  
are mainly op_Explicit, LoadAligned, StoreAligned, and the PrefetchXxx  
methods. Are these relevant to show up in such a list?

Anyway, I'm quite thrilled to see that almost all of the methods are  
accelerated :-).

Bart

using System;
using Mono.Simd;
using System.Reflection;

namespace AcceleratedMethods
{
 class MainClass
 {
 public static void Main(string[] args)
 {
 // Change to your location of Mono.Simd
 string monoSimdLocation = @/Users/masschel/local/mono/ 
lib/mono/2.0/Mono.Simd.dll;
 Assembly assembly = Assembly.LoadFile(monoSimdLocation);
 foreach(Type type in assembly.GetTypes())
 {
 string typeName = type.Name;
 if (typeName.Length=6  typeName.Substring(0,6) ==  
Vector)
 {
 Console.WriteLine(Type {0}, type.Name);
 foreach(MethodInfo mi in type.GetMethods())
 {
 string methodName = mi.Name;
 bool ctu = methodName != Equals
  methodName != GetHashCode
  methodName != ToString
  methodName != GetType
  (methodName.Length=4
  methodName.Substring(0, 4) !=  
get_
  methodName.Substring(0, 4) !=  
set_);
 if (ctu)
 {
 try
 {
 Console.WriteLine(   Method {0}  
{1}, mi.Name, SimdRuntime.IsMethodAccelerated(type, mi.Name));
 }
 // Overloaded methods
 catch  
(System.Reflection.AmbiguousMatchException amme)
 {
 Type[] types = new  
Type[mi.GetParameters().Length];
 for(int i = 0; i   
mi.GetParameters().Length; i++)
 {
 types[i] = mi.GetParameters() 
[i].GetType();
 }
 Console.WriteLine(
AmbiguousMatchException for method {0} {1}, mi.Name,  
SimdRuntime.IsMethodAccelerated(type, mi.Name, types));
 }
 }
 }
 }
 }
 }
 }
}
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd sugestions

2008-11-20 Thread crashfourit



Rodrigo Kumpera wrote:
 
 On Wed, Nov 19, 2008 at 4:23 PM, crashfourit [EMAIL PROTECTED]
 wrote:
 


 It would be nice to have the vector* have a constructor that takes in
 only
 one argument and fills all spots in the vector* with the same value.
 Like...
 Vector4f vector = new Vector4f(1);

 Second... I can really see someone doing this to use mono.simd in already
 established code base.

 [StructLayout( LayoutKind.Sequential, Pack = 0, Size = 16 )]
 class Vector4 {

/*
 some user defined vector methods.
 ..
   */

private static explicit operator Vector4f(Vector4 v){
unsafe {
Vector4f* p = (Vector4f*) v;
return *p;
}
}

private static explicit operator Vector4(Vector4f v){
unsafe {
Vector4* p = (Vector4*) v;
return *p;
}
}
 }

 Is it possible to accelerate these user defined operator overloads? Or do
 I
 have to resort to C# style unions?
 --


 
 This code will be inlined and work like a charm. But I recommend coding it
 in the following way to
 squeeze the maximum  performance out of it:
 
 public static unsafe Vector4f AsVector(ref Vector4 v){
 fixed (Vector4 *f = v) {
 return *(Vector4f*)f;
 }
 }
 
 
 
 This will avoid the extra copy of passing the valuetype by value on stack
 and will inline straight to a load from the
 load/array element to a simd machine register. Pretty cool, isn't it?
 
 ___
 Mono-devel-list mailing list
 Mono-devel-list@lists.ximian.com
 http://lists.ximian.com/mailman/listinfo/mono-devel-list
 
 
How will the jit engine handle this?
public static unsafe Vector4 AsVector4(ref Vector4f v){
fixed (Vector4f *f = v) {
return *(Vector4*)f;
}
}
-- 
View this message in context: 
http://www.nabble.com/mono.simd-sugestions-tp20586082p20612136.html
Sent from the Mono - Dev mailing list archive at Nabble.com.

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd sugestions

2008-11-20 Thread Rodrigo Kumpera
The JIT will generate reasonable code. It's on our plans to give atention on
having a good
integration story with existing code.


On Thu, Nov 20, 2008 at 9:15 PM, crashfourit [EMAIL PROTECTED] wrote:

 How will the jit engine handle this?
public static unsafe Vector4 AsVector4(ref Vector4f v){
fixed (Vector4f *f = v) {
return *(Vector4*)f;
}
}


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-dev] mono.simd sugestions

2008-11-19 Thread crashfourit


It would be nice to have the vector* have a constructor that takes in only
one argument and fills all spots in the vector* with the same value. Like...
Vector4f vector = new Vector4f(1);

Second... I can really see someone doing this to use mono.simd in already
established code base.

[StructLayout( LayoutKind.Sequential, Pack = 0, Size = 16 )]
class Vector4 {

/* 
 some user defined vector methods.
 ..
   */

private static explicit operator Vector4f(Vector4 v){
unsafe {
Vector4f* p = (Vector4f*) v;
return *p;
}
}

private static explicit operator Vector4(Vector4f v){
unsafe {
Vector4* p = (Vector4*) v;
return *p;
}
}
}

Is it possible to accelerate these user defined operator overloads? Or do I
have to resort to C# style unions?
-- 
View this message in context: 
http://www.nabble.com/mono.simd-sugestions-tp20586082p20586082.html
Sent from the Mono - Dev mailing list archive at Nabble.com.

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] mono.simd sugestions

2008-11-19 Thread Rodrigo Kumpera
On Wed, Nov 19, 2008 at 4:23 PM, crashfourit [EMAIL PROTECTED] wrote:



 It would be nice to have the vector* have a constructor that takes in only
 one argument and fills all spots in the vector* with the same value.
 Like...
 Vector4f vector = new Vector4f(1);

 Second... I can really see someone doing this to use mono.simd in already
 established code base.

 [StructLayout( LayoutKind.Sequential, Pack = 0, Size = 16 )]
 class Vector4 {

/*
 some user defined vector methods.
 ..
   */

private static explicit operator Vector4f(Vector4 v){
unsafe {
Vector4f* p = (Vector4f*) v;
return *p;
}
}

private static explicit operator Vector4(Vector4f v){
unsafe {
Vector4* p = (Vector4*) v;
return *p;
}
}
 }

 Is it possible to accelerate these user defined operator overloads? Or do I
 have to resort to C# style unions?
 --



This code will be inlined and work like a charm. But I recommend coding it
in the following way to
squeeze the maximum  performance out of it:

public static unsafe Vector4f AsVector(ref Vector4 v){
fixed (Vector4 *f = v) {
return *(Vector4f*)f;
}
}



This will avoid the extra copy of passing the valuetype by value on stack
and will inline straight to a load from the
load/array element to a simd machine register. Pretty cool, isn't it?
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono.Simd - slower than the normal implementation

2008-11-15 Thread Rodrigo Kumpera
Hi Alan,

There a couple of issues with your code, let me get on them:

-Until recently (last night), getters were not accelerated, which causes a
significant
slowdown. I fixed this in r118899. The generated code is not as good as it
could be,
but this will be fixed eventually.

-Setters are still not accelerated, I'll work on this next week, so until
then your code
will suffer.

-Once you use a single non-accelerated method on a Vector variable all
operations
on it will be slower due to how our JIT works - they still use sse
instructions, but
with a performance penalty.

-Getters and setter are a hint of ill vectorized code. The last part of your
unsafe code
should use temps for the intermediate results.

-In the unsafe case you should use a Vector4ui store instead of extracting
each element.

-For the safe case we still miss proper integration with arrays, in the form
of methods to
extract and store vectors from them.

Your code looks a bit strange, the Vector4ui constructor indexes in
particular. Have you checked that
the output of the 3 methods are the same?

I'll work on the Mono.Simd issues next week, getting setters to be
accelerated, some methods
to better integrate with arrays and other things like element extractors.

Rodrigo

On Sat, Nov 15, 2008 at 12:13 AM, Alan McGovern [EMAIL PROTECTED]wrote:

 I found a bit of code in the SHA1 implementation which i thought was
 ideal for SIMD optimisations. However, unless i resort to unsafe code,
 it's actually substantially slower! I've attached three
 implementations of the method here. The original, the safe SIMD and
 the unsafe SIMD. The runtimes are as follows:

 Original: 600ms
 Unsafe Simd: 450ms
 Safe Simd: 1700ms

 Also, the method is always called with a uint[] of length 80.

 Is this just the wrong place to be using simd? It seemed ideal because
 i need 75% less XOR's. If anyone has an ideas on whether SIMD could
 actually be useful for this case or not, let me know.

 Thanks,
 Alan.


 The original code is:

 private static void FillBuff(uint[] buff)
 {
uint val;
for (int i = 16; i  80; i += 8)
{
val = buff[i - 3] ^ buff[i - 8] ^ buff[i - 14] ^ buff[i -
 16];
buff[i] = (val  1) | (val  31);

val = buff[i - 2] ^ buff[i - 7] ^ buff[i - 13] ^ buff[i -
 15];
buff[i + 1] = (val  1) | (val  31);

val = buff[i - 1] ^ buff[i - 6] ^ buff[i - 12] ^ buff[i -
 14];
buff[i + 2] = (val  1) | (val  31);

val = buff[i + 0] ^ buff[i - 5] ^ buff[i - 11] ^ buff[i -
 13];
buff[i + 3] = (val  1) | (val  31);

val = buff[i + 1] ^ buff[i - 4] ^ buff[i - 10] ^ buff[i -
 12];
buff[i + 4] = (val  1) | (val  31);

val = buff[i + 2] ^ buff[i - 3] ^ buff[i - 9] ^ buff[i -
 11];
buff[i + 5] = (val  1) | (val  31);

val = buff[i + 3] ^ buff[i - 2] ^ buff[i - 8] ^ buff[i -
 10];
buff[i + 6] = (val  1) | (val  31);

val = buff[i + 4] ^ buff[i - 1] ^ buff[i - 7] ^ buff[i - 9];
buff[i + 7] = (val  1) | (val  31);
}
 }

 The unsafe SIMD code is:
 public unsafe static void FillBuff(uint[] buffb)
 {
fixed (uint* buff = buffb) {
Vector4ui e;
for (int t = 16; t  buffb.Length; t += 4)
{
e = *((Vector4ui*)(buff [t-16])) ^
   *((Vector4ui*)(buff [t-14])) ^
   *((Vector4ui*)(buff [t- 8])) ^
   *((Vector4ui*)(buff [t- 3]));
e.W ^= buff[t];

buff[t] = (e.X  1) | (e.X  31);
buff[t + 1] = (e.Y  1) | (e.Y  31);
buff[t + 2] = (e.Z  1) | (e.Z  31);
buff[t + 3] = (e.W  1) | (e.W  31) ^ ((e.X  2) | (e.X 
 30));
}
}
 }

 The safe simd code is:
public static void FillBuff(uint[] buff)
{
Vector4ui e;
for (int t = 16; t  buff.Length; t += 4)
{
e = new Vector4ui (buff [t-16],buff [t-15],buff
 [t-14],buff [t-13]) ^
   new Vector4ui (buff [t-14],buff [t-13],buff
 [t-12],buff [t-11]) ^
   new Vector4ui (buff [t-8],  buff [t-7],  buff
 [t-6],  buff [t-5]) ^
   new Vector4ui (buff [t-3],  buff [t-2],  buff
 [t-1],  buff [t-0]);

e.W ^= buff[t];
buff[t] =(e.X  1) | (e.X  31);
buff[t + 1] = (e.Y  1) | (e.Y  31);
buff[t + 2] = (e.Z  1) | (e.Z  31);
buff[t + 3] = (e.W  1) | (e.W  31) ^ ((e.X  2) |
 (e.X  30));
}
}
 ___
 Mono-devel-list mailing list
 Mono-devel-list@lists.ximian.com
 http://lists.ximian.com/mailman/listinfo/mono-devel-list

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com

Re: [Mono-dev] Mono.Simd - slower than the normal implementation

2008-11-15 Thread Alan McGovern
Hey,

On Sat, Nov 15, 2008 at 3:50 PM, Rodrigo Kumpera [EMAIL PROTECTED] wrote:
 Hi Alan,
 -Getters and setter are a hint of ill vectorized code.

In this particular scenario, I'm not sure how i can get rid of the use
of getters/setters unless I use even more unsafe code. I don't know
whether it's feasible or not, but it'd be great to be able to use this
API without having to use unsafe code. At the moment, I don't think
it's really possible to use this API without getters and setters.

 The last part of your unsafe code should use temps for the intermediate 
 results.
Do you mean that I should copy the vector 'e', which i got from
XOR'ing my values, into another Vector4ui using the store operation?
Then I should do my bitshifting/storing into uint[] from that one?


 -For the safe case we still miss proper integration with arrays, in the form
 of methods to
 extract and store vectors from them.

I was thinking that the API could expose something like:
Vector4ui.Create (uint[] array, int offset, ref Vector4ui result)
which could be changed into:
result = *((Vector4ui*)array [offset]);

Though I'm sure you have ideas already on this ;) A similar method for
storing the result into a uint[] would be great too.


 Your code looks a bit strange, the Vector4ui constructor indexes in
 particular. Have you checked that
 the output of the 3 methods are the same?
Yes, there is a bug in my implementation there, I left out a bracket
when setting the value of buff[t+3]. There should be an additional
bracket around (e.W  1) | (e.W  31). Other than that, the
implementation is correct. I've pasted the correct implementation of
the unsafe and safe SIMD versions below. Just for reference purposes.


 I'll work on the Mono.Simd issues next week, getting setters to be
 accelerated, some methods
 to better integrate with arrays and other things like element extractors.

Great stuff. Give me a shout when you've done that and I'll try to
improve the above implementation. Though if you have time to spare
while writing the SIMD code, you could take a look at it yourself ;)

Thanks,
Alan.

Reference implementations (non buggy ;) ):
public static void FillBuffSafe(uint[] buff)
{
for (int t = 16; t  buff.Length; t += 4)
{
Vector4ui e = new Vector4ui(buff[t - 3], buff[t - 2],
buff[t - 1], buff[t - 0]) ^
  new Vector4ui(buff[t - 8], buff[t - 7],
buff[t - 6], buff[t - 5]) ^
  new Vector4ui(buff[t - 14], buff[t -
13], buff[t - 12], buff[t - 11]) ^
  new Vector4ui(buff[t - 16], buff[t -
15], buff[t - 14], buff[t - 13]);
e.W ^= buff[t];

buff[t] = (e.X  1) | (e.X  31);
buff[t + 1] = (e.Y  1) | (e.Y  31);
buff[t + 2] = (e.Z  1) | (e.Z  31);
buff[t + 3] = ((e.W  1) | (e.W  31)) ^ ((e.X  2)
| (e.X  30));
}
}


public unsafe static void FillBuffUnsafe(uint[] buffb)
{
fixed (uint* buff = buffb)
{
for (int t = 16; t  buffb.Length; t += 4)
{
Vector4ui e = *((Vector4ui*)buff[t - 3]) ^
  *((Vector4ui*)buff[t - 8]) ^
  *((Vector4ui*)buff[t - 14]) ^
  *((Vector4ui*)buff[t - 16]);
e.W ^= buff[t];

buff[t] = (e.X  1) | (e.X  31);
buff[t + 1] = (e.Y  1) | (e.Y  31);
buff[t + 2] = (e.Z  1) | (e.Z  31);
buff[t + 3] = ((e.W  1) | (e.W  31)) ^ ((e.X
 2) | (e.X  30));
}
}
}


 Rodrigo

 On Sat, Nov 15, 2008 at 12:13 AM, Alan McGovern [EMAIL PROTECTED]
 wrote:

 I found a bit of code in the SHA1 implementation which i thought was
 ideal for SIMD optimisations. However, unless i resort to unsafe code,
 it's actually substantially slower! I've attached three
 implementations of the method here. The original, the safe SIMD and
 the unsafe SIMD. The runtimes are as follows:

 Original: 600ms
 Unsafe Simd: 450ms
 Safe Simd: 1700ms

 Also, the method is always called with a uint[] of length 80.

 Is this just the wrong place to be using simd? It seemed ideal because
 i need 75% less XOR's. If anyone has an ideas on whether SIMD could
 actually be useful for this case or not, let me know.

 Thanks,
 Alan.


 The original code is:

 private static void FillBuff(uint[] buff)
 {
uint val;
for (int i = 16; i  80; i += 8)
{
val = buff[i - 3] ^ buff[i - 8] ^ buff[i - 14] ^ buff[i -
 16];
buff[i] = (val  1) | (val  31);

val = buff[i - 2] ^ buff[i - 7] ^ buff[i - 13] ^ buff[i -
 15];
buff[i + 1] = (val  1) | (val  31);

val = buff[i - 1] ^ buff[i - 6] ^ buff[i - 12] ^ buff[i -
 14];
buff[i + 2] = (val 

Re: [Mono-dev] Mono.Simd - slower than the normal implementation

2008-11-15 Thread Alan McGovern
Here's my benchmarking file anyway, it may prove useful.

Alan.

On Sun, Nov 16, 2008 at 2:37 AM, Alan McGovern [EMAIL PROTECTED] wrote:
 Hey,

 On Sat, Nov 15, 2008 at 3:50 PM, Rodrigo Kumpera [EMAIL PROTECTED] wrote:
 Hi Alan,
 -Getters and setter are a hint of ill vectorized code.

 In this particular scenario, I'm not sure how i can get rid of the use
 of getters/setters unless I use even more unsafe code. I don't know
 whether it's feasible or not, but it'd be great to be able to use this
 API without having to use unsafe code. At the moment, I don't think
 it's really possible to use this API without getters and setters.

 The last part of your unsafe code should use temps for the intermediate 
 results.
 Do you mean that I should copy the vector 'e', which i got from
 XOR'ing my values, into another Vector4ui using the store operation?
 Then I should do my bitshifting/storing into uint[] from that one?


 -For the safe case we still miss proper integration with arrays, in the form
 of methods to
 extract and store vectors from them.

 I was thinking that the API could expose something like:
 Vector4ui.Create (uint[] array, int offset, ref Vector4ui result)
 which could be changed into:
 result = *((Vector4ui*)array [offset]);

 Though I'm sure you have ideas already on this ;) A similar method for
 storing the result into a uint[] would be great too.


 Your code looks a bit strange, the Vector4ui constructor indexes in
 particular. Have you checked that
 the output of the 3 methods are the same?
 Yes, there is a bug in my implementation there, I left out a bracket
 when setting the value of buff[t+3]. There should be an additional
 bracket around (e.W  1) | (e.W  31). Other than that, the
 implementation is correct. I've pasted the correct implementation of
 the unsafe and safe SIMD versions below. Just for reference purposes.


 I'll work on the Mono.Simd issues next week, getting setters to be
 accelerated, some methods
 to better integrate with arrays and other things like element extractors.

 Great stuff. Give me a shout when you've done that and I'll try to
 improve the above implementation. Though if you have time to spare
 while writing the SIMD code, you could take a look at it yourself ;)

 Thanks,
 Alan.

 Reference implementations (non buggy ;) ):
public static void FillBuffSafe(uint[] buff)
{
for (int t = 16; t  buff.Length; t += 4)
{
Vector4ui e = new Vector4ui(buff[t - 3], buff[t - 2],
 buff[t - 1], buff[t - 0]) ^
  new Vector4ui(buff[t - 8], buff[t - 7],
 buff[t - 6], buff[t - 5]) ^
  new Vector4ui(buff[t - 14], buff[t -
 13], buff[t - 12], buff[t - 11]) ^
  new Vector4ui(buff[t - 16], buff[t -
 15], buff[t - 14], buff[t - 13]);
e.W ^= buff[t];

buff[t] = (e.X  1) | (e.X  31);
buff[t + 1] = (e.Y  1) | (e.Y  31);
buff[t + 2] = (e.Z  1) | (e.Z  31);
buff[t + 3] = ((e.W  1) | (e.W  31)) ^ ((e.X  2)
 | (e.X  30));
}
}


public unsafe static void FillBuffUnsafe(uint[] buffb)
{
fixed (uint* buff = buffb)
{
for (int t = 16; t  buffb.Length; t += 4)
{
Vector4ui e = *((Vector4ui*)buff[t - 3]) ^
  *((Vector4ui*)buff[t - 8]) ^
  *((Vector4ui*)buff[t - 14]) ^
  *((Vector4ui*)buff[t - 16]);
e.W ^= buff[t];

buff[t] = (e.X  1) | (e.X  31);
buff[t + 1] = (e.Y  1) | (e.Y  31);
buff[t + 2] = (e.Z  1) | (e.Z  31);
buff[t + 3] = ((e.W  1) | (e.W  31)) ^ ((e.X
  2) | (e.X  30));
}
}
}


 Rodrigo

 On Sat, Nov 15, 2008 at 12:13 AM, Alan McGovern [EMAIL PROTECTED]
 wrote:

 I found a bit of code in the SHA1 implementation which i thought was
 ideal for SIMD optimisations. However, unless i resort to unsafe code,
 it's actually substantially slower! I've attached three
 implementations of the method here. The original, the safe SIMD and
 the unsafe SIMD. The runtimes are as follows:

 Original: 600ms
 Unsafe Simd: 450ms
 Safe Simd: 1700ms

 Also, the method is always called with a uint[] of length 80.

 Is this just the wrong place to be using simd? It seemed ideal because
 i need 75% less XOR's. If anyone has an ideas on whether SIMD could
 actually be useful for this case or not, let me know.

 Thanks,
 Alan.


 The original code is:

 private static void FillBuff(uint[] buff)
 {
uint val;
for (int i = 16; i  80; i += 8)
{
val = buff[i - 3] ^ buff[i - 8] ^ buff[i - 14] ^ buff[i -
 16];
buff[i] = (val  1) | (val  31);

val = buff[i - 2] ^ buff[i - 7] ^ buff[i - 13] ^ buff[i -
 

[Mono-dev] Mono.Simd - slower than the normal implementation

2008-11-14 Thread Alan McGovern
I found a bit of code in the SHA1 implementation which i thought was
ideal for SIMD optimisations. However, unless i resort to unsafe code,
it's actually substantially slower! I've attached three
implementations of the method here. The original, the safe SIMD and
the unsafe SIMD. The runtimes are as follows:

Original: 600ms
Unsafe Simd: 450ms
Safe Simd: 1700ms

Also, the method is always called with a uint[] of length 80.

Is this just the wrong place to be using simd? It seemed ideal because
i need 75% less XOR's. If anyone has an ideas on whether SIMD could
actually be useful for this case or not, let me know.

Thanks,
Alan.


The original code is:

private static void FillBuff(uint[] buff)
{
uint val;
for (int i = 16; i  80; i += 8)
{
val = buff[i - 3] ^ buff[i - 8] ^ buff[i - 14] ^ buff[i - 16];
buff[i] = (val  1) | (val  31);

val = buff[i - 2] ^ buff[i - 7] ^ buff[i - 13] ^ buff[i - 15];
buff[i + 1] = (val  1) | (val  31);

val = buff[i - 1] ^ buff[i - 6] ^ buff[i - 12] ^ buff[i - 14];
buff[i + 2] = (val  1) | (val  31);

val = buff[i + 0] ^ buff[i - 5] ^ buff[i - 11] ^ buff[i - 13];
buff[i + 3] = (val  1) | (val  31);

val = buff[i + 1] ^ buff[i - 4] ^ buff[i - 10] ^ buff[i - 12];
buff[i + 4] = (val  1) | (val  31);

val = buff[i + 2] ^ buff[i - 3] ^ buff[i - 9] ^ buff[i - 11];
buff[i + 5] = (val  1) | (val  31);

val = buff[i + 3] ^ buff[i - 2] ^ buff[i - 8] ^ buff[i - 10];
buff[i + 6] = (val  1) | (val  31);

val = buff[i + 4] ^ buff[i - 1] ^ buff[i - 7] ^ buff[i - 9];
buff[i + 7] = (val  1) | (val  31);
}
}

The unsafe SIMD code is:
public unsafe static void FillBuff(uint[] buffb)
{
fixed (uint* buff = buffb) {
Vector4ui e;
for (int t = 16; t  buffb.Length; t += 4)
{
e = *((Vector4ui*)(buff [t-16])) ^
   *((Vector4ui*)(buff [t-14])) ^
   *((Vector4ui*)(buff [t- 8])) ^
   *((Vector4ui*)(buff [t- 3]));
e.W ^= buff[t];

buff[t] = (e.X  1) | (e.X  31);
buff[t + 1] = (e.Y  1) | (e.Y  31);
buff[t + 2] = (e.Z  1) | (e.Z  31);
buff[t + 3] = (e.W  1) | (e.W  31) ^ ((e.X  2) | (e.X  30));
}
}
}

The safe simd code is:
public static void FillBuff(uint[] buff)
{
Vector4ui e;
for (int t = 16; t  buff.Length; t += 4)
{
e = new Vector4ui (buff [t-16],buff [t-15],buff
[t-14],buff [t-13]) ^
   new Vector4ui (buff [t-14],buff [t-13],buff
[t-12],buff [t-11]) ^
   new Vector4ui (buff [t-8],  buff [t-7],  buff
[t-6],  buff [t-5]) ^
   new Vector4ui (buff [t-3],  buff [t-2],  buff
[t-1],  buff [t-0]);

e.W ^= buff[t];
buff[t] =(e.X  1) | (e.X  31);
buff[t + 1] = (e.Y  1) | (e.Y  31);
buff[t + 2] = (e.Z  1) | (e.Z  31);
buff[t + 3] = (e.W  1) | (e.W  31) ^ ((e.X  2) |
(e.X  30));
}
}
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono.Simd - slower than the normal implementation

2008-11-14 Thread Alan McGovern
I forgot to mention that I'm on a 1.86GHZ core2duo and i was running
with --optimize=simd.

Alan.

On Sat, Nov 15, 2008 at 2:13 AM, Alan McGovern [EMAIL PROTECTED] wrote:
 I found a bit of code in the SHA1 implementation which i thought was
 ideal for SIMD optimisations. However, unless i resort to unsafe code,
 it's actually substantially slower! I've attached three
 implementations of the method here. The original, the safe SIMD and
 the unsafe SIMD. The runtimes are as follows:

 Original: 600ms
 Unsafe Simd: 450ms
 Safe Simd: 1700ms

 Also, the method is always called with a uint[] of length 80.

 Is this just the wrong place to be using simd? It seemed ideal because
 i need 75% less XOR's. If anyone has an ideas on whether SIMD could
 actually be useful for this case or not, let me know.

 Thanks,
 Alan.


 The original code is:

 private static void FillBuff(uint[] buff)
 {
uint val;
for (int i = 16; i  80; i += 8)
{
val = buff[i - 3] ^ buff[i - 8] ^ buff[i - 14] ^ buff[i - 16];
buff[i] = (val  1) | (val  31);

val = buff[i - 2] ^ buff[i - 7] ^ buff[i - 13] ^ buff[i - 15];
buff[i + 1] = (val  1) | (val  31);

val = buff[i - 1] ^ buff[i - 6] ^ buff[i - 12] ^ buff[i - 14];
buff[i + 2] = (val  1) | (val  31);

val = buff[i + 0] ^ buff[i - 5] ^ buff[i - 11] ^ buff[i - 13];
buff[i + 3] = (val  1) | (val  31);

val = buff[i + 1] ^ buff[i - 4] ^ buff[i - 10] ^ buff[i - 12];
buff[i + 4] = (val  1) | (val  31);

val = buff[i + 2] ^ buff[i - 3] ^ buff[i - 9] ^ buff[i - 11];
buff[i + 5] = (val  1) | (val  31);

val = buff[i + 3] ^ buff[i - 2] ^ buff[i - 8] ^ buff[i - 10];
buff[i + 6] = (val  1) | (val  31);

val = buff[i + 4] ^ buff[i - 1] ^ buff[i - 7] ^ buff[i - 9];
buff[i + 7] = (val  1) | (val  31);
}
 }

 The unsafe SIMD code is:
 public unsafe static void FillBuff(uint[] buffb)
 {
fixed (uint* buff = buffb) {
Vector4ui e;
for (int t = 16; t  buffb.Length; t += 4)
{
e = *((Vector4ui*)(buff [t-16])) ^
   *((Vector4ui*)(buff [t-14])) ^
   *((Vector4ui*)(buff [t- 8])) ^
   *((Vector4ui*)(buff [t- 3]));
e.W ^= buff[t];

buff[t] = (e.X  1) | (e.X  31);
buff[t + 1] = (e.Y  1) | (e.Y  31);
buff[t + 2] = (e.Z  1) | (e.Z  31);
buff[t + 3] = (e.W  1) | (e.W  31) ^ ((e.X  2) | (e.X  
 30));
}
}
 }

 The safe simd code is:
public static void FillBuff(uint[] buff)
{
Vector4ui e;
for (int t = 16; t  buff.Length; t += 4)
{
e = new Vector4ui (buff [t-16],buff [t-15],buff
 [t-14],buff [t-13]) ^
   new Vector4ui (buff [t-14],buff [t-13],buff
 [t-12],buff [t-11]) ^
   new Vector4ui (buff [t-8],  buff [t-7],  buff
 [t-6],  buff [t-5]) ^
   new Vector4ui (buff [t-3],  buff [t-2],  buff
 [t-1],  buff [t-0]);

e.W ^= buff[t];
buff[t] =(e.X  1) | (e.X  31);
buff[t + 1] = (e.Y  1) | (e.Y  31);
buff[t + 2] = (e.Z  1) | (e.Z  31);
buff[t + 3] = (e.W  1) | (e.W  31) ^ ((e.X  2) |
 (e.X  30));
}
}

___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono.Simd Acceleration Attributes

2008-11-10 Thread russell.kay
Rodrigo,

 

My only problem with this is the language is tied to the x86
architecture, when Altivec or Paired Single  etc are added for PowerPC
then these attributes are nonsensical and will mean nothing to the user.
This would be better done in a static location (rather than spread over
the libraries) and split into a machine agnostic (Simd acceleration ON)
and a machine specific manner (sse1 - 4.2 active).

 

My 2c


Russell

 



From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Rodrigo
Kumpera
Sent: 07 November 2008 15:15
To: Christophe Guillon
Cc: mono-devel-list@lists.ximian.com
Subject: Re: [Mono-dev] Mono.Simd Acceleration Attributes

 

Hi Christophe,



2008/11/7 Christophe Guillon [EMAIL PROTECTED]

Thank you for the explanation. It confirms my point and it seems that we
agree.

For the user guide aspect:


2) the attributes on the methods are never inspected by the runtime:
they are there to guide the programmers using Mono.Simd in determining
what kind of optimizations are usually available or currently enabled.

If it is indeed just a guide to the user of Mono.Simd, thus why
putting it in the library and coupling this with the specific
architecture (SSE2 or other). The fact that it is an AddWithSaturation
on a Vector16b is sufficient for the semantic. Then a note in the mono
VM documentation can tell that on SSE2 architectures -O=simd will select
the corresponding SSE2 op is sufficient. Optionaly a note in the library
documentation can tell that mono normally should catch such calls on
SSE2 architectures.


We want to expose this information on the documentation as well and
instead of having to dig this information twice we are planning on
generating this part of the docs.

 



For the choice of the accelerated or not accelerated mode at
runtime:
   static readonly bool use_mono_simd =
(SimdRuntime.AccelMode  AccelMode.SSE2) != 0;


  ...
   if (use_mono_simd)
   // simd codepath
   else
   //scalar codepath

If it is actually to overcome a temporary inneficiency due to
some copy, it is imho far too intrusive in the user code. Here the user
clearly wrote a code that is dependent on some external context, but
instead of querying the actual VM runtime, or simply a user defined
variable that can be found in some configuration file of the
application, the query is on the Mono.Simd library itself.
While in fact the library itself as no knowledge of the actual
efficiency of the running VM.


There are two good reasons for using this approach, the first one is
because the user requires the best performance in all situations and
want to know if it's method will be optimized or not.

The second reason happens when there are many different ways to
implement a given function, each one using different instruction sets
and the user wants to have improved performance on newer processors.

For example, there are 3 ways to implement dot product using Mono.Simd:

1) Only using sse1 and sse2 which takes 5 instructions (1 mul, 2 add and
2 shuffle)
2) Using sse3, which takes 3 instructions (1 mul, 2 hadd)
3) Using sse4.2 which takes 1 instruction (dotp) -- sse4.2 still not
supported by Mono.Simd.

For some users having this option is important and this is the main
objective of the runtime query capabilities.





Thus I fully agree with this (which is my point):


Note that we may eventually either return the attribute not
based on the
metadata in the assembly, but based on the runtime
understanding: this
will avoid the need to have an updated Mono.Simd assembly when
new
optimizations are added. Just use the b pattern if you want to
avoid
that issue and remember that you don't usually need to check all
the
methods, but just the ones you actually need to be optimized.

All the question there is, whether or not there is a way to get
from the runtime this information and by which mean?
Is it possible to have attributes attached (or simulated) by the
runtime?


The SimdRuntime.AccelMode property queries the runtime for the supported
instruction sets. You might look
at the implementation and get puzzled by the fact that it returns
AccelMode.None, but in fact this is a magic method
that the runtime takes special care and make sure it returns the right
thing.


Thanks for taking your time looking at the Mono.Simd library :)


Cheers,
Rodrigo


This email has been scanned by the MessageLabs Email Security System



DISCLAIMER

This message and any attachments contain privileged and confidential 
information intended for the use of the addressee named above. If you are not 
the intended recipient of this message

Re: [Mono-dev] Mono.Simd Acceleration Attributes

2008-11-10 Thread Rodrigo Kumpera
Hi Russel,

Our initial goal is to make simd instructions available to managed code.
At first we thought about trying to make an instruction set agnostic
library, but
there are way too many quirks and differences between them that the result
could be too crippled to be usable.

There are quite many valid use cases for having the whole sse instruction
set available and
these are what we are targeting now.

But then, this was an analysis based on the fact that no well known
compiler/runtime exposes
such library (arch agnostic simd), they always have a binding to a specific
platform.

This doesn't mean we just won't do it. Once we have, for example, Altivec
and VFP supported
if an usable common subset emerge, we'll work on making it available.

Now back to the Acceleration attribute. It's meant to support not only sse,
but others as well, they
are not present for the simple reason that we didn't have the time for it.

Anyway, the attribute right now should be considered an implementation
detail and if it shows to be a problem
in cases such as the one you describe we'll change it. Keep in mind that the
current design is not final, but at
the same time it's hard to change it based on assumptions.

Thanks for the feedback,
Rodrigo

On Mon, Nov 10, 2008 at 10:04 AM, [EMAIL PROTECTED] wrote:

  Rodrigo,



 My only problem with this is the language is tied to the x86 architecture,
 when Altivec or Paired Single  etc are added for PowerPC then these
 attributes are nonsensical and will mean nothing to the user. This would be
 better done in a static location (rather than spread over the libraries) and
 split into a machine agnostic (Simd acceleration ON) and a machine specific
 manner (sse1 – 4.2 active).



 My 2c


 Russell


  --

 *From:* [EMAIL PROTECTED] [mailto:
 [EMAIL PROTECTED] *On Behalf Of *Rodrigo Kumpera
 *Sent:* 07 November 2008 15:15
 *To:* Christophe Guillon
 *Cc:* mono-devel-list@lists.ximian.com
 *Subject:* Re: [Mono-dev] Mono.Simd Acceleration Attributes



 Hi Christophe,

  2008/11/7 Christophe Guillon [EMAIL PROTECTED]

 Thank you for the explanation. It confirms my point and it seems that we
 agree.

 For the user guide aspect:


 2) the attributes on the methods are never inspected by the runtime:
 they are there to guide the programmers using Mono.Simd in determining
 what kind of optimizations are usually available or currently enabled.

 If it is indeed just a guide to the user of Mono.Simd, thus why putting
 it in the library and coupling this with the specific architecture (SSE2 or
 other). The fact that it is an AddWithSaturation on a Vector16b is
 sufficient for the semantic. Then a note in the mono VM documentation can
 tell that on SSE2 architectures -O=simd will select the corresponding SSE2
 op is sufficient. Optionaly a note in the library documentation can tell
 that mono normally should catch such calls on SSE2 architectures.


 We want to expose this information on the documentation as well and instead
 of having to dig this information twice we are planning on generating this
 part of the docs.





 For the choice of the accelerated or not accelerated mode at runtime:
static readonly bool use_mono_simd = (SimdRuntime.AccelMode 
 AccelMode.SSE2) != 0;


   ...
if (use_mono_simd)
// simd codepath
else
//scalar codepath

 If it is actually to overcome a temporary inneficiency due to some copy, it
 is imho far too intrusive in the user code. Here the user clearly wrote a
 code that is dependent on some external context, but instead of querying
 the actual VM runtime, or simply a user defined variable that can be found
 in some configuration file of the application, the query is on the Mono.Simd
 library itself.
 While in fact the library itself as no knowledge of the actual efficiency
 of the running VM.


 There are two good reasons for using this approach, the first one is
 because the user requires the best performance in all situations and want to
 know if it's method will be optimized or not.

 The second reason happens when there are many different ways to implement a
 given function, each one using different instruction sets and the user wants
 to have improved performance on newer processors.

 For example, there are 3 ways to implement dot product using Mono.Simd:

 1) Only using sse1 and sse2 which takes 5 instructions (1 mul, 2 add and 2
 shuffle)
 2) Using sse3, which takes 3 instructions (1 mul, 2 hadd)
 3) Using sse4.2 which takes 1 instruction (dotp) -- sse4.2 still not
 supported by Mono.Simd.

 For some users having this option is important and this is the main
 objective of the runtime query capabilities.



 Thus I fully agree with this (which is my point):


 Note that we may eventually either return the attribute not based on the
 metadata in the assembly, but based on the runtime understanding: this
 will avoid the need to have an updated Mono.Simd assembly when new

Re: [Mono-dev] Mono.Simd suggestion: Add static members for common values

2008-11-07 Thread Rodrigo Kumpera
Hi John,

Default values are indeed an  useful addition. So far we have focused on API
completeness and not much about
making it easier to use. It's on our plans to add such helpers.

2008/11/6 Hurliman, John [EMAIL PROTECTED]

  I'm in the process of converting over my OpenMetaverseTypes.dll library
 (basic 3D type library) to use Mono.Simd. One thing that is very handy to
 have is static members for common values, such as:



 public static readonly Vector4f Zero = new Vector4f();

 public static readonly Vector4f One = new Vector4f(1f, 1f, 1f, 1f);

 public static readonly Vector4f MinusOne = new Vector4f(-1f, -1f, -1f,
 -1f);



 Which makes comparisons much easier:



 if (myvector4f == Vector4f.Zero) { … }





 -John

 ___
 Mono-devel-list mailing list
 Mono-devel-list@lists.ximian.com
 http://lists.ximian.com/mailman/listinfo/mono-devel-list


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono.Simd Acceleration Attributes

2008-11-07 Thread Paolo Molaro
On 11/07/08 Christophe Guillon wrote:
 It seems that as soon as the Mono.Simd primitives have a well defined
 semantic it is not useful to specify which architecture feature is able to
 emulate each of these primitives. I would have expected this to be the
 choice of the virtual execution environment.

It _is_ ultimately a choice of the runtime.
These attributes are never inspected by the runtime to decide whether to
optimize a method call or not.

 - if my underlying hardware XXX (not SSE2) is able to support efficiently
 add with saturation, I do not have to know whether SSE2 also supports it,
 the virtual machine for XXX can use the corresponding add with saturation
 instruction of XXX at the call sites of AddWithSaturation()   anyway,

When the runtime will implement that optimization, the attribute will be
changed to include SSE2 and your architecture (say AltiVec or Neon
etc). Yes, this requires a re-release of Mono.Simd, but it's not a big
deal as the changes will be relatively rare and if you are happy to
use unoptimized Mono.Simd anyway it doesn't matter.

 - if my underlying hardware features SSE2, the attribute is not useful, the
 virtual machine knows the underlying hardware and thus know that a SSE2
 instruction is able to emulate this,

It's useful to the Mono.Simd programmers, the runtime doesn't use it.

 - if the attribute is there to restrict the mapping to only SSE2 (and above)
 machines, it is an important restriction to the usage of the library.
 Imagine as above that I have in the future a hardware support XXX that is
 able to do AddWithSaturation on Vector16b; if I want a virtual machine to
 execute efficiently this primitive on XXX I would first have to modify the
 Mono.Simd library to add the corresponding XXX attribute and modify the
 primitives declaration to account for it.

Nope, this is not correct.
The behaviour is as follows:
1) the runtime will choose whether a method is optimized or not
depending on the optimization flags (-O=simd, on by default) and on
the features of the current processor.
2) the attributes on the methods are never inspected by the runtime:
they are there to guide the programmers using Mono.Simd in determining
what kind of optimizations are usually available or currently enabled.

The reasoning is this: using unoptimized Mono.Simd is currently
significantly slower than he equivalent scalar code. This has mostly to
do with the additional copies that happen because of the operator
overloading. This overhead is expected to decrease as we add more jit
optimizations. So you have two cases:

1) the slowdown is not significant to you (you must test! Run your
program with mono -O=simd and with mono -O=-simd): in this case
you should ignore completely the acceleration attributes and just enjoy
the speedup that the jit will give you when it can optimize the methods.

2) if the slowdown is significant you might want to have two codepaths,
mostly in the same way in C/C++ you have a C implementation and a simd
implementation of the critical functions. Now the question becomes:
how do you choose at runtime if you want to use Mono.Simd or the scalar
codepath? We offset two patters:

a) do a coarse decision: you take a look at the methods you use in your
algorithms and see that they are optimized when SSE2 is enabled, so you
just do:
static readobly bool use_mono_simd = (SimdRuntime.AccelMode  
AccelMode.SSE2) != 0;
...
if (use_mono_simd)
// simd codepath
else
//scalar codepath

b) a fine-grained decision based on all or some of the methods you use:
for each method you check 
(SimdRuntime.MethodAccelerationMode (typeof(...), ...)  
SimdRuntime.AccelMode) != 0
until you determine that enough of your methods are accelerated to make
it worth using the Mono.Simd codepath.

Note that we may eventually either return the attribute not based on the
metadata in the assembly, but based on the runtime understanding: this
will avoid the need to have an updated Mono.Simd assembly when new
optimizations are added. Just use the b pattern if you want to avoid
that issue and remember that you don't usually need to check all the
methods, but just the ones you actually need to be optimized.

lupus

-- 
-
[EMAIL PROTECTED] debian/rules
[EMAIL PROTECTED] Monkeys do it better
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono.Simd Acceleration Attributes

2008-11-07 Thread Christophe Guillon
Thank you for the explanation. It confirms my point and it seems that we
agree.

For the user guide aspect:
2) the attributes on the methods are never inspected by the runtime:
they are there to guide the programmers using Mono.Simd in determining
what kind of optimizations are usually available or currently enabled.
If it is indeed just a guide to the user of Mono.Simd, thus why putting it
in the library and coupling this with the specific architecture (SSE2 or
other). The fact that it is an AddWithSaturation on a Vector16b is
sufficient for the semantic. Then a note in the mono VM documentation can
tell that on SSE2 architectures -O=simd will select the corresponding SSE2
op is sufficient. Optionaly a note in the library documentation can tell
that mono normally should catch such calls on SSE2 architectures.

For the choice of the accelerated or not accelerated mode at runtime:
   static readonly bool use_mono_simd = (SimdRuntime.AccelMode 
AccelMode.SSE2) != 0;
  ...
   if (use_mono_simd)
   // simd codepath
   else
   //scalar codepath
If it is actually to overcome a temporary inneficiency due to some copy, it
is imho far too intrusive in the user code. Here the user clearly wrote a
code that is dependent on some external context, but instead of querying
the actual VM runtime, or simply a user defined variable that can be found
in some configuration file of the application, the query is on the Mono.Simd
library itself.
While in fact the library itself as no knowledge of the actual efficiency of
the running VM.

Thus I fully agree with this (which is my point):
Note that we may eventually either return the attribute not based on the
metadata in the assembly, but based on the runtime understanding: this
will avoid the need to have an updated Mono.Simd assembly when new
optimizations are added. Just use the b pattern if you want to avoid
that issue and remember that you don't usually need to check all the
methods, but just the ones you actually need to be optimized.

All the question there is, whether or not there is a way to get from the
runtime this information and by which mean?
Is it possible to have attributes attached (or simulated) by the runtime?


  -- Christophe

2008/11/7 Paolo Molaro [EMAIL PROTECTED]

 On 11/07/08 Christophe Guillon wrote:
  It seems that as soon as the Mono.Simd primitives have a well defined
  semantic it is not useful to specify which architecture feature is able
 to
  emulate each of these primitives. I would have expected this to be the
  choice of the virtual execution environment.

 It _is_ ultimately a choice of the runtime.
 These attributes are never inspected by the runtime to decide whether to
 optimize a method call or not.

  - if my underlying hardware XXX (not SSE2) is able to support efficiently
  add with saturation, I do not have to know whether SSE2 also supports it,
  the virtual machine for XXX can use the corresponding add with saturation
  instruction of XXX at the call sites of AddWithSaturation()   anyway,

 When the runtime will implement that optimization, the attribute will be
 changed to include SSE2 and your architecture (say AltiVec or Neon
 etc). Yes, this requires a re-release of Mono.Simd, but it's not a big
 deal as the changes will be relatively rare and if you are happy to
 use unoptimized Mono.Simd anyway it doesn't matter.

  - if my underlying hardware features SSE2, the attribute is not useful,
 the
  virtual machine knows the underlying hardware and thus know that a SSE2
  instruction is able to emulate this,

 It's useful to the Mono.Simd programmers, the runtime doesn't use it.

  - if the attribute is there to restrict the mapping to only SSE2 (and
 above)
  machines, it is an important restriction to the usage of the library.
  Imagine as above that I have in the future a hardware support XXX that is
  able to do AddWithSaturation on Vector16b; if I want a virtual machine to
  execute efficiently this primitive on XXX I would first have to modify
 the
  Mono.Simd library to add the corresponding XXX attribute and modify the
  primitives declaration to account for it.

 Nope, this is not correct.
 The behaviour is as follows:
 1) the runtime will choose whether a method is optimized or not
 depending on the optimization flags (-O=simd, on by default) and on
 the features of the current processor.
 2) the attributes on the methods are never inspected by the runtime:
 they are there to guide the programmers using Mono.Simd in determining
 what kind of optimizations are usually available or currently enabled.

 The reasoning is this: using unoptimized Mono.Simd is currently
 significantly slower than he equivalent scalar code. This has mostly to
 do with the additional copies that happen because of the operator
 overloading. This overhead is expected to decrease as we add more jit
 optimizations. So you have two cases:

 1) the slowdown is not significant to you (you must test! Run your
 

[Mono-dev] Mono.Simd Acceleration Attributes

2008-11-07 Thread Christophe Guillon
Hi all,
Looking at the proposal for the Mono.Simd  primitives I'm wondering how the
Mono.Simd.Acceleration attributes and the corresponding Mono.Simd.AccelMode
parameters are useful.
Thus I'm wondering what is the rational of having these attributes defined
and used in the definition of the primitives.

It seems that as soon as the Mono.Simd primitives have a well defined
semantic it is not useful to specify which architecture feature is able to
emulate each of these primitives. I would have expected this to be the
choice of the virtual execution environment.

For instance the add with saturation for the Vector16b type which is defined
as:

[Mono.Simd.Acceleration(Mono.Simd.AccelMode.SSE2)]
public static 
Vector16bhttp://go-mono.com/docs/monodoc.ashx?link=T%3aMono.Simd.Vector16b
*AddWithSaturation*
(Vector16bhttp://go-mono.com/docs/monodoc.ashx?link=T%3aMono.Simd.Vector16bva,
Vector16bhttp://go-mono.com/docs/monodoc.ashx?link=T%3aMono.Simd.Vector16bvb)

Well, but:
- if my underlying hardware XXX (not SSE2) is able to support efficiently
add with saturation, I do not have to know whether SSE2 also supports it,
the virtual machine for XXX can use the corresponding add with saturation
instruction of XXX at the call sites of AddWithSaturation()   anyway,
- if my underlying hardware features SSE2, the attribute is not useful, the
virtual machine knows the underlying hardware and thus know that a SSE2
instruction is able to emulate this,
- if the attribute is there to restrict the mapping to only SSE2 (and above)
machines, it is an important restriction to the usage of the library.
Imagine as above that I have in the future a hardware support XXX that is
able to do AddWithSaturation on Vector16b; if I want a virtual machine to
execute efficiently this primitive on XXX I would first have to modify the
Mono.Simd library to add the corresponding XXX attribute and modify the
primitives declaration to account for it.

  -- Christophe
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono.Simd Acceleration Attributes

2008-11-07 Thread Rodrigo Kumpera
Hi Christophe,


2008/11/7 Christophe Guillon [EMAIL PROTECTED]

 Thank you for the explanation. It confirms my point and it seems that we
 agree.

 For the user guide aspect:
 2) the attributes on the methods are never inspected by the runtime:
 they are there to guide the programmers using Mono.Simd in determining
 what kind of optimizations are usually available or currently enabled.
 If it is indeed just a guide to the user of Mono.Simd, thus why putting
 it in the library and coupling this with the specific architecture (SSE2 or
 other). The fact that it is an AddWithSaturation on a Vector16b is
 sufficient for the semantic. Then a note in the mono VM documentation can
 tell that on SSE2 architectures -O=simd will select the corresponding SSE2
 op is sufficient. Optionaly a note in the library documentation can tell
 that mono normally should catch such calls on SSE2 architectures.


We want to expose this information on the documentation as well and instead
of having to dig this information twice we are planning on generating this
part of the docs.





 For the choice of the accelerated or not accelerated mode at runtime:
static readonly bool use_mono_simd = (SimdRuntime.AccelMode 
 AccelMode.SSE2) != 0;
   ...
if (use_mono_simd)
// simd codepath
else
//scalar codepath
 If it is actually to overcome a temporary inneficiency due to some copy, it
 is imho far too intrusive in the user code. Here the user clearly wrote a
 code that is dependent on some external context, but instead of querying
 the actual VM runtime, or simply a user defined variable that can be found
 in some configuration file of the application, the query is on the Mono.Simd
 library itself.
 While in fact the library itself as no knowledge of the actual efficiency
 of the running VM.


There are two good reasons for using this approach, the first one is because
the user requires the best performance in all situations and want to know if
it's method will be optimized or not.

The second reason happens when there are many different ways to implement a
given function, each one using different instruction sets and the user wants
to have improved performance on newer processors.

For example, there are 3 ways to implement dot product using Mono.Simd:

1) Only using sse1 and sse2 which takes 5 instructions (1 mul, 2 add and 2
shuffle)
2) Using sse3, which takes 3 instructions (1 mul, 2 hadd)
3) Using sse4.2 which takes 1 instruction (dotp) -- sse4.2 still not
supported by Mono.Simd.

For some users having this option is important and this is the main
objective of the runtime query capabilities.




 Thus I fully agree with this (which is my point):
 Note that we may eventually either return the attribute not based on the
 metadata in the assembly, but based on the runtime understanding: this
 will avoid the need to have an updated Mono.Simd assembly when new
 optimizations are added. Just use the b pattern if you want to avoid
 that issue and remember that you don't usually need to check all the
 methods, but just the ones you actually need to be optimized.

 All the question there is, whether or not there is a way to get from the
 runtime this information and by which mean?
 Is it possible to have attributes attached (or simulated) by the runtime?


The SimdRuntime.AccelMode property queries the runtime for the supported
instruction sets. You might look
at the implementation and get puzzled by the fact that it returns
AccelMode.None, but in fact this is a magic method
that the runtime takes special care and make sure it returns the right
thing.


Thanks for taking your time looking at the Mono.Simd library :)


Cheers,
Rodrigo
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-dev] Mono.Simd API Suggestions

2008-11-06 Thread Rodrigo Kumpera
Hey Jonathan,

Thanks for taking some time looking at the Mono.Simd API and doing
some suggestions
but, please, do then on a more visible mailing list such as mono-devel.


Just perusing through the Mono.Simd API, and one question (and a few
other suggestions) occurs to me: Why the non-reliance on method
overloading?

Right now mostly due to implementation details, so no reason at all.
I'm still playing
with the option of using extension methods. It would have the same
benefit of overloading,
would reduce typing and not make people mad when changing the underlying type.

It's a mater of choosing between a.UnpackLow (b) Vector2l.UnpackLow
(a,b) and
VectorOperations.UnpackLow (a,b). Feedback on this subject is more
than welcome.

Only a small part of the operations are available for all types and
some are  under different
instruction sets. This should be enough to make it pretty confusing
for the user.




On a completely different note (and to start a bikeshed discussion ;-),
why ShiftRightLogic?  Wouldn't LogicalRightShift be more
conventional?  We should also avoid abbreviations, so
SubtractWithSaturation() would be better than SubWithSaturation()...

There is a very reasonable and compelling argument for that. I'm very
bad at naming
methods and only now people are starting to look more deeply at it.
The shift one
is a pretty bad choice indeed.


OTOH, on some cases this might lead to some very very long method names.

Thanks,
Rodrigo
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono.Simd API Suggestions

2008-11-06 Thread Rodrigo Kumpera
Hi Jonathan,

Answering your others suggestions.

Other suggestions:

SimdRuntime.IsMethodAccelerated() and
SimdRuntime.MethodAccelerationMode() should be overloaded to accept a
MethodInfo of the desired method, as it can be ~trivial to get a
MethodInfo in a static, type-checked fashion, e.g.:

   MethodInfo average = ((FuncVector8us, Vector8us, Vector8us)
   Vector8us.Average).Method;
   bool b = SimdRuntime.IsMethodAccelerated (average);

Your idea is quite interesting. The MethodInfo overload is kind of useful, but
letting the user just pass a delegate for the proper type is way nicer.

Even better, this is _faster_ than typeof(Vector8us).GetMethod
(Average).  (Not by a lot, but faster nonetheless.)

Well, speed is irrelevant on this case as this is a startup thing and it doesn't
make sense to use it during execution.


AccelerationAttribute should be AcceleratedOnAttribute, and AccelMode
should be InstructionSet.  I think this would make for more readable
documentation prototypes:

Makes sense.


Finally (for now), parameter names should be more consistent.  On some
methods the arguments are (va, vb) (e.g. Vector8us.Average()), while on
others they're (v1, v2) (e.g. Vector2d.InterleaveHigh()).  I don't care
which we choose, but we should stick to something and use it
consistently (`a` and `b`, perhaps?).

There is little consistency on this and sticking to a single naming
will be relevant once
C# 4.0 is released.
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


[Mono-dev] Mono.Simd suggestion: Add static members for common values

2008-11-06 Thread Hurliman, John
I'm in the process of converting over my OpenMetaverseTypes.dll library (basic 
3D type library) to use Mono.Simd. One thing that is very handy to have is 
static members for common values, such as:

public static readonly Vector4f Zero = new Vector4f();
public static readonly Vector4f One = new Vector4f(1f, 1f, 1f, 1f);
public static readonly Vector4f MinusOne = new Vector4f(-1f, -1f, -1f, -1f);

Which makes comparisons much easier:

if (myvector4f == Vector4f.Zero) { ... }


-John
___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list


Re: [Mono-dev] Mono.Simd API Suggestions

2008-11-06 Thread Jonathan Pryor
On Thu, 2008-11-06 at 12:04 -0200, Rodrigo Kumpera wrote:
 Thanks for taking some time looking at the Mono.Simd API and doing
 some suggestions but, please, do then on a more visible mailing list
 such as mono-devel.

Because I'm an idiot who saw mono-d... and assumed it was
mono-devel-list.  My bad...

 Just perusing through the Mono.Simd API, and one question (and a few
 other suggestions) occurs to me: Why the non-reliance on method
 overloading?
...
 It's a mater of choosing between a.UnpackLow (b) Vector2l.UnpackLow
 (a,b) and VectorOperations.UnpackLow (a,b). Feedback on this
 subject is more than welcome.

I'm not even sure what UnpackLow() does (and reading the source only
helps a little...).

Regardless, I don't like Vector2l.UnpackLow(a,b), and would prefer the
first or last options.

That said, I'm not sure which is better between static method syntax and
instance method syntax.  The advantage to instance method syntax is that
you can readily find the method through the IDE (code completion ftw!),
and should likely be preferred for that reason alone.

On the other hand, an instance method _may_ imply that the instance
variable will be modified by the method call, which is not the case
for .UnpackLow().  (Then again, this implication is already bogus; see
System.String instance methods...)

So for usability/findability, I'd suggest the instance method syntax
(even if it's really done via extension methods).

 Only a small part of the operations are available for all types and
 some are  under different instruction sets. This should be enough to
 make it pretty confusing for the user.

I'm not sure about that, but it is something to keep in mind.  So the
question remains, which is easier for the user to understand:

  - instance methods, documented in the relevant type.
Pro: You know which operations are available specifically for a 
 given type.
Con: You can't easily see which other types support the same 
 operation.  This may not be relevant at all; I don't know.

  - Static methods on a e.g. VectorOperations type.
Pro: You can easily determine which operations are available across
 numerous types.
Con: You can't tell from the type's documentation which operations
 are supported.

  - Extension methods.
Pro: Methods are referenced from the type documentation and, since
 extension methods can be in the same extension class, they can
 also be listed as overloads.  This easily allows determining
 which operations are common across types AND which operations
 are supported on a specific type from that type's 
 documentation.
Con: Requires C# 3.0.  (Is this really a con?)

From that breakdown, it looks like extension methods are best. :-)

 On a completely different note (and to start a bikeshed discussion ;-),
 why ShiftRightLogic?  Wouldn't LogicalRightShift be more
 conventional?  We should also avoid abbreviations, so
 ...
 OTOH, on some cases this might lead to some very very long method names.

Even vim supports code completion [0], so I don't consider this to be a
significant problem...

 - Jon

[0] Ctrl+N/Ctrl+P will complete words already present within the current
buffer and also use any words found in a `ctags` file, if available.


___
Mono-devel-list mailing list
Mono-devel-list@lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list