I would even say server UTF8 to USC-2/UTF16 , HTML decompression and Xml
parsing can now be done in SIMD     , thats my key point SIMD has crossed
into GP programming and auto vect  is almost impossible on those
algortihms.

Its just a pain to write SIMD now and languages that can make it easier
expressing problems in a manner that allows SIMD in a platform independent
way  are very attractive ( and once in this form SOME can also be scheduled
easily and safely in parralel) .  Consider how clever these algortihms are
how much could be done in a language that was easier to express the
intention , maybe the advantages are so great that the future may replace
many stream /tree algorithms with arrays algorithms for SIMD ( i can work
out the atoi and Marix ones but the UTF and XML parsing decoders suprised
me , i was always told flow control was a bad case so never considered the
amount of data processed would just wipe out the conditionals  )    . The
512 bit SIMD CPUs being released now will increase the performance of those
algortihms by close to 100%  ( or 400% compared to the common 128 bit SIMD)
in addition to  other benefits of the CPU.   As we use more SIMD ( which we
are) ,  Intel and AMD will add 1024/2048/4096 bit  , AVX1024 is designed
and sepcified already. Maybe safe parralelism through single threaded SIMD
  ( or SIMD with cores) instead of cores it all comes down to  a being able
to express  the intent of the code.  This  ( more SIMD) is a likely path in
the future and non native intirinsic  style algortihms will be left behind
( due to marshalling)  meaning C or ASM wrapped libs for higher level
languages ( and if C# /Java did not have SIMD memcmp/memcpy  form libc they
would be much further behind in benchmarks )  .  Sure im exaggerating for
effect but  consider how hard it is to write these SIMD algorithms now and
they dominate and are growing rapidly ( most common compression and video
compression lib etc) .   Also note these already highly performing libs
often only use 128 bit SIMD due to the fact  of platform support and
needing different intrisics / algos a JIT style approach would be of great
use here.

Also agree on the higher level factors ..( and in fact due to
concurrency  contention
, cache efficiency  and not to mention simplicity  ( which allows better
tuning / testing)  i have gone back to 1 thread  approach for most high
speed code )  .

I think a language like C#  that adds
- Region Analysis
- Easier expression of SIMD in a portable way (like VecIMp)  and supports
256 / 512/1024 bit generarion  / non temporal hints
- allows fixed arrays  for non unsafe

Has a compelling argument ...it needs nothing more .. single threaded is
fine. Though if it is to be used by native libs you need some mechanism for
native code to call the managed code  ( easy on windows (winrt)  harder on
linux) .   Can even modify the mono C# compiler except for the key issue of
creating SIMD  code which needs either static code  OR new runtime (  a
extention of  CIL and changes to the JIT) .

Things like
Jit vs static
immutable types
Const / Deep clone
TypeClasses
match / functional syntax ,tail calls
Concise syntax

Are secondary / nice to have but the earlier list is enough.

Ben



On Sat, Oct 12, 2013 at 3:26 AM, david j <[email protected]> wrote:

> Thanks for sending this Bennie..
>
> It's easy to get caught up in surface debates about gc-pauses or the
> cost-of-malloc, but when dealing with many computation or client-side
> applications, these higher level factors can dominate performance instead
> (SIMD, concurrency overhead, talking to kernel/hardware apis, memory layout
> and cache efficiency).
>
>
>
> _______________________________________________
> bitc-dev mailing list
> [email protected]
> http://www.coyotos.org/mailman/listinfo/bitc-dev
>
>
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to