I would even say server UTF8 to USC-2/UTF16 , HTML decompression and Xml parsing can now be done in SIMD , thats my key point SIMD has crossed into GP programming and auto vect is almost impossible on those algortihms.
Its just a pain to write SIMD now and languages that can make it easier expressing problems in a manner that allows SIMD in a platform independent way are very attractive ( and once in this form SOME can also be scheduled easily and safely in parralel) . Consider how clever these algortihms are how much could be done in a language that was easier to express the intention , maybe the advantages are so great that the future may replace many stream /tree algorithms with arrays algorithms for SIMD ( i can work out the atoi and Marix ones but the UTF and XML parsing decoders suprised me , i was always told flow control was a bad case so never considered the amount of data processed would just wipe out the conditionals ) . The 512 bit SIMD CPUs being released now will increase the performance of those algortihms by close to 100% ( or 400% compared to the common 128 bit SIMD) in addition to other benefits of the CPU. As we use more SIMD ( which we are) , Intel and AMD will add 1024/2048/4096 bit , AVX1024 is designed and sepcified already. Maybe safe parralelism through single threaded SIMD ( or SIMD with cores) instead of cores it all comes down to a being able to express the intent of the code. This ( more SIMD) is a likely path in the future and non native intirinsic style algortihms will be left behind ( due to marshalling) meaning C or ASM wrapped libs for higher level languages ( and if C# /Java did not have SIMD memcmp/memcpy form libc they would be much further behind in benchmarks ) . Sure im exaggerating for effect but consider how hard it is to write these SIMD algorithms now and they dominate and are growing rapidly ( most common compression and video compression lib etc) . Also note these already highly performing libs often only use 128 bit SIMD due to the fact of platform support and needing different intrisics / algos a JIT style approach would be of great use here. Also agree on the higher level factors ..( and in fact due to concurrency contention , cache efficiency and not to mention simplicity ( which allows better tuning / testing) i have gone back to 1 thread approach for most high speed code ) . I think a language like C# that adds - Region Analysis - Easier expression of SIMD in a portable way (like VecIMp) and supports 256 / 512/1024 bit generarion / non temporal hints - allows fixed arrays for non unsafe Has a compelling argument ...it needs nothing more .. single threaded is fine. Though if it is to be used by native libs you need some mechanism for native code to call the managed code ( easy on windows (winrt) harder on linux) . Can even modify the mono C# compiler except for the key issue of creating SIMD code which needs either static code OR new runtime ( a extention of CIL and changes to the JIT) . Things like Jit vs static immutable types Const / Deep clone TypeClasses match / functional syntax ,tail calls Concise syntax Are secondary / nice to have but the earlier list is enough. Ben On Sat, Oct 12, 2013 at 3:26 AM, david j <[email protected]> wrote: > Thanks for sending this Bennie.. > > It's easy to get caught up in surface debates about gc-pauses or the > cost-of-malloc, but when dealing with many computation or client-side > applications, these higher level factors can dominate performance instead > (SIMD, concurrency overhead, talking to kernel/hardware apis, memory layout > and cache efficiency). > > > > _______________________________________________ > bitc-dev mailing list > [email protected] > http://www.coyotos.org/mailman/listinfo/bitc-dev > >
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
