This is a pretty interesting look at automatic compiler parallelization....
Through mostly-automatic technique they closed the performance gap between
serial and hand-coded parallel implementations from 50x to 1.2x.

http://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/intel-labs-closing-ninja-gap-paper.pdf

A technique worth noting in here is a strategy where they block
data-structures to fit in the cache by converting an array-of-structures to
a structure-of-arrays. This reminds me of column-oriented databases, which
also reminds me research showing merely using column-oriented local
block-layouts provides most of the benefits of column-oriented layout
without many of the drawbacks..

http://www.vldb.org/conf/2002/S12P03.pdf
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Reply via email to