This is a pretty interesting look at automatic compiler parallelization.... Through mostly-automatic technique they closed the performance gap between serial and hand-coded parallel implementations from 50x to 1.2x.
http://www.intel.com/content/dam/www/public/us/en/documents/technology-briefs/intel-labs-closing-ninja-gap-paper.pdf A technique worth noting in here is a strategy where they block data-structures to fit in the cache by converting an array-of-structures to a structure-of-arrays. This reminds me of column-oriented databases, which also reminds me research showing merely using column-oriented local block-layouts provides most of the benefits of column-oriented layout without many of the drawbacks.. http://www.vldb.org/conf/2002/S12P03.pdf
_______________________________________________ bitc-dev mailing list [email protected] http://www.coyotos.org/mailman/listinfo/bitc-dev
