I often use julia muticore features with pmap and @parallel for loops.  So 
the best way to achieve this is to split the array up into parts for each 
core and then run SIMD loops on each parallel process?  Will there ever by 
a time when you can add a tag like SIMD that will have the compiler 
automatically does this like it does for BLAS functions?

On Saturday, April 16, 2016 at 3:26:22 AM UTC-6, Valentin Churavy wrote:
>
> Blas is using a combination of SIMD and multi-core processing. Multi-core 
> (threading) is coming in Julia v0.5 as an experimental feature. 
>
> On Saturday, 16 April 2016 14:13:00 UTC+9, Jason Eckstein wrote:
>>
>> I noticed in Julia 4 now if you call A+B where A and B are matrices of 
>> equal size, the llvm code shows vectorization indicating it is equivalent 
>> to if I wrote my own function with an @simd tagged for loop.  I still 
>> notice though that it uses a single core to maximum capacity but never 
>> spreads an SIMD loop out over multiple cores.  In contrast if I use BLAS 
>> functions like gemm! or even just A*B it will use every core of the 
>> processor.  I'm not sure if these linear algebra operations also use simd 
>> vectorization but I imagine they do since BLAS is very optimized.  Is there 
>> a way to write an SIMD loop that spreads the data out across all processor 
>> cores, not just the multiple functional units of a single core?
>>
>

Reply via email to