[jira] [Updated] (SYSTEMML-1752) Cache-conscious mmchain matrix multiply for wide matrices

Matthias Boehm (JIRA) Sat, 08 Jul 2017 01:04:39 -0700

     [ 
https://issues.apache.org/jira/browse/SYSTEMML-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Matthias Boehm updated SYSTEMML-1752:
-------------------------------------
    Description: 
The fused mmchain matrix multiply for patterns such as {{t(X) %*% (w * (X %*% 
v))}} uses row-wise {{dotProduct}} and {{vectMultAdd}} operations, which works 
very well for the common case of tall&skinny matrices where individual rows fit 
into L1 cache. However, for graph and text scenarios with wide matrices this 
leads to cache trashing on the input and output vectors.

This task aims to generalize these dense and sparse operations to perform the 
computation in a cache-conscious manner when necessary, by accessing fragments 
of the input and output vector for groups of rows. For dense this is trivial to 
realize while for sparse it requires a careful determination of the block sizes 
according to the input sparsity. 

  was:
The fused mmchain matrix multiply for patterns such as {{t(X) %*% (w * (X %*% 
v)) }} uses row-wise {{dotProduct}} and {{vectMultAdd}} operations, which works 
very well for the common case of tall&skinny matrices where individual rows fit 
into L1 cache. However, for graph and text scenarios with wide matrices this 
leads to cache trashing on the input and output vectors.

This task aims to generalize these dense and sparse operations to perform the 
computation in a cache-conscious manner when necessary, by accessing fragments 
of the input and output vector for groups of rows. For dense this is trivial to 
realize while for sparse it requires a careful determination of the block sizes 
according to the input sparsity. 


> Cache-conscious mmchain matrix multiply for wide matrices
> ---------------------------------------------------------
>
>                 Key: SYSTEMML-1752
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-1752
>             Project: SystemML
>          Issue Type: Task
>            Reporter: Matthias Boehm
>
> The fused mmchain matrix multiply for patterns such as {{t(X) %*% (w * (X %*% 
> v))}} uses row-wise {{dotProduct}} and {{vectMultAdd}} operations, which 
> works very well for the common case of tall&skinny matrices where individual 
> rows fit into L1 cache. However, for graph and text scenarios with wide 
> matrices this leads to cache trashing on the input and output vectors.
> This task aims to generalize these dense and sparse operations to perform the 
> computation in a cache-conscious manner when necessary, by accessing 
> fragments of the input and output vector for groups of rows. For dense this 
> is trivial to realize while for sparse it requires a careful determination of 
> the block sizes according to the input sparsity. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (SYSTEMML-1752) Cache-conscious mmchain matrix multiply for wide matrices

Reply via email to