[ 
https://issues.apache.org/jira/browse/SPARK-16820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15401039#comment-15401039
 ] 

Ohad Raviv commented on SPARK-16820:
------------------------------------

I will create a PR soon with a suggested fix, but tell me what you think about 
that..

> Sparse - Sparse matrix multiplication
> -------------------------------------
>
>                 Key: SPARK-16820
>                 URL: https://issues.apache.org/jira/browse/SPARK-16820
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 2.0.0
>            Reporter: Ohad Raviv
>
> While working on MCL implementation on Spark we have encountered some 
> difficulties.
> The main part of this process is distributed sparse matrix multiplication 
> that has two main steps:
> 1.    Simulate multiply – preparation before the real multiplication in order 
> to see which blocks should be multiplied.
> 2.    The actual blocks multiplication and summation.
> In our case the sparse matrix has 50M rows and columns, and 2B non-zeros.
> The current multiplication suffers from these issues:
> 1.    A relatively trivial bug already fixed in the first step the caused the 
> process to be very slow [SPARK-16469]
> 2.    Still after the bug fix, if we have too many blocks the Simulate 
> multiply will take very long time and will multiply the data many times. 
> (O(n^3) where n is the number of blocks)
> 3.    Spark supports only multiplication with Dense matrices. Thus, it 
> converts a Sparse matrix into a dense matrix before the multiplication.
> 4.    For summing the intermediate block results Spark uses Breeze’s CSC 
> matrix operations – here the problem is that it is very inefficient to update 
> a CSC matrix in a zero value.
> That means that with many blocks (default block size is 1024) – in our case 
> 50M/1024 ~= 50K, the simulate multiply will effectively never finish or will 
> generate 50K*16GB ~= 1000TB of data. On the other hand, if we use bigger 
> block size e.g. 100k we get OutOfMemoryException in the “toDense” method of 
> the multiply. We have worked around that by implementing our-selves both the 
> Sparse multiplication and addition in a very naïve way – but at least it 
> works.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to