[ 
https://issues.apache.org/jira/browse/SYSTEMML-831?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15525147#comment-15525147
 ] 

Matthias Boehm edited comment on SYSTEMML-831 at 9/27/16 5:46 AM:
------------------------------------------------------------------

ok, I had a closer look and here is the high-level summary: the problematic 
expression is indeed {{X %*% t(X)}} (in function {{distance_matrix}}). Given 
the 60Kx784 input, this output matrix is 60Kx60K, i.e., 29GB in dense. The size 
itself is not problematic, but because the input is tiny, we only have 2 input 
partitions (and hence 2 output partitions), which gives us 15GB per partition 
which exceeds the 2GB limitation of partitions (on block manager or shuffle).

I now added a new robustness features to our {{mapmm}} (broadcast-based matrix 
multiply, which is often the source of these outer-product-like matrices). We 
now automatically repartition the input if the output size per partition is 
expected to be large. Running in {{-exec hybrid_spark}} and with 20GB driver 
and 55GB executors, this already solved the issue. The patch will be available 
in master tomorrow. However, note that a single matrix block (of blocksize 1K) 
is the smallest granularity for repartitioning - in this scenario a single 
block already outputs almost 500MB. Hence, even this approach is somewhat 
limited. As a workaround you can further reconfigure the block size in 
{{SystemML-config.xml}} to increase the potential degree of parallelism. But 
even then, the problem remains O(n^2), which will inevitable run into issues on 
large input data.

Furthermore, the script also exhibits further optimization opportunities. For 
example, the entire inner for loop over the number of rows can be vectorized, 
i.e. conditional assignments can be rewritten to something like {{betamin = 
(Hdiff>0)*beta + (Hdiff<=0)*betamin}}. Let me know if you need help with that.


was (Author: mboehm7):
ok, I had a closer look and here is the high-level summary: the problematic 
expression is indeed {{X %*% t(X)}} (in function {{distance_matrix}}). Given 
the 60Kx784 input, this output matrix is 60Kx60K, i.e., 29GB in dense. The size 
itself is not problematic, but because the input is tiny, we only have 2 input 
partitions (and hence 2 output partitions), which gives us 15GB per partition 
which exceeds the 2GB limitation of partitions (on block manager or shuffle).

I now added a new robustness features to our mapmm (broadcast-based matrix 
multiply, which is often the source of these outer-product-like matrices). We 
now automatically repartition the input if the output size per partition is 
expected to be large. Running in -exec hybrid_spark and with 20GB driver and 
55GB executors, this already solved the issue. The patch will be available in 
master tomorrow. However, note that a single matrix block (of blocksize 1K) is 
the smallest granularity for repartitioning - in this scenario a single block 
already outputs almost 500MB. Hence, even this approach is somewhat limited. As 
a workaround you can further reconfigure the block size in SystemML-config.xml 
to increase the potential degree of parallelism. But even then, the problem 
remains O(n^2), which will inevitable run into issues on large input data.

Furthermore, the script also exhibits further optimization opportunities. For 
example, the entire inner for loop over the number of rows can be vectorized, 
i.e. conditional assignments can be rewritten to something like {{betamin = 
(Hdiff>0)*beta + (Hdiff<=0)*betamin}}. Let me know if you need help with that.

> Implement t-SNE algorithm
> -------------------------
>
>                 Key: SYSTEMML-831
>                 URL: https://issues.apache.org/jira/browse/SYSTEMML-831
>             Project: SystemML
>          Issue Type: Improvement
>          Components: Algorithms
>            Reporter: Imran Younus
>            Assignee: Imran Younus
>         Attachments: out_2016_09_26_10.log
>
>
> This jira implements the t-distributed Stochastic Neighbor Embedding 
> algorithm for dimensionality reduction presented in this paper:
> Visualizing Data using t-SNE 
> by Laurens van der Maaten, Geoffrey Hinton
> http://www.jmlr.org/papers/v9/vandermaaten08a.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to