[ 
https://issues.apache.org/jira/browse/SPARK-21152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16061762#comment-16061762
 ] 

Yanbo Liang commented on SPARK-21152:
-------------------------------------

[~sethah] I'm a little pessimistic on this. From the experience of refactoring 
{{KMeans}} to use level 3 BLAS operations, I saw huge performance degradation 
for sparse input. Deep into this problem, I found it lack of native BLAS 
implementation for sparse matrix operations, so sparse input can not benefit 
from native BLAS to speed up. However, I'm not very confidence it's the same 
situation here, may be we can hear other's thoughts. Thanks.

> Use level 3 BLAS operations in LogisticAggregator
> -------------------------------------------------
>
>                 Key: SPARK-21152
>                 URL: https://issues.apache.org/jira/browse/SPARK-21152
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML
>    Affects Versions: 2.1.1
>            Reporter: Seth Hendrickson
>
> In logistic regression gradient update, we currently compute by each 
> individual row. If we blocked the rows together, we can do a blocked gradient 
> update which leverages the BLAS GEMM operation.
> On high dimensional dense datasets, I've observed ~10x speedups. The problem 
> here, though, is that it likely won't improve the sparse case so we need to 
> keep both implementations around, and this blocked algorithm will require 
> caching a new dataset of type:
> {code}
> BlockInstance(label: Vector, weight: Vector, features: Matrix)
> {code}
> We have avoided caching anything beside the original dataset passed to train 
> in the past because it adds memory overhead if the user has cached this 
> original dataset for other reasons. Here, I'd like to discuss whether we 
> think this patch would be worth the investment, given that it only improves a 
> subset of the use cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to