[ 
https://issues.apache.org/jira/browse/SPARK-4129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14188037#comment-14188037
 ] 

Apache Spark commented on SPARK-4129:
-------------------------------------

User 'dbtsai' has created a pull request for this issue:
https://github.com/apache/spark/pull/2992

> Performance tuning in MultivariateOnlineSummarizer
> --------------------------------------------------
>
>                 Key: SPARK-4129
>                 URL: https://issues.apache.org/jira/browse/SPARK-4129
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: DB Tsai
>
> In MultivariateOnlineSummarizer, breeze's activeIterator is used to loop 
> through the nonZero elements in the vector. However, activeIterator doesn't 
> perform well due to lots of overhead. In this PR, native while loop is used 
> for both DenseVector and SparseVector.
> The benchmark result with 20 executors using mnist8m dataset:
> Before:
> DenseVector: 48.2 seconds
> SparseVector: 16.3 seconds
> After:
> DenseVector: 17.8 seconds
> SparseVector: 11.2 seconds
> Since MultivariateOnlineSummarizer is used in several places, the overall 
> performance gain in mllib library will be significant with this PR. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to