[ 
https://issues.apache.org/jira/browse/SPARK-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean Owen reassigned SPARK-21680:
---------------------------------

    Assignee: Peng Meng
    Priority: Minor  (was: Major)

> ML/MLLIB Vector compressed optimization
> ---------------------------------------
>
>                 Key: SPARK-21680
>                 URL: https://issues.apache.org/jira/browse/SPARK-21680
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib
>    Affects Versions: 2.3.0
>            Reporter: Peng Meng
>            Assignee: Peng Meng
>            Priority: Minor
>             Fix For: 2.3.0
>
>
> When use Vector.compressed to change a Vector to SparseVector, the 
> performance is very low comparing with Vector.toSparse.
> This is because you have to scan the value three times using 
> Vector.compressed, but you just need two times when use Vector.toSparse.
> When the length of the vector is large, there is significant performance 
> difference between this two method.
> Code of Vector compressed:
> {code:java}
>   def compressed: Vector = {
>     val nnz = numNonzeros
>     // A dense vector needs 8 * size + 8 bytes, while a sparse vector needs 
> 12 * nnz + 20 bytes.
>     if (1.5 * (nnz + 1.0) < size) {
>       toSparse
>     } else {
>       toDense
>     }
>   }
> {code}
> I propose to change it to:
> {code:java}
> // Some comments here
> def compressed: Vector = {
>     val nnz = numNonzeros
>     // A dense vector needs 8 * size + 8 bytes, while a sparse vector needs 
> 12 * nnz + 20 bytes.
>     if (1.5 * (nnz + 1.0) < size) {
>       val ii = new Array[Int](nnz)
>       val vv = new Array[Double](nnz)
>       var k = 0
>       foreachActive { (i, v) =>
>         if (v != 0) {
>           ii(k) = i
>           vv(k) = v
>         k += 1
>         }
>     }
>     new SparseVector(size, ii, vv)
>     } else {
>       toDense
>     }
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to