[ 
https://issues.apache.org/jira/browse/SPARK-19759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16242314#comment-16242314
 ] 

Marco Gaido commented on SPARK-19759:
-------------------------------------

I tried comparing the current implementation with an easy for loop to compute 
directly the dot product on 500.000 elements with 100 features. The code I used 
is:

{code}
def time[R](block: => R): Unit = {
    val t0 = System.nanoTime()
    block
    val t1 = System.nanoTime()
    println("Elapsed time: " + (t1 - t0) + "ns")
}
val r = new scala.util.Random(100)
val input = (1 to 500000).map(_ => (1 to 100).map(_ => r.nextFloat).toSeq)
def f(a:Seq[Float], b:Seq[Float]): Float = {
    var r = 0.0f
    for(i <- 0 until a.length) {
        r+=a(i)*b(i)
    }
    r
}
import com.github.fommil.netlib.BLAS.{getInstance => blas}
val b = (1 to 100).map(_ => r.nextFloat).toSeq
time { input.foreach(a=>blas.sdot(100, a.toArray, 1, b.toArray, 1)) }
time { input.foreach(a=>f(a,b)) }
{code}

On 10 run the current implementation takes on average 2968.718815 ms, while the 
for-loop implementation takes 515.510185 ms.
Thus I am submitting a PR with this second implementation.

> ALSModel.predict on Dataframes : potential optimization by not using blas 
> --------------------------------------------------------------------------
>
>                 Key: SPARK-19759
>                 URL: https://issues.apache.org/jira/browse/SPARK-19759
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.1.1
>            Reporter: Sue Ann Hong
>            Priority: Minor
>
> In the DataFrame ALS prediction function, we use blas.sdot which may be 
> slower due to the conversion to Arrays. We can try operating on Seqs or 
> another data structure to see if avoiding the conversion makes the operation 
> faster. Ref: 
> https://github.com/apache/spark/pull/17090/files/707bc6b153a7f899fbf3fe2a5675cacba1f95711#diff-be65dd1d6adc53138156641b610fcada
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to