[ https://issues.apache.org/jira/browse/SPARK-19759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16242314#comment-16242314 ]
Marco Gaido commented on SPARK-19759: ------------------------------------- I tried comparing the current implementation with an easy for loop to compute directly the dot product on 500.000 elements with 100 features. The code I used is: {code} def time[R](block: => R): Unit = { val t0 = System.nanoTime() block val t1 = System.nanoTime() println("Elapsed time: " + (t1 - t0) + "ns") } val r = new scala.util.Random(100) val input = (1 to 500000).map(_ => (1 to 100).map(_ => r.nextFloat).toSeq) def f(a:Seq[Float], b:Seq[Float]): Float = { var r = 0.0f for(i <- 0 until a.length) { r+=a(i)*b(i) } r } import com.github.fommil.netlib.BLAS.{getInstance => blas} val b = (1 to 100).map(_ => r.nextFloat).toSeq time { input.foreach(a=>blas.sdot(100, a.toArray, 1, b.toArray, 1)) } time { input.foreach(a=>f(a,b)) } {code} On 10 run the current implementation takes on average 2968.718815 ms, while the for-loop implementation takes 515.510185 ms. Thus I am submitting a PR with this second implementation. > ALSModel.predict on Dataframes : potential optimization by not using blas > -------------------------------------------------------------------------- > > Key: SPARK-19759 > URL: https://issues.apache.org/jira/browse/SPARK-19759 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 2.1.1 > Reporter: Sue Ann Hong > Priority: Minor > > In the DataFrame ALS prediction function, we use blas.sdot which may be > slower due to the conversion to Arrays. We can try operating on Seqs or > another data structure to see if avoiding the conversion makes the operation > faster. Ref: > https://github.com/apache/spark/pull/17090/files/707bc6b153a7f899fbf3fe2a5675cacba1f95711#diff-be65dd1d6adc53138156641b610fcada > -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org