[ https://issues.apache.org/jira/browse/SPARK-18781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15824111#comment-15824111 ]
Eyal Allweil commented on SPARK-18781: -------------------------------------- It seems like the approximation count is taking 10-20% of the total running time. When I opened this issue my jobs were taking about an hour, so it was more noticeable - the jobs I've been running lately have been 10-20 minutes, so it "feels" less important, because it's just a few minutes, but it's always at least 10%, usually around 15%. > Allow MatrixFactorizationModel.predict to skip user/product approximation > count > ------------------------------------------------------------------------------- > > Key: SPARK-18781 > URL: https://issues.apache.org/jira/browse/SPARK-18781 > Project: Spark > Issue Type: Improvement > Components: MLlib > Reporter: Eyal Allweil > Priority: Minor > > When > [MatrixFactorizationModel.predict|https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.html#predict(org.apache.spark.rdd.RDD)] > is used, it first calculates an approximation count of the users and > products in order to determine the most efficient way to proceed. In many > cases, the answer to this question is fixed (typically there are more users > than products by an order of magnitude) and this check is unnecessary. Adding > a parameter to this predict method to allow choosing the implementation (and > skipping the check) would be nice. > It would be especially nice in development cycles when you are repeatedly > tweaking your model and which pairs you're predicting for and this > approximate count represents a meaningful portion of the time you wait for > results. > I can provide a pull request with this ability added that preserves the > existing behavior. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org