[ 
https://issues.apache.org/jira/browse/SPARK-18781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15787309#comment-15787309
 ] 

Sean Owen commented on SPARK-18781:
-----------------------------------

I take it back, I see why remembering the result probably doesn't help anything.
[~viirya] what do you think of the idea of a new method that maybe takes an 
optional boolean "partition by users" to let the caller avoid this overhead?
I think the RDD predict method is meant for large batches where this overhead 
is trivial, so still not sure about complicating the API for this. This is 
never going to be a real-time method.

> Allow MatrixFactorizationModel.predict to skip user/product approximation 
> count
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-18781
>                 URL: https://issues.apache.org/jira/browse/SPARK-18781
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>            Reporter: Eyal Allweil
>            Priority: Minor
>
> When 
> [MatrixFactorizationModel.predict|https://spark.apache.org/docs/1.6.1/api/java/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.html#predict(org.apache.spark.rdd.RDD)]
>  is used, it first calculates an approximation count of the users and 
> products in order to determine the most efficient way to proceed. In many 
> cases, the answer to this question is fixed (typically there are more users 
> than products by an order of magnitude) and this check is unnecessary. Adding 
> a parameter to this predict method to allow choosing the implementation (and 
> skipping the check) would be nice.
> It would be especially nice in development cycles when you are repeatedly 
> tweaking your model and which pairs you're predicting for and this 
> approximate count represents a meaningful portion of the time you wait for 
> results.
> I can provide a pull request with this ability added that preserves the 
> existing behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to