[jira] [Commented] (SPARK-4675) Find similar products and similar users in MatrixFactorizationModel
[ https://issues.apache.org/jira/browse/SPARK-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14546995#comment-14546995 ] Apache Spark commented on SPARK-4675: - User 'debasish83' has created a pull request for this issue: https://github.com/apache/spark/pull/6213 Find similar products and similar users in MatrixFactorizationModel --- Key: SPARK-4675 URL: https://issues.apache.org/jira/browse/SPARK-4675 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Steven Bourke Priority: Trivial Labels: mllib, recommender Using the latent feature space that is learnt in MatrixFactorizationModel, I have added 2 new functions to find similar products and similar users. A user of the API can for example pass a product ID, and get the closest products. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4675) Find similar products and similar users in MatrixFactorizationModel
[ https://issues.apache.org/jira/browse/SPARK-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242808#comment-14242808 ] Sean Owen commented on SPARK-4675: -- The lower dimensional space is of course smaller. This makes it faster and more efficient to work with, which is an advantage to be sure at scale. But the real reason is that the original high-dimensional space is extremely sparse. Standard similarity measures are undefined for most pairs, or are 0. It's sort of a symptom of the curse of dimensionality. Find similar products and similar users in MatrixFactorizationModel --- Key: SPARK-4675 URL: https://issues.apache.org/jira/browse/SPARK-4675 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Steven Bourke Priority: Trivial Labels: mllib, recommender Using the latent feature space that is learnt in MatrixFactorizationModel, I have added 2 new functions to find similar products and similar users. A user of the API can for example pass a product ID, and get the closest products. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4675) Find similar products and similar users in MatrixFactorizationModel
[ https://issues.apache.org/jira/browse/SPARK-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14243026#comment-14243026 ] Debasish Das commented on SPARK-4675: - Is there a metric like MAP / AUC kind of measure that can help us validate similarUsers and similarProducts ? Right now if I run column similarities with sparse vector on matrix factorization datasets for product similarities, it will assume all unvisited entries (which should be ?) as 0 and compute column similarities for...If the sparse vector has ? in place of 0 then basically all similarity calculation is incorrect...so in that sense it makes more sense to compute the similarities on the matrix factors... But then we are back to map-reduce calculation of rowSimilarities. Find similar products and similar users in MatrixFactorizationModel --- Key: SPARK-4675 URL: https://issues.apache.org/jira/browse/SPARK-4675 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Steven Bourke Priority: Trivial Labels: mllib, recommender Using the latent feature space that is learnt in MatrixFactorizationModel, I have added 2 new functions to find similar products and similar users. A user of the API can for example pass a product ID, and get the closest products. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4675) Find similar products and similar users in MatrixFactorizationModel
[ https://issues.apache.org/jira/browse/SPARK-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241535#comment-14241535 ] Debasish Das commented on SPARK-4675: - There are few issues: 1. Batch API for topK similar users and topK similar products 2. Comparison of product x product similarities generated with columnSimilarities and compared with topK similar products I added batch APIs for topK product recommendation for each user and topK user recommendation for each product in SPARK-4231...similar batch API will be very helpful for topK similar users and topK similar products... I agree with Cosine Similarity...you should be able to re-use column similarity calculations...I think a better idea is to add rowMatrix.similarRows and re-use that code to generate product similarities and user similarities... But my question is more on validation. We can compute product similarities on raw features and we can compute product similarities on matrix product factor...which one is better ? Find similar products and similar users in MatrixFactorizationModel --- Key: SPARK-4675 URL: https://issues.apache.org/jira/browse/SPARK-4675 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Steven Bourke Priority: Trivial Labels: mllib, recommender Using the latent feature space that is learnt in MatrixFactorizationModel, I have added 2 new functions to find similar products and similar users. A user of the API can for example pass a product ID, and get the closest products. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4675) Find similar products and similar users in MatrixFactorizationModel
[ https://issues.apache.org/jira/browse/SPARK-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14241952#comment-14241952 ] Joseph K. Bradley commented on SPARK-4675: -- Just to make sure I get your last question, are you asking, Why compute product similarities using the low-dimensional space when we could do it in the high-dimensional space? If so, then my understanding is that the low-dimensional space will give more meaningful similarities in general. Find similar products and similar users in MatrixFactorizationModel --- Key: SPARK-4675 URL: https://issues.apache.org/jira/browse/SPARK-4675 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Steven Bourke Priority: Trivial Labels: mllib, recommender Using the latent feature space that is learnt in MatrixFactorizationModel, I have added 2 new functions to find similar products and similar users. A user of the API can for example pass a product ID, and get the closest products. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4675) Find similar products and similar users in MatrixFactorizationModel
[ https://issues.apache.org/jira/browse/SPARK-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14242034#comment-14242034 ] Debasish Das commented on SPARK-4675: - [~josephkb] how do we validate that low dimension space is giving more meaningful similarities than the feature space (which is sparse) ? Find similar products and similar users in MatrixFactorizationModel --- Key: SPARK-4675 URL: https://issues.apache.org/jira/browse/SPARK-4675 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Steven Bourke Priority: Trivial Labels: mllib, recommender Using the latent feature space that is learnt in MatrixFactorizationModel, I have added 2 new functions to find similar products and similar users. A user of the API can for example pass a product ID, and get the closest products. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-4675) Find similar products and similar users in MatrixFactorizationModel
[ https://issues.apache.org/jira/browse/SPARK-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14229610#comment-14229610 ] Apache Spark commented on SPARK-4675: - User 'sbourke' has created a pull request for this issue: https://github.com/apache/spark/pull/3536 Find similar products and similar users in MatrixFactorizationModel --- Key: SPARK-4675 URL: https://issues.apache.org/jira/browse/SPARK-4675 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Steven Bourke Priority: Trivial Labels: mllib, recommender Using the latent feature space that is learnt in MatrixFactorizationModel, I have added 2 new functions to find similar products and similar users. A user of the API can for example pass a product ID, and get the closest products. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org