[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-59889793 @mdagost Thanks for working on the SerDe! I tested it locally and it works correctly, but the unit tests for the added methods are missing. Do you mind adding them? You can follow https://github.com/mdagost/spark/blob/mf_user_features/python/pyspark/mllib/recommendation.py#L55 Basically, we want to verify that userFeatures/productFeatures returns an RDD of key-value pairs with the correct number of records and for each records the feature dimension is correct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-59927704 Whoops. Forgot the tests :) I'll work on those today. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-59935805 @mengxr Unit tests are added. I get some unrelated test failures on my local (everything in `recommendation.py`, including the new stuff, passes.) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-59956077 this is ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-59956125 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-59957012 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21994/consoleFull) for PR 2636 at commit [`c98f9e2`](https://github.com/apache/spark/commit/c98f9e22a87b640b9787e054067a49506aabf2b6). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-59967277 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21994/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-59967265 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21994/consoleFull) for PR 2636 at commit [`c98f9e2`](https://github.com/apache/spark/commit/c98f9e22a87b640b9787e054067a49506aabf2b6). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2636 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-59769268 @MLnick It doesn't look like `pairRDDToPython` does the trick. I tried ```{python} def userFeatures(self): juf = self._java_model.userFeatures() juf = sc._jvm.SerDeUtil.pairRDDToPython(juf, 1) return juf ``` but what comes out when I try to print the result of taking the first element of the RDD is just [[B@176fa1a5 rather than any kind of nicely formatted python object. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-59784360 @davies Your idea of adding something like `fromTupleRDD` to `PythonMLLibAPI` seems to be the way to go. I'm just doing some cleanup and will push `userFeatures` and `productFeatures` in just a bit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-58243752 @MLnick @mdagost There are a few functions available which you could use for the serialization, but PythonRDD.javaToPython might be a good option. You can see example usage in recommendation.py --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-58272430 I've been having trouble getting either `PythonRDD.javaToPython` or `pairRDDToPython` to work. But porting the general function I wrote from `MatrixFactorizationModel.scala` to `PythonMLLibAPI` is also giving me some trouble. I'll get back to it later this week and try to make some progress... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user MLnick commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-57905979 Can we use the existing `pairRDDToPython ` function? https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala#L120 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-57785126 @mdagost If you convert `(Int, Array[Double])` to a `java.util.ListObject` (id the first and features the second (without converting to string)), you should be able to get the data correctly on the Python side. If that works, could you add `productFeatures` as well? Thanks! @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-57813143 @mdagost @mengxr We use Pyrolite to convert Java objects into Python objects, you can get the type mapping here: https://github.com/irmen/Pyrolite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user mdagost commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-57840219 I'm totally new to Spark, so sorry if these are all dumb questions. Are you suggesting that I convert the userFeatures `RDD[(Int, Array[Double])]` to `RDD[Array[Object]]` ? If so, do you want a helper function for doing that like I did for the string helper, or should I convert the main userFeatures to be of that type? Also, I'm sure this is dumb, but what exact type of `Object` are we talking about? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-57855517 We still need this wrapper, but RDD[Array[Object]] is only used for Python API, so it's better to put it in PythonMLLibAPI, maybe more general, like fromTupleRDD, which will convert any RDD[Tuple[_,_]] into RDD[Array[Any]], Any is similar to Java Object. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
GitHub user mdagost opened a pull request: https://github.com/apache/spark/pull/2636 SPARK-3770: Make userFeatures accessible from python https://issues.apache.org/jira/browse/SPARK-3770 We need access to the underlying latent user features from python. However, the userFeatures RDD from the MatrixFactorizationModel isn't accessible from the python bindings. I've added a method to the underlying scala class to turn the RDD[(Int, Array[Double])] to an RDD[String]. This is then accessed from the python recommendation.py You can merge this pull request into a Git repository by running: $ git pull https://github.com/mdagost/spark mf_user_features Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2636.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2636 commit e1fbe5e82a6b9436ce745175670cd005f6481173 Author: Michelangelo D'Agostino mdagost...@civisanalytics.com Date: 2014-10-02T13:33:45Z Added scala function to stringify userFeatures for access in python. commit cdd98e3a43cc465844a3b38432f4edc679ffa0dd Author: Michelangelo D'Agostino mdagost...@civisanalytics.com Date: 2014-10-02T16:05:48Z It's working now. commit 34cb2a2889649e3f29f1686745320884f1fbc945 Author: Michelangelo D'Agostino mdagost...@civisanalytics.com Date: 2014-10-02T21:41:51Z A couple of lint cleanups and a comment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2636#issuecomment-57715181 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org