[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-21 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-59889793
  
@mdagost Thanks for working on the SerDe! I tested it locally and it works 
correctly, but the unit tests for the added methods are missing. Do you mind 
adding them? You can follow


https://github.com/mdagost/spark/blob/mf_user_features/python/pyspark/mllib/recommendation.py#L55

Basically, we want to verify that userFeatures/productFeatures returns an 
RDD of key-value pairs with the correct number of records and for each records 
the feature dimension is correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-21 Thread mdagost
Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-59927704
  
Whoops.  Forgot the tests :)  I'll work on those today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-21 Thread mdagost
Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-59935805
  
@mengxr Unit tests are added.  I get some unrelated test failures on my 
local (everything in `recommendation.py`, including the new stuff, passes.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-21 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-59956077
  
this is ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-21 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-59956125
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-59957012
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21994/consoleFull)
 for   PR 2636 at commit 
[`c98f9e2`](https://github.com/apache/spark/commit/c98f9e22a87b640b9787e054067a49506aabf2b6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-59967277
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21994/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-59967265
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21994/consoleFull)
 for   PR 2636 at commit 
[`c98f9e2`](https://github.com/apache/spark/commit/c98f9e22a87b640b9787e054067a49506aabf2b6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-21 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2636


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-20 Thread mdagost
Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-59769268
  
@MLnick It doesn't look like `pairRDDToPython` does the trick.  I tried

```{python}
def userFeatures(self):
juf = self._java_model.userFeatures()   

 
juf = sc._jvm.SerDeUtil.pairRDDToPython(juf, 1)
return juf
```

but what comes out when I try to print the result of taking the first 
element of the RDD is just [[B@176fa1a5 rather than any kind of nicely 
formatted python object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-20 Thread mdagost
Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-59784360
  
@davies Your idea of adding something like `fromTupleRDD` to 
`PythonMLLibAPI` seems to be the way to go.  I'm just doing some cleanup and 
will push `userFeatures` and `productFeatures` in just a bit. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-07 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-58243752
  
@MLnick  @mdagost  There are a few functions available which you could use 
for the serialization, but PythonRDD.javaToPython might be a good option.  You 
can see example usage in recommendation.py


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-07 Thread mdagost
Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-58272430
  
I've been having trouble getting either `PythonRDD.javaToPython` or 
`pairRDDToPython` to work.  But porting the general function I wrote from 
`MatrixFactorizationModel.scala` to `PythonMLLibAPI` is also giving me some 
trouble.  I'll get back to it later this week and try to make some progress...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-04 Thread MLnick
Github user MLnick commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-57905979
  
Can we use the existing `pairRDDToPython ` function? 


https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/api/python/SerDeUtil.scala#L120


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-03 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-57785126
  
@mdagost If you convert `(Int, Array[Double])` to a 
`java.util.ListObject` (id the first and features the second (without 
converting to string)), you should be able to get the data correctly on the 
Python side. If that works, could you add `productFeatures` as well? Thanks!

@davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-03 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-57813143
  
@mdagost @mengxr We use Pyrolite to convert Java objects into Python 
objects, you can get the type mapping here: https://github.com/irmen/Pyrolite


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-03 Thread mdagost
Github user mdagost commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-57840219
  
I'm totally new to Spark, so sorry if these are all dumb questions.  

Are you suggesting that I convert the userFeatures `RDD[(Int, 
Array[Double])]` to `RDD[Array[Object]]` ?  If so, do you want a helper 
function for doing that like I did for the string helper, or should I convert 
the main userFeatures to be of that type?

Also, I'm sure this is dumb, but what exact type of `Object` are we talking 
about?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-03 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-57855517
  
We still need this wrapper, but RDD[Array[Object]] is only used for Python 
API, so it's better to put it in PythonMLLibAPI, maybe more general, like 
fromTupleRDD, which will convert any RDD[Tuple[_,_]] into RDD[Array[Any]], Any 
is similar to Java Object.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-02 Thread mdagost
GitHub user mdagost opened a pull request:

https://github.com/apache/spark/pull/2636

SPARK-3770: Make userFeatures accessible from python

https://issues.apache.org/jira/browse/SPARK-3770

We need access to the underlying latent user features from python. However, 
the userFeatures RDD from the MatrixFactorizationModel isn't accessible from 
the python bindings. I've added a method to the underlying scala class to turn 
the RDD[(Int, Array[Double])] to an RDD[String]. This is then accessed from the 
python recommendation.py

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mdagost/spark mf_user_features

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2636.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2636


commit e1fbe5e82a6b9436ce745175670cd005f6481173
Author: Michelangelo D'Agostino mdagost...@civisanalytics.com
Date:   2014-10-02T13:33:45Z

Added scala function to stringify userFeatures for access in python.

commit cdd98e3a43cc465844a3b38432f4edc679ffa0dd
Author: Michelangelo D'Agostino mdagost...@civisanalytics.com
Date:   2014-10-02T16:05:48Z

It's working now.

commit 34cb2a2889649e3f29f1686745320884f1fbc945
Author: Michelangelo D'Agostino mdagost...@civisanalytics.com
Date:   2014-10-02T21:41:51Z

A couple of lint cleanups and a comment.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-3770: Make userFeatures accessible from ...

2014-10-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2636#issuecomment-57715181
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org