[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49131064
  
Besides breaking the API, I'm also worried about two things:

1. The increase in storage. We had some discussion before v1.0 about 
whether we should switch to long or not. ALS is not computation heavy for small 
k but communication heavy. I posted some screenshots on the JIRA page, where 
ALS shuffles ~200GB data in each iteration. With Long ids, this number may 
become ~300GB and hence ALS may slow down by 50%. Instead of upgrading the id 
type to Long, I'm actually thinking about downgrading the rating type to Float.

2. Is collision really bad? ALS needs somewhat dense matrix to compute 
good recommendations. If there are 3 billion users but each user only gives 1 
or 2 ratings, ALS is very likely to overfit. In this case, making a random 
projection on the user side would certainly help, while hashing is one of the 
commonly used techniques for random projection. There will be bad 
recommendations no matter whether there exist hash collisions or not. So I'm 
really interested in some measurements on the downside of hash collision.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49149396
  
Let me close this PR for now. I will fork or wrap as necessary. Keep it in 
mind, and maybe in a 2.x release this can be revisited. (Matei I ran into more 
problems with the `Rating` class retrofit anyway.)

Yes storage is the downside. Your comments on JIRA about effects of 
serialization in compressing away the difference are promising. I completely 
agree with using `Float` for ratings and even feature vectors.

Yes I understand why random projections are helpful. It doesn't help 
accuracy, but may only trivially hurt it in return for some performance gain. 
If I have just 1 rating, it doesn't make my recs better to arbitrarily add your 
ratings to mine. Sure that's denser, and maybe you're getting less overfitting, 
but it's fitting the wrong input for both of us.

A collision here and there is probably acceptable. One in a million 
customers? OK. 1%? maybe a problem. I agree, you'd have to quantify this to 
decide. If I'm an end user of MLlib bringing even millions of things to my 
model, I have to decide. And if it's a problem, have to maintain a lookup table 
to use it.

It seemed simplest to moot the problem with a much bigger key space and 
engineer around the storage issue. A bit more memory is cheap; accuracy and 
engineer time are expensive. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49201924
  
Sean, I'd still be okay with adding a LongALS class if you see benefit for 
it in some use cases. Let's just see how it works in comparison.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49203419
  
@mateiz I think it would mean mostly cloning `ALS.scala`, as the `Rating` 
object is woven throughout. Probably some large chunks could be refactored and 
shared. Is that what you mean? even I'm not sure if two APIs are worth the 
trouble. When I get to this point and see what it takes to make a 64-bit key 
implementation, yes I can propose what that looks like.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49210917
  
BTW this could also be a place to use the dreaded Scala @specialized 
annotation to template the code for Ints vs Longs, though as far as I know 
that's being deprecated by the Scala developers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49210831
  
Yeah, that's what I meant, we can clone it at first but we might be able to 
share code later (at least the math code we run on each block, or stuff like 
that). But let's do it only if you find out it's worth it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-16 Thread srowen
Github user srowen closed the pull request at:

https://github.com/apache/spark/pull/1393


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-48999523
  
I am trying to figure out why it happend, this might not be my conclusion 
but at the moment I feel that since this class has a private [mllib] 
constructor, there is an entry in ignores file as follows 
`org.apache.spark.mllib.recommendation.MatrixFactorizationModel.init`. This 
particular entry makes the whole class ignored by mima tool. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-48999631
  
And to my surprise also has 
`org.apache.spark.mllib.recommendation.MatrixFactorizationModel.predict` not 
sure why it has that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread ScrapCodes
Github user ScrapCodes commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49000211
  
Ahh understood (please ignore my previous theory.), so it happened because 
we have a function which is `@developerApi` in the same class with same name. 
So this was added by our GenerateMimaIgnoreTool to ignores file and as a result 
MIMA check for all predict methods was disabled. Not sure if mima can 
disambiguate them. I will check that. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/1393#discussion_r14926032
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
 ---
@@ -53,7 +53,7 @@ class MatrixFactorizationModel private[mllib] (
 * @param usersProducts  RDD of (user, product) pairs.
 * @return RDD of Ratings.
 */
-  def predict(usersProducts: RDD[(Int, Int)]): RDD[Rating] = {
+  def predict(usersProducts: RDD[(Long, Long)]): RDD[Rating] = {
--- End diff --

I had understood all of this to be an `@Experimental` API, though it is not 
consistently marked. For example, `Rating` is experimental but its API is 
actually bound to this API here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49013972
  
Yes you could also tell callers to track their own user-ID mapping and 
maintain it consistently everywhere. Callers have to share that state then 
somehow. Hashing is easier, and 64 bits makes it work for practical purposes. 

A caller has to do something like these to deal with real-world identifiers 
because an `Int` ID API by itself doesn't quite work. This is an instance of a 
meta-concern I have, if an API which (from my perspective) is going to be 
problematic at scale is already unchangeable before battle-testing. (I actually 
thought all of MLlib was de facto `@Experimental`?)

Yeah however you can layer on other APIs to fix it, or use `@deprecated` in 
cases like this to keep existing methods but add new signatures too. I think 
that would be the simplest solution to this particular concern.

The question of serialized size is still out there. That is worth weighing 
in on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49072664
  
Yeah API stability is very important. I keep banging on about the flip-side 
-- freezing an API that may still need to change. You get a different important 
problem. I'm sure everyone gets that, and it's a judgment call and trade-off.

I will change the PR to preserve the existing methods and add new ones. 
That's the thing we can consider and merge or not. I'm not offended if nobody 
else is feeling this one. I can always fork/wrap this aspect to fit what I need 
it do. (And I have other API suggestions I'd rather spend time on if anything.)

I wouldn't want to add the overhead of a separate set of implementations 
just for 64 bit values. Users would have a hard time understanding the 
difference and choosing.

3 billion people is a lot! It could happen, yes. Maybe not with people but 
with, say, URLs. *If* collisions mattered much, then with many billions of 
things, you can't use the ALS implementation as it stands, since _most_ IDs 
would collide no matter how you map or hash. That's the best motivation I can 
offer for this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49099667
  
QA tests have started for PR 1393. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16694/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49104938
  
Wise words Matei! anyway here's another cut that preserves the original 
API. Tests are still running. Up to you guys' judgment on whether it's 
worthwhile.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1393#discussion_r14972737
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
 ---
@@ -53,7 +53,7 @@ class MatrixFactorizationModel private[mllib] (
 * @param usersProducts  RDD of (user, product) pairs.
 * @return RDD of Ratings.
 */
-  def predict(usersProducts: RDD[(Int, Int)]): RDD[Rating] = {
+  def predict(usersProducts: RDD[(Long, Long)]): RDD[Rating] = {
--- End diff --

Interesting, I think Rating shouldn't have been marked experimental. We 
should definitely fix that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1393#discussion_r14973020
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -53,15 +53,15 @@ private[recommendation] case class 
OutLinkBlock(elementIds: Array[Int], shouldSe
  * we get product block b's message to update the corresponding users.
  */
 private[recommendation] case class InLinkBlock(
-  elementIds: Array[Int], ratingsForBlock: Array[Array[(Array[Int], 
Array[Double])]])
+  elementIds: Array[Long], ratingsForBlock: Array[Array[(Array[Int], 
Array[Double])]])
 
 
 /**
  * :: Experimental ::
- * A more compact class to represent a rating than Tuple3[Int, Int, 
Double].
+ * A more compact class to represent a rating than Tuple3[Long, Long, 
Double].
  */
 @Experimental
-case class Rating(user: Int, product: Int, rating: Double)
+case class Rating(user: Long, product: Long, rating: Double)
--- End diff --

This is also a breaking change; we need to add another constructor that 
takes Ints. You may be able to add one like this:
```
case class Rating(user: Long, product: Long, rating: Double) {
  def this(u: Int, p: Int, r: Double) = this(u.toLong, p.toLong, r)
}
```
Or if it doesn't work, you need to turn this into a class instead of a case 
class and add extends Serializable and the various constructors yourself.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1393#discussion_r14973237
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala ---
@@ -53,15 +53,15 @@ private[recommendation] case class 
OutLinkBlock(elementIds: Array[Int], shouldSe
  * we get product block b's message to update the corresponding users.
  */
 private[recommendation] case class InLinkBlock(
-  elementIds: Array[Int], ratingsForBlock: Array[Array[(Array[Int], 
Array[Double])]])
+  elementIds: Array[Long], ratingsForBlock: Array[Array[(Array[Int], 
Array[Double])]])
 
 
 /**
  * :: Experimental ::
- * A more compact class to represent a rating than Tuple3[Int, Int, 
Double].
+ * A more compact class to represent a rating than Tuple3[Long, Long, 
Double].
  */
 @Experimental
-case class Rating(user: Int, product: Int, rating: Double)
+case class Rating(user: Long, product: Long, rating: Double)
--- End diff --

Actually this is kind of tricky, we probably need a separate LongRating of 
some kind. This is because user and product and such are public fields and 
they used to be Int.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49106922
  
@mateiz yeah it was `@Experimental` but if you want to preserve the API of 
this class I can only imagine it would have a `Long` field internally but still 
also expose accessors of type `Int` as well with the current name, which just 
truncate or something. It still works the same for values that fit in an `Int` 
I suppose. I can try that to see how it looks. 

I'd not be offended if this is better considered for 2.x, after more 
evidence exists whether this is really causing problems or not.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49107533
  
QA results for PR 1393:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brcase class Rating(user: Long, product: Long, rating: 
Double)brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16694/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-15 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-49110420
  
Yeah, this is tricky. Maybe we need to create a LongRating class, even 
though it's ugly. @mengxr what do you think? I do think we'd like to have 
support for Longs here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-14 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1393#discussion_r14916107
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
 ---
@@ -53,7 +53,7 @@ class MatrixFactorizationModel private[mllib] (
 * @param usersProducts  RDD of (user, product) pairs.
 * @return RDD of Ratings.
 */
-  def predict(usersProducts: RDD[(Int, Int)]): RDD[Rating] = {
+  def predict(usersProducts: RDD[(Long, Long)]): RDD[Rating] = {
--- End diff --

This is a breaking API change unfortunately, and because of type erasure, 
users who compiled against an old MLlib (passing RDD[(Int, Int)]) will have 
their code link against the new one and get a ClassCastException


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-14 Thread mateiz
Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1393#discussion_r14916135
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
 ---
@@ -36,10 +36,10 @@ import org.apache.spark.mllib.api.python.PythonMLLibAPI
  */
 class MatrixFactorizationModel private[mllib] (
 val rank: Int,
-val userFeatures: RDD[(Int, Array[Double])],
-val productFeatures: RDD[(Int, Array[Double])]) extends Serializable {
+val userFeatures: RDD[(Long, Array[Double])],
+val productFeatures: RDD[(Long, Array[Double])]) extends Serializable {
   /** Predict the rating of one user for one product. */
-  def predict(user: Int, product: Int): Double = {
+  def predict(user: Long, product: Long): Double = {
--- End diff --

This is also a breaking API change, weird that Jenkins didn't catch it. 
@scrapcodes and @pwendell is this something we're not checking for right now in 
MIMA? Because this class is not experimental or anything.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-14 Thread mateiz
Github user mateiz commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-48987125
  
I'm somewhat worried about this because it seems difficult to do without 
breaking API changes. Do users really want to rely on luck to avoid collisions? 
It might be better to add some efficient way to assign unique IDs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/1393

SPARK-2465. Use long as user / item ID for ALS

I'd like to float this for consideration: use longs instead of ints for 
user and product IDs in the ALS implementation.

The main reason for is that identifiers are not generally numeric at all, 
and will be hashed to an integer. (This is a separate issue.) Hashing to 32 
bits means collisions are likely after hundreds of thousands of users and 
items, which is not unrealistic. Hashing to 64 bits pushes this back to 
billions.

It would also mean numeric IDs that happen to be larger than the largest 
int can be used directly as identifiers.

On the downside of course: 8 bytes instead of 4 bytes of memory used per 
Rating.

Thoughts?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-2465

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1393.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1393


commit d4082ad7aa7468b63605d141f6d68e278983678a
Author: Sean Owen so...@cloudera.com
Date:   2014-07-13T14:57:15Z

Use long instead of int for user/product IDs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-48842883
  
QA tests have started for PR 1393. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16603/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread witgo
Github user witgo commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-48843123
  
The overall increase how much memory? Have a detailed contrast?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---



[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-48843270
  
I think the most significant change is the Rating object. It goes from 8 + 
(ref) + 8 (object) + 4 (int) + 4 (int) + 8 (double) = 32 bytes to 8 (ref) + 8 
(object) + 4 (long) + 4 (long) + 8 (double) = 40 bytes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-48845420
  
QA results for PR 1393:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brcase class Rating(user: Long, product: Long, rating: 
Double)brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16603/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-48846073
  
QA tests have started for PR 1393. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16604/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-48848503
  
QA results for PR 1393:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brcase class Rating(user: Long, product: Long, rating: 
Double)brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16604/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-48850704
  
QA tests have started for PR 1393. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16607/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-2465. Use long as user / item ID for ALS

2014-07-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1393#issuecomment-48852700
  
QA results for PR 1393:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brcase class Rating(user: Long, product: Long, rating: 
Double)brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16607/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---