[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-12-13 Thread rekhajoshm
Github user rekhajoshm closed the pull request at:

https://github.com/apache/spark/pull/9980


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-11-30 Thread rekhajoshm
Github user rekhajoshm commented on the pull request:

https://github.com/apache/spark/pull/9980#issuecomment-160803099
  
I concur @mengxr . Tried YourKit, and VisualVm Profiling.This does not fix 
the concern based on my runs with MovieLensALS and RecommendationExample. I do 
run into a set of other issues :-) If i do not get anything soon on this, will 
close this pull. thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-11-30 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/9980#issuecomment-160798418
  
@rekhajoshm You need to do profiling on big datasets. If the improvement is 
not significant, then this is not the right fix. Essentially we are shuffling 
many small objects `(srcId, (dstId, rating))`. I don't think the fix would be 
trivial. We could probably see improvement if we switch the backend to 
DataFrame/Tungsten.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-11-30 Thread mengxr
Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/9980#discussion_r46220915
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
 ---
@@ -275,16 +276,13 @@ object MatrixFactorizationModel extends 
Loader[MatrixFactorizationModel] {
   num: Int): RDD[(Int, Array[(Int, Double)])] = {
 val srcBlocks = blockify(rank, srcFeatures)
 val dstBlocks = blockify(rank, dstFeatures)
+val output = new ArrayBuffer[(Int, (Int, Double))]()
 val ratings = srcBlocks.cartesian(dstBlocks).flatMap {
   case ((srcIds, srcFactors), (dstIds, dstFactors)) =>
-val m = srcIds.length
-val n = dstIds.length
 val ratings = srcFactors.transpose.multiply(dstFactors)
-val output = new Array[(Int, (Int, Double))](m * n)
-var k = 0
+output.clear()
 ratings.foreachActive { (i, j, r) =>
--- End diff --

We don't need `output` to hold the buffer. The following should work, 
though it doesn't really fix the GC problem:

~~~scala
for (i <- 0 until m; j <- 0 until n) yield {
  (srcIds(i), dstIds(j), ratings(i, j))
}
~~~


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-11-25 Thread rekhajoshm
Github user rekhajoshm commented on the pull request:

https://github.com/apache/spark/pull/9980#issuecomment-159772331
  
Thanks @mengxr  Any alternative suggestion for improving upon objects 
needed on recommendAll functionality? I did multiple profiling/heap dump by 
running MatrixFactorizationModelSuite with IntelliJ/Visualvm. The GC %. used 
heap space and heap dumps/instances are non conclusive.thanks.
thanks @srowen , fixed for your comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9980#issuecomment-159763910
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46720/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9980#issuecomment-159763909
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-11-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9980#issuecomment-159763828
  
**[Test build #46720 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46720/consoleFull)**
 for PR 9980 at commit 
[`4104978`](https://github.com/apache/spark/commit/41049787a1b2f3cba8e77623c69a9f590006199f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9980#issuecomment-159755471
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-11-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/9980#issuecomment-159755473
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/46718/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-11-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9980#issuecomment-159755392
  
**[Test build #46718 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46718/consoleFull)**
 for PR 9980 at commit 
[`4b2bb59`](https://github.com/apache/spark/commit/4b2bb59f46dad86cd7f09671040800f2664dfad0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-11-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9980#issuecomment-159754992
  
**[Test build #46720 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46720/consoleFull)**
 for PR 9980 at commit 
[`4104978`](https://github.com/apache/spark/commit/41049787a1b2f3cba8e77623c69a9f590006199f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-11-25 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/9980#discussion_r45929208
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
 ---
@@ -275,15 +276,13 @@ object MatrixFactorizationModel extends 
Loader[MatrixFactorizationModel] {
   num: Int): RDD[(Int, Array[(Int, Double)])] = {
 val srcBlocks = blockify(rank, srcFeatures)
 val dstBlocks = blockify(rank, dstFeatures)
+val output = new ArrayBuffer[(Int, (Int, Double))]()
 val ratings = srcBlocks.cartesian(dstBlocks).flatMap {
   case ((srcIds, srcFactors), (dstIds, dstFactors)) =>
-val m = srcIds.length
-val n = dstIds.length
 val ratings = srcFactors.transpose.multiply(dstFactors)
-val output = new Array[(Int, (Int, Double))](m * n)
 var k = 0
 ratings.foreachActive { (i, j, r) =>
-  output(k) = (srcIds(i), (dstIds(j), r))
+  output.append((srcIds(i), (dstIds(j), r)))
--- End diff --

Is k even needed now?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-11-25 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/9980#issuecomment-159750518
  
This won't help much and it may cause issues because the buffer is not 
cleaned. It would be helpful if you can profile the implementation and show 
that the number of temporary objects are reduced.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-11-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/9980#issuecomment-159749151
  
**[Test build #46718 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/46718/consoleFull)**
 for PR 9980 at commit 
[`4b2bb59`](https://github.com/apache/spark/commit/4b2bb59f46dad86cd7f09671040800f2664dfad0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-11968] [MLlib] : MatrixFactorizationMod...

2015-11-25 Thread rekhajoshm
GitHub user rekhajoshm opened a pull request:

https://github.com/apache/spark/pull/9980

[SPARK-11968] [MLlib] : MatrixFactorizationModel recommendAll for GC times

Fix for ALS recommend all methods for GC times

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rekhajoshm/spark SPARK-11968

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/9980.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #9980


commit e3677c9fa9697e0d34f9df52442085a6a481c9e9
Author: Rekha Joshi 
Date:   2015-05-05T23:10:08Z

Merge pull request #1 from apache/master

Pulling functionality from apache spark

commit 106fd8eee8f6a6f7c67cfc64f57c1161f76d8f75
Author: Rekha Joshi 
Date:   2015-05-08T21:49:09Z

Merge pull request #2 from apache/master

pull latest from apache spark

commit 0be142d6becba7c09c6eba0b8ea1efe83d649e8c
Author: Rekha Joshi 
Date:   2015-06-22T00:08:08Z

Merge pull request #3 from apache/master

Pulling functionality from apache spark

commit 6c6ee12fd733e3f9902e10faf92ccb78211245e3
Author: Rekha Joshi 
Date:   2015-09-17T01:03:09Z

Merge pull request #4 from apache/master

Pulling functionality from apache spark

commit b123c601e459d1ad17511fd91dd304032154882a
Author: Rekha Joshi 
Date:   2015-11-25T18:50:32Z

Merge pull request #5 from apache/master

pull request from apache/master

commit 4b2bb59f46dad86cd7f09671040800f2664dfad0
Author: Joshi 
Date:   2015-11-25T22:48:56Z

Fix for ALS recommend all methods for GC times




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org