[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-30 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22784
  
Thanks a lot @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-30 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged to master


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98178/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98178 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98178/testReport)**
 for PR 22784 at commit 
[`0effc85`](https://github.com/apache/spark/commit/0effc85ccfc831bcc4c469b4a4c1d8db26fab72e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98178 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98178/testReport)**
 for PR 22784 at commit 
[`0effc85`](https://github.com/apache/spark/commit/0effc85ccfc831bcc4c469b4a4c1d8db26fab72e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98164/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98164 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98164/testReport)**
 for PR 22784 at commit 
[`2b7ee7b`](https://github.com/apache/spark/commit/2b7ee7b0a6d2cbcc159826d8dbe286a4a144d463).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98161/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98161 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98161/testReport)**
 for PR 22784 at commit 
[`0d9eea8`](https://github.com/apache/spark/commit/0d9eea8fcffbdd72bdb8dd8b93de3ac9a782fc85).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98164 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98164/testReport)**
 for PR 22784 at commit 
[`2b7ee7b`](https://github.com/apache/spark/commit/2b7ee7b0a6d2cbcc159826d8dbe286a4a144d463).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98161 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98161/testReport)**
 for PR 22784 at commit 
[`0d9eea8`](https://github.com/apache/spark/commit/0d9eea8fcffbdd72bdb8dd8b93de3ac9a782fc85).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98145 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98145/testReport)**
 for PR 22784 at commit 
[`18af032`](https://github.com/apache/spark/commit/18af0325e95552a00983983224795e71f2e66204).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98145/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98144/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98144 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98144/testReport)**
 for PR 22784 at commit 
[`094594b`](https://github.com/apache/spark/commit/094594bf63a22be65bac7b31932d5d870f1142d3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98145 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98145/testReport)**
 for PR 22784 at commit 
[`18af032`](https://github.com/apache/spark/commit/18af0325e95552a00983983224795e71f2e66204).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98144 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98144/testReport)**
 for PR 22784 at commit 
[`094594b`](https://github.com/apache/spark/commit/094594bf63a22be65bac7b31932d5d870f1142d3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98141/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98141 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98141/testReport)**
 for PR 22784 at commit 
[`3cbe017`](https://github.com/apache/spark/commit/3cbe017c640764db0fe95bcc2a820917bbc5fb3e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98140 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98140/testReport)**
 for PR 22784 at commit 
[`5674e17`](https://github.com/apache/spark/commit/5674e177b7894d61904c6748dbf7721359163938).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98140/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98141 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98141/testReport)**
 for PR 22784 at commit 
[`3cbe017`](https://github.com/apache/spark/commit/3cbe017c640764db0fe95bcc2a820917bbc5fb3e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98140 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98140/testReport)**
 for PR 22784 at commit 
[`5674e17`](https://github.com/apache/spark/commit/5674e177b7894d61904c6748dbf7721359163938).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98135/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98135 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98135/testReport)**
 for PR 22784 at commit 
[`a8c4391`](https://github.com/apache/spark/commit/a8c43919a5d8624a5a5ddf7ea862a93f2db098c6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98134/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98134 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98134/testReport)**
 for PR 22784 at commit 
[`b1789d7`](https://github.com/apache/spark/commit/b1789d7a2305c53b463960e1d60f85abde5934ad).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98135 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98135/testReport)**
 for PR 22784 at commit 
[`a8c4391`](https://github.com/apache/spark/commit/a8c43919a5d8624a5a5ddf7ea862a93f2db098c6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22784
  
Thank you @srowen for the review.  I have addressed the comments.

>  I wonder if the SVD should be used at even smaller scales? as you point 
out, it's pretty hard to compute a gramian on even a 40k x 40k matrix. 
> 

Yes. We can compute the PCA using SVD even for smaller scales. In fact if 
the number of columns are lesser, Spark SVD computes eigen decomposition by 
computing gramian matrix first, which is the same approach as in PCA.

The condition for whether to compute gramian matrix first or not is given 
below, for Spark SVD.

https://github.com/apache/spark/blob/d5573c578a1eea9ee04886d9df37c7178e67bb30/mllib/src/main/scala/org/apache/spark/mllib/linalg/distributed/RowMatrix.scala#L232-L243




So, for smaller number of columns, (< 15000 columns), Spark SVD prefers 
computation of graminan matrix first and then computing the svd, which is same 
as the current implimentation of PCA. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-27 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #98134 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98134/testReport)**
 for PR 22784 at commit 
[`b1789d7`](https://github.com/apache/spark/commit/b1789d7a2305c53b463960e1d60f85abde5934ad).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-22 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
Sorry for my mistake. My keyboard '4' sometimes has a trouble.
> I think, INT_MAX is 2147483647, so n ~= sqrt(2*2147483647) = 65536.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-22 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22784
  
Hi @srowen , Thanks for the comment. As per my knowledge, PCA/SVD is not 
limited on row size.

1) Currently row size is not a constraint. Ultimately we need to compute 
graminan matrix/GramianMatrix vector product to compute SVD.  So, computation 
of svd is limited by columns only.

5) Sparsity is only for the computation of gramian matrix/ gramian matrix 
vector product in both PCA and Spark SVD. Mean centred vector  will always be 
dense. Currently PCA is computed with dense matrix and SVD uses dense vector.  
So, only constraint about dense is coming in the matrix vector product 
computation.

6) In this PR, if the limit exceeds, it will compute in the distributed 
manner, which current PCA doesn't support.

2) Currently PCA is not scalable in terms of column number
 For 40GB driver memory, and number of columns is 40,000 and number of rows 
is 1lakh, I am getting following error.

```
scala> val pca = new PCA(k).fit(rad)
2018-10-22 22:44:23,128 | WARN  | main | 4 columns will require at 
least 12800 megabytes of memory! | 
org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66)
2018-10-22 22:47:02,836 | WARN  | main | 4 columns will require at 
least 12800 megabytes of memory! | 
org.apache.spark.internal.Logging$class.logWarning(Logging.scala:66)
java.lang.OutOfMemoryError
  at 
java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
  at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
  at 
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
  at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
  at 
org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:41)
  at 
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
  at 
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
  at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189)
  at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
  at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)

```


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-22 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/22784
  
Hm, as a general comment, is this going to scale? This is making a 
potentially huge sparse data set dense, and computing a PCA via SVD. I get the 
idea that it's better to have some option than none, but I wonder if this 
approach is realistic for a data set with even 100K rows, and if not, is it 
going to confuse people.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-22 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22784
  
Hi @kiszk Maximum it can go upto the following limit.

https://github.com/apache/spark/blob/23cfda1547355a823a3b2b2d374e64608c9ce175/mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala#L78-L79

where ncv = min(n, 2*k), normally k << n.

For eg: if n = 1 million features, we can compute top 100 principle 
components. 

Number of principle components to compute is configurable.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-22 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
One question: After this PR, what is the maximum column that we can accept?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-22 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can I clarify the description?
> Because we are passing an array of size n*(n+1)/2 to the breeze library 
and the size cannot be more than INT_MAX. so, the maximum column size we can 
give is 65,500.

If n > 20726, `n*(n+1)/2` > 214783647 ( = INT_MAX)`. Where does this 
limitation `65,500` come from?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-22 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can I clarify the description?
> Because we are passing an array of size n*(n+1)/2 to the breeze library 
and the size cannot be more than INT_MAX. so, the maximum column size we can 
give is 65,500.

If n > 20726, `n*(n+1)/2` > 214783647 ( = INT_MAX)`. Where does this 
limitation `65,500` come from?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-22 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22784
  
Hi @kiszk ,
 I think, INT_MAX is 2147483647, so n ~= sqrt(2*2147483647) = 65536.
Thanks


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can I clarify the description?
> Because we are passing an array of size n*(n+1)/2 to the breeze library 
and the size cannot be more than INT_MAX. so, the maximum column size we can 
give is 65,500.

If n > 20726, `n*(n+1)/2` > 214783647 ( = INT_MAX)`. Where does this 
limitation `65,500` come from?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can I clarify the description?
> Because we are passing an array of size n*(n+1)/2 to the breeze library 
and the size cannot be more than INT_MAX. so, the maximum column size we can 
give is 65,500.

If n > 20726, `n*(n+1)/2` > 214783647 ( = INT_MAX)`. Where does `65,500` 
come from?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can I clarify the description?
> Because we are passing an array of size n*(n+1)/2 to the breeze library 
and the size cannot be more than INT_MAX. so, the maximum column size we can 
give is 65,500.

If n > 20726, `n*(n+1)/2` > 214783647 ( = INT_MAX)`. Where does `65,500` 
come from?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can I clarify the description?
> Because we are passing an array of size n*(n+1)/2 to the breeze library 
and the size cannot be more than INT_MAX. so, the maximum column size we can 
give is 65,500.

If n > 20726, `n*(n+1)/2` > 214783647 ( = INT_MAX)`. Where does `65,500` 
come from?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can I clarify the description?
> Because we are passing an array of size n*(n+1)/2 to the breeze library 
and the size cannot be more than INT_MAX. so, the maximum column size we can 
give is 65,500.

If n > 20726, `n*(n+1)/2` > 214783647 ( = INT_MAX)`. Where does `65,500` 
come from?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can I clarify the description?
> Because we are passing an array of size n*(n+1)/2 to the breeze library 
and the size cannot be more than INT_MAX. so, the maximum column size we can 
give is 65,500.

If n > 20726, `n*(n+1)/2` > 214783647 ( = INT_MAX)`. Where does `65,500` 
come from?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can I clarify the description?
> Because we are passing an array of size n*(n+1)/2 to the breeze library 
and the size cannot be more than INT_MAX. so, the maximum column size we can 
give is 65,500.

If n > 20726, `n*(n+1)/2` > 214783647 ( = INT_MAX)`. Where does `65,500` 
come from?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can I clarify the description?
> Because we are passing an array of size n*(n+1)/2 to the breeze library 
and the size cannot be more than INT_MAX. so, the maximum column size we can 
give is 65,500.

If n > 20726, `n*(n+1)/2` > 214783647 ( = INT_MAX)`. Where does `65,500` 
come from?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can I clarify the description?
> Because we are passing an array of size n*(n+1)/2 to the breeze library 
and the size cannot be more than INT_MAX. so, the maximum column size we can 
give is 65,500.

If n > 20726, `n*(n+1)/2` > 214783647 ( = INT_MAX)`. Where does `65,500` 
come from?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can I clarify the description?
> Because we are passing an array of size n*(n+1)/2 to the breeze library 
and the size cannot be more than INT_MAX. so, the maximum column size we can 
give is 65,500.

If n > 20726, `n*(n+1)/2` > 214783647 ( = INT_MAX)`. Where does `65,500` 
come from?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can I clarify the description?
> Because we are passing an array of size n*(n+1)/2 to the breeze library 
and the size cannot be more than INT_MAX. so, the maximum column size we can 
give is 65,500.

If n > 20726, `n*(n+1)/2` > 214783647 ( = INT_MAX)`. Where does `65,500` 
come from?




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #97683 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97683/testReport)**
 for PR 22784 at commit 
[`23cfda1`](https://github.com/apache/spark/commit/23cfda1547355a823a3b2b2d374e64608c9ce175).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97683/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #97683 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97683/testReport)**
 for PR 22784 at commit 
[`23cfda1`](https://github.com/apache/spark/commit/23cfda1547355a823a3b2b2d374e64608c9ce175).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22784
  
All the UTs are passing locally. Seems random error.
retest this please.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97682/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #97682 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97682/testReport)**
 for PR 22784 at commit 
[`9aff54f`](https://github.com/apache/spark/commit/9aff54fecc530c77e5f97941e15a478a421827d0).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #97682 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97682/testReport)**
 for PR 22784 at commit 
[`9aff54f`](https://github.com/apache/spark/commit/9aff54fecc530c77e5f97941e15a478a421827d0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/22784
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97680/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #97680 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97680/testReport)**
 for PR 22784 at commit 
[`9aff54f`](https://github.com/apache/spark/commit/9aff54fecc530c77e5f97941e15a478a421827d0).
 * This patch **fails due to an unknown error code, -9**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97679/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #97679 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97679/testReport)**
 for PR 22784 at commit 
[`4c1776f`](https://github.com/apache/spark/commit/4c1776f14f1453c2f64350a58cab7209764f826c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #97680 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97680/testReport)**
 for PR 22784 at commit 
[`9aff54f`](https://github.com/apache/spark/commit/9aff54fecc530c77e5f97941e15a478a421827d0).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97677/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #97677 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97677/testReport)**
 for PR 22784 at commit 
[`e6cf661`](https://github.com/apache/spark/commit/e6cf6612ee488ace7bfc11db26ee4d6cc72e3368).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #97679 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97679/testReport)**
 for PR 22784 at commit 
[`4c1776f`](https://github.com/apache/spark/commit/4c1776f14f1453c2f64350a58cab7209764f826c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97678/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #97678 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97678/testReport)**
 for PR 22784 at commit 
[`9111fca`](https://github.com/apache/spark/commit/9111fcab296ca71bfa280010d60aa803ece69509).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #97678 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97678/testReport)**
 for PR 22784 at commit 
[`9111fca`](https://github.com/apache/spark/commit/9111fcab296ca71bfa280010d60aa803ece69509).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #97677 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97677/testReport)**
 for PR 22784 at commit 
[`e6cf661`](https://github.com/apache/spark/commit/e6cf6612ee488ace7bfc11db26ee4d6cc72e3368).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22784
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97673/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #97673 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97673/testReport)**
 for PR 22784 at commit 
[`1252526`](https://github.com/apache/spark/commit/12525266fe76b767974e5ba94cd131251bc7ed3e).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22784
  
**[Test build #97673 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97673/testReport)**
 for PR 22784 at commit 
[`1252526`](https://github.com/apache/spark/commit/12525266fe76b767974e5ba94cd131251bc7ed3e).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/22784
  
ok to test.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22784
  
cc @srowen 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread shahidki31
Github user shahidki31 commented on the issue:

https://github.com/apache/spark/pull/22784
  
Test results with existing PCA and using SVD without computing covariance 
matrix
val data = Array(
Vectors.sparse(5, Seq((1, 1.0), (3, 7.0))),
Vectors.dense(2.0, 0.0, 3.0, 4.0, 5.0),
Vectors.dense(4.0, 0.0, 0.0, 6.0, 7.0))

1) PCA using covariance matrix
explained Variance = [ 0.7943932532, 0.2056067468, 1.26E-16]
Top 2 Principle components :  
[[-0.44859172075072673 -0.28423808214073987 
0.13301985745398526 -0.05621155904253121 
-0.1252315635978212 0.7636264774662965  
0.21650756651919933 -0.5652958773533949 
-0.8476512931126826 -0.11560340501314653 ]]

2) PCA using SVD, without computing covariance matrix: 
explained Variance = [0.7943932532, 0.2056067468, 5.55E-17]
Top 2 Principle components :  
[[-0.44859172075072673 -0.2842380821407399
0.13301985745398529 -0.056211559042531424
-0.12523156359782125 0.7636264774662964  
0.21650756651919945 -0.5652958773533953
-0.8476512931126826 -0.11560340501314664]]


**Leading Eigen Values MSE = 0.0
Leading eigen vectors MSE = 0.0**









---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22784: [SPARK-25790][MLLIB] PCA: Support more than 65535 column...

2018-10-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22784
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org