[GitHub] spark pull request #19078: [SPARK-21862][ML] Add overflow check in PCA

2017-08-31 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19078


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19078: [SPARK-21862][ML] Add overflow check in PCA

2017-08-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/19078#discussion_r136248809
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala ---
@@ -110,3 +115,17 @@ class PCAModel private[spark] (
 }
   }
 }
+
+object PCAUtil {
--- End diff --

This should be made package private


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19078: [SPARK-21862][ML] Add overflow check in PCA

2017-08-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/19078#discussion_r136249083
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala ---
@@ -44,6 +44,11 @@ class PCA @Since("1.4.0") (@Since("1.4.0") val k: Int) {
 require(k <= numFeatures,
   s"source vector size $numFeatures must be no less than k=$k")
 
+require(PCAUtil.memoryCost(k, numFeatures) <= Int.MaxValue,
--- End diff --

As long as you're making updates...how about making this strict inequality? 
 (I could imagine boundary issues with indexing somewhere in Breeze or MLlib.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19078: [SPARK-21862][ML] Add overflow check in PCA

2017-08-30 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/19078#discussion_r136248974
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala ---
@@ -44,6 +44,11 @@ class PCA @Since("1.4.0") (@Since("1.4.0") val k: Int) {
 require(k <= numFeatures,
   s"source vector size $numFeatures must be no less than k=$k")
 
+require(PCAUtil.memoryCost(k, numFeatures) <= Int.MaxValue,
+  "The param k and numFeatures is too large for SVD computation." +
--- End diff --

Put space between sentences: "computation. "


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19078: [SPARK-21862][ML] Add overflow check in PCA

2017-08-30 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19078#discussion_r136032375
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala ---
@@ -44,6 +44,13 @@ class PCA @Since("1.4.0") (@Since("1.4.0") val k: Int) {
 require(k <= numFeatures,
   s"source vector size $numFeatures must be no less than k=$k")
 
+val workSize = ( 3
--- End diff --

But catching `NegativeArraySizeException` won't be very precise, other 
place also possible to throw `NegativeArraySizeException`. It is a very common 
exception.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19078: [SPARK-21862][ML] Add overflow check in PCA

2017-08-30 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/19078#discussion_r136030423
  
--- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala ---
@@ -44,6 +44,13 @@ class PCA @Since("1.4.0") (@Since("1.4.0") val k: Int) {
 require(k <= numFeatures,
   s"source vector size $numFeatures must be no less than k=$k")
 
+val workSize = ( 3
--- End diff --

OK, how about catching NegativeArraySizeException and rethrowing a better 
error? replicating the check is really what I'm questioning, as it's pretty 
obtuse.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org