[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14555


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-11 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74397043
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/VectorsSuite.scala ---
@@ -72,6 +72,12 @@ class VectorsSuite extends SparkMLFunSuite {
 }
   }
 
+  test("sparse vector construction with negative indices") {
+intercept[IllegalArgumentException] {
+  Vectors.sparse(3, Array(-1, 1), Array(3.0, 5.0))
+}
+  }
+
--- End diff --

... but 
https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala#L210
 does not assume this and that's the method to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-11 Thread zjffdu
Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74396667
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/VectorsSuite.scala ---
@@ -72,6 +72,12 @@ class VectorsSuite extends SparkMLFunSuite {
 }
   }
 
+  test("sparse vector construction with negative indices") {
+intercept[IllegalArgumentException] {
+  Vectors.sparse(3, Array(-1, 1), Array(3.0, 5.0))
+}
+  }
+
--- End diff --

For now, it is assumed the indices are in order. 
https://github.com/apache/spark/blob/master/mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala#L554


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-11 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74395330
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/VectorsSuite.scala ---
@@ -72,6 +72,12 @@ class VectorsSuite extends SparkMLFunSuite {
 }
   }
 
+  test("sparse vector construction with negative indices") {
+intercept[IllegalArgumentException] {
+  Vectors.sparse(3, Array(-1, 1), Array(3.0, 5.0))
+}
+  }
+
--- End diff --

Lastly, I believe, do we have a test that exercises the check for missorted 
indices? I don't see one; I suppose one should have been added before.

Same for Python.

I'm OK with the error messages, but now seeing that Python has parallel 
checks, it'd be nice to make the error messages match.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-10 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74208842
  
--- Diff: python/pyspark/ml/linalg/__init__.py ---
@@ -511,6 +519,12 @@ def __init__(self, size, *args):
 "Indices %s and %s are not strictly increasing"
 % (self.indices[i], self.indices[i + 1]))
 
+assert np.max(self.indices) < self.size, \
--- End diff --

Will this disallow a size 0 vector? not sure what np.max does here.
PS I like its error message better for "not strictly increasing" and we 
should probably make the messages generally consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-10 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74208436
  
--- Diff: 
mllib-local/src/test/scala/org/apache/spark/ml/linalg/VectorsSuite.scala ---
@@ -72,6 +72,12 @@ class VectorsSuite extends SparkMLFunSuite {
 }
   }
 
+  test("sparse vector construction with negative indices") {
+intercept[IllegalArgumentException] {
+  Vectors.sparse(3, Array(1, -2), Array(3.0, 5.0))
--- End diff --

This actually won't test the check for negative indices, because the 
indices are missorted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-10 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74208355
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,26 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  // validate the data
+  {
+require(size >= 0, "The size of the requested sparse vector must be 
greater than 0.")
+require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
+  s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
+  s" ${values.length} values.")
+require(indices.length <= size, s"You provided ${indices.length} 
indices and values, " +
+  s"which exceeds the specified vector size ${size}.")
+
+var prev = -1
+if (indices.nonEmpty) {
+  require(indices(0) >= 0, s"Found negative index: ${indices(0)}.")
+}
+indices.foreach { i =>
+  require(prev < i, s"Found duplicate index: $i.")
+  prev = i
+}
+require(prev < size, s"You may not write an element to index $prev 
because the declared " +
--- End diff --

Maybe more direct as `s"Index $prev out of bounds for vector of size 
$size"`? the way it starts it seems like the problem is something being 
read-only.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-10 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74208246
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,26 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  // validate the data
+  {
+require(size >= 0, "The size of the requested sparse vector must be 
greater than 0.")
+require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
+  s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
+  s" ${values.length} values.")
+require(indices.length <= size, s"You provided ${indices.length} 
indices and values, " +
+  s"which exceeds the specified vector size ${size}.")
+
+var prev = -1
+if (indices.nonEmpty) {
+  require(indices(0) >= 0, s"Found negative index: ${indices(0)}.")
+}
+indices.foreach { i =>
+  require(prev < i, s"Found duplicate index: $i.")
--- End diff --

These are just nits now, but `prev` should be declared just before the loop 
where it's used. This error message could be more general, since it will catch 
the case that the indices aren't sorted. It could say `s"Index $i follows $prev 
and is not strictly increasing"`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-09 Thread zjffdu
Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74022257
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  validate()
+
+  private def validate(): Unit = {
+require(size >= 0, "The size of the requested sparse vector must be 
greater than 0.")
--- End diff --

Yes, I do see test for 0 length vector. 

https://github.com/apache/spark/blob/master/mllib-local/src/test/scala/org/apache/spark/ml/linalg/VectorsSuite.scala#L81


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-09 Thread zjffdu
Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74021856
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  validate()
--- End diff --

I also thought about `{...}`,  just feel putting into one method is better. 
Anyway I can do that way if this is not proper for spark code style. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74021332
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  validate()
+
+  private def validate(): Unit = {
+require(size >= 0, "The size of the requested sparse vector must be 
greater than 0.")
--- End diff --

This allows a size 0 vector now. I guess that's good, because `DenseVector` 
allows this (a 0 length array).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-09 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74021092
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  validate()
--- End diff --

They wouldn't become fields unless used outside the constructor. You can 
also use a simple scope `{...}` to guard against this. I understand the 
argument and don't feel strongly either way, but we don't do this in other code 
in general.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-09 Thread zjffdu
Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74009150
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  validate()
--- End diff --

2 reasons
* group the validation code together
* I may define some temp variable for validation, without method it would 
become variable of SparseVector


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-08 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74006226
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  validate()
--- End diff --

Sure, but why bother writing a method? it's invoked once directly above. 
This is just constructor code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-08 Thread zjffdu
Github user zjffdu commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74006057
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  validate()
--- End diff --

What do you mean ? This method would be called when SparseVector is 
created, it can refer any variable in SparseVector. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-08 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74005289
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  validate()
+
+  private def validate(): Unit = {
+require(size >= 0, "The size of the requested sparse vector must be 
greater than 0.")
+require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
+  s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
+  s" ${values.length} values.")
+require(indices.length <= size, s"You provided ${indices.length} 
indices and values, " +
+  s"which exceeds the specified vector size ${size}.")
+
+var prev = -1
+indices.foreach { i =>
+  require(i >= 0, s"Found negative indice: $i.")
+  require(prev < i, s"Found duplicate indices: $i.")
+  prev = i
+}
+require(prev < size, s"You may not write an element to index $prev 
because the declared " +
--- End diff --

If you're doing it this way, then just check if the first index is >= 0. If 
any of them is, it will be.
This message could be a little more straightforward: "found index that 
exceeds size" or something


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-08 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74005178
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  validate()
+
+  private def validate(): Unit = {
+require(size >= 0, "The size of the requested sparse vector must be 
greater than 0.")
+require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
+  s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
+  s" ${values.length} values.")
+require(indices.length <= size, s"You provided ${indices.length} 
indices and values, " +
+  s"which exceeds the specified vector size ${size}.")
+
+var prev = -1
+indices.foreach { i =>
+  require(i >= 0, s"Found negative indice: $i.")
--- End diff --

index, not indice


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-08 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14555#discussion_r74005184
  
--- Diff: 
mllib-local/src/main/scala/org/apache/spark/ml/linalg/Vectors.scala ---
@@ -560,11 +554,25 @@ class SparseVector @Since("2.0.0") (
 @Since("2.0.0") val indices: Array[Int],
 @Since("2.0.0") val values: Array[Double]) extends Vector {
 
-  require(indices.length == values.length, "Sparse vectors require that 
the dimension of the" +
-s" indices match the dimension of the values. You provided 
${indices.length} indices and " +
-s" ${values.length} values.")
-  require(indices.length <= size, s"You provided ${indices.length} indices 
and values, " +
-s"which exceeds the specified vector size ${size}.")
+  validate()
--- End diff --

What's the value in this method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14555: [SPARK-16965][MLLIB][PYSPARK] Fix bound checking ...

2016-08-08 Thread zjffdu
GitHub user zjffdu opened a pull request:

https://github.com/apache/spark/pull/14555

[SPARK-16965][MLLIB][PYSPARK] Fix bound checking for SparseVector.

## What changes were proposed in this pull request?

1. In scala, add negative low bound checking and put all the low/upper 
bound checking in one place
2. In python, add low/upper bound checking of indices.


## How was this patch tested?

unit test added





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zjffdu/spark SPARK-16965

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/14555.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14555


commit 57b9c84a727f6bbc6bdc28148696c8c985408f80
Author: Jeff Zhang 
Date:   2016-08-09T05:30:34Z

[SPARK-16965][MLLIB][PYSPARK] Fix bound checking for SparseVector.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org