zhengruifeng created SPARK-16875:
------------------------------------

             Summary: Add args checking for DataSet randomSplit and sample
                 Key: SPARK-16875
                 URL: https://issues.apache.org/jira/browse/SPARK-16875
             Project: Spark
          Issue Type: Improvement
          Components: SQL
            Reporter: zhengruifeng
            Priority: Minor


{code}
scala> data
res73: org.apache.spark.sql.DataFrame = [label: double, features: vector]

scala> data.count
res74: Long = 150

scala> val s = data.randomSplit(Array(1,2,-0.01))
s: Array[org.apache.spark.sql.Dataset[org.apache.spark.sql.Row]] = 
Array([label: double, features: vector], [label: double, features: vector], 
[label: double, features: vector])

scala> s(0).count
res75: Long = 51

scala> s(2).count
16/08/03 18:28:27 ERROR Executor: Exception in task 0.0 in stage 76.0 (TID 66)
java.lang.IllegalArgumentException: requirement failed: Upper bound 
(1.0033444816053512) must be <= 1.0
        at scala.Predef$.require(Predef.scala:224)

scala> data.sample(false, -0.01)
res80: org.apache.spark.sql.Dataset[org.apache.spark.sql.Row] = [label: double, 
features: vector]

scala> data.sample(false, -0.01).count
16/08/03 18:30:33 ERROR Executor: Exception in task 0.0 in stage 84.0 (TID 71)
java.lang.IllegalArgumentException: requirement failed: Lower bound (0.0) must 
be <= upper bound (-0.01)
{code}

{{val s = data.randomSplit(Array(1,2,-0.01))}} run successfully, even if I use 
{{s(0)}} in the following lines.
{{data.sample(false, -0.01)}} should also fail immediately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to