[jira] [Created] (SPARK-25289) ChiSqSelector max on empty collection

Marie Beaulieu (JIRA) Thu, 30 Aug 2018 19:58:18 -0700

Marie Beaulieu created SPARK-25289:
--------------------------------------

             Summary: ChiSqSelector max on empty collection
                 Key: SPARK-25289
                 URL: https://issues.apache.org/jira/browse/SPARK-25289
             Project: Spark
          Issue Type: Bug
          Components: MLlib
    Affects Versions: 2.3.1
            Reporter: Marie Beaulieu



In org.apache.spark.mllib.feature.ChiSqSelector.fit, there is a max taken on a 
possibly empty collection.

I am using Spark 2.3.1.

Here is an example to reproduce.
{code:java}
import org.apache.spark.mllib.feature.ChiSqSelector
import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.sql.SQLContext

val sqlContext = new SQLContext(sc)
implicit val spark = sqlContext.sparkSession

val labeledPoints = (0 to 1).map(n => {
  val v = Vectors.dense((1 to 3).map(_ => n * 1.0).toArray)
  LabeledPoint(n.toDouble, v)
})
val rdd = sc.parallelize(labeledPoints)
val selector = new ChiSqSelector().setSelectorType("fdr").setFdr(0.05)
selector.fit(rdd){code}
Here is the stack trace:
{code:java}
java.lang.UnsupportedOperationException: empty.max
at scala.collection.TraversableOnce$class.max(TraversableOnce.scala:229)
at scala.collection.mutable.ArrayOps$ofInt.max(ArrayOps.scala:234)
at org.apache.spark.mllib.feature.ChiSqSelector.fit(ChiSqSelector.scala:280)
{code}
Looking at line 280 in ChiSqSelector, it's pretty obvious how the collection 
can be empty. A simple non empty validation should do the trick.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-25289) ChiSqSelector max on empty collection

Reply via email to