Liang-Chi Hsieh created SPARK-2355: -------------------------------------- Summary: Check for the number of clusters to avoid ArrayIndexOutOfBoundsException Key: SPARK-2355 URL: https://issues.apache.org/jira/browse/SPARK-2355 Project: Spark Issue Type: Bug Components: MLlib Affects Versions: 1.0.0 Reporter: Liang-Chi Hsieh
When the number of clusters given to perform with org.apache.spark.mllib.clustering.KMeans under parallel initial mode is greater than data number, it will throw ArrayIndexOutOfBoundsException. KMeans class should check the number of clusters that must not be greater than data number. Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.spark.mllib.clustering.LocalKMeans$$anonfun$kMeansPlusPlus$1.apply$mcVI$sp(LocalKMeans.scala:62) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141) at org.apache.spark.mllib.clustering.LocalKMeans$.kMeansPlusPlus(LocalKMeans.scala:49) at org.apache.spark.mllib.clustering.KMeans$$anonfun$20.apply(KMeans.scala:297) at org.apache.spark.mllib.clustering.KMeans$$anonfun$20.apply(KMeans.scala:294) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) at scala.collection.immutable.Range.foreach(Range.scala:141) at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at org.apache.spark.mllib.clustering.KMeans.initKMeansParallel(KMeans.scala:294) at org.apache.spark.mllib.clustering.KMeans.runBreeze(KMeans.scala:143) at org.apache.spark.mllib.clustering.KMeans.run(KMeans.scala:126) at org.apache.spark.examples.mllib.DenseKMeans$.run(DenseKMeans.scala:102) at org.apache.spark.examples.mllib.DenseKMeans$$anonfun$main$1.apply(DenseKMeans.scala:72) at org.apache.spark.examples.mllib.DenseKMeans$$anonfun$main$1.apply(DenseKMeans.scala:71) at scala.Option.map(Option.scala:145) at org.apache.spark.examples.mllib.DenseKMeans$.main(DenseKMeans.scala:71) at org.apache.spark.examples.mllib.DenseKMeans.main(DenseKMeans.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:303) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) -- This message was sent by Atlassian JIRA (v6.2#6252)