I believe you're running into an erasure issue which we found in DecisionTree too. Check out: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala#L134
That retags RDDs which were created from Java to prevent the exception you're running into. Hope this helps! Joseph On Thu, Jan 8, 2015 at 12:48 PM, Devl Devel <devl.developm...@gmail.com> wrote: > Thanks for the suggestion, can anyone offer any advice on the ClassCast > Exception going from Java to Scala? Why does JavaRDD.rdd() and then a > collect() result in this exception? > > On Thu, Jan 8, 2015 at 4:13 PM, Yana Kadiyska <yana.kadiy...@gmail.com> > wrote: > > > How about > > > > > data.map(s=>s.split(",")).filter(_.length>1).map(good_entry=>Vectors.dense((Double.parseDouble(good_entry[0]), > > Double.parseDouble(good_entry[1])) > > > > (full disclosure, I didn't actually run this). But after the first map > you > > should have an RDD[Array[String]], then you'd discard everything shorter > > than 2, and convert the rest to dense vectors?...In fact if you're > > expecting length exactly 2 might want to filter ==2... > > > > > > On Thu, Jan 8, 2015 at 10:58 AM, Devl Devel <devl.developm...@gmail.com> > > wrote: > > > >> Hi All, > >> > >> I'm trying a simple K-Means example as per the website: > >> > >> val parsedData = data.map(s => > >> Vectors.dense(s.split(',').map(_.toDouble))) > >> > >> but I'm trying to write a Java based validation method first so that > >> missing values are omitted or replaced with 0. > >> > >> public RDD<Vector> prepareKMeans(JavaRDD<String> data) { > >> JavaRDD<Vector> words = data.flatMap(new FlatMapFunction<String, > >> Vector>() { > >> public Iterable<Vector> call(String s) { > >> String[] split = s.split(","); > >> ArrayList<Vector> add = new ArrayList<Vector>(); > >> if (split.length != 2) { > >> add.add(Vectors.dense(0, 0)); > >> } else > >> { > >> add.add(Vectors.dense(Double.parseDouble(split[0]), > >> Double.parseDouble(split[1]))); > >> } > >> > >> return add; > >> } > >> }); > >> > >> return words.rdd(); > >> } > >> > >> When I then call from scala: > >> > >> val parsedData=dc.prepareKMeans(data); > >> val p=parsedData.collect(); > >> > >> I get Exception in thread "main" java.lang.ClassCastException: > >> [Ljava.lang.Object; cannot be cast to > >> [Lorg.apache.spark.mllib.linalg.Vector; > >> > >> Why is the class tag is object rather than vector? > >> > >> 1) How do I get this working correctly using the Java validation example > >> above or > >> 2) How can I modify val parsedData = data.map(s => > >> Vectors.dense(s.split(',').map(_.toDouble))) so that when s.split size > <2 > >> I > >> ignore the line? or > >> 3) Is there a better way to do input validation first? > >> > >> Using spark and mlib: > >> libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.2.0" > >> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.2.0" > >> > >> Many thanks in advance > >> Dev > >> > > > > >