I believe you're running into an erasure issue which we found in
DecisionTree too.  Check out:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/RandomForest.scala#L134

That retags RDDs which were created from Java to prevent the exception
you're running into.

Hope this helps!
Joseph

On Thu, Jan 8, 2015 at 12:48 PM, Devl Devel <devl.developm...@gmail.com>
wrote:

> Thanks for the suggestion, can anyone offer any advice on the ClassCast
> Exception going from Java to Scala? Why does JavaRDD.rdd() and then a
> collect() result in this exception?
>
> On Thu, Jan 8, 2015 at 4:13 PM, Yana Kadiyska <yana.kadiy...@gmail.com>
> wrote:
>
> > How about
> >
> >
> data.map(s=>s.split(",")).filter(_.length>1).map(good_entry=>Vectors.dense((Double.parseDouble(good_entry[0]),
> > Double.parseDouble(good_entry[1]))
> > ​
> > (full disclosure, I didn't actually run this). But after the first map
> you
> > should have an RDD[Array[String]], then you'd discard everything shorter
> > than 2, and convert the rest to dense vectors?...In fact if you're
> > expecting length exactly 2 might want to filter ==2...
> >
> >
> > On Thu, Jan 8, 2015 at 10:58 AM, Devl Devel <devl.developm...@gmail.com>
> > wrote:
> >
> >> Hi All,
> >>
> >> I'm trying a simple K-Means example as per the website:
> >>
> >> val parsedData = data.map(s =>
> >> Vectors.dense(s.split(',').map(_.toDouble)))
> >>
> >> but I'm trying to write a Java based validation method first so that
> >> missing values are omitted or replaced with 0.
> >>
> >> public RDD<Vector> prepareKMeans(JavaRDD<String> data) {
> >>         JavaRDD<Vector> words = data.flatMap(new FlatMapFunction<String,
> >> Vector>() {
> >>             public Iterable<Vector> call(String s) {
> >>                 String[] split = s.split(",");
> >>                 ArrayList<Vector> add = new ArrayList<Vector>();
> >>                 if (split.length != 2) {
> >>                     add.add(Vectors.dense(0, 0));
> >>                 } else
> >>                 {
> >>                     add.add(Vectors.dense(Double.parseDouble(split[0]),
> >>                Double.parseDouble(split[1])));
> >>                 }
> >>
> >>                 return add;
> >>             }
> >>         });
> >>
> >>         return words.rdd();
> >> }
> >>
> >> When I then call from scala:
> >>
> >> val parsedData=dc.prepareKMeans(data);
> >> val p=parsedData.collect();
> >>
> >> I get Exception in thread "main" java.lang.ClassCastException:
> >> [Ljava.lang.Object; cannot be cast to
> >> [Lorg.apache.spark.mllib.linalg.Vector;
> >>
> >> Why is the class tag is object rather than vector?
> >>
> >> 1) How do I get this working correctly using the Java validation example
> >> above or
> >> 2) How can I modify val parsedData = data.map(s =>
> >> Vectors.dense(s.split(',').map(_.toDouble))) so that when s.split size
> <2
> >> I
> >> ignore the line? or
> >> 3) Is there a better way to do input validation first?
> >>
> >> Using spark and mlib:
> >> libraryDependencies += "org.apache.spark" % "spark-core_2.10" %  "1.2.0"
> >> libraryDependencies += "org.apache.spark" % "spark-mllib_2.10" % "1.2.0"
> >>
> >> Many thanks in advance
> >> Dev
> >>
> >
> >
>

Reply via email to