Hmm, I got the same error with the master. Here is another test example that fails. Here, I explicitly create a Row RDD which corresponds to the use case I am in :
*object TestDataFrame { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("TestDataFrame").setMaster("local[4]") val sc = new SparkContext(conf) val sqlContext = new SQLContext(sc)* * import sqlContext.implicits._* * val data = Seq(LabeledPoint(1, Vectors.zeros(10))) val dataDF = sc.parallelize(data).toDF dataDF.printSchema() dataDF.save("test1.parquet") // OK val dataRow = data.map {case LabeledPoint(l: Double, f: mllib.linalg.Vector)=> Row(l,f) } val dataRowRDD = sc.parallelize(dataRow) val dataDF2 = sqlContext.createDataFrame(dataRowRDD, dataDF.schema) dataDF2.printSchema() dataDF2.saveAsParquetFile("test3.parquet") // FAIL !!! }}* On Tue, Mar 31, 2015 at 11:18 PM, Xiangrui Meng <men...@gmail.com> wrote: > I cannot reproduce this error on master, but I'm not aware of any > recent bug fixes that are related. Could you build and try the current > master? -Xiangrui > > On Tue, Mar 31, 2015 at 4:10 AM, Jaonary Rabarisoa <jaon...@gmail.com> > wrote: > > Hi all, > > > > DataFrame with an user defined type (here mllib.Vector) created with > > sqlContex.createDataFrame can't be saved to parquet file and raise > > ClassCastException: org.apache.spark.mllib.linalg.DenseVector cannot be > cast > > to org.apache.spark.sql.Row error. > > > > Here is an example of code to reproduce this error : > > > > object TestDataFrame { > > > > def main(args: Array[String]): Unit = { > > //System.loadLibrary(Core.NATIVE_LIBRARY_NAME) > > val conf = new > > SparkConf().setAppName("RankingEval").setMaster("local[8]") > > .set("spark.executor.memory", "6g") > > > > val sc = new SparkContext(conf) > > val sqlContext = new SQLContext(sc) > > > > import sqlContext.implicits._ > > > > val data = sc.parallelize(Seq(LabeledPoint(1, Vectors.zeros(10)))) > > val dataDF = data.toDF > > > > dataDF.save("test1.parquet") > > > > val dataDF2 = sqlContext.createDataFrame(dataDF.rdd, dataDF.schema) > > > > dataDF2.save("test2.parquet") > > } > > } > > > > > > Is this related to https://issues.apache.org/jira/browse/SPARK-5532 and > how > > can it be solved ? > > > > > > Cheers, > > > > > > Jao >