Hi I have a DataFrame which I need to convert into JavaRDD and back to DataFrame I have the following code
DataFrame sourceFrame = hiveContext.read().format("orc").load("/path/to/orc/file"); //I do order by in above sourceFrame and then I convert it into JavaRDD JavaRDD<Row> modifiedRDD = sourceFrame.toJavaRDD().map(new Function<Row,Row>({ public Row call(Row row) throws Exception { if(row != null) { //updated row by creating new Row return RowFactory.create(updateRow); } return null; }); //now I convert above JavaRDD<Row> into DataFrame using the following DataFrame modifiedFrame = sqlContext.createDataFrame(modifiedRDD,schema); sourceFrame and modifiedFrame schema is same when I call sourceFrame.show() output is expected I see every column has corresponding values and no column is empty but when I call modifiedFrame.show() I see all the columns values gets merged into first column value for e.g. assume source DataFrame has 3 column as shown below _col1 _col2 _col3 ABC 10 DEF GHI 20 JKL When I print modifiedFrame which I converted from JavaRDD it shows in the following order _col1 _col2 _col3 ABC,10,DEF GHI,20,JKL As shown above all the _col1 has all the values and _col2 and _col3 is empty. I dont know what is wrong I am doing please guide I am new to Spark thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-DataFrame-created-from-JavaRDD-Row-copies-all-columns-data-into-first-column-tp23961.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org