Thank you for quick response! I’ll use Tuple1 From: Feynman Liang [mailto:fli...@databricks.com] Sent: Monday, September 14, 2015 11:05 AM To: Ulanov, Alexander Cc: dev@spark.apache.org Subject: Re: Data frame with one column
For an example, see the ml-feature word2vec user guide<https://spark.apache.org/docs/latest/ml-features.html#word2vec> On Mon, Sep 14, 2015 at 11:03 AM, Feynman Liang <fli...@databricks.com<mailto:fli...@databricks.com>> wrote: You could use `Tuple1(x)` instead of `Hack` On Mon, Sep 14, 2015 at 10:50 AM, Ulanov, Alexander <alexander.ula...@hpe.com<mailto:alexander.ula...@hpe.com>> wrote: Dear Spark developers, I would like to create a dataframe with one column. However, the createDataFrame method accepts at least a Product: val data = Seq(1.0, 2.0) val rdd = sc.parallelize(data, 2) val df = sqlContext.createDataFrame(rdd) [fail]<console>:25: error: overloaded method value createDataFrame with alternatives: [A <: Product](data: Seq[A])(implicit evidence$2: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame <and> [A <: Product](rdd: org.apache.spark.rdd.RDD[A])(implicit evidence$1: reflect.runtime.universe.TypeTag[A])org.apache.spark.sql.DataFrame cannot be applied to (org.apache.spark.rdd.RDD[Double]) val df = sqlContext.createDataFrame(rdd) So, if I zip rdd with index, then it is OK: val df = sqlContext.createDataFrame(rdd.zipWithIndex) [success]df: org.apache.spark.sql.DataFrame = [_1: double, _2: bigint] Also, if I use the case class, it also seems to work: case class Hack(x: Double) val caseRDD = rdd.map( x => Hack(x)) val df = sqlContext.createDataFrame(caseRDD) [success]df: org.apache.spark.sql.DataFrame = [x: double] What is the recommended way of creating a dataframe with one column? Best regards, Alexander