Hi all, I just upgraded spark from 1.2.1 to 1.3.0, and changed the "import sqlContext.createSchemaRDD" to "import sqlContext.implicits._" in my code. (I scan the programming guide and it seems this is the only change I need to do). But it come to an error when run compile as following: >>> [ERROR] ...\magic.scala:527: error: value registerTempTable is not a member of org.apache.spark.rdd.RDD[com.yhd.ycache.magic.Table] [INFO] tableRdd.registerTempTable(tableName) <<<
Then I try the exactly example in the programming guide of 1.3 in spark-shell, it come to the same error. >>> scala> sys.env.get("CLASSPATH") res7: Option[String] = Some(:/root/scala/spark-1.3.0-bin-hadoop2.4/conf:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/spark-assembly-1.3.0-hadoop2.4.0.jar:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-core-3.2.10.jar:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-rdbms-3.2.9.jar:/root/scala/spark-1.3.0-bin-hadoop2.4/lib/datanucleus-api-jdo-3.2.6.jar) scala> val sqlContext = new org.apache.spark.sql.SQLContext(sc) sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@4b05b3ff scala> import sqlContext.implicits._ import sqlContext.implicits._ scala> case class Person(name: String, age: Int) defined class Person scala> val t1 = sc.textFile("hdfs://heju:8020/user/root/magic/poolInfo.txt") 15/03/25 11:13:35 INFO MemoryStore: ensureFreeSpace(81443) called with curMem=186397, maxMem=278302556 15/03/25 11:13:35 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 79.5 KB, free 265.2 MB) 15/03/25 11:13:35 INFO MemoryStore: ensureFreeSpace(31262) called with curMem=267840, maxMem=278302556 15/03/25 11:13:35 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 30.5 KB, free 265.1 MB) 15/03/25 11:13:35 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on heju:48885 (size: 30.5 KB, free: 265.4 MB) 15/03/25 11:13:35 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0 15/03/25 11:13:35 INFO SparkContext: Created broadcast 3 from textFile at <console>:34 t1: org.apache.spark.rdd.RDD[String] = hdfs://heju:8020/user/root/magic/poolInfo.txt MapPartitionsRDD[9] at textFile at <console>:34 scala> val t2 = t1.flatMap(_.split("\n")).map(_.split(" ")).map(p => Person(p(0),1)) t2: org.apache.spark.rdd.RDD[Person] = MapPartitionsRDD[12] at map at <console>:38 scala> t2.registerTempTable("people") <console>:41: error: value registerTempTable is not a member of org.apache.spark.rdd.RDD[Person] t2.registerTempTable("people") ^ <<< I found the following explanation in programming guide about implicit convert case class to DataFrams, but I don't understand what I should do. Could any one tell me how should I do if I want to convert a case class RDD to DataFrame? >>> Isolation of Implicit Conversions and Removal of dsl Package (Scala-only) Many of the code examples prior to Spark 1.3 started with import sqlContext._, which brought all of the functions from sqlContext into scope. In Spark 1.3 we have isolated the implicit conversions for converting RDDs into DataFrames into an object inside of the SQLContext. Users should now write import sqlContext.implicits._. Additionally, the implicit conversions now only augment RDDs that are composed of Products (i.e., case classes or tuples) with a method toDF, instead of applying automatically. <<< Thanks Jason