Not sure I understand this problem, why I cannot reproduce it?

scala> spark.version
res22: String = 2.1.0

scala> case class Teamuser(teamid: String, userid: String, role: String)
defined class Teamuser

scala> val df = Seq(Teamuser("t1", "u1", "role1")).toDF
df: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 more 
field]

scala> df.show
+------+------+-----+
|teamid|userid| role|
+------+------+-----+
|    t1|    u1|role1|
+------+------+-----+

scala> df.createOrReplaceTempView("teamuser")

scala> val newDF = spark.sql("select teamid, userid, role from teamuser")
newDF: org.apache.spark.sql.DataFrame = [teamid: string, userid: string ... 1 
more field]

scala> val userDS: Dataset[Teamuser] = newDF.as[Teamuser]
userDS: org.apache.spark.sql.Dataset[Teamuser] = [teamid: string, userid: 
string ... 1 more field]

scala> userDS.show
+------+------+-----+
|teamid|userid| role|
+------+------+-----+
|    t1|    u1|role1|
+------+------+-----+


scala> userDS.printSchema
root
 |-- teamid: string (nullable = true)
 |-- userid: string (nullable = true)
 |-- role: string (nullable = true)


Am I missing anything?


Yong


________________________________
From: shyla deshpande <deshpandesh...@gmail.com>
Sent: Thursday, March 23, 2017 3:49 PM
To: user
Subject: Re: Converting dataframe to dataset question

I realized, my case class was inside the object. It should be defined outside 
the scope of the object. Thanks

On Wed, Mar 22, 2017 at 6:07 PM, shyla deshpande 
<deshpandesh...@gmail.com<mailto:deshpandesh...@gmail.com>> wrote:

Why userDS is Dataset[Any], instead of Dataset[Teamuser]?  Appreciate your 
help. Thanks

    val spark = SparkSession
      .builder
      .config("spark.cassandra.connection.host", cassandrahost)
      .appName(getClass.getSimpleName)
      .getOrCreate()

    import spark.implicits._
    val sqlContext = spark.sqlContext
    import sqlContext.implicits._

    case class Teamuser(teamid:String, userid:String, role:String)
    spark
      .read
      .format("org.apache.spark.sql.cassandra")
      .options(Map("keyspace" -> "test", "table" -> "teamuser"))
      .load
      .createOrReplaceTempView("teamuser")

    val userDF = spark.sql("SELECT teamid, userid, role FROM teamuser")

    userDF.show()

    val userDS:Dataset[Teamuser] = userDF.as[Teamuser]


Reply via email to