ah.. thanks , your code also works for me, I figured it's because I tried
to encode a tuple of (MyClass, Int):


package org.apache.spark

/**
  */

import org.apache.spark.sql.catalyst.util.{ArrayData, GenericArrayData}
import org.apache.spark.sql.types._
import org.apache.spark.sql.{Encoders, SQLContext}


object Hello {
  // this class has to be OUTSIDE the method that calls it!! otherwise
gives error about typetag not found
  // the UDT stuff from
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/test/ExamplePointUDT.scala
  // and 
http://stackoverflow.com/questions/32440461/how-to-define-schema-for-custom-type-in-spark-sql
  class Person4 {
    @scala.beans.BeanProperty def setX(x:Int): Unit = {}
    @scala.beans.BeanProperty def getX():Int = {1}
  }

  def main(args: Array[String]) {
    val logFile = "YOUR_SPARK_HOME/README.md" // Should be some file
on your system
    val conf = new
SparkConf().setMaster("local[*]").setAppName("Simple Application")
    val sc = new SparkContext(conf)

    val raw = Array((new Person4(), 1), (new Person4(), 1))
    val myrdd = sc.parallelize(raw)

    val sqlContext = new SQLContext(sc)

    implicit val personEncoder = Encoders.bean[Person4](classOf[Person4])
    implicit val personEncoder2 = Encoders.tuple(personEncoder, Encoders.INT)


    import sqlContext.implicits._
    //// -------- this works --------------
    Seq(new Person4(), new Person4()).toDS()

    //// ---------- this doesn't -----
    Seq((new Person4(),1), (new Person4(),1)).toDS()


    sc.stop()
  }
}


On Tue, May 9, 2017 at 1:37 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> Must be a bug.  This works for me
> <https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/908554720841389/2840265927289860/latest.html>
>  in
> Spark 2.1.
>
> On Tue, May 9, 2017 at 12:10 PM, Yang <teddyyyy...@gmail.com> wrote:
>
>> somehow the schema check is here
>>
>> https://github.com/apache/spark/blob/master/sql/catalyst/
>> src/main/scala/org/apache/spark/sql/catalyst/ScalaReflec
>> tion.scala#L697-L750
>>
>> supposedly beans are to be handled, but it's not clear to me which line
>> handles the type of beans. if that's clear, I could probably annotate my
>> bean class properly
>>
>> On Tue, May 9, 2017 at 11:19 AM, Michael Armbrust <mich...@databricks.com
>> > wrote:
>>
>>> I think you are supposed to set BeanProperty on a var as they do here
>>> <https://github.com/apache/spark/blob/f830bb9170f6b853565d9dd30ca7418b93a54fe3/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala#L71-L83>.
>>> If you are using scala though I'd consider using the case class encoders.
>>>
>>> On Tue, May 9, 2017 at 12:21 AM, Yang <teddyyyy...@gmail.com> wrote:
>>>
>>>> I'm trying to use Encoders.bean() to create an encoder for my custom
>>>> class, but it fails complaining about can't find the schema:
>>>>
>>>>
>>>> class Person4 { @scala.beans.BeanProperty def setX(x:Int): Unit = {}
>>>> @scala.beans.BeanProperty def getX():Int = {1} } val personEncoder =
>>>> Encoders.bean[Person4](classOf[Person4]) scala> val person_rdd =sc.
>>>> parallelize(Array( (new Person4(), 1), (new Person4(), 2) )) person_rdd
>>>> : org.apache.spark.rdd.RDD[(Person4, Int)] = ParallelCollectionRDD[1]
>>>> at parallelize at <con sole>:31 scala> sqlcontext.createDataFrame(per
>>>> son_rdd) java.lang.UnsupportedOperationException: Schema for type
>>>> Person4 is not supported at org.apache.spark.sql.catalyst.
>>>> ScalaReflection$.schemaFor(ScalaReflection.scala:716) at org.apache.
>>>> spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$2.apply(
>>>> ScalaReflection.scala:71 2) at org.apache.spark.sql.catalyst.
>>>> ScalaReflection$$anonfun$schemaFor$2.apply(ScalaReflection.scala:71 1)
>>>> at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
>>>> sableLike.scala:234) at
>>>>
>>>>
>>>> but if u look at the encoder's schema, it does know it:
>>>> but the system does seem to understand the schema for "Person4":
>>>>
>>>>
>>>> scala> personEncoder.schema
>>>> res38: org.apache.spark.sql.types.StructType = 
>>>> StructType(StructField(x,IntegerType,false))
>>>>
>>>>
>>>
>>
>

Reply via email to