Re: Parameterized types and Datasets - Spark 2.1.0

Michael Armbrust Wed, 01 Feb 2017 16:11:51 -0800

This is the error, you are missing an import:

<console>:13: error: not found: type Encoder
       abstract class RawTable[A : Encoder](inDir: String) {


Works for me in a REPL.
<https://databricks-prod-cloudfront.cloud.databricks.com/public/4027ec902e239c93eaaa8714f173bcfc/1023043053387187/204687029790319/2840265927289860/latest.html>

On Wed, Feb 1, 2017 at 3:34 PM, Don Drake <dondr...@gmail.com> wrote:

> Thanks for the reply.   I did give that syntax a try [A : Encoder]
> yesterday, but I kept getting this exception in a spark-shell and Zeppelin
> browser.
>
> scala> import org.apache.spark.sql.Encoder
> import org.apache.spark.sql.Encoder
>
> scala>
>
> scala> case class RawTemp(f1: String, f2: String, temp: Long, created_at:
> java.sql.Timestamp, data_filename: String)
> defined class RawTemp
>
> scala>
>
> scala> import spark.implicits._
> import spark.implicits._
>
> scala>
>
> scala> abstract class RawTable[A : Encoder](inDir: String) {
>      |     import spark.implicits._
>      |     def load() = {
>      |         import spark.implicits._
>      |         spark.read
>      |             .option("header", "true")
>      |             .option("mode", "FAILFAST")
>      |             .option("escape", "\"")
>      |             .option("nullValue", "")
>      |             .option("indferSchema", "true")
>      |             .csv(inDir)
>      |             .as[A]
>      |     }
>      | }
> <console>:13: error: not found: type Encoder
>        abstract class RawTable[A : Encoder](inDir: String) {
>                                    ^
> <console>:24: error: Unable to find encoder for type stored in a Dataset.
> Primitive types (Int, String, etc) and Product types (case classes) are
> supported by importing spark.implicits._  Support for serializing other
> types will be added in future releases.
>                    .as[A]
>
>
> I gave it a try today in a Scala application and it seems to work.  Is
> this a known issue in a spark-shell?
>
> In my Scala application, this is being defined in a separate file, etc.
> without direct access to a Spark session.
>
> I had to add the following code snippet so the import spark.implicits._
> would take effect:
>
> // ugly hack to get around Encoder can't be found compile time errors
>
> private object myImplicits extends SQLImplicits {
>
>   protected override def _sqlContext: SQLContext = MySparkSingleton.
> getCurrentSession().sqlContext
>
> }
>
> import myImplicits._
>
> I found that in about the hundredth SO post I searched for this problem.
> Is this the best way to let implicits do its thing?
>
> Thanks.
>
> -Don
>
>
>
> On Wed, Feb 1, 2017 at 3:16 PM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> You need to enforce that an Encoder is available for the type A using a 
>> context
>> bound <http://docs.scala-lang.org/tutorials/FAQ/context-bounds>.
>>
>> import org.apache.spark.sql.Encoder
>> abstract class RawTable[A : Encoder](inDir: String) {
>>   ...
>> }
>>
>> On Tue, Jan 31, 2017 at 8:12 PM, Don Drake <dondr...@gmail.com> wrote:
>>
>>> I have a set of CSV that I need to perform ETL on, with the plan to
>>> re-use a lot of code between each file in a parent abstract class.
>>>
>>> I tried creating the following simple abstract class that will have a
>>> parameterized type of a case class that represents the schema being read in.
>>>
>>> This won't compile, it just complains about not being able to find an
>>> encoder, but I'm importing the implicits and don't believe this error.
>>>
>>>
>>> scala> import spark.implicits._
>>> import spark.implicits._
>>>
>>> scala>
>>>
>>> scala> case class RawTemp(f1: String, f2: String, temp: Long,
>>> created_at: java.sql.Timestamp, data_filename: String)
>>> defined class RawTemp
>>>
>>> scala>
>>>
>>> scala> abstract class RawTable[A](inDir: String) {
>>>      |     def load() = {
>>>      |         spark.read
>>>      |             .option("header", "true")
>>>      |             .option("mode", "FAILFAST")
>>>      |             .option("escape", "\"")
>>>      |             .option("nullValue", "")
>>>      |             .option("indferSchema", "true")
>>>      |             .csv(inDir)
>>>      |             .as[A]
>>>      |     }
>>>      | }
>>> <console>:27: error: Unable to find encoder for type stored in a
>>> Dataset.  Primitive types (Int, String, etc) and Product types (case
>>> classes) are supported by importing spark.implicits._  Support for
>>> serializing other types will be added in future releases.
>>>                    .as[A]
>>>
>>> scala> class TempTable extends RawTable[RawTemp]("/user/drake/t.csv")
>>> <console>:13: error: not found: type RawTable
>>>        class TempTable extends RawTable[RawTemp]("/user/drake/t.csv")
>>>                       ^
>>>
>>> What's odd is that this output looks okay:
>>>
>>> scala> val RTEncoder = Encoders.product[RawTemp]
>>> RTEncoder: org.apache.spark.sql.Encoder[RawTemp] = class[f1[0]: string,
>>> f2[0]: string, temp[0]: bigint, created_at[0]: timestamp, data_filename[0]:
>>> string]
>>>
>>> scala> RTEncoder.schema
>>> res4: org.apache.spark.sql.types.StructType =
>>> StructType(StructField(f1,StringType,true),
>>> StructField(f2,StringType,true), StructField(temp,LongType,false),
>>> StructField(created_at,TimestampType,true),
>>> StructField(data_filename,StringType,true))
>>>
>>> scala> RTEncoder.clsTag
>>> res5: scala.reflect.ClassTag[RawTemp] = RawTemp
>>>
>>> Any ideas?
>>>
>>> --
>>> Donald Drake
>>> Drake Consulting
>>> http://www.drakeconsulting.com/
>>> https://twitter.com/dondrake <http://www.MailLaunder.com/>
>>> 800-733-2143 <(800)%20733-2143>
>>>
>>
>>
>
>
> --
> Donald Drake
> Drake Consulting
> http://www.drakeconsulting.com/
> https://twitter.com/dondrake <http://www.MailLaunder.com/>
> 800-733-2143 <(800)%20733-2143>
>

Re: Parameterized types and Datasets - Spark 2.1.0

Reply via email to