I would walk through a Spark tutorial in Scala. It will be the best way to learn this.
In brief though, a Scala case class is like a Java bean / pojo but has a more concise syntax (no getters/setters). case class Person(firstName: String, lastName: String, age: Int) Thanks, Andy. -- Andy Grove Chief Architect AgilData - Simple Streaming SQL that Scales www.agildata.com On Wed, Jan 20, 2016 at 10:28 AM, Raghu Ganti <raghuki...@gmail.com> wrote: > Ah, OK! I am a novice to Scala - will take a look at Scala case classes. > It would be awesome if you can provide some pointers. > > Thanks, > Raghu > > On Wed, Jan 20, 2016 at 12:25 PM, Andy Grove <andy.gr...@agildata.com> > wrote: > >> I'm talking about implementing CustomerRecord as a scala case class, >> rather than as a Java class. Scala case classes implement the scala.Product >> trait, which Catalyst is looking for. >> >> >> Thanks, >> >> Andy. >> >> -- >> >> Andy Grove >> Chief Architect >> AgilData - Simple Streaming SQL that Scales >> www.agildata.com >> >> >> On Wed, Jan 20, 2016 at 10:21 AM, Raghu Ganti <raghuki...@gmail.com> >> wrote: >> >>> Is it not internal to the Catalyst implementation? I should not be >>> modifying the Spark source to get things to work, do I? :-) >>> >>> On Wed, Jan 20, 2016 at 12:21 PM, Raghu Ganti <raghuki...@gmail.com> >>> wrote: >>> >>>> Case classes where? >>>> >>>> On Wed, Jan 20, 2016 at 12:21 PM, Andy Grove <andy.gr...@agildata.com> >>>> wrote: >>>> >>>>> Honestly, moving to Scala and using case classes is the path of least >>>>> resistance in the long term. >>>>> >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> Andy. >>>>> >>>>> -- >>>>> >>>>> Andy Grove >>>>> Chief Architect >>>>> AgilData - Simple Streaming SQL that Scales >>>>> www.agildata.com >>>>> >>>>> >>>>> On Wed, Jan 20, 2016 at 10:19 AM, Raghu Ganti <raghuki...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thanks for your reply, Andy. >>>>>> >>>>>> Yes, that is what I concluded based on the Stack trace. The problem >>>>>> is stemming from Java implementation of generics, but I thought this will >>>>>> go away if you compiled against Java 1.8, which solves the issues of >>>>>> proper >>>>>> generic implementation. >>>>>> >>>>>> Any ideas? >>>>>> >>>>>> Also, are you saying that in order for my example to work, I would >>>>>> need to move to Scala and have the UDT implemented in Scala? >>>>>> >>>>>> >>>>>> On Wed, Jan 20, 2016 at 10:27 AM, Andy Grove <andy.gr...@agildata.com >>>>>> > wrote: >>>>>> >>>>>>> Catalyst is expecting a class that implements scala.Row or >>>>>>> scala.Product and is instead finding a Java class. I've run into this >>>>>>> issue >>>>>>> a number of times. Dataframe doesn't work so well with Java. Here's a >>>>>>> blog >>>>>>> post with more information on this: >>>>>>> >>>>>>> http://www.agildata.com/apache-spark-rdd-vs-dataframe-vs-dataset/ >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Andy. >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Andy Grove >>>>>>> Chief Architect >>>>>>> AgilData - Simple Streaming SQL that Scales >>>>>>> www.agildata.com >>>>>>> >>>>>>> >>>>>>> On Wed, Jan 20, 2016 at 7:07 AM, raghukiran <raghuki...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I created a custom UserDefinedType in Java as follows: >>>>>>>> >>>>>>>> SQLPoint = new UserDefinedType<JavaPoint>() { >>>>>>>> //overriding serialize, deserialize, sqlType, userClass functions >>>>>>>> here >>>>>>>> } >>>>>>>> >>>>>>>> When creating a dataframe, I am following the manual mapping, I >>>>>>>> have a >>>>>>>> constructor for JavaPoint - JavaPoint(double x, double y) and a >>>>>>>> Customer >>>>>>>> record as follows: >>>>>>>> >>>>>>>> public class CustomerRecord { >>>>>>>> private int id; >>>>>>>> private String name; >>>>>>>> private Object location; >>>>>>>> >>>>>>>> //setters and getters follow here >>>>>>>> } >>>>>>>> >>>>>>>> Following the example in Spark source, when I create a RDD as >>>>>>>> follows: >>>>>>>> >>>>>>>> sc.textFile(inputFileName).map(new Function<String, >>>>>>>> CustomerRecord>() { >>>>>>>> //call method >>>>>>>> CustomerRecord rec = new CustomerRecord(); >>>>>>>> rec.setLocation(SQLPoint.serialize(new JavaPoint(x, y))); >>>>>>>> }); >>>>>>>> >>>>>>>> This results in a MatchError. The stack trace is as follows: >>>>>>>> >>>>>>>> scala.MatchError: [B@45aa3dd5 (of class [B) >>>>>>>> at >>>>>>>> >>>>>>>> org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:255) >>>>>>>> at >>>>>>>> >>>>>>>> org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:250) >>>>>>>> at >>>>>>>> >>>>>>>> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102) >>>>>>>> at >>>>>>>> >>>>>>>> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401) >>>>>>>> at >>>>>>>> >>>>>>>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1358) >>>>>>>> at >>>>>>>> >>>>>>>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1358) >>>>>>>> at >>>>>>>> >>>>>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>>>>>>> at >>>>>>>> >>>>>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) >>>>>>>> at >>>>>>>> >>>>>>>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) >>>>>>>> at >>>>>>>> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) >>>>>>>> at >>>>>>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244) >>>>>>>> at >>>>>>>> scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108) >>>>>>>> at >>>>>>>> >>>>>>>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1.apply(SQLContext.scala:1358) >>>>>>>> at >>>>>>>> >>>>>>>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1.apply(SQLContext.scala:1356) >>>>>>>> at >>>>>>>> scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >>>>>>>> at >>>>>>>> scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >>>>>>>> at >>>>>>>> scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >>>>>>>> at >>>>>>>> scala.collection.Iterator$$anon$11.next(Iterator.scala:328) >>>>>>>> at >>>>>>>> scala.collection.Iterator$$anon$10.next(Iterator.scala:312) >>>>>>>> at >>>>>>>> scala.collection.Iterator$class.foreach(Iterator.scala:727) >>>>>>>> at >>>>>>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157) >>>>>>>> at >>>>>>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) >>>>>>>> at >>>>>>>> >>>>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) >>>>>>>> at >>>>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) >>>>>>>> at scala.collection.TraversableOnce$class.to >>>>>>>> (TraversableOnce.scala:273) >>>>>>>> at scala.collection.AbstractIterator.to >>>>>>>> (Iterator.scala:1157) >>>>>>>> at >>>>>>>> >>>>>>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) >>>>>>>> at >>>>>>>> scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) >>>>>>>> at >>>>>>>> >>>>>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) >>>>>>>> at >>>>>>>> scala.collection.AbstractIterator.toArray(Iterator.scala:1157) >>>>>>>> at >>>>>>>> >>>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212) >>>>>>>> at >>>>>>>> >>>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212) >>>>>>>> at >>>>>>>> >>>>>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) >>>>>>>> at >>>>>>>> >>>>>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858) >>>>>>>> at >>>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) >>>>>>>> at org.apache.spark.scheduler.Task.run(Task.scala:89) >>>>>>>> at >>>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) >>>>>>>> at >>>>>>>> >>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >>>>>>>> at >>>>>>>> >>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >>>>>>>> at java.lang.Thread.run(Thread.java:745) >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> View this message in context: >>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Scala-MatchError-in-Spark-SQL-tp26021.html >>>>>>>> Sent from the Apache Spark User List mailing list archive at >>>>>>>> Nabble.com. >>>>>>>> >>>>>>>> >>>>>>>> --------------------------------------------------------------------- >>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >