Re: Scala MatchError in Spark SQL

Andy Grove Wed, 20 Jan 2016 09:43:20 -0800

I would walk through a Spark tutorial in Scala. It will be the best way to
learn this.


In brief though, a Scala case class is like a Java bean / pojo but has a
more concise syntax (no getters/setters).

case class Person(firstName: String, lastName: String, age: Int)


Thanks,

Andy.

--

Andy Grove
Chief Architect
AgilData - Simple Streaming SQL that Scales
www.agildata.com


On Wed, Jan 20, 2016 at 10:28 AM, Raghu Ganti <raghuki...@gmail.com> wrote:

> Ah, OK! I am a novice to Scala - will take a look at Scala case classes.
> It would be awesome if you can provide some pointers.
>
> Thanks,
> Raghu
>
> On Wed, Jan 20, 2016 at 12:25 PM, Andy Grove <andy.gr...@agildata.com>
> wrote:
>
>> I'm talking about implementing CustomerRecord as a scala case class,
>> rather than as a Java class. Scala case classes implement the scala.Product
>> trait, which Catalyst is looking for.
>>
>>
>> Thanks,
>>
>> Andy.
>>
>> --
>>
>> Andy Grove
>> Chief Architect
>> AgilData - Simple Streaming SQL that Scales
>> www.agildata.com
>>
>>
>> On Wed, Jan 20, 2016 at 10:21 AM, Raghu Ganti <raghuki...@gmail.com>
>> wrote:
>>
>>> Is it not internal to the Catalyst implementation? I should not be
>>> modifying the Spark source to get things to work, do I? :-)
>>>
>>> On Wed, Jan 20, 2016 at 12:21 PM, Raghu Ganti <raghuki...@gmail.com>
>>> wrote:
>>>
>>>> Case classes where?
>>>>
>>>> On Wed, Jan 20, 2016 at 12:21 PM, Andy Grove <andy.gr...@agildata.com>
>>>> wrote:
>>>>
>>>>> Honestly, moving to Scala and using case classes is the path of least
>>>>> resistance in the long term.
>>>>>
>>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Andy.
>>>>>
>>>>> --
>>>>>
>>>>> Andy Grove
>>>>> Chief Architect
>>>>> AgilData - Simple Streaming SQL that Scales
>>>>> www.agildata.com
>>>>>
>>>>>
>>>>> On Wed, Jan 20, 2016 at 10:19 AM, Raghu Ganti <raghuki...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks for your reply, Andy.
>>>>>>
>>>>>> Yes, that is what I concluded based on the Stack trace. The problem
>>>>>> is stemming from Java implementation of generics, but I thought this will
>>>>>> go away if you compiled against Java 1.8, which solves the issues of 
>>>>>> proper
>>>>>> generic implementation.
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>> Also, are you saying that in order for my example to work, I would
>>>>>> need to move to Scala and have the UDT implemented in Scala?
>>>>>>
>>>>>>
>>>>>> On Wed, Jan 20, 2016 at 10:27 AM, Andy Grove <andy.gr...@agildata.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Catalyst is expecting a class that implements scala.Row or
>>>>>>> scala.Product and is instead finding a Java class. I've run into this 
>>>>>>> issue
>>>>>>> a number of times. Dataframe doesn't work so well with Java. Here's a 
>>>>>>> blog
>>>>>>> post with more information on this:
>>>>>>>
>>>>>>> http://www.agildata.com/apache-spark-rdd-vs-dataframe-vs-dataset/
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Andy.
>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>> Andy Grove
>>>>>>> Chief Architect
>>>>>>> AgilData - Simple Streaming SQL that Scales
>>>>>>> www.agildata.com
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jan 20, 2016 at 7:07 AM, raghukiran <raghuki...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I created a custom UserDefinedType in Java as follows:
>>>>>>>>
>>>>>>>> SQLPoint = new UserDefinedType<JavaPoint>() {
>>>>>>>> //overriding serialize, deserialize, sqlType, userClass functions
>>>>>>>> here
>>>>>>>> }
>>>>>>>>
>>>>>>>> When creating a dataframe, I am following the manual mapping, I
>>>>>>>> have a
>>>>>>>> constructor for JavaPoint - JavaPoint(double x, double y) and a
>>>>>>>> Customer
>>>>>>>> record as follows:
>>>>>>>>
>>>>>>>> public class CustomerRecord {
>>>>>>>> private int id;
>>>>>>>> private String name;
>>>>>>>> private Object location;
>>>>>>>>
>>>>>>>> //setters and getters follow here
>>>>>>>> }
>>>>>>>>
>>>>>>>> Following the example in Spark source, when I create a RDD as
>>>>>>>> follows:
>>>>>>>>
>>>>>>>> sc.textFile(inputFileName).map(new Function<String,
>>>>>>>> CustomerRecord>() {
>>>>>>>> //call method
>>>>>>>> CustomerRecord rec = new CustomerRecord();
>>>>>>>> rec.setLocation(SQLPoint.serialize(new JavaPoint(x, y)));
>>>>>>>> });
>>>>>>>>
>>>>>>>> This results in a MatchError. The stack trace is as follows:
>>>>>>>>
>>>>>>>> scala.MatchError: [B@45aa3dd5 (of class [B)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:255)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:250)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:102)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:401)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1358)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1$$anonfun$apply$1.apply(SQLContext.scala:1358)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>>>>>>>>         at
>>>>>>>> scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>>>>>>>>         at
>>>>>>>> scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>>>>>>>>         at
>>>>>>>> scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1.apply(SQLContext.scala:1358)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> org.apache.spark.sql.SQLContext$$anonfun$org$apache$spark$sql$SQLContext$$beansToRows$1.apply(SQLContext.scala:1356)
>>>>>>>>         at
>>>>>>>> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>>>>         at
>>>>>>>> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>>>>         at
>>>>>>>> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>>>>         at
>>>>>>>> scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>>>>>>>>         at
>>>>>>>> scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
>>>>>>>>         at
>>>>>>>> scala.collection.Iterator$class.foreach(Iterator.scala:727)
>>>>>>>>         at
>>>>>>>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>>>>>>>>         at
>>>>>>>> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>>>>>>>>         at
>>>>>>>> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>>>>>>>>         at scala.collection.TraversableOnce$class.to
>>>>>>>> (TraversableOnce.scala:273)
>>>>>>>>         at scala.collection.AbstractIterator.to
>>>>>>>> (Iterator.scala:1157)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>>>>>>>>         at
>>>>>>>> scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>>>>>>>>         at
>>>>>>>> scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> org.apache.spark.sql.execution.SparkPlan$$anonfun$5.apply(SparkPlan.scala:212)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
>>>>>>>>         at
>>>>>>>> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>>>>>>>>         at org.apache.spark.scheduler.Task.run(Task.scala:89)
>>>>>>>>         at
>>>>>>>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>>>>>>>         at
>>>>>>>>
>>>>>>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>>>>>>>         at java.lang.Thread.run(Thread.java:745)
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> View this message in context:
>>>>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Scala-MatchError-in-Spark-SQL-tp26021.html
>>>>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>>>>> Nabble.com.
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Scala MatchError in Spark SQL

Reply via email to