Thanks Michael. Should the cast be done in the source RDD or while doing
the SUM?
To give a better picture here is the code sequence:

val sourceRdd = sql("select ... from source-hive-table")
sourceRdd.registerAsTable("sourceRDD")
val aggRdd = sql("select c1, c2, sum(c3) from sourceRDD group by c1, c2)
 // This query throws the exception when I collect the results

I tried adding the cast to the aggRdd query above and that didn't help.


- Ranga

On Wed, Oct 8, 2014 at 3:52 PM, Michael Armbrust <mich...@databricks.com>
wrote:

> Using SUM on a string should automatically cast the column.  Also you can
> use CAST to change the datatype
> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-TypeConversionFunctions>
> .
>
> What version of Spark are you running?  This could be
> https://issues.apache.org/jira/browse/SPARK-1994
>
> On Wed, Oct 8, 2014 at 3:47 PM, Ranga <sra...@gmail.com> wrote:
>
>> Hi
>>
>> I am in the process of migrating some logic in pig scripts to Spark-SQL.
>> As part of this process, I am creating a few "Select...Group By" query and
>> registering them as tables using the SchemaRDD.registerAsTable feature.
>> When using such a registered table in a subsequent "Select...Group By"
>> query, I get a "ClassCastException".
>> java.lang.ClassCastException: java.lang.String cannot be cast to
>> java.lang.Integer
>>
>> This happens when I use the "Sum" function on one of the columns. Is
>> there anyway to specify the data type for the columns when the
>> registerAsTable function is called? Are there other approaches that I
>> should be looking at?
>>
>> Thanks for your help.
>>
>>
>>
>> - Ranga
>>
>
>

Reply via email to