Thanks Michael. Should the cast be done in the source RDD or while doing the SUM? To give a better picture here is the code sequence:
val sourceRdd = sql("select ... from source-hive-table") sourceRdd.registerAsTable("sourceRDD") val aggRdd = sql("select c1, c2, sum(c3) from sourceRDD group by c1, c2) // This query throws the exception when I collect the results I tried adding the cast to the aggRdd query above and that didn't help. - Ranga On Wed, Oct 8, 2014 at 3:52 PM, Michael Armbrust <mich...@databricks.com> wrote: > Using SUM on a string should automatically cast the column. Also you can > use CAST to change the datatype > <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-TypeConversionFunctions> > . > > What version of Spark are you running? This could be > https://issues.apache.org/jira/browse/SPARK-1994 > > On Wed, Oct 8, 2014 at 3:47 PM, Ranga <sra...@gmail.com> wrote: > >> Hi >> >> I am in the process of migrating some logic in pig scripts to Spark-SQL. >> As part of this process, I am creating a few "Select...Group By" query and >> registering them as tables using the SchemaRDD.registerAsTable feature. >> When using such a registered table in a subsequent "Select...Group By" >> query, I get a "ClassCastException". >> java.lang.ClassCastException: java.lang.String cannot be cast to >> java.lang.Integer >> >> This happens when I use the "Sum" function on one of the columns. Is >> there anyway to specify the data type for the columns when the >> registerAsTable function is called? Are there other approaches that I >> should be looking at? >> >> Thanks for your help. >> >> >> >> - Ranga >> > >