This is a bit strange. When I print the schema for the RDD, it reflects the
correct data type for each column. But doing any kind of mathematical
calculation seems to result in ClassCastException. Here is a sample that
results in the exception:
select c1, c2
...
cast (c18 as int) * cast (c21 as int)
...
from table

Any other pointers? Thanks for the help.


- Ranga

On Wed, Oct 8, 2014 at 5:20 PM, Ranga <sra...@gmail.com> wrote:

> Sorry. Its 1.1.0.
> After digging a bit more into this, it seems like the OpenCSV Deseralizer
> converts all the columns to a String type. This maybe throwing the
> execution off. Planning to create a class and map the rows to this custom
> class. Will keep this thread updated.
>
> On Wed, Oct 8, 2014 at 5:11 PM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> Which version of Spark are you running?
>>
>> On Wed, Oct 8, 2014 at 4:18 PM, Ranga <sra...@gmail.com> wrote:
>>
>>> Thanks Michael. Should the cast be done in the source RDD or while doing
>>> the SUM?
>>> To give a better picture here is the code sequence:
>>>
>>> val sourceRdd = sql("select ... from source-hive-table")
>>> sourceRdd.registerAsTable("sourceRDD")
>>> val aggRdd = sql("select c1, c2, sum(c3) from sourceRDD group by c1, c2)
>>>  // This query throws the exception when I collect the results
>>>
>>> I tried adding the cast to the aggRdd query above and that didn't help.
>>>
>>>
>>> - Ranga
>>>
>>> On Wed, Oct 8, 2014 at 3:52 PM, Michael Armbrust <mich...@databricks.com
>>> > wrote:
>>>
>>>> Using SUM on a string should automatically cast the column.  Also you
>>>> can use CAST to change the datatype
>>>> <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-TypeConversionFunctions>
>>>> .
>>>>
>>>> What version of Spark are you running?  This could be
>>>> https://issues.apache.org/jira/browse/SPARK-1994
>>>>
>>>> On Wed, Oct 8, 2014 at 3:47 PM, Ranga <sra...@gmail.com> wrote:
>>>>
>>>>> Hi
>>>>>
>>>>> I am in the process of migrating some logic in pig scripts to
>>>>> Spark-SQL. As part of this process, I am creating a few "Select...Group 
>>>>> By"
>>>>> query and registering them as tables using the SchemaRDD.registerAsTable
>>>>> feature.
>>>>> When using such a registered table in a subsequent "Select...Group By"
>>>>> query, I get a "ClassCastException".
>>>>> java.lang.ClassCastException: java.lang.String cannot be cast to
>>>>> java.lang.Integer
>>>>>
>>>>> This happens when I use the "Sum" function on one of the columns. Is
>>>>> there anyway to specify the data type for the columns when the
>>>>> registerAsTable function is called? Are there other approaches that I
>>>>> should be looking at?
>>>>>
>>>>> Thanks for your help.
>>>>>
>>>>>
>>>>>
>>>>> - Ranga
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to