Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-09 Thread Ranga
Resolution: After realizing that the SerDe (OpenCSV) was causing all the fields to be defined as String type, I modified the Hive load statement to use the default serializer. I was able to modify the CSV input file to use a different delimiter. Although, this is a workaround, I am able to proceed

Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Ranga
Hi I am in the process of migrating some logic in pig scripts to Spark-SQL. As part of this process, I am creating a few Select...Group By query and registering them as tables using the SchemaRDD.registerAsTable feature. When using such a registered table in a subsequent Select...Group By query,

Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Michael Armbrust
Using SUM on a string should automatically cast the column. Also you can use CAST to change the datatype https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-TypeConversionFunctions . What version of Spark are you running? This could be

Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Ranga
Thanks Michael. Should the cast be done in the source RDD or while doing the SUM? To give a better picture here is the code sequence: val sourceRdd = sql(select ... from source-hive-table) sourceRdd.registerAsTable(sourceRDD) val aggRdd = sql(select c1, c2, sum(c3) from sourceRDD group by c1, c2)

Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Michael Armbrust
Which version of Spark are you running? On Wed, Oct 8, 2014 at 4:18 PM, Ranga sra...@gmail.com wrote: Thanks Michael. Should the cast be done in the source RDD or while doing the SUM? To give a better picture here is the code sequence: val sourceRdd = sql(select ... from source-hive-table)

Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Ranga
Sorry. Its 1.1.0. After digging a bit more into this, it seems like the OpenCSV Deseralizer converts all the columns to a String type. This maybe throwing the execution off. Planning to create a class and map the rows to this custom class. Will keep this thread updated. On Wed, Oct 8, 2014 at

Re: Spark-SQL: SchemaRDD - ClassCastException

2014-10-08 Thread Ranga
This is a bit strange. When I print the schema for the RDD, it reflects the correct data type for each column. But doing any kind of mathematical calculation seems to result in ClassCastException. Here is a sample that results in the exception: select c1, c2 ... cast (c18 as int) * cast (c21 as