spark sql (can it call impala udf)

2014-08-11 Thread marspoc
I want to do the below query that I run in impala calling a c++ UDF in spark sql. In which pnl_flat_pp and pfh_flat are both impala table with partitioned. Can Spark Sql does that? select a.pnl_type_code,percentile_udf_cloudera(cast(90.0 as

Re: RDD join, index key: composite keys

2014-07-11 Thread marspoc
I want to do Index similar to RDBMS on keyPnl on the pnl_type_code so that group by can be done efficitently. How do I achieve that? Currently below code blow out of memory in Spark on 60GB of data. keyPnl is very large file. We have been stuck for 1 week. trying kryo, mapvalue etc but without