I want to do the below query that I run in impala calling a c++ UDF in spark
sql.
In which pnl_flat_pp and pfh_flat are both impala table with partitioned.
Can Spark Sql does that?
select a.pnl_type_code,percentile_udf_cloudera(cast(90.0 as
I want to do Index similar to RDBMS on keyPnl on the pnl_type_code so that
group by can be done efficitently. How do I achieve that?
Currently below code blow out of memory in Spark on 60GB of data.
keyPnl is very large file. We have been stuck for 1 week. trying kryo,
mapvalue etc but without