spark sql (can it call impala udf)
I want to do the below query that I run in impala calling a c++ UDF in spark sql. In which pnl_flat_pp and pfh_flat are both impala table with partitioned. Can Spark Sql does that? select a.pnl_type_code,percentile_udf_cloudera(cast(90.0 as double),sum(pnl_vector1),sum(pnl_vector2),sum(pnl_vector3),sum(pnl_vector4),sum(pnl_vector5),sum(pnl_vector6),sum(pnl_vector7),sum(pnl_vector8),sum(pnl_vector9),sum(pnl_vector10),sum(pnl_vector11),sum(pnl_vector12),sum(pnl_vector13),sum(pnl_vector14)) FROM ibrisk.pnl_flat_pp a JOIN(select portfolio_code from ibrisk.pfh_flat where pl0_code = '3') b ON a.portfolio_code = b.portfolio_code where rf_level = '0' and calc_ref = 7020704 and excl_pnl != '1' group by a.pnl_type_code -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-can-it-call-impala-udf-tp11878.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: RDD join, index key: composite keys
I want to do Index similar to RDBMS on keyPnl on the pnl_type_code so that group by can be done efficitently. How do I achieve that? Currently below code blow out of memory in Spark on 60GB of data. keyPnl is very large file. We have been stuck for 1 week. trying kryo, mapvalue etc but without prevail. We want to do partition on pnl_type_code but has no idea how to do that. Please advice. val keyPnl = pnl.filter(_.rf_level == 0).keyBy(f=f.portfolio_code) val keyPosition = positions.filter(_.pl0_code == 3).keyBy(f = f.portfolio_code) val JoinPnlPortfolio = keyPnl.leftOuterJoin(keyPosition) var result = JoinPnlPortfolio.groupBy(r = (r._2._1.pnl_type_code)) .mapValues(kv = (kv.map(mapper).fold (List[Double]()) (Vector.reduceVector _))) .mapValues(kv = (Var.percentile(kv, 0.99))) -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/RDD-join-composite-keys-tp8696p9423.html Sent from the Apache Spark User List mailing list archive at Nabble.com.