spark sql (can it call impala udf)

2014-08-11 Thread marspoc
I want to do the below query that I run in impala calling a c++ UDF in spark
sql.
In which pnl_flat_pp and pfh_flat are both impala table with partitioned.

Can Spark Sql does that?



select a.pnl_type_code,percentile_udf_cloudera(cast(90.0 as
double),sum(pnl_vector1),sum(pnl_vector2),sum(pnl_vector3),sum(pnl_vector4),sum(pnl_vector5),sum(pnl_vector6),sum(pnl_vector7),sum(pnl_vector8),sum(pnl_vector9),sum(pnl_vector10),sum(pnl_vector11),sum(pnl_vector12),sum(pnl_vector13),sum(pnl_vector14))
FROM ibrisk.pnl_flat_pp a JOIN(select portfolio_code from ibrisk.pfh_flat
where pl0_code = '3') b ON a.portfolio_code = b.portfolio_code where
rf_level = '0' and calc_ref = 7020704 and excl_pnl != '1' group by
a.pnl_type_code



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/spark-sql-can-it-call-impala-udf-tp11878.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: RDD join, index key: composite keys

2014-07-11 Thread marspoc
I want to do Index similar to RDBMS on keyPnl on the pnl_type_code so that
group by can be done efficitently. How do I  achieve that?
Currently below code blow out of memory in Spark on 60GB of data.
keyPnl is very large file. We have been stuck for 1 week. trying kryo,
mapvalue etc but without prevail.
We want to do partition on pnl_type_code but has no idea how to do that.
Please advice.


  val keyPnl = pnl.filter(_.rf_level == 0).keyBy(f=f.portfolio_code)
  val keyPosition = positions.filter(_.pl0_code == 3).keyBy(f =
f.portfolio_code)

  val JoinPnlPortfolio = keyPnl.leftOuterJoin(keyPosition)

  var result = JoinPnlPortfolio.groupBy(r = (r._2._1.pnl_type_code))
.mapValues(kv = (kv.map(mapper).fold (List[Double]())
(Vector.reduceVector _)))
.mapValues(kv = (Var.percentile(kv, 0.99)))




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/RDD-join-composite-keys-tp8696p9423.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.