Spark Analytics
Hello, I'm newbie at spark world, With my team are analyzing Spark as integration frameworks between different sources, so far so good, but I't becomes slow when aggregations and calculations are applied to the RDD. Im using Spark as standalone and under windows. I'm running this exalple: - Create Spark context SQLContext, and create a query to SQL like this val df = sqc.read.format("jdbc").options( Map("url" -> url, "dbtable" ->"(select [Cod], cant from [FactV]) as t ")) .load().toDF("k1","v1") - Perform aggregarion like this val cant = df.rollup("k1") .agg(sum("v1").alias("v1") It tooks like 20s, if i execute this query directly on SQL it tooks less than one second. what I've seen is that the expensive time is on context creation or at least that is what I believe. Is there any way to do it faster? or reuse context? What wee need is to perform aggregations like a Cube but with different sources and real time, it's also posible perform pre-process the data. kind regards, -- Ing. Ivaldi Andres
status of spark analytics functions? over, rank, percentile, row_number, etc.
I’m curious what the status of implementing hive analytics functions in spark. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics Many of these seem missing. I’m assuming they’re not implemented yet? Is there an ETA on them? or am I the first to bring this up? :-P -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com
Re: status of spark analytics functions? over, rank, percentile, row_number, etc.
Hi Kevin, I'm currently working on implementing windowing. If you'd like to see something that's not covered by a JIRA, please file one! best, wb - Original Message - From: Kevin Burton bur...@spinn3r.com To: user@spark.apache.org Sent: Saturday, January 10, 2015 12:12:38 PM Subject: status of spark analytics functions? over, rank, percentile, row_number, etc. I’m curious what the status of implementing hive analytics functions in spark. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics Many of these seem missing. I’m assuming they’re not implemented yet? Is there an ETA on them? or am I the first to bring this up? :-P -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com … or check out my Google+ profile https://plus.google.com/102718274791889610666/posts http://spinn3r.com - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org