Spark Analytics

2015-11-05 Thread Andrés Ivaldi
Hello, I'm newbie at spark world, With my team are analyzing Spark as
integration frameworks between different sources, so far so good, but I't
becomes slow when aggregations and calculations are applied to the RDD.

Im using Spark as standalone and under windows.

I'm running this exalple:
- Create Spark context SQLContext, and create a query to SQL like this
  val df = sqc.read.format("jdbc").options(  Map("url" -> url,
"dbtable" ->"(select [Cod], cant from [FactV]) as t "))
  .load().toDF("k1","v1")

- Perform aggregarion like this
 val cant = df.rollup("k1")
  .agg(sum("v1").alias("v1")


It tooks like 20s, if i execute this query directly on SQL it tooks less
than one second.

what I've seen is that the expensive time is on context creation or at
least that is what I believe.
Is there any way to do it faster? or reuse context?

What wee need is to perform aggregations like a Cube but with different
sources and real time, it's also posible perform pre-process the data.

kind regards,


-- 
Ing. Ivaldi Andres


status of spark analytics functions? over, rank, percentile, row_number, etc.

2015-01-10 Thread Kevin Burton
I’m curious what the status of implementing hive analytics functions in
spark.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics

Many of these seem missing.  I’m assuming they’re not implemented yet?

Is there an ETA on them?

or am I the first to bring this up? :-P

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com


Re: status of spark analytics functions? over, rank, percentile, row_number, etc.

2015-01-10 Thread Will Benton
Hi Kevin,

I'm currently working on implementing windowing.  If you'd like to see 
something that's not covered by a JIRA, please file one!


best,
wb

- Original Message -
 From: Kevin Burton bur...@spinn3r.com
 To: user@spark.apache.org
 Sent: Saturday, January 10, 2015 12:12:38 PM
 Subject: status of spark analytics functions? over, rank, percentile, 
 row_number, etc.
 
 I’m curious what the status of implementing hive analytics functions in
 spark.
 
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics
 
 Many of these seem missing.  I’m assuming they’re not implemented yet?
 
 Is there an ETA on them?
 
 or am I the first to bring this up? :-P
 
 --
 
 Founder/CEO Spinn3r.com
 Location: *San Francisco, CA*
 blog: http://burtonator.wordpress.com
 … or check out my Google+ profile
 https://plus.google.com/102718274791889610666/posts
 http://spinn3r.com
 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org