Spark Analytics

Andrés Ivaldi Thu, 05 Nov 2015 13:39:07 -0800

Hello, I'm newbie at spark world, With my team are analyzing Spark as
integration frameworks between different sources, so far so good, but I't
becomes slow when aggregations and calculations are applied to the RDD.


Im using Spark as standalone and under windows.

I'm running this exalple:
- Create Spark context SQLContext, and create a query to SQL like this
  val df = sqc.read.format("jdbc").options(  Map("url" -> url,
        "dbtable" ->"(select [Cod], cant from [FactV]) as t "))
      .load().toDF("k1","v1")

- Perform aggregarion like this
 val cant = df.rollup("k1")
  .agg(sum("v1").alias("v1")


It tooks like 20s, if i execute this query directly on SQL it tooks less
than one second.

what I've seen is that the expensive time is on context creation or at
least that is what I believe.
Is there any way to do it faster? or reuse context?

What wee need is to perform aggregations like a Cube but with different
sources and real time, it's also posible perform pre-process the data.

kind regards,


-- 
Ing. Ivaldi Andres

Spark Analytics

Reply via email to