Hello, I'm newbie at spark world, With my team are analyzing Spark as integration frameworks between different sources, so far so good, but I't becomes slow when aggregations and calculations are applied to the RDD.
Im using Spark as standalone and under windows. I'm running this exalple: - Create Spark context SQLContext, and create a query to SQL like this val df = sqc.read.format("jdbc").options( Map("url" -> url, "dbtable" ->"(select [Cod], cant from [FactV]) as t ")) .load().toDF("k1","v1") - Perform aggregarion like this val cant = df.rollup("k1") .agg(sum("v1").alias("v1") It tooks like 20s, if i execute this query directly on SQL it tooks less than one second. what I've seen is that the expensive time is on context creation or at least that is what I believe. Is there any way to do it faster? or reuse context? What wee need is to perform aggregations like a Cube but with different sources and real time, it's also posible perform pre-process the data. kind regards, -- Ing. Ivaldi Andres