We have gone down a similar path at Webtrends, Spark has worked amazingly well for us in this use case. Our solution goes from REST, directly into spark, and back out to the UI instantly.
Here is the resulting product in case you are curious (and please pardon the self promotion): https://www.webtrends.com/support-training/training/explore-onboarding/ > How can I automatically cache the data once a day... If you are not memory-bounded you could easily cache the daily results for some span of time and re-union them together each time you add new data. You would service queries off the unioned RDD. > ... and make them available on a web service >From the unioned RDD you could always step into spark SQL at that point. Or >you could use a simple scatter/gather pattern for this. As with all things >Spark, this is super easy to do: just use aggregate()()! Cheers, Sean On Feb 3, 2015, at 9:59 AM, Adamantios Corais <adamantios.cor...@gmail.com<mailto:adamantios.cor...@gmail.com>> wrote: Hi, After some research I have decided that Spark (SQL) would be ideal for building an OLAP engine. My goal is to push aggregated data (to Cassandra or other low-latency data storage) and then be able to project the results on a web page (web service). New data will be added (aggregated) once a day, only. On the other hand, the web service must be able to run some fixed(?) queries (either on Spark or Spark SQL) at anytime and plot the results with D3.js. Note that I can already achieve similar speeds while in REPL mode by caching the data. Therefore, I believe that my problem must be re-phrased as follows: "How can I automatically cache the data once a day and make them available on a web service that is capable of running any Spark or Spark (SQL) statement in order to plot the results with D3.js?" Note that I have already some experience in Spark (+Spark SQL) as well as D3.js but not at all with OLAP engines (at least in their traditional form). Any ideas or suggestions? // Adamantios