Re: Spark (SQL) as OLAP engine

McNerlin, Andrew (Agoda) Tue, 03 Feb 2015 21:38:00 -0800

Hi Sean,

I'm interested in trying something similar.  How was your performance when you 
had many concurrent queries running against spark?  I know this will work well 
where you have a low volume of queries against a large dataset, but am 
concerned about having a high volume of queries against the same large dataset. 
(I know I've not defined "large", but hopefully you get the gist:))

I'm using Cassandra to handle workloads where we have large amounts of low 
complexity queries, but want to move to an architecture which supports a 
similar(ish) large volume of higher complexity queries.  I'd like to use spark 
as the query serving layer, but have concerns about how many concurrent queries 
it could handle.

I'd be interested in anyones thoughts or experience with this.

Thanks,
Andrew

From: Sean McNamara 
<sean.mcnam...@webtrends.com<mailto:sean.mcnam...@webtrends.com>>
Date: Wednesday, February 4, 2015 at 1:01
To: Adamantios Corais 
<adamantios.cor...@gmail.com<mailto:adamantios.cor...@gmail.com>>
Cc: "user@spark.apache.org<mailto:user@spark.apache.org>" 
<user@spark.apache.org<mailto:user@spark.apache.org>>
Subject: Re: Spark (SQL) as OLAP engine

We have gone down a similar path at Webtrends, Spark has worked amazingly well 
for us in this use case.  Our solution goes from REST, directly into spark, and 
back out to the UI instantly.

Here is the resulting product in case you are curious (and please pardon the 
self promotion): 
https://www.webtrends.com/support-training/training/explore-onboarding/

> How can I automatically cache the data once a day...

If you are not memory-bounded you could easily cache the daily results for some 
span of time and re-union them together each time you add new data.  You would 
service queries off the unioned RDD.

> ... and make them available on a web service

>From the unioned RDD you could always step into spark SQL at that point.  Or 
>you could use a simple scatter/gather pattern for this.  As with all things 
>Spark, this is super easy to do: just use aggregate()()!

Cheers,

Sean

On Feb 3, 2015, at 9:59 AM, Adamantios Corais 
<adamantios.cor...@gmail.com<mailto:adamantios.cor...@gmail.com>> wrote:

Hi,

After some research I have decided that Spark (SQL) would be ideal for building 
an OLAP engine. My goal is to push aggregated data (to Cassandra or other 
low-latency data storage) and then be able to project the results on a web page 
(web service). New data will be added (aggregated) once a day, only. On the 
other hand, the web service must be able to run some fixed(?) queries (either 
on Spark or Spark SQL) at anytime and plot the results with D3.js. Note that I 
can already achieve similar speeds while in REPL mode by caching the data. 
Therefore, I believe that my problem must be re-phrased as follows: "How can I 
automatically cache the data once a day and make them available on a web 
service that is capable of running any Spark or Spark (SQL)  statement in order 
to plot the results with D3.js?"

Note that I have already some experience in Spark (+Spark SQL) as well as D3.js 
but not at all with OLAP engines (at least in their traditional form).

Any ideas or suggestions?

// Adamantios

________________________________
This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Spark (SQL) as OLAP engine

Reply via email to