Hi, First you need to make your SLA clear. It does not sound for me they are defined very well or that your solution is necessary for the scenario. I also find it hard to believe that 1 customer has 100Million transactions per month.
Time series data is easy to precalculate - you do not necessarily need in-memory technology here. I recommend your company to do a Proof of Concept and get more details/clarificarion on the requirements before risking million of dollars of investment. Le mar. 18 août 2015 à 21:18, Benjamin Ross <br...@lattice-engines.com> a écrit : > My company is interested in building a real-time time-series querying > solution using Spark and Cassandra. Specifically, we’re interested in > setting up a Spark system against Cassandra running a hive thrift server. > We need to be able to perform real-time queries on time-series data – > things like, how many accounts have spent in total more than $300 on > product X in the past 3 months, and purchased product Y in the past month. > > > > These queries need to be fast – preferably sub-second but we can deal with > a few seconds if absolutely necessary. The data sizes are in the millions > of records when rolled up to be per-monthly records. Something on the > order of 100M per customer. > > > > My question is, based on experience, how hard would it be to get Cassandra > and Spark working together to give us sub-second response times in this use > case? Note that we’ll need to use DataStax enterprise (which is > unappealing from a cost standpoint) because it’s the only thing that > provides the hive spark thrift server to Cassandra. > > > > The two top contenders for our solution are Spark+Cassandra and Druid. > > > > Neither of these solutions work perfectly out of the box: > > - Druid would need to be modified, possibly hacked, to support > the queries we require. I’m also not clear how operationally ready it is. > > - Cassandra and Spark would require paying money for DataStax > enterprise. It really feels like it’s going to be tricky to configure > Cassandra and Spark to be lightning fast for our use case. Finally, window > functions (which we need – see above) are not supported unless we use a > pre-release milestone of the datastax spark Cassandra connector. > > > > I was wondering if anyone had any thoughts. How easy is it to get Spark > and Cassandra down to sub-second speeds in our use case? > > > > Thanks, > > Ben >