Hi there, Please consider our real-time aggregation engine, sparkta, fully open source (Apache2 License).
Here you have some slides about the project: - http://www.slideshare.net/Stratio/strata-sparkta And the source code: - https://github.com/Stratio/sparkta Sparkta is a real-time aggregation engine based on spark streaming. You can define your aggregation policy in a declarative way and choose the output of your rollups, too. In addition, you can store the raw data and transform data on-the-fly, among other features. When working with Cassandra, it could be useful to use the lucene integration that we have also released at Stratio: - http://www.slideshare.net/Stratio/cassandra-meetup-20150217 - https://github.com/Stratio/cassandra-lucene-index Ready for use with sparkSQL or in your CQL queries. We are now working in a SQL layer to work with the cubes in a flexible way, but this is not available at this moment. Do not hesitate to contact us if you have any doubt. Regards. 2015-11-10 8:16 GMT+01:00 Luke Han <luke...@gmail.com>: > Some friends refer me this thread about OLAP/Kylin and Spark... > > Here's my 2 cents.. > > If you are trying to setup OLAP, Apache Kylin should be one good idea for > you to evaluate. > > The project has developed more than 2 years and going to graduate to > Apache Top Level Project [1]. > There are many deployments on production already include > eBay, Exponential, JD.com, VIP.com and others, refer to powered by page [2]. > > Apache Kylin's spark engine also on the way, there's discussion about > turning the performance [3]. > > There are variety clients are available to interactive with Kylin with > ANSI SQL, including Tableau, Zeppelin, Pentaho/mondrian, Saiku/mondrian, > and the Excel/PowerBI support will roll out this week. > > Apache Kylin is young but mature with huge case validation (one biggest > cube in eBay contains 85+B rows, 90%ile production platform's query latency > in few seconds). > > StreamingOLAP is coming in Kylin v2.0 with plug-able architecture, there's > already one real case on production inside eBay, please refer to our design > deck [4] > > We are really welcome everyone to join and contribute to Kylin as OLAP > engine for Big Data:-) > > Please feel free to contact our community or me for any question. > > Thanks. > > 1. http://s.apache.org/bah > 2. http://kylin.incubator.apache.org/community/poweredby.html > 3. http://s.apache.org/lHA > 4. > http://www.slideshare.net/lukehan/1-apache-kylin-deep-dive-streaming-and-plugin-architecture-apache-kylin-meetup-shanghai > 5. http://kylin.io > > > Best Regards! > --------------------- > > Luke Han > > On Tue, Nov 10, 2015 at 2:56 AM, tsh <t...@timshenkao.su> wrote: > >> Hi, >> >> I'm in the same position right now: we are going to implement something >> like OLAP BI + Machine Learning explorations on the same cluster. >> Well, the question is quite ambivalent: from one hand, we have terabytes >> of versatile data and the necessity to make something like cubes (Hive and >> Hive on HBase are unsatisfactory). From the other, our users get accustomed >> to Tableau + Vertica. >> So, right now I consider the following choices: >> 1) Platfora (not free, I don't know price right now) + Spark >> 2) AtScale + Tableau(not free, I don't know price right now) + Spark >> 3) Apache Kylin (young project?) + Spark on YARN + Kafka + Flume + some >> storage >> 4) Apache Phoenix + Apache HBase + Mondrian + Spark on YARN + Kafka + >> Flume (has somebody use it in production?) >> 5) Spark + Tableau (cubes?) >> >> For myself, I decided not to dive into Mesos. Cassandra is hardly >> configurable, you'll have to dedicate special employee to support it. >> >> I'll be glad to hear other ideas & propositions as we are at the >> beginning of the process too. >> >> Sincerely yours, Tim Shenkao >> >> >> On 11/09/2015 09:46 AM, fightf...@163.com wrote: >> >> Hi, >> >> Thanks for suggesting. Actually we are now evaluating and stressing the >> spark sql on cassandra, while >> >> trying to define business models. FWIW, the solution mentioned here is >> different from traditional OLAP >> >> cube engine, right ? So we are hesitating on the common sense or >> direction choice of olap architecture. >> >> And we are happy to hear more use case from this community. >> >> Best, >> Sun. >> >> ------------------------------ >> fightf...@163.com >> >> >> *From:* Jörn Franke <jornfra...@gmail.com> >> *Date:* 2015-11-09 14:40 >> *To:* fightf...@163.com >> *CC:* user <user@spark.apache.org>; dev <d...@spark.apache.org> >> *Subject:* Re: OLAP query using spark dataframe with cassandra >> >> Is there any distributor supporting these software components in >> combination? If no and your core business is not software then you may want >> to look for something else, because it might not make sense to build up >> internal know-how in all of these areas. >> >> In any case - it depends all highly on your data and queries. You will >> have to do your own experiments. >> >> On 09 Nov 2015, at 07:02, "fightf...@163.com" <fightf...@163.com> wrote: >> >> Hi, community >> >> We are specially interested about this featural integration according to >> some slides from [1]. The SMACK(Spark+Mesos+Akka+Cassandra+Kafka) >> >> seems good implementation for lambda architecure in the open-source >> world, especially non-hadoop based cluster environment. As we can see, >> >> the advantages obviously consist of : >> >> 1 the feasibility and scalability of spark datafram api, which can also >> make a perfect complement for Apache Cassandra native cql feature. >> >> 2 both streaming and batch process availability using the ALL-STACK >> thing, cool. >> >> 3 we can both achieve compacity and usability for spark with cassandra, >> including seemlessly integrating with job scheduling and resource >> management. >> >> Only one concern goes to the OLAP query performance issue, which mainly >> caused by frequent aggregation work between daily increased large tables, >> for >> >> both spark sql and cassandra. I can see that the [1] use case facilitates >> FiloDB to achieve columnar storage and query performance, but we had >> nothing more >> >> knowledge. >> >> Question is : Any guy had such use case for now, especially using in your >> production environment ? Would be interested in your architeture for >> designing this >> >> OLAP engine using spark + cassandra. What do you think the comparison >> between the scenario with traditional OLAP cube design? Like Apache Kylin >> or >> >> pentaho mondrian ? >> >> Best Regards, >> >> Sun. >> >> >> [1] >> <http://www.slideshare.net/planetcassandra/cassandra-summit-2014-interactive-olap-queries-using-apache-cassandra-and-spark> >> http://www.slideshare.net/planetcassandra/cassandra-summit-2014-interactive-olap-queries-using-apache-cassandra-and-spark >> >> ------------------------------ >> fightf...@163.com >> >> >> > -- David Morales de Frías :: +34 607 010 411 :: @dmoralesdf <https://twitter.com/dmoralesdf> <http://www.stratio.com/> Vía de las dos Castillas, 33, Ática 4, 3ª Planta 28224 Pozuelo de Alarcón, Madrid Tel: +34 91 828 6473 // www.stratio.com // *@stratiobd <https://twitter.com/StratioBD>*