Hi, community We are specially interested about this featural integration according to some slides from [1]. The SMACK(Spark+Mesos+Akka+Cassandra+Kafka)
seems good implementation for lambda architecure in the open-source world, especially non-hadoop based cluster environment. As we can see, the advantages obviously consist of : 1 the feasibility and scalability of spark datafram api, which can also make a perfect complement for Apache Cassandra native cql feature. 2 both streaming and batch process availability using the ALL-STACK thing, cool. 3 we can both achieve compacity and usability for spark with cassandra, including seemlessly integrating with job scheduling and resource management. Only one concern goes to the OLAP query performance issue, which mainly caused by frequent aggregation work between daily increased large tables, for both spark sql and cassandra. I can see that the [1] use case facilitates FiloDB to achieve columnar storage and query performance, but we had nothing more knowledge. Question is : Any guy had such use case for now, especially using in your production environment ? Would be interested in your architeture for designing this OLAP engine using spark + cassandra. What do you think the comparison between the scenario with traditional OLAP cube design? Like Apache Kylin or pentaho mondrian ? Best Regards, Sun. [1] http://www.slideshare.net/planetcassandra/cassandra-summit-2014-interactive-olap-queries-using-apache-cassandra-and-spark fightf...@163.com