Sharding vs. Per-Timeframe Tables

Jan Algermissen Tue, 29 Sep 2015 00:01:53 -0700

Hi,

I am using Spark and the Cassandra-connector to save customer events for later 
batch analysis.


Primary access pattern later on will be by time-slice

One way to save the events would be to create a C* row per day, for example, 
and within that row store the events in decreasing time order.

However, this will cause a hot spot in the cluster for each day.

The other two options I see would be sharding (e.g. create 100 rows per day) or 
use a new table for every day.

I prefer the last option, but am not sure whether that is a good pattern with 
the C* connector.

Can anyone provide insights to guide that decision?

Jan
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Sharding vs. Per-Timeframe Tables

Reply via email to