Re: What is performance gain of clustering columns

2017-10-03 Thread kurt greaves
Clustering info is stored in the index of an SSTable, so if you are only
querying a subset of rows within the partition you don't necessarily have
to hit all SSTables, just the SSTables that contain the relevant clustering
col's. They make a big improvement, and can also be used quite effectively
in a time series use case and remove the need for time buckets in your
partition key.

On 3 October 2017 at 15:30, eugene miretsky 
wrote:

> Hi,
>
> Clustering columns are used to order the data in a partition. However,
> since data is split into SSTables, the rows are ordered by clustering key
> only within each SSTable. Cassandra still needs to check all SSTables, and
> merge the data if it is found in several SSTables. The only scanario where
> I can imagine big performance gain is  super wide paritions, where each
> partition is within a single SSTable (time series data, where partition
> keys are time-buckets)
>
> Has anybody done benchmarks on that and can share the data mode they have
> used?
>
> Cheers,
> Eugene
>


What is performance gain of clustering columns

2017-10-03 Thread eugene miretsky
Hi,

Clustering columns are used to order the data in a partition. However,
since data is split into SSTables, the rows are ordered by clustering key
only within each SSTable. Cassandra still needs to check all SSTables, and
merge the data if it is found in several SSTables. The only scanario where
I can imagine big performance gain is  super wide paritions, where each
partition is within a single SSTable (time series data, where partition
keys are time-buckets)

Has anybody done benchmarks on that and can share the data mode they have
used?

Cheers,
Eugene