Benedict created CASSANDRA-7980:
-----------------------------------
Summary: cassandra-stress should support partial clustering column
generation
Key: CASSANDRA-7980
URL: https://issues.apache.org/jira/browse/CASSANDRA-7980
Project: Cassandra
Issue Type: Bug
Reporter: Benedict
Assignee: Branimir Lambov
Priority: Minor
cassandra-stress generates its data randomly, in tiers, so that we can scroll
through the partitions it generates without having to generate their entirety.
The problem is that to support very large partitions (important for
benchmarking certain cases, and acceptance testing) we have to have a large
number of clustering columns - generally more than we would otherwise have,
which changes the performance characteristics. We should effectively split each
clustering column into a number of byte-ranges that become tiers for
visitation. The only real complexity here is in obeying the size/count
distribution range specified, which would be difficult for exponential
distributions, however we could require the user specify the ranges, and
distributions for each range, upfront. We could even treat them exactly like
other column specifications, but as sub-specs within a given column in the
yaml. Or, we could simply accept that we imperfectly follow the distribution in
these situations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)