Unsubscribe

On Mon, Sep 11, 2017 at 4:48 PM, Paul Pollack <paul.poll...@klaviyo.com>
wrote:

> Hi,
>
> We run 48 node cluster that stores counts in wide rows. Each node is using
> roughly 1TB space on a 2TB EBS gp2 drive for data directory and
> LeveledCompactionStrategy. We have been trying to bootstrap new nodes that
> use a raid0 configuration over 2 1TB EBS drives to increase I/O throughput
> cap from 160 MB/s to 250 MB/s (AWS limits). Every time a node finishes
> streaming it is bombarded by a large number of compactions. We see CPU load
> on the new node spike extremely high and CPU load on all the other nodes in
> the cluster drop unreasonably low. Meanwhile our app's latency for writes
> to this cluster average 10 seconds or greater. We've already tried
> throttling compaction throughput to 1 mbps and we've always had
> concurrent_compactors set to 2 but the disk is still saturated. In every
> case we have had to shut down the Cassandra process on the new node to
> resume acceptable operations.
>
> We're currently upgrading all of our clients to use the 3.11.0 version of
> the DataStax Python driver, which will allow us to add our next newly
> bootstrapped node to a blacklist, hoping that if it doesn't accept writes
> the rest of the cluster can serve them adequately (as is the case whenever
> we turn down the bootstrapping node), and allow it to finish its
> compactions.
>
> We were also interested in hearing if anyone has had much luck using the
> sstableofflinerelevel tool, and if this is a reasonable approach for our
> issue.
>
> One of my colleagues found a post where a user had a similar issue and
> found that bloom filters had an extremely high false positive ratio, and
> although I didn't check that during any of these attempts to bootstrap it
> seems to me like if we have that many compactions to do we're likely to
> observe that same thing.
>
> Would appreciate any guidance anyone can offer.
>
> Thanks,
> Paul
>

Reply via email to