Hello,

I'd like to understand how other people have been aggregating metrics
using Spark Streaming and Cassandra database. Currently I have design
some data models that will stored the rolled up metrics. There are two
models that I am considering:

CREATE TABLE rollup_using_counters (
    metric_1 text,
    metric_1_value counter
);

The model above is nice because I only need to write and execute a
single query when updating the counter value. The problem is that I
need these counter values to be fairly accurate and based on some
discussions from the Cassandra folks it sounds like there is some
potential for over and under counting if the database is under load.

CREATE TABLE rollup_using_update (
    metric_1 text,
    metric_1_value int
);

This model on the other hand will require the metric values to be
updated and therefore the operation is idempotent. The problem is that
I will need to read the metrics into the Spark streaming application
and perform the addition prior to the writing the result to Cassandra.
I believe that ensures that the metrics are accurate but I believe it
also introduces a lot of complexity and possibly latency into my Spark
streaming application.

Has any one else run into this problem before and how did you solve it?

Thanks, Mike.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to