Re: Cassandra 2.0 Batch Statement for timeseries schema

2015-11-05 Thread DuyHai Doan
""Get me the count of orders changed in a given sequence-id range"" --> Can
you give an example of SELECT statement for this query ?

Because given the table structure, you have to provide the shard-and-date
partition key and I don't see how you can know this value unless you create
as many SELECT as there are Cassandra nodes, for a given date ...

On Thu, Nov 5, 2015 at 4:21 PM, Sachin Nikam  wrote:

> I currently have a keyspace with table definition that looks like this.
>
>
> CREATE TABLE *orders*(
>   order-id long PRIMARY KEY,
>   order-blob text
> );
>
> This table will have a write load of ~40-100 tps and a read load of ~200-400 
> tps.
>
> We are now considering adding another table definition which closely 
> resembles a timeseries table.
>
> CREATE TABLE order_sequence(
> //shard-id will be generated by order-id%Number of Nodes in //Cassandra Ring. 
> It will be then suffixed with Current //Date. An Example would be 
> 2-Nov-11-2015
>
>   shard-and-date text,
>
> //This will be a simple flake generated long
>   sequence-id long
>   PRIMARY KEY (shard-and-date, sequence-id)
> )WITH CLUSTERING ORDER BY (sequence-id DESC);
>
>
> The goal of this table is to answer queries like "Get me the count of orders 
> changed in a given sequence-id range". This query will be called once every 5 
> sec.
>
> The plan is to write both these tables in a single BATCH statement.
>
> 1. Will this impact the WRite latency?
>
> 2. Also will it impact Read latency of "orders" table?
>
> 3. Will it impact the overall stability of the cluster?
>
>


Re: Cassandra 2.0 Batch Statement for timeseries schema

2015-11-05 Thread Eric Stevens
If you're talking about logged batches, these absolutely have an impact on
performance of about 30%.  The whole batch will succeed or fail as a unit,
but throughput will go down and load will go up.  Keep in mind that logged
batches are atomic but are not isolated - i.e. it's totally possible to get
a dirty read.  See
http://www.datastax.com/dev/blog/atomic-batches-in-cassandra-1-2

If you're not doing some kind of CAS operation inside the logged batch,
then the only advantage of a logged batch over an unlogged batch is that
when consistency can't be accomplished for the second statement (so it
fails the write), then the first statement will also not succeed (but at
that point your cluster is effectively offline).

Unlogged batches offer very few guarantees over single statements, and even
have the drawback of eliminating your driver's ability to operate in a
token aware fashion.

On Thu, Nov 5, 2015 at 8:22 AM Sachin Nikam  wrote:

> I currently have a keyspace with table definition that looks like this.
>
>
> CREATE TABLE *orders*(
>   order-id long PRIMARY KEY,
>   order-blob text
> );
>
> This table will have a write load of ~40-100 tps and a read load of ~200-400 
> tps.
>
> We are now considering adding another table definition which closely 
> resembles a timeseries table.
>
> CREATE TABLE order_sequence(
> //shard-id will be generated by order-id%Number of Nodes in //Cassandra Ring. 
> It will be then suffixed with Current //Date. An Example would be 
> 2-Nov-11-2015
>
>   shard-and-date text,
>
> //This will be a simple flake generated long
>   sequence-id long
>   PRIMARY KEY (shard-and-date, sequence-id)
> )WITH CLUSTERING ORDER BY (sequence-id DESC);
>
>
> The goal of this table is to answer queries like "Get me the count of orders 
> changed in a given sequence-id range". This query will be called once every 5 
> sec.
>
> The plan is to write both these tables in a single BATCH statement.
>
> 1. Will this impact the WRite latency?
>
> 2. Also will it impact Read latency of "orders" table?
>
> 3. Will it impact the overall stability of the cluster?
>
>


Cassandra 2.0 Batch Statement for timeseries schema

2015-11-05 Thread Sachin Nikam
I currently have a keyspace with table definition that looks like this.


CREATE TABLE *orders*(
  order-id long PRIMARY KEY,
  order-blob text
);

This table will have a write load of ~40-100 tps and a read load of
~200-400 tps.

We are now considering adding another table definition which closely
resembles a timeseries table.

CREATE TABLE order_sequence(
//shard-id will be generated by order-id%Number of Nodes in
//Cassandra Ring. It will be then suffixed with Current //Date. An
Example would be 2-Nov-11-2015

  shard-and-date text,

//This will be a simple flake generated long
  sequence-id long
  PRIMARY KEY (shard-and-date, sequence-id)
)WITH CLUSTERING ORDER BY (sequence-id DESC);


The goal of this table is to answer queries like "Get me the count of
orders changed in a given sequence-id range". This query will be
called once every 5 sec.

The plan is to write both these tables in a single BATCH statement.

1. Will this impact the WRite latency?

2. Also will it impact Read latency of "orders" table?

3. Will it impact the overall stability of the cluster?