Re: Cassandra Counters

2019-09-19 Thread Federico Razzoli
Hi Tarun,

That documentation page is a bit ambiguous. My understanding of it is that:

* Cassandra guarantees that counters are updated consistently across the
cluster by doing background reads, that don't affect write latency.
* If you use a consistency level stricter than ONE, the same read is done;
the difference is that it's not in background, and your write latency is
affected.
* They say that "typically" you want to use ONE because writes will be
faster, but don't specify what "typically" means in this case.

I suppose that "typically" means "if you are not concerned about Cassandra
nodes crashes". Here's my educated guess:

1. You write to node1, updating a counter. The new value should
theoretically be 10.
2. node1 returns the success.
3. node1 should read the new value in background, but it crashes before
finishing this read. What is the current state of the counter in node1?
>From the documentation, I fail to find this out.
4. You read the counter from node3 with ONE. You read 9.
5. You restart node1. I don't know what happens to the counter.

It would be great to have a clarification from Cassandra team.

Federico


On Wed, 18 Sep 2019 at 14:10, Tarun Chabarwal 
wrote:

> Hi
>
> I stumbled on this
> 
> post which says use consistency level ONE with counters. I'm using
> cassandra 3 with 3 copies in one data center. I've to support consistent
> reads.
>
> Can we do LOCAL_QUORUM read/write against counters ? Is there any downside
> of using quorum with counters ?
>
> If we can't use quorum then is it possible to get consistent reads ?
>
>
> Regards
> Tarun Chabarwal
>


Cassandra Counters

2019-09-18 Thread Tarun Chabarwal
Hi

I stumbled on this

post which says use consistency level ONE with counters. I'm using
cassandra 3 with 3 copies in one data center. I've to support consistent
reads.

Can we do LOCAL_QUORUM read/write against counters ? Is there any downside
of using quorum with counters ?

If we can't use quorum then is it possible to get consistent reads ?


Regards
Tarun Chabarwal


Re: Cassandra counters

2015-07-10 Thread Ajay
Any pointers on this?.

In 2.1, when updating the counter with UNLOGGED batch using timestamp isn't
safe as other column update with consistency level (with timestamp counter
update can be idempotent? ).

Thanks
Ajay

On 09-Jul-2015 11:47 am, Ajay ajay.ga...@gmail.com wrote:

 Hi,

 What is the accuracy improvement of counter in 2.1 over 2.0?

 This below post, it mentioned 2.0.x issues fixed in 2.1 and perfomance
improvement.

http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

 But how accurate are the counter 2.1.x or any known issues in 2.1 using
UNLOGGED batch for counter update with timestamp?

 Thanks
 Ajay


Cassandra counters

2015-07-09 Thread Ajay
Hi,

What is the accuracy improvement of counter in 2.1 over 2.0?

This below post, it mentioned 2.0.x issues fixed in 2.1 and perfomance
improvement.
http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

But how accurate are the counter 2.1.x or any known issues in 2.1 using
UNLOGGED batch for counter update?

Thanks
Ajay


RE: Cassandra Counters

2012-09-25 Thread Roshni Rajagopal

Thanks for the reply and sorry for being bull - headed.
Once  you're past the stage where you've decided its distributed, and NoSQL and 
cassandra out of all the NoSQL options,Now to count something, you can do it in 
different ways in cassandra. In all the ways you want to use cassandra's best 
features of availability, tunable consistency , partition tolerance etc.
Given this, what are the performance tradeoffs of using counters vs a standard 
column family for counting. Because as I see if the counter number in a counter 
column family becomes wrong, it will not be 'eventually consistent' - you will 
need intervention to correct it. So the key aspect is how much faster would be 
a counter column family, and at what numbers do we start seing a difference.




Date: Tue, 25 Sep 2012 07:57:08 +0200
Subject: Re: Cassandra Counters
From: oleksandr.pet...@gmail.com
To: user@cassandra.apache.org

Maybe I'm missing the point, but counting in a standard column family would be 
a little overkill. 
I assume that distributed counting here was more of a map/reduce approach, 
where Hadoop (+ Cascading, Pig, Hive, Cascalog) would help you a lot. We're 
doing some more complex counting (e.q. based on sets of rules) like that. Of 
course, that would perform _way_ slower than counting beforehand. On the other 
side, you will always have a consistent result for a consistent dataset.

On the other hand, if you use things like AMQP or Storm (sorry to put up my 
sentence together like that, as tools are mostly either orthogonal or 
complementary, but I hope you get my point), you could build a topology that 
makes fault-tolerant writes independently of your original write. Of course, it 
would still have a consistency tradeoff, mostly because of race conditions and 
different network latencies etc.  

So I would say that building a data model in a distributed system often depends 
more on your problem than on the common patterns, because everything has a 
tradeoff. 
Want to have an immediate result? Modify your counter while writing the row.
Can sacrifice speed, but have more counting opportunities? Go with offline 
distributed counting.Want to have kind of both, dispatch a message and react 
upon it, having the processing logic and writes decoupled from main 
application, allowing you to care less about speed.

However, I may have missed the point somewhere (early morning, you know), so I 
may be wrong in any given statement.Cheers

On Tue, Sep 25, 2012 at 6:53 AM, Roshni Rajagopal 
roshni_rajago...@hotmail.com wrote:





Thanks Milind,
Has anyone implemented counting in a standard col family in cassandra, when you 
can have increments and decrements to the count. Any comparisons in performance 
to using counter column families? 

Regards,Roshni

Date: Mon, 24 Sep 2012 11:02:51 -0700
Subject: RE: Cassandra Counters
From: milindpar...@gmail.com

To: user@cassandra.apache.org

IMO

You would use Cassandra Counters (or other variation of distributed counting) 
in case of having determined that a centralized version of counting is not 
going to work.

You'd determine the non_feasibility of centralized counting by figuring the 
speed at which you need to sustain writes and reads and reconcile that with 
your hard disk seek times (essentially).

Once you have proved that you can't do centralized counting, the second layer 
of arsenal comes into play; which is distributed counting.

In distributed counting , the CAP theorem comes into life.  in Cassandra, 
Availability and Network Partitioning trumps over Consistency. 

 

So yes, you sacrifice strong consistency for availability and partion 
tolerance; for eventual consistency.

On Sep 24, 2012 10:28 AM, Roshni Rajagopal roshni_rajago...@hotmail.com 
wrote:






Hi folks,
   I looked at my mail below, and Im rambling a bit, so Ill try to re-state my 
queries pointwise. 
a) what are the performance tradeoffs on reads  writes between creating a 
standard column family and manually doing the counts by a lookup on a key, 
versus using counters. 


b) whats the current state of counters limitations in the latest version of 
apache cassandra?
c) with there being a possibilty of counter values getting out of sync, would 
counters not be recommended where strong consistency is desired. The normal 
benefits of cassandra's tunable consistency would not be applicable, as 
re-tries may cause overstating. So the normal use case is high performance, and 
where consistency is not paramount.


Regards,roshni


From: roshni_rajago...@hotmail.com
To: user@cassandra.apache.org


Subject: Cassandra Counters
Date: Mon, 24 Sep 2012 16:21:55 +0530





Hi ,
I'm trying to understand if counters are a good fit for my use case.Ive watched 
http://blip.tv/datastax/counters-in-cassandra-5497678 many times over now...

and still need help!
Suppose I have a list of items- to which I can add or delete a set of items at 
a time,  and I want a count of the items, without considering changing the 
database

Re: Cassandra Counters

2012-09-25 Thread Robin Verlangen
From my point of view an other problem with using the standard column
family for counting is transactions. Cassandra lacks of them, so if you're
multithreaded updating counters, how will you keep track of that? Yes, I'm
aware of software like Zookeeper to do that, however I'm not sure whether
that's the best option.

I think you should stick with Cassandra counter column families.

Best regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

http://goo.gl/Lt7BC

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.



2012/9/25 Roshni Rajagopal roshni_rajago...@hotmail.com

  Thanks for the reply and sorry for being bull - headed.

 Once  you're past the stage where you've decided its distributed, and
 NoSQL and cassandra out of all the NoSQL options,
 Now to count something, you can do it in different ways in cassandra.
 In all the ways you want to use cassandra's best features of availability,
 tunable consistency , partition tolerance etc.

 Given this, what are the performance tradeoffs of using counters vs a
 standard column family for counting. Because as I see if the counter number
 in a counter column family becomes wrong, it will not be 'eventually
 consistent' - you will need intervention to correct it. So the key aspect
 is how much faster would be a counter column family, and at what numbers do
 we start seing a difference.





 --
 Date: Tue, 25 Sep 2012 07:57:08 +0200
 Subject: Re: Cassandra Counters
 From: oleksandr.pet...@gmail.com
 To: user@cassandra.apache.org


 Maybe I'm missing the point, but counting in a standard column family
 would be a little overkill.

 I assume that distributed counting here was more of a map/reduce
 approach, where Hadoop (+ Cascading, Pig, Hive, Cascalog) would help you a
 lot. We're doing some more complex counting (e.q. based on sets of rules)
 like that. Of course, that would perform _way_ slower than counting
 beforehand. On the other side, you will always have a consistent result for
 a consistent dataset.

 On the other hand, if you use things like AMQP or Storm (sorry to put up
 my sentence together like that, as tools are mostly either orthogonal or
 complementary, but I hope you get my point), you could build a topology
 that makes fault-tolerant writes independently of your original write. Of
 course, it would still have a consistency tradeoff, mostly because of race
 conditions and different network latencies etc.

 So I would say that building a data model in a distributed system often
 depends more on your problem than on the common patterns, because
 everything has a tradeoff.

 Want to have an immediate result? Modify your counter while writing the
 row.
 Can sacrifice speed, but have more counting opportunities? Go with offline
 distributed counting.
 Want to have kind of both, dispatch a message and react upon it, having
 the processing logic and writes decoupled from main application, allowing
 you to care less about speed.

 However, I may have missed the point somewhere (early morning, you know),
 so I may be wrong in any given statement.
 Cheers


 On Tue, Sep 25, 2012 at 6:53 AM, Roshni Rajagopal 
 roshni_rajago...@hotmail.com wrote:

  Thanks Milind,

 Has anyone implemented counting in a standard col family in cassandra,
 when you can have increments and decrements to the count.
 Any comparisons in performance to using counter column families?

 Regards,
 Roshni


 --
 Date: Mon, 24 Sep 2012 11:02:51 -0700
 Subject: RE: Cassandra Counters
 From: milindpar...@gmail.com
 To: user@cassandra.apache.org


 IMO
 You would use Cassandra Counters (or other variation of distributed
 counting) in case of having determined that a centralized version of
 counting is not going to work.
 You'd determine the non_feasibility of centralized counting by figuring
 the speed at which you need to sustain writes and reads and reconcile that
 with your hard disk seek times (essentially).
 Once you have proved that you can't do centralized counting, the second
 layer of arsenal comes into play; which is distributed counting.
 In distributed counting , the CAP theorem comes into life.  in Cassandra,
 Availability and Network Partitioning trumps over Consistency.

 So yes, you sacrifice strong consistency for availability and partion
 tolerance; for eventual consistency.
 On Sep 24, 2012 10:28 AM, Roshni Rajagopal roshni_rajago...@hotmail.com
 wrote:

  Hi folks,

I looked at my mail below, and Im rambling a bit, so Ill try to
 re-state

Re: Cassandra Counters

2012-09-25 Thread Edward Kibardin
I've recently noticed several threads about Cassandra
Counters inconsistencies and started seriously think about possible
workarounds like store realtime counters in Redis and dump them daily to
Cassandra.
So general question, should I rely on Counters if I want 100% accuracy?

Thanks, Ed

On Tue, Sep 25, 2012 at 8:15 AM, Robin Verlangen ro...@us2.nl wrote:

 From my point of view an other problem with using the standard column
 family for counting is transactions. Cassandra lacks of them, so if you're
 multithreaded updating counters, how will you keep track of that? Yes, I'm
 aware of software like Zookeeper to do that, however I'm not sure whether
 that's the best option.

 I think you should stick with Cassandra counter column families.

 Best regards,

 Robin Verlangen
 *Software engineer*
 *
 *
 W http://www.robinverlangen.nl
 E ro...@us2.nl

 http://goo.gl/Lt7BC

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.



 2012/9/25 Roshni Rajagopal roshni_rajago...@hotmail.com

  Thanks for the reply and sorry for being bull - headed.

 Once  you're past the stage where you've decided its distributed, and
 NoSQL and cassandra out of all the NoSQL options,
 Now to count something, you can do it in different ways in cassandra.
 In all the ways you want to use cassandra's best features of
 availability, tunable consistency , partition tolerance etc.

 Given this, what are the performance tradeoffs of using counters vs a
 standard column family for counting. Because as I see if the counter number
 in a counter column family becomes wrong, it will not be 'eventually
 consistent' - you will need intervention to correct it. So the key aspect
 is how much faster would be a counter column family, and at what numbers do
 we start seing a difference.





 --
 Date: Tue, 25 Sep 2012 07:57:08 +0200
 Subject: Re: Cassandra Counters
 From: oleksandr.pet...@gmail.com
 To: user@cassandra.apache.org


 Maybe I'm missing the point, but counting in a standard column family
 would be a little overkill.

 I assume that distributed counting here was more of a map/reduce
 approach, where Hadoop (+ Cascading, Pig, Hive, Cascalog) would help you a
 lot. We're doing some more complex counting (e.q. based on sets of rules)
 like that. Of course, that would perform _way_ slower than counting
 beforehand. On the other side, you will always have a consistent result for
 a consistent dataset.

 On the other hand, if you use things like AMQP or Storm (sorry to put up
 my sentence together like that, as tools are mostly either orthogonal or
 complementary, but I hope you get my point), you could build a topology
 that makes fault-tolerant writes independently of your original write. Of
 course, it would still have a consistency tradeoff, mostly because of race
 conditions and different network latencies etc.

 So I would say that building a data model in a distributed system often
 depends more on your problem than on the common patterns, because
 everything has a tradeoff.

 Want to have an immediate result? Modify your counter while writing the
 row.
 Can sacrifice speed, but have more counting opportunities? Go with
 offline distributed counting.
 Want to have kind of both, dispatch a message and react upon it, having
 the processing logic and writes decoupled from main application, allowing
 you to care less about speed.

 However, I may have missed the point somewhere (early morning, you know),
 so I may be wrong in any given statement.
 Cheers


 On Tue, Sep 25, 2012 at 6:53 AM, Roshni Rajagopal 
 roshni_rajago...@hotmail.com wrote:

  Thanks Milind,

 Has anyone implemented counting in a standard col family in cassandra,
 when you can have increments and decrements to the count.
 Any comparisons in performance to using counter column families?

 Regards,
 Roshni


 --
 Date: Mon, 24 Sep 2012 11:02:51 -0700
 Subject: RE: Cassandra Counters
 From: milindpar...@gmail.com
 To: user@cassandra.apache.org


 IMO
 You would use Cassandra Counters (or other variation of distributed
 counting) in case of having determined that a centralized version of
 counting is not going to work.
 You'd determine the non_feasibility of centralized counting by figuring
 the speed at which you need to sustain writes and reads and reconcile that
 with your hard disk seek times (essentially).
 Once you have proved that you can't do centralized counting, the second
 layer of arsenal comes into play; which is distributed counting.
 In distributed counting , the CAP

Re: Cassandra Counters

2012-09-25 Thread rohit bhatia
@Edward,

We use counters in production with Cassandra 1.0.5. Though since our
application is sensitive to write latency and we are seeing problems with
Frequent Young Garbage Collections, and also we just do increments
(decrements have caused problems for some people)
We don't see inconsistencies in our data.
So if you want 99.99% accurate counters, and can manage with eventual
consistency. Cassandra works nicely.

On Tue, Sep 25, 2012 at 4:52 PM, Edward Kibardin infa...@gmail.com wrote:

 I've recently noticed several threads about Cassandra
 Counters inconsistencies and started seriously think about possible
 workarounds like store realtime counters in Redis and dump them daily to
 Cassandra.
 So general question, should I rely on Counters if I want 100% accuracy?

 Thanks, Ed


 On Tue, Sep 25, 2012 at 8:15 AM, Robin Verlangen ro...@us2.nl wrote:

 From my point of view an other problem with using the standard column
 family for counting is transactions. Cassandra lacks of them, so if you're
 multithreaded updating counters, how will you keep track of that? Yes, I'm
 aware of software like Zookeeper to do that, however I'm not sure whether
 that's the best option.

 I think you should stick with Cassandra counter column families.

 Best regards,

 Robin Verlangen
 *Software engineer*
 *
 *
 W http://www.robinverlangen.nl
 E ro...@us2.nl

 http://goo.gl/Lt7BC

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.



 2012/9/25 Roshni Rajagopal roshni_rajago...@hotmail.com

  Thanks for the reply and sorry for being bull - headed.

 Once  you're past the stage where you've decided its distributed, and
 NoSQL and cassandra out of all the NoSQL options,
 Now to count something, you can do it in different ways in cassandra.
 In all the ways you want to use cassandra's best features of
 availability, tunable consistency , partition tolerance etc.

 Given this, what are the performance tradeoffs of using counters vs a
 standard column family for counting. Because as I see if the counter number
 in a counter column family becomes wrong, it will not be 'eventually
 consistent' - you will need intervention to correct it. So the key aspect
 is how much faster would be a counter column family, and at what numbers do
 we start seing a difference.





 --
 Date: Tue, 25 Sep 2012 07:57:08 +0200
 Subject: Re: Cassandra Counters
 From: oleksandr.pet...@gmail.com
 To: user@cassandra.apache.org


 Maybe I'm missing the point, but counting in a standard column family
 would be a little overkill.

 I assume that distributed counting here was more of a map/reduce
 approach, where Hadoop (+ Cascading, Pig, Hive, Cascalog) would help you a
 lot. We're doing some more complex counting (e.q. based on sets of rules)
 like that. Of course, that would perform _way_ slower than counting
 beforehand. On the other side, you will always have a consistent result for
 a consistent dataset.

 On the other hand, if you use things like AMQP or Storm (sorry to put up
 my sentence together like that, as tools are mostly either orthogonal or
 complementary, but I hope you get my point), you could build a topology
 that makes fault-tolerant writes independently of your original write. Of
 course, it would still have a consistency tradeoff, mostly because of race
 conditions and different network latencies etc.

 So I would say that building a data model in a distributed system often
 depends more on your problem than on the common patterns, because
 everything has a tradeoff.

 Want to have an immediate result? Modify your counter while writing the
 row.
 Can sacrifice speed, but have more counting opportunities? Go with
 offline distributed counting.
 Want to have kind of both, dispatch a message and react upon it, having
 the processing logic and writes decoupled from main application, allowing
 you to care less about speed.

 However, I may have missed the point somewhere (early morning, you
 know), so I may be wrong in any given statement.
 Cheers


 On Tue, Sep 25, 2012 at 6:53 AM, Roshni Rajagopal 
 roshni_rajago...@hotmail.com wrote:

  Thanks Milind,

 Has anyone implemented counting in a standard col family in cassandra,
 when you can have increments and decrements to the count.
 Any comparisons in performance to using counter column families?

 Regards,
 Roshni


 --
 Date: Mon, 24 Sep 2012 11:02:51 -0700
 Subject: RE: Cassandra Counters
 From: milindpar...@gmail.com
 To: user@cassandra.apache.org


 IMO
 You would use Cassandra Counters

Re: Cassandra Counters

2012-09-25 Thread Sylvain Lebresne

 So general question, should I rely on Counters if I want 100% accuracy?


No.

 Even not considering potential bugs, counters being not idempotent, if you
get a TimeoutException during a write (which can happen even in relatively
normal conditions), you won't know if the increment went in or not (and you
have no way to know unless you have an external way to check the value).
This is probably fine if you use counters for say real-time analytics, but
not if you use 100% accuracy.

--
Sylvain


Re: Cassandra Counters

2012-09-25 Thread rohit bhatia
@Sylvain

In a relatively untroubled cluster, even timed out writes go through,
provided no messages are dropped. Which you can monitor on cassandra
nodes. We have 100% consistency on our production servers as we don't
see messages being dropped on our servers.
Though as you mention, there would be no way to repair your dropped messages .

On Tue, Sep 25, 2012 at 6:57 PM, Sylvain Lebresne sylv...@datastax.com wrote:
 So general question, should I rely on Counters if I want 100% accuracy?


 No.

  Even not considering potential bugs, counters being not idempotent, if you
 get a TimeoutException during a write (which can happen even in relatively
 normal conditions), you won't know if the increment went in or not (and you
 have no way to know unless you have an external way to check the value).
 This is probably fine if you use counters for say real-time analytics, but
 not if you use 100% accuracy.

 --
 Sylvain


Re: Cassandra Counters

2012-09-25 Thread Edward Kibardin
@Sylvain and @Rohit: Thanks for your answers.


On Tue, Sep 25, 2012 at 2:27 PM, Sylvain Lebresne sylv...@datastax.comwrote:

 So general question, should I rely on Counters if I want 100% accuracy?


 No.

  Even not considering potential bugs, counters being not idempotent, if
 you get a TimeoutException during a write (which can happen even in
 relatively normal conditions), you won't know if the increment went in or
 not (and you have no way to know unless you have an external way to check
 the value). This is probably fine if you use counters for say real-time
 analytics, but not if you use 100% accuracy.

 --
 Sylvain



Re: Cassandra Counters

2012-09-25 Thread Sylvain Lebresne
 In a relatively untroubled cluster, even timed out writes go through,
 provided no messages are dropped.

This all depends of your definition of untroubled cluster, but to be
clear, in a cluster where a node dies (which for Cassandra is not
considered abnormal and will happen to everyone no matter how good
your monitoring is), you have a good change to get TimeoutExceptions
on counter writes while the other nodes of the cluster haven't
detected the failure (which can take a few seconds) AND those writes
won't get through. The fact that Cassandra logs dropped messages or
not has nothing to do with that.

 We have 100% consistency on our production servers as we don't
 see messages being dropped on our servers.

Though I'm happy for you that you achieve 100% consistency, I want to
re-iter that not seeing any log of messages being dropped does not
guarantee that all counter writes did went true: the ones that timeout
may or may have been persisted.

--
Sylvain


 Though as you mention, there would be no way to repair your dropped 
 messages .

 On Tue, Sep 25, 2012 at 6:57 PM, Sylvain Lebresne sylv...@datastax.com 
 wrote:
 So general question, should I rely on Counters if I want 100% accuracy?


 No.

  Even not considering potential bugs, counters being not idempotent, if you
 get a TimeoutException during a write (which can happen even in relatively
 normal conditions), you won't know if the increment went in or not (and you
 have no way to know unless you have an external way to check the value).
 This is probably fine if you use counters for say real-time analytics, but
 not if you use 100% accuracy.

 --
 Sylvain


RE: Cassandra Counters

2012-09-24 Thread Roshni Rajagopal

Hi folks,
   I looked at my mail below, and Im rambling a bit, so Ill try to re-state my 
queries pointwise. 
a) what are the performance tradeoffs on reads  writes between creating a 
standard column family and manually doing the counts by a lookup on a key, 
versus using counters. 
b) whats the current state of counters limitations in the latest version of 
apache cassandra?
c) with there being a possibilty of counter values getting out of sync, would 
counters not be recommended where strong consistency is desired. The normal 
benefits of cassandra's tunable consistency would not be applicable, as 
re-tries may cause overstating. So the normal use case is high performance, and 
where consistency is not paramount.
Regards,roshni


From: roshni_rajago...@hotmail.com
To: user@cassandra.apache.org
Subject: Cassandra Counters
Date: Mon, 24 Sep 2012 16:21:55 +0530





Hi ,
I'm trying to understand if counters are a good fit for my use case.Ive watched 
http://blip.tv/datastax/counters-in-cassandra-5497678 many times over now...and 
still need help!
Suppose I have a list of items- to which I can add or delete a set of items at 
a time,  and I want a count of the items, without considering changing the 
database  or additional components like zookeeper,I have 2 options_ the first 
is a counter col family, and the second is a standard one











 
 
  1. List_Counter_CF
  
  
  
 
 
  
  TotalItems
  
  
  
  
 
 
  ListId
  50
  
  
  
  
 
 
  
  
  
  
  
  
 
 
  2.List_Std_CF


  
  
  
  
  
 
 
  
  TimeUUID1
  TimeUUID2
  TimeUUID3
  TimeUUID4
  TimeUUID5
 
 
  ListId
  3
  70
  -20
  3
  -6
 


And in the second I can add a new col with every set of items added or deleted. 
Over time this row may grow wide.To display the final count, Id need to read 
the row, slice through all columns and add them.
In both cases the writes should be fast, in fact standard col family should be 
faster as there's no read, before write. And for CL ONE write the latency 
should be same. For reads, the first option is very good, just read one column 
for a key
For the second, the read involves reading the row, and adding each column value 
via application code. I dont think there's a way to do math via CQL yet.There 
should be not hot spotting, if the key is sharded well. I could even maintain 
the count derived from the List_Std_CF in a separate column family which is a 
standard col family with the final number, but I could do that as a separate 
process  immediately after the write to List_Std_CF completes, so that its not 
blocking.  I understand cassandra is faster for writes than reads, but how slow 
would Reading by row key be...? Is there any number around after how many 
columns the performance starts deteriorating, or how much worse in performance 
it would be? 
The advantage I see is that I can use the same consistency rules as for the 
rest of column families. If quorum for reads  writes, then you get strongly 
consistent values. In case of counters I see that in case of timeout exceptions 
because the first replica is down or not responding, there's a chance of the 
values getting messed up, and re-trying can mess it up further. Its not 
idempotent like a standard col family design can be.
If it gets messed up, it would need administrator's help (is there a a document 
on how we could resolve counter values going wrong?)
I believe the rest of the limitations still hold good- has anything changed in 
recent versions? In my opinion, they are not as major as the consistency 
question.-removing a counter  then modifying value - behaviour is 
undetermined-special process for counter col family sstable loss( need to 
remove all files)-no TTL support-no secondary indexes

In short, I can recommend counters can be used for analytics or while dealing 
with data where the exact numbers are not important, orwhen its ok to take some 
time to fix the mismatch, and the performance requirements are most 
important.However where the numbers should match , its better to use a std 
column family and a manual implementation.
Please share your thoughts on this.
Regards,roshni  
  

RE: Cassandra Counters

2012-09-24 Thread Milind Parikh
IMO

You would use Cassandra Counters (or other variation of distributed
counting) in case of having determined that a centralized version of
counting is not going to work.

You'd determine the non_feasibility of centralized counting by figuring the
speed at which you need to sustain writes and reads and reconcile that with
your hard disk seek times (essentially).

Once you have proved that you can't do centralized counting, the second
layer of arsenal comes into play; which is distributed counting.

In distributed counting , the CAP theorem comes into life.  in Cassandra,
Availability and Network Partitioning trumps over Consistency.

So yes, you sacrifice strong consistency for availability and partion
tolerance; for eventual consistency.
On Sep 24, 2012 10:28 AM, Roshni Rajagopal roshni_rajago...@hotmail.com
wrote:

  Hi folks,

I looked at my mail below, and Im rambling a bit, so Ill try to
 re-state my queries pointwise.

 a) what are the performance tradeoffs on reads  writes between creating a
 standard column family and manually doing the counts by a lookup on a key,
 versus using counters.

 b) whats the current state of counters limitations in the latest version
 of apache cassandra?

 c) with there being a possibilty of counter values getting out of sync,
 would counters not be recommended where strong consistency is desired. The
 normal benefits of cassandra's tunable consistency would not be applicable,
 as re-tries may cause overstating. So the normal use case is high
 performance, and where consistency is not paramount.

 Regards,
 roshni



 --
 From: roshni_rajago...@hotmail.com
 To: user@cassandra.apache.org
 Subject: Cassandra Counters
 Date: Mon, 24 Sep 2012 16:21:55 +0530

  Hi ,

 I'm trying to understand if counters are a good fit for my use case.
 Ive watched http://blip.tv/datastax/counters-in-cassandra-5497678 many
 times over now...
 and still need help!

 Suppose I have a list of items- to which I can add or delete a set of
 items at a time,  and I want a count of the items, without considering
 changing the database  or additional components like zookeeper,
 I have 2 options_ the first is a counter col family, and the second is a
 standard one
   1. List_Counter_CFTotalItemsListId 502.List_Std_CF

 TimeUUID1 TimeUUID2 TimeUUID3 TimeUUID4 TimeUUID5  ListId 3 70 -20 3
 -6

 And in the second I can add a new col with every set of items added or
 deleted. Over time this row may grow wide.
 To display the final count, Id need to read the row, slice through all
 columns and add them.

 In both cases the writes should be fast, in fact standard col family
 should be faster as there's no read, before write. And for CL ONE write the
 latency should be same.
 For reads, the first option is very good, just read one column for a key

 For the second, the read involves reading the row, and adding each column
 value via application code. I dont think there's a way to do math via CQL
 yet.
 There should be not hot spotting, if the key is sharded well. I could even
 maintain the count derived from the List_Std_CF in a separate column family
 which is a standard col family with the final number, but I could do that
 as a separate process  immediately after the write to List_Std_CF
 completes, so that its not blocking.  I understand cassandra is faster for
 writes than reads, but how slow would Reading by row key be...? Is there
 any number around after how many columns the performance starts
 deteriorating, or how much worse in performance it would be?

 The advantage I see is that I can use the same consistency rules as for
 the rest of column families. If quorum for reads  writes, then you get
 strongly consistent values.
 In case of counters I see that in case of timeout exceptions because the
 first replica is down or not responding, there's a chance of the values
 getting messed up, and re-trying can mess it up further. Its not idempotent
 like a standard col family design can be.

 If it gets messed up, it would need administrator's help (is there a a
 document on how we could resolve counter values going wrong?)

 I believe the rest of the limitations still hold good- has anything
 changed in recent versions? In my opinion, they are not as major as the
 consistency question.
 -removing a counter  then modifying value - behaviour is undetermined
 -special process for counter col family sstable loss( need to remove all
 files)
 -no TTL support
 -no secondary indexes


 In short, I can recommend counters can be used for analytics or while
 dealing with data where the exact numbers are not important, or
 when its ok to take some time to fix the mismatch, and the performance
 requirements are most important.
 However where the numbers should match , its better to use a std column
 family and a manual implementation.

 Please share your thoughts on this.

 Regards,
 roshni




RE: Cassandra Counters

2012-09-24 Thread Roshni Rajagopal

Thanks Milind,
Has anyone implemented counting in a standard col family in cassandra, when you 
can have increments and decrements to the count. Any comparisons in performance 
to using counter column families? 
Regards,Roshni

Date: Mon, 24 Sep 2012 11:02:51 -0700
Subject: RE: Cassandra Counters
From: milindpar...@gmail.com
To: user@cassandra.apache.org

IMO
You would use Cassandra Counters (or other variation of distributed counting) 
in case of having determined that a centralized version of counting is not 
going to work.
You'd determine the non_feasibility of centralized counting by figuring the 
speed at which you need to sustain writes and reads and reconcile that with 
your hard disk seek times (essentially).
Once you have proved that you can't do centralized counting, the second layer 
of arsenal comes into play; which is distributed counting.
In distributed counting , the CAP theorem comes into life.  in Cassandra, 
Availability and Network Partitioning trumps over Consistency. 

 

So yes, you sacrifice strong consistency for availability and partion 
tolerance; for eventual consistency.
On Sep 24, 2012 10:28 AM, Roshni Rajagopal roshni_rajago...@hotmail.com 
wrote:





Hi folks,
   I looked at my mail below, and Im rambling a bit, so Ill try to re-state my 
queries pointwise. 
a) what are the performance tradeoffs on reads  writes between creating a 
standard column family and manually doing the counts by a lookup on a key, 
versus using counters. 

b) whats the current state of counters limitations in the latest version of 
apache cassandra?
c) with there being a possibilty of counter values getting out of sync, would 
counters not be recommended where strong consistency is desired. The normal 
benefits of cassandra's tunable consistency would not be applicable, as 
re-tries may cause overstating. So the normal use case is high performance, and 
where consistency is not paramount.

Regards,roshni


From: roshni_rajago...@hotmail.com
To: user@cassandra.apache.org

Subject: Cassandra Counters
Date: Mon, 24 Sep 2012 16:21:55 +0530





Hi ,
I'm trying to understand if counters are a good fit for my use case.Ive watched 
http://blip.tv/datastax/counters-in-cassandra-5497678 many times over now...
and still need help!
Suppose I have a list of items- to which I can add or delete a set of items at 
a time,  and I want a count of the items, without considering changing the 
database  or additional components like zookeeper,
I have 2 options_ the first is a counter col family, and the second is a 
standard one











 
 
  1. List_Counter_CF
  
  
  
 
 
  
  TotalItems
  
  
  
  
 
 
  ListId
  50
  
  
  
  
 
 
  
  
  
  
  
  
 
 
  2.List_Std_CF


  
  
  
  
  
 
 
  
  TimeUUID1
  TimeUUID2
  TimeUUID3
  TimeUUID4
  TimeUUID5
 
 
  ListId
  3
  70
  -20
  3
  -6
 


And in the second I can add a new col with every set of items added or deleted. 
Over time this row may grow wide.To display the final count, Id need to read 
the row, slice through all columns and add them.

In both cases the writes should be fast, in fact standard col family should be 
faster as there's no read, before write. And for CL ONE write the latency 
should be same. For reads, the first option is very good, just read one column 
for a key

For the second, the read involves reading the row, and adding each column value 
via application code. I dont think there's a way to do math via CQL yet.There 
should be not hot spotting, if the key is sharded well. I could even maintain 
the count derived from the List_Std_CF in a separate column family which is a 
standard col family with the final number, but I could do that as a separate 
process  immediately after the write to List_Std_CF completes, so that its not 
blocking.  I understand cassandra is faster for writes than reads, but how slow 
would Reading by row key be...? Is there any number around after how many 
columns the performance starts deteriorating, or how much worse in performance 
it would be? 

The advantage I see is that I can use the same consistency rules as for the 
rest of column families. If quorum for reads  writes, then you get strongly 
consistent values. In case of counters I see that in case of timeout exceptions 
because the first replica is down or not responding, there's a chance of the 
values getting messed up, and re-trying can mess it up further. Its not 
idempotent like a standard col family design can be.

If it gets messed up, it would need administrator's help (is there a a document 
on how we could resolve counter values going wrong?)
I believe the rest of the limitations still hold good- has anything changed in 
recent versions? In my opinion, they are not as major as the consistency 
question.
-removing a counter  then modifying value - behaviour is undetermined-special 
process for counter col family sstable loss( need to remove all files)-no TTL 
support-no secondary indexes


In short, I can recommend counters can be used

Re: Cassandra Counters

2012-09-24 Thread Oleksandr Petrov
Maybe I'm missing the point, but counting in a standard column family would
be a little overkill.

I assume that distributed counting here was more of a map/reduce
approach, where Hadoop (+ Cascading, Pig, Hive, Cascalog) would help you a
lot. We're doing some more complex counting (e.q. based on sets of rules)
like that. Of course, that would perform _way_ slower than counting
beforehand. On the other side, you will always have a consistent result for
a consistent dataset.

On the other hand, if you use things like AMQP or Storm (sorry to put up my
sentence together like that, as tools are mostly either orthogonal or
complementary, but I hope you get my point), you could build a topology
that makes fault-tolerant writes independently of your original write. Of
course, it would still have a consistency tradeoff, mostly because of race
conditions and different network latencies etc.

So I would say that building a data model in a distributed system often
depends more on your problem than on the common patterns, because
everything has a tradeoff.

Want to have an immediate result? Modify your counter while writing the row.
Can sacrifice speed, but have more counting opportunities? Go with offline
distributed counting.
Want to have kind of both, dispatch a message and react upon it, having the
processing logic and writes decoupled from main application, allowing you
to care less about speed.

However, I may have missed the point somewhere (early morning, you know),
so I may be wrong in any given statement.
Cheers


On Tue, Sep 25, 2012 at 6:53 AM, Roshni Rajagopal 
roshni_rajago...@hotmail.com wrote:

  Thanks Milind,

 Has anyone implemented counting in a standard col family in cassandra,
 when you can have increments and decrements to the count.
 Any comparisons in performance to using counter column families?

 Regards,
 Roshni


 --
 Date: Mon, 24 Sep 2012 11:02:51 -0700
 Subject: RE: Cassandra Counters
 From: milindpar...@gmail.com
 To: user@cassandra.apache.org


 IMO
 You would use Cassandra Counters (or other variation of distributed
 counting) in case of having determined that a centralized version of
 counting is not going to work.
 You'd determine the non_feasibility of centralized counting by figuring
 the speed at which you need to sustain writes and reads and reconcile that
 with your hard disk seek times (essentially).
 Once you have proved that you can't do centralized counting, the second
 layer of arsenal comes into play; which is distributed counting.
 In distributed counting , the CAP theorem comes into life.  in Cassandra,
 Availability and Network Partitioning trumps over Consistency.

 So yes, you sacrifice strong consistency for availability and partion
 tolerance; for eventual consistency.
 On Sep 24, 2012 10:28 AM, Roshni Rajagopal roshni_rajago...@hotmail.com
 wrote:

  Hi folks,

I looked at my mail below, and Im rambling a bit, so Ill try to
 re-state my queries pointwise.

 a) what are the performance tradeoffs on reads  writes between creating a
 standard column family and manually doing the counts by a lookup on a key,
 versus using counters.

 b) whats the current state of counters limitations in the latest version
 of apache cassandra?

 c) with there being a possibilty of counter values getting out of sync,
 would counters not be recommended where strong consistency is desired. The
 normal benefits of cassandra's tunable consistency would not be applicable,
 as re-tries may cause overstating. So the normal use case is high
 performance, and where consistency is not paramount.

 Regards,
 roshni



 --
 From: roshni_rajago...@hotmail.com
 To: user@cassandra.apache.org
 Subject: Cassandra Counters
 Date: Mon, 24 Sep 2012 16:21:55 +0530

  Hi ,

 I'm trying to understand if counters are a good fit for my use case.
 Ive watched http://blip.tv/datastax/counters-in-cassandra-5497678 many
 times over now...
 and still need help!

 Suppose I have a list of items- to which I can add or delete a set of
 items at a time,  and I want a count of the items, without considering
 changing the database  or additional components like zookeeper,
 I have 2 options_ the first is a counter col family, and the second is a
 standard one
   1. List_Counter_CFTotalItemsListId 502.List_Std_CF

 TimeUUID1 TimeUUID2 TimeUUID3 TimeUUID4 TimeUUID5  ListId 3 70 -20 3
 -6

 And in the second I can add a new col with every set of items added or
 deleted. Over time this row may grow wide.
 To display the final count, Id need to read the row, slice through all
 columns and add them.

 In both cases the writes should be fast, in fact standard col family
 should be faster as there's no read, before write. And for CL ONE write the
 latency should be same.
 For reads, the first option is very good, just read one column for a key

 For the second, the read involves reading the row, and adding each column
 value via application code. I

TTL and Cassandra counters

2012-08-26 Thread Avi-h
Hello

Our current application uses Cassandra to hold the chat items for user’s
conversation and a counter of unread chat messages (per each conversation).
We use TTL to delete old chat items, but we fail to see how we can define a
call back which will trigger an update (decrease) to the counters’ value.

Please consult on how we can achieve a solution for this issue..




--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/TTL-and-Cassandra-counters-tp7581990.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Cassandra Counters and TTL

2011-11-07 Thread Vlad Paiu

Hello,

Thanks for your answer. See my reply in-line.

On 11/04/2011 01:46 PM, Amit Chavan wrote:


Answers inline.

On Fri, Nov 4, 2011 at 4:59 PM, Vlad Paiu vladp...@opensips.org 
mailto:vladp...@opensips.org wrote:


Hello,

I'm a new user of Cassandra and I think it's great.
Still, while developing my APP using Cassandra, I got stuck with
some things and I'm not really sure that Cassandra can handle them
at the moment.

So, first of all, does Cassandra allow for Counters and regular
Keys to be located in the same ColumnFamily ?

What do you mean when you say regular Keys? If you are hinting at 
columns apart from counters, then the answer is *no*: only counters 
can exist in a CounterColumnFamily and other column families cannot 
hold counters.

Yes, this is what I was asking. Thanks for the answer.



Secondly, is there any way to dynamically set the TTL for a key ?
In the sense that I have a key, I initially set it with no TTL,
but after a while I decide that it should expire in 100 seconds.
Can Cassandra do this ?

TTL is not for one key, it is for one column.


When I was saying 'Key' I actually meant to say column. Seems I'm not 
yet very acquainted with Cassandra terminology. So in the end, can you 
dynamically alter the TTL of a Column ?




3. Can counters have a TTL ?

No. Currently, counters do not (or if I am correct - cannot) have TTL.



Ok. Any info if this will be implemented anytime soon ?


4. Is there any way to atomically reset a counter ? I read on the
website that the only way to do it is read the variable value, and
then set it to -value, which seems rather bogus to me.

I think that is the only way to reset a counter. I would like to know 
if there is another way.


Ok then, waiting for someone to confirm. It's bad that you cannot 
atomically reset a counter value, as a two-way resetting might lead to 
undetermined behaviour.


Also, can I set the counter to a specific value, without keeping state 
on the client ? For example, if the client does not know the current 
counter value is 3. Can it set the counter value to 10, without first 
getting the counter value, and then incrementing by 7 ?


Background: I am using Cassandra since the past two months. Hope the 
community corrects me if I am wrong.



Regards,

-- 
Vlad Paiu

OpenSIPS Developer




--
Regards
Amit S. Chavan





Regards,

Vlad Paiu
OpenSIPS Developer




Re: Cassandra Counters and TTL

2011-11-07 Thread Sylvain Lebresne
On Mon, Nov 7, 2011 at 10:12 AM, Vlad Paiu vladp...@opensips.org wrote:
 Hello,

 Thanks for your answer. See my reply in-line.

 On 11/04/2011 01:46 PM, Amit Chavan wrote:

 Answers inline.

 On Fri, Nov 4, 2011 at 4:59 PM, Vlad Paiu vladp...@opensips.org wrote:

 Hello,

 I'm a new user of Cassandra and I think it's great.
 Still, while developing my APP using Cassandra, I got stuck with some
 things and I'm not really sure that Cassandra can handle them at the moment.

 So, first of all, does Cassandra allow for Counters and regular Keys to be
 located in the same ColumnFamily ?

 What do you mean when you say regular Keys? If you are hinting at columns
 apart from counters, then the answer is *no*: only counters can exist in a
 CounterColumnFamily and other column families cannot hold counters.


 Yes, this is what I was asking. Thanks for the answer.

 Secondly, is there any way to dynamically set the TTL for a key ? In the
 sense that I have a key, I initially set it with no TTL, but after a while I
 decide that it should expire in 100 seconds. Can Cassandra do this ?

 TTL is not for one key, it is for one column.

 When I was saying 'Key' I actually meant to say column. Seems I'm not yet
 very acquainted with Cassandra terminology. So in the end, can you
 dynamically alter the TTL of a Column ?

You'll have to update the column with the new TTL. Which does involve
that you know the column value and so may require reading the column
first.




 3. Can counters have a TTL ?

 No. Currently, counters do not (or if I am correct - cannot) have TTL.

 Ok. Any info if this will be implemented anytime soon ?

The current status is not anytime soon because we don't have a good solution
for it so far. See https://issues.apache.org/jira/browse/CASSANDRA-2103 for
more details.


 4. Is there any way to atomically reset a counter ? I read on the website
 that the only way to do it is read the variable value, and then set it to
 -value, which seems rather bogus to me.

 I think that is the only way to reset a counter. I would like to know if
 there is another way.

 Ok then, waiting for someone to confirm. It's bad that you cannot atomically
 reset a counter value, as a two-way resetting might lead to undetermined
 behaviour.

There is no other way. Which does mean that you need some external way
to make sure that not two client will attempt resetting the same counter at
the same time. Or model so that you don't need counter resets (I'm not
saying this is always possible, but there is probably a number of cases
where resetting a counter could be replaced by switching to a brand new
counter).

 Also, can I set the counter to a specific value, without keeping state on
 the client ? For example, if the client does not know the current counter
 value is 3. Can it set the counter value to 10, without first getting the
 counter value, and then incrementing by 7 ?

No.

--
Sylvain


 Background: I am using Cassandra since the past two months. Hope the
 community corrects me if I am wrong.


 Regards,

 --
 Vlad Paiu
 OpenSIPS Developer




 --
 Regards
 Amit S. Chavan




 Regards,

 Vlad Paiu
 OpenSIPS Developer



Cassandra Counters and TTL

2011-11-04 Thread Vlad Paiu

Hello,

I'm a new user of Cassandra and I think it's great.
Still, while developing my APP using Cassandra, I got stuck with some 
things and I'm not really sure that Cassandra can handle them at the 
moment.


So, first of all, does Cassandra allow for Counters and regular Keys to 
be located in the same ColumnFamily ?


Secondly, is there any way to dynamically set the TTL for a key ? In the 
sense that I have a key, I initially set it with no TTL, but after a 
while I decide that it should expire in 100 seconds. Can Cassandra do 
this ?


3. Can counters have a TTL ?

4. Is there any way to atomically reset a counter ? I read on the 
website that the only way to do it is read the variable value, and then 
set it to -value, which seems rather bogus to me.


Regards,

--
Vlad Paiu
OpenSIPS Developer



Re: Cassandra Counters and TTL

2011-11-04 Thread Amit Chavan
Answers inline.

On Fri, Nov 4, 2011 at 4:59 PM, Vlad Paiu vladp...@opensips.org wrote:

 Hello,

 I'm a new user of Cassandra and I think it's great.
 Still, while developing my APP using Cassandra, I got stuck with some
 things and I'm not really sure that Cassandra can handle them at the moment.

 So, first of all, does Cassandra allow for Counters and regular Keys to be
 located in the same ColumnFamily ?

What do you mean when you say regular Keys? If you are hinting at columns
apart from counters, then the answer is *no*: only counters can exist in a
CounterColumnFamily and other column families cannot hold counters.



 Secondly, is there any way to dynamically set the TTL for a key ? In the
 sense that I have a key, I initially set it with no TTL, but after a while
 I decide that it should expire in 100 seconds. Can Cassandra do this ?

TTL is not for one key, it is for one column.



 3. Can counters have a TTL ?

No. Currently, counters do not (or if I am correct - cannot) have TTL.


 4. Is there any way to atomically reset a counter ? I read on the website
 that the only way to do it is read the variable value, and then set it to
 -value, which seems rather bogus to me.

I think that is the only way to reset a counter. I would like to know if
there is another way.

Background: I am using Cassandra since the past two months. Hope the
community corrects me if I am wrong.



 Regards,

 --
 Vlad Paiu
 OpenSIPS Developer




-- 
Regards
Amit S. Chavan


Cassandra Counters and Replication Factor

2011-10-12 Thread Amit Chavan
Hi,

Looking at this talk (
http://www.datastax.com/wp-content/uploads/2011/07/cassandra_sf_counters.pdf)
by Sylvain Lesbresne at DataStax, I had a few questions related to my
understanding Cassandra architecture.

Assuming that we have a keyspace in Cassandra with:
1. Replication Factor (RF) = 1.
2. Counters as a counter column family having row-key as a row key which
has cnt as a counter column.
3. We always update Counters[row-key][cnt] with a Consistency level of
ONE.

My understanding is that in such a case, the updates/second of that counter
will be limited by the performance of just one node in the cluster. Adding
new nodes will not increase the rate of update. However if RF was 3 (keeping
everything else same), updates/second would roughly have been 3 times the
current value. Am I correct here?

Moreover, any write operation to a column in a key in the above mentioned
configuration can scale only if RF increases. Is this inference correct?

-- 
Regards
Amit S. Chavan