Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

2010-04-11 Thread Paul Prescod
On Sun, Apr 11, 2010 at 3:30 AM, Mark Robson  wrote:
> Can we not implement counts by just storing all the deltas in a row, and
> then summing them all up to acheive a count.
>
> If a row ends up with too many deltas, a reader could just summarise the
> deltas occasionally into a single value (in a way which avoids race
> conditions, of course).

How do you avoid the race condition? Don't you need a lock?

 Paul Prescod
 Ayogo, Inc.


Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

2010-04-11 Thread Mark Robson
Can we not implement counts by just storing all the deltas in a row, and
then summing them all up to acheive a count.

If a row ends up with too many deltas, a reader could just summarise the
deltas occasionally into a single value (in a way which avoids race
conditions, of course).

So you'd map

key => { uniqueid: delta1, uniqueid: delta2 }

Every column in Cassandra also has a timestamp, so your app can decide, when
it does a read, which deltas to summarise.

Mark


Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

2010-04-11 Thread Boris Shulman
What will be the latency for the zk based atomic increase?

On Tue, Apr 6, 2010 at 8:22 PM, Chris Goffinet  wrote:
> http://issues.apache.org/jira/browse/CASSANDRA-704
> http://issues.apache.org/jira/browse/CASSANDRA-721
> We have our own internal codebase of Cassandra at Digg. But we are using
> those above patches until we have the vector clock work cleaned up, that
> patch will also goto jira. Most likely the vector clock work will go into
> 0.7, but since we run 0.6 and built it for that version, we will share that
> patch too.
> -Chris
> On Apr 6, 2010, at 10:17 AM, S Ahmed wrote:
>
> Chris,
> When you so patch, does that mean for Cassandra or your own internal
> codebase?
> Sounds interesting thanks!
>
> On Tue, Apr 6, 2010 at 12:54 PM, Chris Goffinet  wrote:
>>
>> That's not true. We have been using the Zookeper work we posted on jira.
>> That's what we are using internally and have been for months. We are now
>> just wrapping up our vector clocks + distributed counter patch so we can
>> begin transitioning away from the Zookeeper approach because there are
>> problems with it long-term.
>>
>> -Chris
>>
>> On Apr 6, 2010, at 9:50 AM, Ryan King wrote:
>>
>> > They don't use cassandra for it yet.
>> >
>> > -ryan
>> >
>> > On Tue, Apr 6, 2010 at 9:00 AM, S Ahmed  wrote:
>> >> From what I read in another thread, Cassandra isn't used for isn't
>> >> 'ideal'
>> >> for keeping track of counts.
>> >> For example, I would undertand this to mean keeping track of which
>> >> stories
>> >> were dugg.
>> >> If this is true, how would a site like digg keep track of the 'dugg'
>> >> counter?
>> >> Also, I am assuming with eventual consistancy the number *may* not be
>> >> 100%
>> >> accurate.  If you wanted it to be accurate, would you just use the
>> >> Quorom
>> >> flag? (I believe quorom is to ensure all writes are written to disk)
>>
>
>
>


Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

2010-04-06 Thread Chris Goffinet
http://issues.apache.org/jira/browse/CASSANDRA-704
http://issues.apache.org/jira/browse/CASSANDRA-721

We have our own internal codebase of Cassandra at Digg. But we are using those 
above patches until we have the vector clock work cleaned up, that patch will 
also goto jira. Most likely the vector clock work will go into 0.7, but since 
we run 0.6 and built it for that version, we will share that patch too.

-Chris

On Apr 6, 2010, at 10:17 AM, S Ahmed wrote:

> Chris,
> 
> When you so patch, does that mean for Cassandra or your own internal 
> codebase?  
> 
> Sounds interesting thanks!
> 
> On Tue, Apr 6, 2010 at 12:54 PM, Chris Goffinet  wrote:
> That's not true. We have been using the Zookeper work we posted on jira. 
> That's what we are using internally and have been for months. We are now just 
> wrapping up our vector clocks + distributed counter patch so we can begin 
> transitioning away from the Zookeeper approach because there are problems 
> with it long-term.
> 
> -Chris
> 
> On Apr 6, 2010, at 9:50 AM, Ryan King wrote:
> 
> > They don't use cassandra for it yet.
> >
> > -ryan
> >
> > On Tue, Apr 6, 2010 at 9:00 AM, S Ahmed  wrote:
> >> From what I read in another thread, Cassandra isn't used for isn't 'ideal'
> >> for keeping track of counts.
> >> For example, I would undertand this to mean keeping track of which stories
> >> were dugg.
> >> If this is true, how would a site like digg keep track of the 'dugg'
> >> counter?
> >> Also, I am assuming with eventual consistancy the number *may* not be 100%
> >> accurate.  If you wanted it to be accurate, would you just use the Quorom
> >> flag? (I believe quorom is to ensure all writes are written to disk)
> 
> 



Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

2010-04-06 Thread S Ahmed
Chris,

When you so patch, does that mean for Cassandra or your own internal
codebase?

Sounds interesting thanks!

On Tue, Apr 6, 2010 at 12:54 PM, Chris Goffinet  wrote:

> That's not true. We have been using the Zookeper work we posted on jira.
> That's what we are using internally and have been for months. We are now
> just wrapping up our vector clocks + distributed counter patch so we can
> begin transitioning away from the Zookeeper approach because there are
> problems with it long-term.
>
> -Chris
>
> On Apr 6, 2010, at 9:50 AM, Ryan King wrote:
>
> > They don't use cassandra for it yet.
> >
> > -ryan
> >
> > On Tue, Apr 6, 2010 at 9:00 AM, S Ahmed  wrote:
> >> From what I read in another thread, Cassandra isn't used for isn't
> 'ideal'
> >> for keeping track of counts.
> >> For example, I would undertand this to mean keeping track of which
> stories
> >> were dugg.
> >> If this is true, how would a site like digg keep track of the 'dugg'
> >> counter?
> >> Also, I am assuming with eventual consistancy the number *may* not be
> 100%
> >> accurate.  If you wanted it to be accurate, would you just use the
> Quorom
> >> flag? (I believe quorom is to ensure all writes are written to disk)
>
>


Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

2010-04-06 Thread Chris Goffinet
That's not true. We have been using the Zookeper work we posted on jira. That's 
what we are using internally and have been for months. We are now just wrapping 
up our vector clocks + distributed counter patch so we can begin transitioning 
away from the Zookeeper approach because there are problems with it long-term. 

-Chris

On Apr 6, 2010, at 9:50 AM, Ryan King wrote:

> They don't use cassandra for it yet.
> 
> -ryan
> 
> On Tue, Apr 6, 2010 at 9:00 AM, S Ahmed  wrote:
>> From what I read in another thread, Cassandra isn't used for isn't 'ideal'
>> for keeping track of counts.
>> For example, I would undertand this to mean keeping track of which stories
>> were dugg.
>> If this is true, how would a site like digg keep track of the 'dugg'
>> counter?
>> Also, I am assuming with eventual consistancy the number *may* not be 100%
>> accurate.  If you wanted it to be accurate, would you just use the Quorom
>> flag? (I believe quorom is to ensure all writes are written to disk)



Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

2010-04-06 Thread S Ahmed
Is it just the counters they are using mysql/postgresql for or also the list
of stories?

e.g. get me the top stories in category x.

On Tue, Apr 6, 2010 at 12:50 PM, Ryan King  wrote:

> They don't use cassandra for it yet.
>
> -ryan
>
> On Tue, Apr 6, 2010 at 9:00 AM, S Ahmed  wrote:
> > From what I read in another thread, Cassandra isn't used for isn't
> 'ideal'
> > for keeping track of counts.
> > For example, I would undertand this to mean keeping track of which
> stories
> > were dugg.
> > If this is true, how would a site like digg keep track of the 'dugg'
> > counter?
> > Also, I am assuming with eventual consistancy the number *may* not be
> 100%
> > accurate.  If you wanted it to be accurate, would you just use the Quorom
> > flag? (I believe quorom is to ensure all writes are written to disk)
>


Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

2010-04-06 Thread Ryan King
They don't use cassandra for it yet.

-ryan

On Tue, Apr 6, 2010 at 9:00 AM, S Ahmed  wrote:
> From what I read in another thread, Cassandra isn't used for isn't 'ideal'
> for keeping track of counts.
> For example, I would undertand this to mean keeping track of which stories
> were dugg.
> If this is true, how would a site like digg keep track of the 'dugg'
> counter?
> Also, I am assuming with eventual consistancy the number *may* not be 100%
> accurate.  If you wanted it to be accurate, would you just use the Quorom
> flag? (I believe quorom is to ensure all writes are written to disk)


if cassandra isn't ideal for keep track of counts, how does digg count diggs?

2010-04-06 Thread S Ahmed
>From what I read in another thread, Cassandra isn't used for isn't 'ideal'
for keeping track of counts.

For example, I would undertand this to mean keeping track of which stories
were dugg.

If this is true, how would a site like digg keep track of the 'dugg'
counter?

Also, I am assuming with eventual consistancy the number *may* not be 100%
accurate.  If you wanted it to be accurate, would you just use the Quorom
flag? (I believe quorom is to ensure all writes are written to disk)