Re: Counters and Top 10

2011-12-25 Thread Benoit Perroud
With Composite Column Name, you can even have column composed of sore
(int) and userid (uuid or whatever). Empty column value to avoid
repeating user UUID.


2011/12/22 R. Verlangen ro...@us2.nl:
 I would suggest you to create a CF with a single row (or multiple for
 historical data) with a date as key (utf8, e.g. 2011-12-22) and multiple
 columns for every user's score. The column (utf8) would then be the score +
 something unique of the user (e.g. hex representation of the TimeUUID). The
 value would be the TimeUUID of the user.

 By default columns will be sorted and you can perform a slice to get the top
 10.

 2011/12/14 cbert...@libero.it cbert...@libero.it

 Hi all,
 I'm using Cassandra in production for a small social network (~10.000
 people).
 Now I have to assign some credits to each user operation (login, write
 post
 and so on) and then beeing capable of providing in each moment the top 10
 of
 the most active users. I'm on Cassandra 0.7.6 I'd like to migrate to a new
 version in order to use Counters for the user points but ... what about
 the top
 10?
 I was thinking about a specific ROW that always keeps the 10 most active
 users
 ... but I think it would be heavy (to write and to handle in thread-safe
 mode)
 ... can counters provide something like a value ordered list?

 Thanks for any help.
 Best regards,

 Carlo






-- 
sent from my Nokia 3210


R: Re: Counters and Top 10

2011-12-25 Thread cbert...@libero.it
Hi all,
I've red all your messages concerning the top 10 ... any solution is possibile 
but I still did not find the best one.

Using a composite Column Name as suggested would be smart cause it brings to a 
sorted row where I can have my top-10 in any moment but it can slow down all 
the platform since, for every operation, I have to read data from cassandra, 
calculate and store data back. Using counters I could just say hey, +1 on 
this and forget. But using counters I don't have any kind of value-sorting ...

I know redis but I think it's too much to use a new key-value db just for this 
sorting ... I think I'll use a thread that run every X to generate the top10 
row ... it won't be realtime but at least it will keep platform performance to 
a good level.

Thank you all and merry christmas

Messaggio originale
Da: ben...@noisette.ch
Data: 25/12/2011 10.19
A: user@cassandra.apache.org
Ogg: Re: Counters and Top 10

With Composite Column Name, you can even have column composed of sore
(int) and userid (uuid or whatever). Empty column value to avoid
repeating user UUID.


2011/12/22 R. Verlangen ro...@us2.nl:
 I would suggest you to create a CF with a single row (or multiple for
 historical data) with a date as key (utf8, e.g. 2011-12-22) and multiple
 columns for every user's score. The column (utf8) would then be the score +
 something unique of the user (e.g. hex representation of the TimeUUID). The
 value would be the TimeUUID of the user.

 By default columns will be sorted and you can perform a slice to get the 
top
 10.

 2011/12/14 cbert...@libero.it cbert...@libero.it

 Hi all,
 I'm using Cassandra in production for a small social network (~10.000
 people).
 Now I have to assign some credits to each user operation (login, write
 post
 and so on) and then beeing capable of providing in each moment the top 10
 of
 the most active users. I'm on Cassandra 0.7.6 I'd like to migrate to a new
 version in order to use Counters for the user points but ... what about
 the top
 10?
 I was thinking about a specific ROW that always keeps the 10 most active
 users
 ... but I think it would be heavy (to write and to handle in thread-safe
 mode)
 ... can counters provide something like a value ordered list?

 Thanks for any help.
 Best regards,

 Carlo






-- 
sent from my Nokia 3210





Re: Counters and Top 10

2011-12-24 Thread Janne Jalkanen

In our case we didn't need an exact daily top-10 list of pages, just a good 
guess of it.  So the way we did it was to insert a column with a short TTL 
(e.g. 12 hours) with the page id as the column name.  Then, when constructing 
the top-10 list, we'd just slice through the entire list of unexpired page 
id's, get the actual activity data for each from another CF and then sort.  The 
theory is that if a page is popular, they'd be referenced at least once in the 
past 12 hours anyway.  Depending on the size of your hot pages and the 
frequency at which you'd need the top-10 list, you can then tune the TTL 
accordingly.  We started at 24 hrs, then went down to 12 and then gradually 
downwards.

So while it's not guaranteed to be the precise top-10 list for the day, it is a 
fairly accurate sampling of one.

/Janne

On 23 Dec 2011, at 11:52, aaron morton wrote:

 Counters only update the value of the column, they cannot be used as column 
 names. So you cannot have a dynamically updating top ten list using counters.
 
 You have a couple of options. First use something like redis if that fits 
 your use case. Redis could either be the database of record for the counts. 
 Or just an aggregation layer, write the data to cassandra and sorted sets in 
 redis then read the top ten from redis and use cassandra to rebuild redis if 
 needed. 
 
 The other is to periodically pivot the counts into a top ten row where you 
 use regular integers for the column name. With only 10K users you could do 
 this with an process that periodically reads all the users rows or where ever 
 the counters are and updates the aggregate row. Depending on data size you 
 cold use hive/pig or whatever regular programming language your are happy 
 with.
 
 I guess you could also use redis to keep the top ten sorted and then 
 periodically dump that back to cassandra and serve the read traffic from 
 there.  
 
 Hope that helps 
 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 23/12/2011, at 3:46 AM, R. Verlangen wrote:
 
 I would suggest you to create a CF with a single row (or multiple for 
 historical data) with a date as key (utf8, e.g. 2011-12-22) and multiple 
 columns for every user's score. The column (utf8) would then be the score + 
 something unique of the user (e.g. hex representation of the TimeUUID). The 
 value would be the TimeUUID of the user.
 
 By default columns will be sorted and you can perform a slice to get the top 
 10.
 
 2011/12/14 cbert...@libero.it cbert...@libero.it
 Hi all,
 I'm using Cassandra in production for a small social network (~10.000 
 people).
 Now I have to assign some credits to each user operation (login, write post
 and so on) and then beeing capable of providing in each moment the top 10 of
 the most active users. I'm on Cassandra 0.7.6 I'd like to migrate to a new
 version in order to use Counters for the user points but ... what about the 
 top
 10?
 I was thinking about a specific ROW that always keeps the 10 most active 
 users
 ... but I think it would be heavy (to write and to handle in thread-safe 
 mode)
 ... can counters provide something like a value ordered list?
 
 Thanks for any help.
 Best regards,
 
 Carlo
 
 
 
 



Re: Counters and Top 10

2011-12-23 Thread aaron morton
Counters only update the value of the column, they cannot be used as column 
names. So you cannot have a dynamically updating top ten list using counters.

You have a couple of options. First use something like redis if that fits your 
use case. Redis could either be the database of record for the counts. Or just 
an aggregation layer, write the data to cassandra and sorted sets in redis then 
read the top ten from redis and use cassandra to rebuild redis if needed. 

The other is to periodically pivot the counts into a top ten row where you use 
regular integers for the column name. With only 10K users you could do this 
with an process that periodically reads all the users rows or where ever the 
counters are and updates the aggregate row. Depending on data size you cold use 
hive/pig or whatever regular programming language your are happy with.

I guess you could also use redis to keep the top ten sorted and then 
periodically dump that back to cassandra and serve the read traffic from there. 
 

Hope that helps 


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/12/2011, at 3:46 AM, R. Verlangen wrote:

 I would suggest you to create a CF with a single row (or multiple for 
 historical data) with a date as key (utf8, e.g. 2011-12-22) and multiple 
 columns for every user's score. The column (utf8) would then be the score + 
 something unique of the user (e.g. hex representation of the TimeUUID). The 
 value would be the TimeUUID of the user.
 
 By default columns will be sorted and you can perform a slice to get the top 
 10.
 
 2011/12/14 cbert...@libero.it cbert...@libero.it
 Hi all,
 I'm using Cassandra in production for a small social network (~10.000 people).
 Now I have to assign some credits to each user operation (login, write post
 and so on) and then beeing capable of providing in each moment the top 10 of
 the most active users. I'm on Cassandra 0.7.6 I'd like to migrate to a new
 version in order to use Counters for the user points but ... what about the 
 top
 10?
 I was thinking about a specific ROW that always keeps the 10 most active users
 ... but I think it would be heavy (to write and to handle in thread-safe mode)
 ... can counters provide something like a value ordered list?
 
 Thanks for any help.
 Best regards,
 
 Carlo
 
 
 



Re: Counters and Top 10

2011-12-22 Thread R. Verlangen
I would suggest you to create a CF with a single row (or multiple for
historical data) with a date as key (utf8, e.g. 2011-12-22) and multiple
columns for every user's score. The column (utf8) would then be the score +
something unique of the user (e.g. hex representation of the TimeUUID). The
value would be the TimeUUID of the user.

By default columns will be sorted and you can perform a slice to get the
top 10.

2011/12/14 cbert...@libero.it cbert...@libero.it

 Hi all,
 I'm using Cassandra in production for a small social network (~10.000
 people).
 Now I have to assign some credits to each user operation (login, write
 post
 and so on) and then beeing capable of providing in each moment the top 10
 of
 the most active users. I'm on Cassandra 0.7.6 I'd like to migrate to a new
 version in order to use Counters for the user points but ... what about
 the top
 10?
 I was thinking about a specific ROW that always keeps the 10 most active
 users
 ... but I think it would be heavy (to write and to handle in thread-safe
 mode)
 ... can counters provide something like a value ordered list?

 Thanks for any help.
 Best regards,

 Carlo





Counters and Top 10

2011-12-14 Thread cbert...@libero.it
Hi all,
I'm using Cassandra in production for a small social network (~10.000 people).
Now I have to assign some credits to each user operation (login, write post 
and so on) and then beeing capable of providing in each moment the top 10 of 
the most active users. I'm on Cassandra 0.7.6 I'd like to migrate to a new 
version in order to use Counters for the user points but ... what about the top 
10?
I was thinking about a specific ROW that always keeps the 10 most active users 
... but I think it would be heavy (to write and to handle in thread-safe mode) 
... can counters provide something like a value ordered list?

Thanks for any help. 
Best regards,

Carlo