fixed size collection possible?

2014-04-22 Thread Jimmy Lin
hi,
look at the collection type support in cql3,
e.g
http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_list_t.html

we can append or remove using + and - operator

UPDATE users
  SET top_places = top_places + [ 'mordor' ] WHERE user_id = 'frodo';

UPDATE users
  SET top_places = top_places - ['riddermark'] WHERE user_id = 'frodo';


is there a way to keep a fixed size of the list(collection) ?

I was thinking about using TTL to remove older data after certain time
but then the list will become too big if the ttl is too long, and if
ttl is too short I running the risk of having a empty list(if there is
no new activity).

Even if I don't use collection type and have my own table, I still ran
into the same issue.


Any recommendation to handle this type of situation?


thanks


Re: fixed size collection possible?

2014-04-22 Thread Chris Lohfink
It isn’t natively supported but theres some things you can do if need it.

A lot depends on how frequently this list is getting updated. For heavier 
workloads I would recommend using a custom CF for this instead of collections.  
If extreme inserts you would want to add additional partitioning to it as well. 
 As mentioned below Id recommend having a cleanup MR job to periodically clean 
it up if the cost of TTLs possibly leading to 0 entries is too expensive.  
Putting it in its own CF helps in that it removes the elements of the list from 
polluting your users partition.  If there gets to be a lot of 
tombstones/inserts this could make reading the user bad (it would look like 
queue which has horrible performance) so it will at least section off that 
badness from the regular user lookups.

CREATE TABLE user_top_places (
  user_id varchar,
  created timeuuid,
  place varchar,
  PRIMARY KEY (user_id, created))
  WITH CLUSTERING ORDER BY (created DESC);

then to add a new one to the front of the “list”

 INSERT INTO user_top_places (user_id, created, place) VALUES ('frodo', now(), 
'mordor’);

and you can see the last 10 entries

SELECT * FROM user_top_places WHERE user_id = 'frodo' LIMIT 10;

This will give you the last 10 entries (allows duplicates though).  Older 
records will still be around though and disk space could eventually become a 
problem for you.  If it becomes bad I would recommend using a periodic job like 
hadoop to remove excess columns (solely to save disk space).  Although if can 
afford the disk it would give better performance if just let it grow to a point 
(providing rows don’t get too large, i.e. 64mb).  If this isn’t very high in 
writes there might be some more clever things you can do...

If not having duplicates is more important then you can set “place” as your 
column name:

CREATE TABLE user_top_places (user_id varchar, place varchar, created 
timestamp, PRIMARY KEY (user_id, place));
INSERT INTO user_top_places (user_id, place, created) VALUES ('frodo', 
'mordor', dateof(now()));

but the results won’t be in order of latest inserted so might have to do some 
client side filtering to show the latest only using the created field.

---
Chris Lohfink

On Apr 22, 2014, at 1:51 AM, Jimmy Lin y2klyf+w...@gmail.com wrote:

 hi,
 look at the collection type support in cql3,
 e.g
 http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_list_t.html
 
 we can append or remove using + and - operator
 UPDATE users
   SET top_places = top_places + [ 'mordor' ] WHERE user_id = 'frodo';
 UPDATE users
   SET top_places = top_places - ['riddermark'] WHERE user_id = 'frodo';
 
 is there a way to keep a fixed size of the list(collection) ?
 I was thinking about using TTL to remove older data after certain time but 
 then the list will become too big if the ttl is too long, and if ttl is too 
 short I running the risk of having a empty list(if there is no new activity).
 
 Even if I don't use collection type and have my own table, I still ran into 
 the same issue.
 
 Any recommendation to handle this type of situation?
 
 thanks