It isn’t natively supported but theres some things you can do if need it.
A lot depends on how frequently this list is getting updated. For heavier
workloads I would recommend using a custom CF for this instead of collections.
If extreme inserts you would want to add additional partitioning to it as well.
As mentioned below Id recommend having a cleanup MR job to periodically clean
it up if the cost of TTLs possibly leading to 0 entries is too expensive.
Putting it in its own CF helps in that it removes the elements of the list from
polluting your users partition. If there gets to be a lot of
tombstones/inserts this could make reading the user bad (it would look like
queue which has horrible performance) so it will at least section off that
badness from the regular user lookups.
CREATE TABLE user_top_places (
user_id varchar,
created timeuuid,
place varchar,
PRIMARY KEY (user_id, created))
WITH CLUSTERING ORDER BY (created DESC);
then to add a new one to the front of the “list”
INSERT INTO user_top_places (user_id, created, place) VALUES ('frodo', now(),
'mordor’);
and you can see the last 10 entries
SELECT * FROM user_top_places WHERE user_id = 'frodo' LIMIT 10;
This will give you the last 10 entries (allows duplicates though). Older
records will still be around though and disk space could eventually become a
problem for you. If it becomes bad I would recommend using a periodic job like
hadoop to remove excess columns (solely to save disk space). Although if can
afford the disk it would give better performance if just let it grow to a point
(providing rows don’t get too large, i.e. >64mb). If this isn’t very high in
writes there might be some more clever things you can do...
If not having duplicates is more important then you can set “place” as your
column name:
CREATE TABLE user_top_places (user_id varchar, place varchar, created
timestamp, PRIMARY KEY (user_id, place));
INSERT INTO user_top_places (user_id, place, created) VALUES ('frodo',
'mordor', dateof(now()));
but the results won’t be in order of latest inserted so might have to do some
client side filtering to show the latest only using the created field.
---
Chris Lohfink
On Apr 22, 2014, at 1:51 AM, Jimmy Lin wrote:
> hi,
> look at the collection type support in cql3,
> e.g
> http://www.datastax.com/documentation/cql/3.0/cql/cql_using/use_list_t.html
>
> we can append or remove using "+" and "-" operator
> UPDATE users
> SET top_places = top_places + [ 'mordor' ] WHERE user_id = 'frodo';
> UPDATE users
> SET top_places = top_places - ['riddermark'] WHERE user_id = 'frodo';
>
> is there a way to keep a fixed size of the list(collection) ?
> I was thinking about using TTL to remove older data after certain time but
> then the list will become too big if the ttl is too long, and if ttl is too
> short I running the risk of having a empty list(if there is no new activity).
>
> Even if I don't use collection type and have my own table, I still ran into
> the same issue.
>
> Any recommendation to handle this type of situation?
>
> thanks
>