Re: limiting columns in a row

2011-01-14 Thread Sylvain Lebresne
Hi,

> does this seem like a generally useful feature?

I do think this could be a useful feature. If only because I don't think
there
is any satisfactory/efficient way to do this client side.

> if so, would it be hard to implement (maybe it could be done at compaction
> time like the TTL feature)?

Out of the top of my hat (aka, I haven't really think that through but I'll
still give my opinion), I see the following difficulties:
  1) You can only do this limiting during major compaction or the same cases
 as CASSANDRA-1074 for minor, since you need to make sure the x columns
you
 are keeping are not deleted ones. Or you'll want to disable deletes
 altogether on the cf with this 'limit' option (I feel like this last
 option would really simplify things).
  2) Even if the removal of the column exceeding the limit is eventual (and
it
 will), you'll want query to only ever return column inside the limit
 (otherwise the feature would be too unpredictable). But I think this
will
 be quite challenging. That is, slice query from the start of the row
are
 easy. Everything else is harder (at least if you want to make it
efficient).

That was my 2 cents. Anyway, you can always open a JIRA ticket.

--
Sylvain


On Fri, Jan 14, 2011 at 7:38 AM, mike dooley  wrote:

> hi,
>
> the time-to-live feature in 0.7 is very nice and it made me want to ask
> about
> a somewhat similar feature.
>
> i have a stream of data consisting of entities and associated samples.  so
> i create
> a row for each entity and the columns in each row contain the samples for
> that entity.
> when i get around to processing  an entity i only care about the most
> recent N samples.
> so i read the most recent N columns and delete all the rest.
>
> what i would like would be a column family property that allows me to
> specify a maximum number of columns per row.  then i could just keep
> writing
> and not have to do the deletes.
>
> in my case it would be fine if the limit is only 'eventually' applied (so
> that
> sometimes there might be extra columns).
>
> does this seem like a generally useful feature?  if so, would it be hard to
> implement (maybe it could be done at compaction time like the TTL feature)?
>
> thanks,
> -mike


limiting columns in a row

2011-01-13 Thread mike dooley
hi,

the time-to-live feature in 0.7 is very nice and it made me want to ask about
a somewhat similar feature.  

i have a stream of data consisting of entities and associated samples.  so i 
create 
a row for each entity and the columns in each row contain the samples for that 
entity.  
when i get around to processing  an entity i only care about the most recent N 
samples. 
so i read the most recent N columns and delete all the rest.

what i would like would be a column family property that allows me to
specify a maximum number of columns per row.  then i could just keep writing
and not have to do the deletes.

in my case it would be fine if the limit is only 'eventually' applied (so that
sometimes there might be extra columns).

does this seem like a generally useful feature?  if so, would it be hard to
implement (maybe it could be done at compaction time like the TTL feature)?

thanks,
-mike