[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

Sylvain Lebresne (JIRA) Wed, 15 Jan 2014 02:06:06 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13871885#comment-13871885
 ]


Sylvain Lebresne commented on CASSANDRA-5357:
---------------------------------------------

bq. The case you describe of "wanting to cache a full table" is not dependent 
on rows per partition but on cache size = number of partitions cached

But if you don't want to cache a full table, you still at least need to make 
sure that for each partition, all rows are cached. You still need "rows per 
partition = <n> where <n> > max number of rows per partition in that table" and 
all I'm saying is that "rows per partition = all" is a bit more user friendly.  
It's true you also need to make sure you cache is big enough if you want to 
cache the table in full but that doesn't invalidate the first part (unless I'm 
missing something).

bq. We're talking about static CFs aka partition key == primary key, right? 
Then there is one row per partition, so there is no need for a special "rows 
per partition = all" setting.

I guess I'm saying 2 things:
# I think that what user sometimes really want is "cache full partitions". 
That's the basic intention.  So what's the harm of adding a "all" alias that 
express that intention better for user friendliness sake, provided adding that 
don't require noticeable complexity? And given "all" can just be an alias for 
Integer.MAX_VALUE, it doesn't add complexity so ...
# It's somewhat a detail, but I don't think that technically "rows per 
partition = 1" will work equivalently to the current row cache behavior for 
static table in practice, not always at least.  More precisely, suppose you get 
a query "select * from foo where pk=3", that "pk=3" is a cache hit and that 
"rows_per_partition=1" on that table. Then, you can only serve the read from 
the cache hit if you know *for sure* that this is a static table, i.e. that 
there cannot be more rows in that partition that haven't been cache due to the 
per-partition limitation.  And, at least for thrift, we never really know for 
sure if a table is a static one.  I do note that "rows_per_partition=2" would 
work, because if your cache hit has 1 row and you know you cache the 2 first 
rows of the partition, then you can infer all rows of the partition are cached 
without any more info, but at that point, I think it's a lot simpler to have a 
"all" alias than to have to explain those implementation details.

Not saying it's a big deal, just that I think it's user friendly and has not 
real downside that I can see.


> Query cache / partition head cache
> ----------------------------------
>
>                 Key: CASSANDRA-5357
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Marcus Eriksson
>             Fix For: 2.1
>
>         Attachments: 0001-Cache-a-configurable-amount-of-columns.patch
>
>
> I think that most people expect the row cache to act like a query cache, 
> because that's a reasonable model.  Caching the entire partition is, in 
> retrospect, not really reasonable, so it's not surprising that it catches 
> people off guard, especially given the confusion we've inflicted on ourselves 
> as to what a "row" constitutes.
> I propose replacing it with a true query cache.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-5357) Query cache / partition head cache

Reply via email to