[ 
https://issues.apache.org/jira/browse/CASSANDRA-4478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504465#comment-13504465
 ] 

Sylvain Lebresne commented on CASSANDRA-4478:
---------------------------------------------

bq. What if instead we make index_interval be CQL3 rows instead of partitions?

I'm not sure I see much benefit of that over measuring it in bytes. Namely:
# that doesn't make tuning easier. What the index_interval represent is how 
much of the index file you will need to read at maximum to find the indexed 
block you are looking for. So it does fell like to me that having this size in 
bytes is *ideal*. In particular, even if CQL3 rows vary less in size than 
internal ones, they are still not constant in size depending on the table.
# it will be more complicated/less efficient to implement in practice with the 
current code because the index summary is built from the index file. But the 
index file doesn't have enough information currently to count cql3 rows.
# a cql3 row count might be fairly meaningless for thrift users. 
# currently we still have 2 nested level of indexing, the internal rows and 
inside that, the column index. They do are in the same file now, but they are 
not merged together. In that situation, I'm not really sure counting cql3 rows 
make any sense in fact (of course, we could merge the two level of indexing 
together, but that's not a small/simple patch while this ticket is more 
straightforward while still putting us in a situation this is probably good 
enough for a while). 
                
> Make index_interval be measured in kb (instead of number of keys)
> -----------------------------------------------------------------
>
>                 Key: CASSANDRA-4478
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4478
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: 4478-incomplete.txt
>
>
> Currently, index_interval is measured in number of keys: how may keys before 
> adding an entry to the index summary. After CASSANDRA-2319, each index entry 
> also contains the columns index for the row, so index entry can be a bit 
> bigger and of differing sizes. Measuring in number of keys is thus 
> sub-optimal and difficult to tune, since you might want a different setting 
> depending of whether your rows are big or small, but the setting is global.
> So we should move to measuring the interval in bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to