[jira] [Commented] (CASSANDRA-6432) Calculate estimated Cql row count per token range

Alex Liu (JIRA) Tue, 03 Dec 2013 15:08:00 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838301#comment-13838301
 ]


Alex Liu commented on CASSANDRA-6432:
-------------------------------------

SSTableMetadata.estimatedColumnCount collects column counts per SSTable, but 
there is no column counts per key, so we can't use the current statistics to 
calculate the columns per token range.

Same column can be distributed across multiple sstables, so we need merging the 
columns to count the unique columns which is not applicable. 

Select count(*) from cf scans all the rows, then it's not useful for big data.

> Calculate estimated Cql row count per token range
> -------------------------------------------------
>
>                 Key: CASSANDRA-6432
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6432
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Alex Liu
>
> CASSANDRA-6311 use the client side to calculate actual CF row count for 
> hadoop job. We need fix it by using Cql row count, which need estimated Cql 
> row count per token range.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (CASSANDRA-6432) Calculate estimated Cql row count per token range

Reply via email to