[jira] [Commented] (CASSANDRA-7280) Hadoop support not respecting cassandra.input.split.size

Alex Liu (JIRA) Tue, 14 Oct 2014 12:31:51 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14171391#comment-14171391
 ]


Alex Liu commented on CASSANDRA-7280:
-------------------------------------

cassandra.input.split.size is used to partition rows by partitioning key. It 
doesn't affect native paging. Native internal paging has a page size which can 
be set by "cassandra.input.page.row.size"

> Hadoop support not respecting cassandra.input.split.size
> --------------------------------------------------------
>
>                 Key: CASSANDRA-7280
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7280
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>            Reporter: Jeremy Hanna
>
> Long ago (0.7), I tried to set the cassandra.input.split.size property and 
> never really got it to respect that property.  However the batch size was 
> useful for what I needed to affect the timeouts.
> Now with the cql record reader and the native paging, users can specify 
> queries potentially using allow filtering clauses.  The input split size is 
> more important because the server may have to scan through many many records 
> to get matching records.  If the user can effectively set the input split 
> size, then that gives a hard limit on how many records it will traverse.
> Currently it appears to be overriding the property, perhaps in the 
> client.describe_splits_ex method on the server side.
> It can be argued that users shouldn't be using allow filtering clauses in 
> their cql in the first place.  However it is still a bug that the input split 
> size is not honored.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-7280) Hadoop support not respecting cassandra.input.split.size

Reply via email to