[ https://issues.apache.org/jira/browse/CASSANDRA-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405054#comment-13405054 ]
Christian Spriegel edited comment on CASSANDRA-4304 at 7/6/12 11:29 AM: ------------------------------------------------------------------------ Hi Jonathan, I agree with you that operator limits are useful. Nevertheless my use-case (mobile devices asking for the next chunk of data) would benefit from a client-defined limit. This way, a call from a mobile could be handled in a single cassandra-request. There are probably many more areas where such a feature would be useful: - Hectors ColumnSliceIterator could use it. Currently it loads batches of 100 items. - CASSANDRA-4415 could use a bytes pagesize too internally - Cassandra could maybe use it internally too, e.g. HintedHandoffManager calculates a limit based on the average column size. This could be simply replaced by a batch size in bytes. - Anything that slices through blobs with variable sizes. (e.g. my blobs are varying between 100bytes and 5MB) I think this should be a pretty low hanging fruit, which can do very little damage. Imho the only ugly thing is that it requires a new attribute in the Thrift-Api/CQL/CLI. was (Author: christianmovi): Hi Jonathan, I agree with you that operator limits are useful. Nevertheless my use-case (mobile devices asking for the next chunk of data) would benefit from a client-defined limit. This way, a call from a mobile could be handled in a single cassandra-request. There are probably many more areas where such a feature would be useful: - Hectors ColumnSliceIterator could use it. Currently it loads batches of 100 items. - Cassandra could maybe use it internally too, e.g. HintedHandoffManager calculates a limit based on the average column size. This could be simply replaced by a batch size in bytes. - Anything that slices through blobs with variable sizes. (e.g. my blobs are varying between 100bytes and 5MB) I think this should be a pretty low hanging fruit, which can do very little damage. Imho the only ugly thing is that it requires a new attribute in the Thrift-Api/CQL/CLI. > Add bytes-limit clause to queries > --------------------------------- > > Key: CASSANDRA-4304 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4304 > Project: Cassandra > Issue Type: New Feature > Components: API, Core > Reporter: Christian Spriegel > Fix For: 1.2 > > Attachments: TestImplForSlices.patch > > > Idea is to add a second limit clause to (slice)queries. This would allow easy > loading of batches, even if content is variable sized. > Imagine the following use case: > You want to load a batch of XMLs, where each is between 100bytes and 5MB > large. > Currently you can load either > - a large number of XMLs, but risk OOMs or timeouts > or > - a small number of XMLs, and do too many queries where each query usually > retrieves very little data. > With cassandra being able to limit by size and not just count, we could do a > single query which would never OOM but always return a decent amount of data > -- with no extra overhead for multiple queries. > Few thoughts from my side: > - The limit should be a soft limit, not a hard limit. Therefore it will > always return at least one row/column, even if that one large than the limit > specifies. > - HintedHandoffManager:303 is already doing a > InMemoryCompactionLimit/averageColumnSize to avoid OOM. It could then simply > use the new limit clause :-) > - A bytes-limit on a range- or indexed-query should always return a complete > row -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira