[ 
https://issues.apache.org/jira/browse/CASSANDRA-4304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405054#comment-13405054
 ] 

Christian Spriegel edited comment on CASSANDRA-4304 at 7/6/12 11:29 AM:
------------------------------------------------------------------------

Hi Jonathan, I agree with you that operator limits are useful. Nevertheless my 
use-case (mobile devices asking for the next chunk of data) would benefit from 
a client-defined limit. This way, a call from a mobile could be handled in a 
single cassandra-request.

There are probably many more areas where such a feature would be useful:
- Hectors ColumnSliceIterator could use it. Currently it loads batches of 100 
items.
- CASSANDRA-4415 could use a bytes pagesize too internally
- Cassandra could maybe use it internally too, e.g. HintedHandoffManager 
calculates a limit based on the average column size. This could be simply 
replaced by a batch size in bytes.
- Anything that slices through blobs with variable sizes. (e.g. my blobs are 
varying between 100bytes and 5MB)

I think this should be a pretty low hanging fruit, which can do very little 
damage. Imho the only ugly thing is that it requires a new attribute in the 
Thrift-Api/CQL/CLI.

                
      was (Author: christianmovi):
    Hi Jonathan, I agree with you that operator limits are useful. Nevertheless 
my use-case (mobile devices asking for the next chunk of data) would benefit 
from a client-defined limit. This way, a call from a mobile could be handled in 
a single cassandra-request.

There are probably many more areas where such a feature would be useful:
- Hectors ColumnSliceIterator could use it. Currently it loads batches of 100 
items.
- Cassandra could maybe use it internally too, e.g. HintedHandoffManager 
calculates a limit based on the average column size. This could be simply 
replaced by a batch size in bytes.
- Anything that slices through blobs with variable sizes. (e.g. my blobs are 
varying between 100bytes and 5MB)

I think this should be a pretty low hanging fruit, which can do very little 
damage. Imho the only ugly thing is that it requires a new attribute in the 
Thrift-Api/CQL/CLI.

                  
> Add bytes-limit clause to queries
> ---------------------------------
>
>                 Key: CASSANDRA-4304
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4304
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Christian Spriegel
>             Fix For: 1.2
>
>         Attachments: TestImplForSlices.patch
>
>
> Idea is to add a second limit clause to (slice)queries. This would allow easy 
> loading of batches, even if content is variable sized.
> Imagine the following use case:
> You want to load a batch of XMLs, where each is between 100bytes and 5MB 
> large.
> Currently you can load either
> - a large number of XMLs, but risk OOMs or timeouts
> or
> - a small number of XMLs, and do too many queries where each query usually 
> retrieves very little data.
> With cassandra being able to limit by size and not just count, we could do a 
> single query which would never OOM but always return a decent amount of data 
> -- with no extra overhead for multiple queries.
> Few thoughts from my side:
> - The limit should be a soft limit, not a hard limit. Therefore it will 
> always return at least one row/column, even if that one large than the limit 
> specifies.
> - HintedHandoffManager:303 is already doing a 
> InMemoryCompactionLimit/averageColumnSize to avoid OOM. It could then simply 
> use the new limit clause :-)
> - A bytes-limit on a range- or indexed-query should always return a complete 
> row

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to