Issue with pagination using paging state

2023-09-08 Thread Ritesh Kumar
Hello Team,I am trying to achieve pagination in Cassandra using paging state mechanism.I am trying to paginate through records with LIMIT set to 250 and fetch size set to 50. I get the paging state for the next set of 50 records up until the 250 records are retrieved but how can I paginate further 250 records. If I leave out the LIMIT and simply use the fetch size, I am able to paginate through all the records. I want to understand if leaving out LIMIT causes negative performance impact.Doesn’t  fetch size and limit have similar functionalities? Please help.Regards,Ritesh Kumar  Sent from Mail for Windows 


Re: Application level pagination in Cassandra

2020-07-30 Thread Erick Ramirez
I don't have experience with ScalarDB so I don't have an answer for you. I
have to admit I've been here for a bit and ScalarDB doesn't really come up
in discussion. So if you don't get a response, I would recommend you post
your question in the ScalarDB community as well. Cheers!

>


Application level pagination in Cassandra

2020-07-30 Thread Manu Chadha
Hi

This question is part-Cassandra and part ScalarDB. I am using ScalarDB which 
provide ACID support on top of `Cassandra`. The library seem to be working 
well! Unfortunately, ScalarDB doesn't support pagination though so I have to 
implement it in the application.

Consider this scenario in which `P` is primary key, `C` is clustering key and 
`E` is other data within the partition

Partition => { P,C1,E1
P,C2,E1
P,C2,E2
P,C2,E3
P,C2,E4
P,C3,E1
...
P,Cm,En
}

In ScalarDB, I can specify start and end values of keys so I suppose ScalarDB 
will get data only from the specified rows. I can also limit the no. of entries 
fetched.

https://scalar-labs.github.io/scalardb/javadoc/com/scalar/db/api/Scan.html

Say I want to get entries `E3` and `E4` from `P,C2`. For smaller values, I can 
specify start and end clustering keys as C2 and set fetch limit to say 4 and 
ignore `E1` and `E2`. But if there are several hundred records then this method 
will not scale.

For example say `P,C1` has 10 records, `P,C2` has 100 records and I want to 
implement pagination of 20 records per query. Then to implement this, I'll have 
to
Query 1 – Scan – primary key will be P, clustering start will be C1, clustering 
end will be Cn as I don’t know how many records are there.
- get `P,C1`. This will give 10 records
- get `P,C2`. This will give me 20 records. I'll ignore last 10 and combine 
`P,C1`'s 10 with `P,C2`'s first 10 and return the result.

I'll also have to maintain that the last cluster key queried was `C2` and also 
that 10 records were fetched from it.

Query 2 (for next pagination request) - Scan – primary key will be P, 
clustering start will be C2, clustering end will be Cn as I don’t know how many 
records are there.
Now I'll fetch `P,C2` and get 20, ignore 1st 10 (as they were sent last time), 
take the remaining 10, do another fetch using same Scan and take first 10 from 
that.

Is this how it should be done or is there a better way? My concern with above 
implementation is that every time I'll have to fetch loads of records and dump 
them. For example, say I want to get records 70-90 from `P,C2` then I'll  still 
query up to record 60 and dump the result!

Thanks
Manu

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10



what happened to a pagination if some data are inserted before it get resumed ?

2019-10-31 Thread jagernicolas
Hi,

 what would happen If between the moment I save a paging state and the moment I 
resume it, some data have been added to the database ?

for example, let say I do a query which return 100 results paged by 10 rows. I 
get my first page, i.e, my first 10 elements.
Then, let say that before I ask to get the second page, some data were added to 
what was my first page, some to the second etc.

what will I see when I resume the pagination ? will I get the results as if 
nothing was added to the database, or am I going to see on my second page some 
results that was pushed out of the first page ?
in my case we using the python driver our code use the same functions that the 
following example, class Items(Model):
 id = columns.Text(primary_key=True)
 data = columns.Bytes()

query = Items.objects.all().limit(10)

first_page = list(query);
last = first_page[-1]
next_page = list(query.filter(pk__token__gt=cqlengine.Token(last.pk))) source: 
https://docs.datastax.com/en/developer/python-driver/3.20/cqlengine/queryset/ 
(https://docs.datastax.com/en/developer/python-driver/3.20/cqlengine/queryset/)

there is another way to store the pagination than storing the token ? (if I 
showing the example and asking that it's because I have the feeling there is 
two ways to use the python driver. the one using function like filter, and 
another one where we send a query as we would have written in cqlsh)

regards,
Nicolas Jäger


Re: Cassandra Driver Pagination

2018-04-25 Thread Andy Tolbert
Hi Ahmed,

It does not, it only reads enough rows to satisfy the clients request.
Although, that may be a bit of an oversimplification as it has to scan
through sstable files, read indices, pass over tombstones and so on, but it
will stop reading new rows once it has read the number of rows the driver
has requested.   If you have a really wide partition or a lot of tombstones
in the partition, you may find your query performance being slow in general
when reading rows from that partition.

Thanks,
Andy

On Wed, Apr 25, 2018 at 3:08 AM, Ahmed Eljami 
wrote:

> Hi Andy,
>
> Thanks.
>
> When the driver requests X rows, C* will load the whole partition (All
> rows) before reply to driver ?
>
> Thanls.
>
> 2018-04-24 18:11 GMT+02:00 Andy Tolbert :
>
>> Hi Ahmed,
>>
>> The java driver docs do a good job explaining how the driver uses paging,
>> including providing a sequence diagram that describes the flow of the
>> process:  https://docs.datastax.com/en/developer/java-
>> driver/3.5/manual/paging/
>>
>> The driver requests X rows (5000 by default, controlled via
>> QueryOptions.setFetchSize
>> )
>> at a time.  When C* replies, it returns a 'paging state' id which
>> identifies where in the result set (partition and clustering key) to
>> continue retrieving the next set of rows.  When you continue iterating over
>> the result set in the java driver and hit the end of the current page, it
>> will send another request to C* using that paging state to get the next set
>> of rows.
>>
>> Thanks,
>> Andy
>>
>> On Tue, Apr 24, 2018 at 9:49 AM, Ahmed Eljami 
>> wrote:
>>
>>> Hello,
>>>
>>> Can someone explain me how paging is implemented ?
>>>
>>> according to the doc of datastax, the goal  being to avoid loading much
>>> results in memory.
>>>
>>> Does it mean that the whole partition is not upload to heap memory?
>>>
>>>
>>> ​C* version: 2.1
>>>
>>> Java Driver version: 3.0
>>>
>>> ​Best regards​
>>>
>>>
>>
>
>
> --
> Cordialement;
>
> Ahmed ELJAMI
>


Re: Cassandra Driver Pagination

2018-04-25 Thread Ahmed Eljami
Hi Andy,

Thanks.

When the driver requests X rows, C* will load the whole partition (All
rows) before reply to driver ?

Thanls.

2018-04-24 18:11 GMT+02:00 Andy Tolbert :

> Hi Ahmed,
>
> The java driver docs do a good job explaining how the driver uses paging,
> including providing a sequence diagram that describes the flow of the
> process:  https://docs.datastax.com/en/developer/
> java-driver/3.5/manual/paging/
>
> The driver requests X rows (5000 by default, controlled via
> QueryOptions.setFetchSize
> )
> at a time.  When C* replies, it returns a 'paging state' id which
> identifies where in the result set (partition and clustering key) to
> continue retrieving the next set of rows.  When you continue iterating over
> the result set in the java driver and hit the end of the current page, it
> will send another request to C* using that paging state to get the next set
> of rows.
>
> Thanks,
> Andy
>
> On Tue, Apr 24, 2018 at 9:49 AM, Ahmed Eljami 
> wrote:
>
>> Hello,
>>
>> Can someone explain me how paging is implemented ?
>>
>> according to the doc of datastax, the goal  being to avoid loading much
>> results in memory.
>>
>> Does it mean that the whole partition is not upload to heap memory?
>>
>>
>> ​C* version: 2.1
>>
>> Java Driver version: 3.0
>>
>> ​Best regards​
>>
>>
>


-- 
Cordialement;

Ahmed ELJAMI


Re: Cassandra Driver Pagination

2018-04-24 Thread Andy Tolbert
Hi Ahmed,

The java driver docs do a good job explaining how the driver uses paging,
including providing a sequence diagram that describes the flow of the
process:
https://docs.datastax.com/en/developer/java-driver/3.5/manual/paging/

The driver requests X rows (5000 by default, controlled via
QueryOptions.setFetchSize
)
at a time.  When C* replies, it returns a 'paging state' id which
identifies where in the result set (partition and clustering key) to
continue retrieving the next set of rows.  When you continue iterating over
the result set in the java driver and hit the end of the current page, it
will send another request to C* using that paging state to get the next set
of rows.

Thanks,
Andy

On Tue, Apr 24, 2018 at 9:49 AM, Ahmed Eljami 
wrote:

> Hello,
>
> Can someone explain me how paging is implemented ?
>
> according to the doc of datastax, the goal  being to avoid loading much
> results in memory.
>
> Does it mean that the whole partition is not upload to heap memory?
>
>
> ​C* version: 2.1
>
> Java Driver version: 3.0
>
> ​Best regards​
>
>


Cassandra Driver Pagination

2018-04-24 Thread Ahmed Eljami
Hello,

Can someone explain me how paging is implemented ?

according to the doc of datastax, the goal  being to avoid loading much
results in memory.

Does it mean that the whole partition is not upload to heap memory?


​C* version: 2.1

Java Driver version: 3.0

​Best regards​


Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

2017-10-07 Thread Daniel Hölbling-Inzko
I now finished a implementation where I just save the pagination state to a
separate table and retrieve it to get to the next page.

So far it seems to work pretty well. But I have to do more thorough
testing.

Greetings.
On Wed 4. Oct 2017 at 19:42, Jon Haddad  wrote:

> Seems pretty overengineered, imo, given you can just save the pagination
> state as Andy Tolbert pointed out.
>
> On Oct 4, 2017, at 8:38 AM, Daniel Hölbling-Inzko <
> daniel.hoelbling-in...@bitmovin.com> wrote:
>
> Thanks for pointing me to Elassandra.
> Have you had any experience running this in production at scale? Not sure
> if I
>
> I think ES will enter the picture at some point since some things just
> don't work efficiently with Cassandra and so it's inevitable in the end.
> But I'd rather delay that step for as long as possible since it would add
> a lot of complexity and another layer of eventual consistency I'd rather
> not deal with at the moment :)
>
> greetings Daniel
>
> On Wed, 4 Oct 2017 at 08:36 Greg Saylor  wrote:
>
>> Without knowing other details, of course, have you considered using
>> something like Elassandra?  That is a pretty tightly integrated Cassandra +
>> Elastic Search solution.   You’d insert data into Cassandra like you do
>> normally, then query it with Elastic Search.  Of course this would increase
>> the size of your storage requirements.
>>
>> - Greg
>>
>>
>> On Oct 3, 2017, at 11:10 PM, Daniel Hölbling-Inzko <
>> daniel.hoelbling-in...@bitmovin.com> wrote:
>>
>> Thanks Kurt,
>> I thought about that but one issue is that we are doing limit/offset not
>> pages. So one customer can choose to page through the list in 10 Item
>> increments, another might want to page through with 100 elements per page.
>> So I can't have a clustering key that represents a page range.
>>
>> What I was thinking about doing was saving the paginationState in a
>> separate table along with limit/offset info of the last query the
>> paginationState originated from so I can use the last paginationState to
>> continue the iteration from if the customer requests the next page with the
>> same limit but a different offset.
>> This breaks down if the customer does a cold offset=1000 request but
>> that's something I can throw error messages for at, what I do need to
>> support is a customer doing
>> Request 1: offset=0 + limit=100
>> Request 2: offset=100 + limit=100
>> Request 3: offset=200 + limit=100
>>
>> So next question would be: How long is the paginationState from the
>> driver current? I was thinking about inserting the paginationState with a
>> TTL into another Cassandra table - not sure if that's smart though.
>>
>> greetings Daniel
>>
>> On Tue, 3 Oct 2017 at 12:20 kurt greaves  wrote:
>>
>>> I get the impression that you are paging through a single partition in
>>> Cassandra? If so you should probably use bounds on clustering keys to get
>>> your "next page". You could use LIMIT as well here but it's mostly
>>> unnecessary. Probably just use the pagesize that you intend for the API.
>>>
>>> Yes you'll need a table for each sort order, which ties into how you
>>> would use clustering keys for LIMIT/OFFSET. Essentially just do range
>>> slices on the clustering keys for each table to get your "pages".
>>>
>>> Also I'm assuming there's a lot of data per partition if in-mem sorting
>>> isn't an option, if this is true you will want to be wary of creating large
>>> partitions and reading them all at once. Although this depends on your data
>>> model and compaction strategy choices.
>>>
>>> On 3 October 2017 at 08:36, Daniel Hölbling-Inzko <
>>> daniel.hoelbling-in...@bitmovin.com> wrote:
>>>
>>>> Hi,
>>>> I am currently working on migrating a service that so far was MySQL
>>>> based to Cassandra.
>>>> Everything seems to work fine so far, but a few things in the old
>>>> services API Spec is posing some interesting data modeling challenges:
>>>>
>>>> The old service was doing Limit/Offset pagination which is obviously
>>>> something Cassandra can't really do. I understand how paginationState works
>>>> - but so far I haven't figured out a good way to make Limit/Offset work on
>>>> top of paginationState (as I need to be 100% backwards compatible).
>>>> The only ways which I could think of to make Limit/Offset work would
>>>> create scalability issues down the road.
>>>>
>>>> The old service allowed sorting by any field. If I understood correctly
>>>> that would require a table for each sort order right? (In-Mem sorting is
>>>> not an option unfortunately)
>>>> In doing so, how can I make the Java Datastax mapper save to another
>>>> table (I really don't want to be writing a Subclass of the Entity for each
>>>> Table to add the @Table annotation.
>>>>
>>>> greetings Daniel
>>>>
>>>
>>>
>>
>


Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

2017-10-04 Thread Jon Haddad
Seems pretty overengineered, imo, given you can just save the pagination state 
as Andy Tolbert pointed out.

> On Oct 4, 2017, at 8:38 AM, Daniel Hölbling-Inzko 
>  wrote:
> 
> Thanks for pointing me to Elassandra.
> Have you had any experience running this in production at scale? Not sure if 
> I 
> 
> I think ES will enter the picture at some point since some things just don't 
> work efficiently with Cassandra and so it's inevitable in the end.
> But I'd rather delay that step for as long as possible since it would add a 
> lot of complexity and another layer of eventual consistency I'd rather not 
> deal with at the moment :)
> 
> greetings Daniel
> 
> On Wed, 4 Oct 2017 at 08:36 Greg Saylor  <mailto:gr...@net-virtual.com>> wrote:
> Without knowing other details, of course, have you considered using something 
> like Elassandra?  That is a pretty tightly integrated Cassandra + Elastic 
> Search solution.   You’d insert data into Cassandra like you do normally, 
> then query it with Elastic Search.  Of course this would increase the size of 
> your storage requirements.
> 
> - Greg
> 
> 
>> On Oct 3, 2017, at 11:10 PM, Daniel Hölbling-Inzko 
>> > <mailto:daniel.hoelbling-in...@bitmovin.com>> wrote:
>> 
>> Thanks Kurt,
>> I thought about that but one issue is that we are doing limit/offset not 
>> pages. So one customer can choose to page through the list in 10 Item 
>> increments, another might want to page through with 100 elements per page. 
>> So I can't have a clustering key that represents a page range.
>> 
>> What I was thinking about doing was saving the paginationState in a separate 
>> table along with limit/offset info of the last query the paginationState 
>> originated from so I can use the last paginationState to continue the 
>> iteration from if the customer requests the next page with the same limit 
>> but a different offset.
>> This breaks down if the customer does a cold offset=1000 request but that's 
>> something I can throw error messages for at, what I do need to support is a 
>> customer doing
>> Request 1: offset=0 + limit=100
>> Request 2: offset=100 + limit=100
>> Request 3: offset=200 + limit=100
>> 
>> So next question would be: How long is the paginationState from the driver 
>> current? I was thinking about inserting the paginationState with a TTL into 
>> another Cassandra table - not sure if that's smart though.
>> 
>> greetings Daniel
>> 
>> On Tue, 3 Oct 2017 at 12:20 kurt greaves > <mailto:k...@instaclustr.com>> wrote:
>> I get the impression that you are paging through a single partition in 
>> Cassandra? If so you should probably use bounds on clustering keys to get 
>> your "next page". You could use LIMIT as well here but it's mostly 
>> unnecessary. Probably just use the pagesize that you intend for the API. 
>> 
>> Yes you'll need a table for each sort order, which ties into how you would 
>> use clustering keys for LIMIT/OFFSET. Essentially just do range slices on 
>> the clustering keys for each table to get your "pages".
>> 
>> Also I'm assuming there's a lot of data per partition if in-mem sorting 
>> isn't an option, if this is true you will want to be wary of creating large 
>> partitions and reading them all at once. Although this depends on your data 
>> model and compaction strategy choices.
>> 
>> On 3 October 2017 at 08:36, Daniel Hölbling-Inzko 
>> > <mailto:daniel.hoelbling-in...@bitmovin.com>> wrote:
>> Hi,
>> I am currently working on migrating a service that so far was MySQL based to 
>> Cassandra.
>> Everything seems to work fine so far, but a few things in the old services 
>> API Spec is posing some interesting data modeling challenges:
>> 
>> The old service was doing Limit/Offset pagination which is obviously 
>> something Cassandra can't really do. I understand how paginationState works 
>> - but so far I haven't figured out a good way to make Limit/Offset work on 
>> top of paginationState (as I need to be 100% backwards compatible).
>> The only ways which I could think of to make Limit/Offset work would create 
>> scalability issues down the road.
>> 
>> The old service allowed sorting by any field. If I understood correctly that 
>> would require a table for each sort order right? (In-Mem sorting is not an 
>> option unfortunately)
>> In doing so, how can I make the Java Datastax mapper save to another table 
>> (I really don't want to be writing a Subclass of the Entity for each Table 
>> to add the @Table annotation.
>> 
>> greetings Daniel
>> 
> 



Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

2017-10-04 Thread Greg Saylor
Yes we’re using it in production in a 22 node cluster across 4 Amazon regions 
in a large production system.  We were using DSE but recently migrated to it.  
There are a few quirks, (copy_to isn’t supported, for example), but so far been 
pretty pleased with it.

- Greg

> On Oct 4, 2017, at 8:38 AM, Daniel Hölbling-Inzko 
>  wrote:
> 
> Thanks for pointing me to Elassandra.
> Have you had any experience running this in production at scale? Not sure if 
> I 
> 
> I think ES will enter the picture at some point since some things just don't 
> work efficiently with Cassandra and so it's inevitable in the end.
> But I'd rather delay that step for as long as possible since it would add a 
> lot of complexity and another layer of eventual consistency I'd rather not 
> deal with at the moment :)
> 
> greetings Daniel
> 
>> On Wed, 4 Oct 2017 at 08:36 Greg Saylor  wrote:
>> Without knowing other details, of course, have you considered using 
>> something like Elassandra?  That is a pretty tightly integrated Cassandra + 
>> Elastic Search solution.   You’d insert data into Cassandra like you do 
>> normally, then query it with Elastic Search.  Of course this would increase 
>> the size of your storage requirements.
>> 
>> - Greg
>> 
>> 
>>> On Oct 3, 2017, at 11:10 PM, Daniel Hölbling-Inzko 
>>>  wrote:
>>> 
>>> Thanks Kurt,
>>> I thought about that but one issue is that we are doing limit/offset not 
>>> pages. So one customer can choose to page through the list in 10 Item 
>>> increments, another might want to page through with 100 elements per page. 
>>> So I can't have a clustering key that represents a page range.
>>> 
>>> What I was thinking about doing was saving the paginationState in a 
>>> separate table along with limit/offset info of the last query the 
>>> paginationState originated from so I can use the last paginationState to 
>>> continue the iteration from if the customer requests the next page with the 
>>> same limit but a different offset.
>>> This breaks down if the customer does a cold offset=1000 request but that's 
>>> something I can throw error messages for at, what I do need to support is a 
>>> customer doing
>>> Request 1: offset=0 + limit=100
>>> Request 2: offset=100 + limit=100
>>> Request 3: offset=200 + limit=100
>>> 
>>> So next question would be: How long is the paginationState from the driver 
>>> current? I was thinking about inserting the paginationState with a TTL into 
>>> another Cassandra table - not sure if that's smart though.
>>> 
>>> greetings Daniel
>>> 
>>>> On Tue, 3 Oct 2017 at 12:20 kurt greaves  wrote:
>>>> I get the impression that you are paging through a single partition in 
>>>> Cassandra? If so you should probably use bounds on clustering keys to get 
>>>> your "next page". You could use LIMIT as well here but it's mostly 
>>>> unnecessary. Probably just use the pagesize that you intend for the API. 
>>>> 
>>>> Yes you'll need a table for each sort order, which ties into how you would 
>>>> use clustering keys for LIMIT/OFFSET. Essentially just do range slices on 
>>>> the clustering keys for each table to get your "pages".
>>>> 
>>>> Also I'm assuming there's a lot of data per partition if in-mem sorting 
>>>> isn't an option, if this is true you will want to be wary of creating 
>>>> large partitions and reading them all at once. Although this depends on 
>>>> your data model and compaction strategy choices.
>>>> 
>>>>> On 3 October 2017 at 08:36, Daniel Hölbling-Inzko 
>>>>>  wrote:
>>>>> Hi,
>>>>> I am currently working on migrating a service that so far was MySQL based 
>>>>> to Cassandra.
>>>>> Everything seems to work fine so far, but a few things in the old 
>>>>> services API Spec is posing some interesting data modeling challenges:
>>>>> 
>>>>> The old service was doing Limit/Offset pagination which is obviously 
>>>>> something Cassandra can't really do. I understand how paginationState 
>>>>> works - but so far I haven't figured out a good way to make Limit/Offset 
>>>>> work on top of paginationState (as I need to be 100% backwards 
>>>>> compatible).
>>>>> The only ways which I could think of to make Limit/Offset work would 
>>>>> create scalability issues down the road.
>>>>> 
>>>>> The old service allowed sorting by any field. If I understood correctly 
>>>>> that would require a table for each sort order right? (In-Mem sorting is 
>>>>> not an option unfortunately)
>>>>> In doing so, how can I make the Java Datastax mapper save to another 
>>>>> table (I really don't want to be writing a Subclass of the Entity for 
>>>>> each Table to add the @Table annotation.
>>>>> 
>>>>> greetings Daniel
>>>> 
>> 


Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

2017-10-04 Thread Daniel Hölbling-Inzko
Thanks for pointing me to Elassandra.
Have you had any experience running this in production at scale? Not sure
if I

I think ES will enter the picture at some point since some things just
don't work efficiently with Cassandra and so it's inevitable in the end.
But I'd rather delay that step for as long as possible since it would add a
lot of complexity and another layer of eventual consistency I'd rather not
deal with at the moment :)

greetings Daniel

On Wed, 4 Oct 2017 at 08:36 Greg Saylor  wrote:

> Without knowing other details, of course, have you considered using
> something like Elassandra?  That is a pretty tightly integrated Cassandra +
> Elastic Search solution.   You’d insert data into Cassandra like you do
> normally, then query it with Elastic Search.  Of course this would increase
> the size of your storage requirements.
>
> - Greg
>
>
> On Oct 3, 2017, at 11:10 PM, Daniel Hölbling-Inzko <
> daniel.hoelbling-in...@bitmovin.com> wrote:
>
> Thanks Kurt,
> I thought about that but one issue is that we are doing limit/offset not
> pages. So one customer can choose to page through the list in 10 Item
> increments, another might want to page through with 100 elements per page.
> So I can't have a clustering key that represents a page range.
>
> What I was thinking about doing was saving the paginationState in a
> separate table along with limit/offset info of the last query the
> paginationState originated from so I can use the last paginationState to
> continue the iteration from if the customer requests the next page with the
> same limit but a different offset.
> This breaks down if the customer does a cold offset=1000 request but
> that's something I can throw error messages for at, what I do need to
> support is a customer doing
> Request 1: offset=0 + limit=100
> Request 2: offset=100 + limit=100
> Request 3: offset=200 + limit=100
>
> So next question would be: How long is the paginationState from the driver
> current? I was thinking about inserting the paginationState with a TTL into
> another Cassandra table - not sure if that's smart though.
>
> greetings Daniel
>
> On Tue, 3 Oct 2017 at 12:20 kurt greaves  wrote:
>
>> I get the impression that you are paging through a single partition in
>> Cassandra? If so you should probably use bounds on clustering keys to get
>> your "next page". You could use LIMIT as well here but it's mostly
>> unnecessary. Probably just use the pagesize that you intend for the API.
>>
>> Yes you'll need a table for each sort order, which ties into how you
>> would use clustering keys for LIMIT/OFFSET. Essentially just do range
>> slices on the clustering keys for each table to get your "pages".
>>
>> Also I'm assuming there's a lot of data per partition if in-mem sorting
>> isn't an option, if this is true you will want to be wary of creating large
>> partitions and reading them all at once. Although this depends on your data
>> model and compaction strategy choices.
>>
>> On 3 October 2017 at 08:36, Daniel Hölbling-Inzko <
>> daniel.hoelbling-in...@bitmovin.com> wrote:
>>
>>> Hi,
>>> I am currently working on migrating a service that so far was MySQL
>>> based to Cassandra.
>>> Everything seems to work fine so far, but a few things in the old
>>> services API Spec is posing some interesting data modeling challenges:
>>>
>>> The old service was doing Limit/Offset pagination which is obviously
>>> something Cassandra can't really do. I understand how paginationState works
>>> - but so far I haven't figured out a good way to make Limit/Offset work on
>>> top of paginationState (as I need to be 100% backwards compatible).
>>> The only ways which I could think of to make Limit/Offset work would
>>> create scalability issues down the road.
>>>
>>> The old service allowed sorting by any field. If I understood correctly
>>> that would require a table for each sort order right? (In-Mem sorting is
>>> not an option unfortunately)
>>> In doing so, how can I make the Java Datastax mapper save to another
>>> table (I really don't want to be writing a Subclass of the Entity for each
>>> Table to add the @Table annotation.
>>>
>>> greetings Daniel
>>>
>>
>>
>


Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

2017-10-04 Thread Andy Tolbert
Hi Daniel,

To answer this question:

> How long is the paginationState from the driver current?

The paging state itself contains information about the position in data
where to proceed from, so you don't need to worry about it becoming
stale/invalid.  The only exception is if you upgrade your cluster and start
using a newer protocol version, at which point the paging state will likely
become invalid.  The java driver guide has an explanation of saving and
reusing the paging state
<http://docs.datastax.com/en/developer/java-driver/3.3/manual/paging/#saving-and-reusing-the-paging-state>
that explains this.

Thanks,
Andy

On Wed, Oct 4, 2017 at 1:36 AM Greg Saylor  wrote:

> Without knowing other details, of course, have you considered using
> something like Elassandra?  That is a pretty tightly integrated Cassandra +
> Elastic Search solution.   You’d insert data into Cassandra like you do
> normally, then query it with Elastic Search.  Of course this would increase
> the size of your storage requirements.
>
> - Greg
>
>
> On Oct 3, 2017, at 11:10 PM, Daniel Hölbling-Inzko <
> daniel.hoelbling-in...@bitmovin.com> wrote:
>
> Thanks Kurt,
> I thought about that but one issue is that we are doing limit/offset not
> pages. So one customer can choose to page through the list in 10 Item
> increments, another might want to page through with 100 elements per page.
> So I can't have a clustering key that represents a page range.
>
> What I was thinking about doing was saving the paginationState in a
> separate table along with limit/offset info of the last query the
> paginationState originated from so I can use the last paginationState to
> continue the iteration from if the customer requests the next page with the
> same limit but a different offset.
> This breaks down if the customer does a cold offset=1000 request but
> that's something I can throw error messages for at, what I do need to
> support is a customer doing
> Request 1: offset=0 + limit=100
> Request 2: offset=100 + limit=100
> Request 3: offset=200 + limit=100
>
> So next question would be: How long is the paginationState from the driver
> current? I was thinking about inserting the paginationState with a TTL into
> another Cassandra table - not sure if that's smart though.
>
> greetings Daniel
>
> On Tue, 3 Oct 2017 at 12:20 kurt greaves  wrote:
>
>> I get the impression that you are paging through a single partition in
>> Cassandra? If so you should probably use bounds on clustering keys to get
>> your "next page". You could use LIMIT as well here but it's mostly
>> unnecessary. Probably just use the pagesize that you intend for the API.
>>
>> Yes you'll need a table for each sort order, which ties into how you
>> would use clustering keys for LIMIT/OFFSET. Essentially just do range
>> slices on the clustering keys for each table to get your "pages".
>>
>> Also I'm assuming there's a lot of data per partition if in-mem sorting
>> isn't an option, if this is true you will want to be wary of creating large
>> partitions and reading them all at once. Although this depends on your data
>> model and compaction strategy choices.
>>
>> On 3 October 2017 at 08:36, Daniel Hölbling-Inzko <
>> daniel.hoelbling-in...@bitmovin.com> wrote:
>>
>>> Hi,
>>> I am currently working on migrating a service that so far was MySQL
>>> based to Cassandra.
>>> Everything seems to work fine so far, but a few things in the old
>>> services API Spec is posing some interesting data modeling challenges:
>>>
>>> The old service was doing Limit/Offset pagination which is obviously
>>> something Cassandra can't really do. I understand how paginationState works
>>> - but so far I haven't figured out a good way to make Limit/Offset work on
>>> top of paginationState (as I need to be 100% backwards compatible).
>>> The only ways which I could think of to make Limit/Offset work would
>>> create scalability issues down the road.
>>>
>>> The old service allowed sorting by any field. If I understood correctly
>>> that would require a table for each sort order right? (In-Mem sorting is
>>> not an option unfortunately)
>>> In doing so, how can I make the Java Datastax mapper save to another
>>> table (I really don't want to be writing a Subclass of the Entity for each
>>> Table to add the @Table annotation.
>>>
>>> greetings Daniel
>>>
>>
>>
>


Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

2017-10-03 Thread Greg Saylor
Without knowing other details, of course, have you considered using something 
like Elassandra?  That is a pretty tightly integrated Cassandra + Elastic 
Search solution.   You’d insert data into Cassandra like you do normally, then 
query it with Elastic Search.  Of course this would increase the size of your 
storage requirements.

- Greg


> On Oct 3, 2017, at 11:10 PM, Daniel Hölbling-Inzko 
>  wrote:
> 
> Thanks Kurt,
> I thought about that but one issue is that we are doing limit/offset not 
> pages. So one customer can choose to page through the list in 10 Item 
> increments, another might want to page through with 100 elements per page. So 
> I can't have a clustering key that represents a page range.
> 
> What I was thinking about doing was saving the paginationState in a separate 
> table along with limit/offset info of the last query the paginationState 
> originated from so I can use the last paginationState to continue the 
> iteration from if the customer requests the next page with the same limit but 
> a different offset.
> This breaks down if the customer does a cold offset=1000 request but that's 
> something I can throw error messages for at, what I do need to support is a 
> customer doing
> Request 1: offset=0 + limit=100
> Request 2: offset=100 + limit=100
> Request 3: offset=200 + limit=100
> 
> So next question would be: How long is the paginationState from the driver 
> current? I was thinking about inserting the paginationState with a TTL into 
> another Cassandra table - not sure if that's smart though.
> 
> greetings Daniel
> 
> On Tue, 3 Oct 2017 at 12:20 kurt greaves  <mailto:k...@instaclustr.com>> wrote:
> I get the impression that you are paging through a single partition in 
> Cassandra? If so you should probably use bounds on clustering keys to get 
> your "next page". You could use LIMIT as well here but it's mostly 
> unnecessary. Probably just use the pagesize that you intend for the API. 
> 
> Yes you'll need a table for each sort order, which ties into how you would 
> use clustering keys for LIMIT/OFFSET. Essentially just do range slices on the 
> clustering keys for each table to get your "pages".
> 
> Also I'm assuming there's a lot of data per partition if in-mem sorting isn't 
> an option, if this is true you will want to be wary of creating large 
> partitions and reading them all at once. Although this depends on your data 
> model and compaction strategy choices.
> 
> On 3 October 2017 at 08:36, Daniel Hölbling-Inzko 
>  <mailto:daniel.hoelbling-in...@bitmovin.com>> wrote:
> Hi,
> I am currently working on migrating a service that so far was MySQL based to 
> Cassandra.
> Everything seems to work fine so far, but a few things in the old services 
> API Spec is posing some interesting data modeling challenges:
> 
> The old service was doing Limit/Offset pagination which is obviously 
> something Cassandra can't really do. I understand how paginationState works - 
> but so far I haven't figured out a good way to make Limit/Offset work on top 
> of paginationState (as I need to be 100% backwards compatible).
> The only ways which I could think of to make Limit/Offset work would create 
> scalability issues down the road.
> 
> The old service allowed sorting by any field. If I understood correctly that 
> would require a table for each sort order right? (In-Mem sorting is not an 
> option unfortunately)
> In doing so, how can I make the Java Datastax mapper save to another table (I 
> really don't want to be writing a Subclass of the Entity for each Table to 
> add the @Table annotation.
> 
> greetings Daniel
> 



Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

2017-10-03 Thread Daniel Hölbling-Inzko
Thanks Kurt,
I thought about that but one issue is that we are doing limit/offset not
pages. So one customer can choose to page through the list in 10 Item
increments, another might want to page through with 100 elements per page.
So I can't have a clustering key that represents a page range.

What I was thinking about doing was saving the paginationState in a
separate table along with limit/offset info of the last query the
paginationState originated from so I can use the last paginationState to
continue the iteration from if the customer requests the next page with the
same limit but a different offset.
This breaks down if the customer does a cold offset=1000 request but that's
something I can throw error messages for at, what I do need to support is a
customer doing
Request 1: offset=0 + limit=100
Request 2: offset=100 + limit=100
Request 3: offset=200 + limit=100

So next question would be: How long is the paginationState from the driver
current? I was thinking about inserting the paginationState with a TTL into
another Cassandra table - not sure if that's smart though.

greetings Daniel

On Tue, 3 Oct 2017 at 12:20 kurt greaves  wrote:

> I get the impression that you are paging through a single partition in
> Cassandra? If so you should probably use bounds on clustering keys to get
> your "next page". You could use LIMIT as well here but it's mostly
> unnecessary. Probably just use the pagesize that you intend for the API.
>
> Yes you'll need a table for each sort order, which ties into how you would
> use clustering keys for LIMIT/OFFSET. Essentially just do range slices on
> the clustering keys for each table to get your "pages".
>
> Also I'm assuming there's a lot of data per partition if in-mem sorting
> isn't an option, if this is true you will want to be wary of creating large
> partitions and reading them all at once. Although this depends on your data
> model and compaction strategy choices.
>
> On 3 October 2017 at 08:36, Daniel Hölbling-Inzko <
> daniel.hoelbling-in...@bitmovin.com> wrote:
>
>> Hi,
>> I am currently working on migrating a service that so far was MySQL based
>> to Cassandra.
>> Everything seems to work fine so far, but a few things in the old
>> services API Spec is posing some interesting data modeling challenges:
>>
>> The old service was doing Limit/Offset pagination which is obviously
>> something Cassandra can't really do. I understand how paginationState works
>> - but so far I haven't figured out a good way to make Limit/Offset work on
>> top of paginationState (as I need to be 100% backwards compatible).
>> The only ways which I could think of to make Limit/Offset work would
>> create scalability issues down the road.
>>
>> The old service allowed sorting by any field. If I understood correctly
>> that would require a table for each sort order right? (In-Mem sorting is
>> not an option unfortunately)
>> In doing so, how can I make the Java Datastax mapper save to another
>> table (I really don't want to be writing a Subclass of the Entity for each
>> Table to add the @Table annotation.
>>
>> greetings Daniel
>>
>
>


Re: Migrating a Limit/Offset Pagination and Sorting to Cassandra

2017-10-03 Thread kurt greaves
I get the impression that you are paging through a single partition in
Cassandra? If so you should probably use bounds on clustering keys to get
your "next page". You could use LIMIT as well here but it's mostly
unnecessary. Probably just use the pagesize that you intend for the API.

Yes you'll need a table for each sort order, which ties into how you would
use clustering keys for LIMIT/OFFSET. Essentially just do range slices on
the clustering keys for each table to get your "pages".

Also I'm assuming there's a lot of data per partition if in-mem sorting
isn't an option, if this is true you will want to be wary of creating large
partitions and reading them all at once. Although this depends on your data
model and compaction strategy choices.

On 3 October 2017 at 08:36, Daniel Hölbling-Inzko <
daniel.hoelbling-in...@bitmovin.com> wrote:

> Hi,
> I am currently working on migrating a service that so far was MySQL based
> to Cassandra.
> Everything seems to work fine so far, but a few things in the old services
> API Spec is posing some interesting data modeling challenges:
>
> The old service was doing Limit/Offset pagination which is obviously
> something Cassandra can't really do. I understand how paginationState works
> - but so far I haven't figured out a good way to make Limit/Offset work on
> top of paginationState (as I need to be 100% backwards compatible).
> The only ways which I could think of to make Limit/Offset work would
> create scalability issues down the road.
>
> The old service allowed sorting by any field. If I understood correctly
> that would require a table for each sort order right? (In-Mem sorting is
> not an option unfortunately)
> In doing so, how can I make the Java Datastax mapper save to another table
> (I really don't want to be writing a Subclass of the Entity for each Table
> to add the @Table annotation.
>
> greetings Daniel
>


Migrating a Limit/Offset Pagination and Sorting to Cassandra

2017-10-03 Thread Daniel Hölbling-Inzko
Hi,
I am currently working on migrating a service that so far was MySQL based
to Cassandra.
Everything seems to work fine so far, but a few things in the old services
API Spec is posing some interesting data modeling challenges:

The old service was doing Limit/Offset pagination which is obviously
something Cassandra can't really do. I understand how paginationState works
- but so far I haven't figured out a good way to make Limit/Offset work on
top of paginationState (as I need to be 100% backwards compatible).
The only ways which I could think of to make Limit/Offset work would create
scalability issues down the road.

The old service allowed sorting by any field. If I understood correctly
that would require a table for each sort order right? (In-Mem sorting is
not an option unfortunately)
In doing so, how can I make the Java Datastax mapper save to another table
(I really don't want to be writing a Subclass of the Entity for each Table
to add the @Table annotation.

greetings Daniel


Re: Pagination

2017-06-21 Thread Vladimir Yudovin
Hi,

can this https://docs.datastax.com/en/developer/java-driver/2.1/manual/paging/ 
help you?



Best regards, Vladimir Yudovin, 

Winguzone - Cloud Cassandra Hosting






 On Wed, 21 Jun 2017 02:44:17 -0400 web master 
 wrote 




I am migrating from MySql to Cassandra , In mysql I use OFFSET and LIMIT to 
paginate , the problem is that we have Android client that request next page 
and POST to server OFFSET and LIMIT so I don't know how can I migrate to 
Cassandra and keep backward compatibility 

Is there any technique for the problem?









Pagination

2017-06-20 Thread web master
I am migrating from MySql to Cassandra , In mysql I use OFFSET and LIMIT to
paginate , the problem is that we have Android client that request next
page and POST to server OFFSET and LIMIT so I don't know how can I migrate
to Cassandra and keep backward compatibility

Is there any technique for the problem?


Pagination and timeouts

2017-03-27 Thread Tom van den Berge
I have a table with some 1M rows, and I would like to get the partition key
of each row. Using the java driver (2.1.9), I'm executing the query

select distinct key from table;

The result set is paginated automatically. My C* cluster has two
datacenters, and when I run this query using consistency level LOCAL_ONE,
it starts returning results (page by page) as expected. But after some
time, it will give a ReadTimeoutException. This happens anywhere between 30
seconds and a few minutes.
The java driver's read timeout is set to 50 ms, and the cluster's
read_request_timeout_in_ms is 30 ms.

I'm wondering what is causing this timeout?

What is also not clear to me is whether the driver and server timeout apply
to a single page, or to the entire query?

Thanks,
Tom


Re: Sorting & pagination in apache cassandra 2.1

2016-01-15 Thread Carlos Alonso
Hi Anuja.

Yeah, that's what he means. Before Cassandra 3.0 the modelling advice is to
have one table per query. This may sound weird from a relational
perspective, but the truth is that writes in Cassandra are very cheap, and
its better to write multiple times and have quick and easy reads than write
just once and have expensive reads.

Carlos Alonso | Software Engineer | @calonso 

On 15 January 2016 at 05:57, anuja jain  wrote:

> @Jonathan
> what do you mean by "you'll need to maintain your own materialized view
> tables"?
> does it mean we have to create new table for each query?
>
> On Wed, Jan 13, 2016 at 7:40 PM, Narendra Sharma <
> narendra.sha...@gmail.com> wrote:
>
>> In the example you gave the primary key user _ name is the row key. Since
>> the default partition is random you are getting rows in random order.
>>
>> Since each row no clustering column there is no further grouping of data.
>> Or in simple terms each row has one record and is being returned ordered by
>> column name.
>>
>> To see some meaningful ordering there should be some clustering column
>> defined.
>>
>> You can use create additional column families to maintain ordering. Or
>> use external solutions like elasticsearch.
>> On Jan 12, 2016 10:07 PM, "anuja jain"  wrote:
>>
>>> I understand the meaning of SSTable but whats the reason behind sorting
>>> the table on the basis of int columns first..
>>> Is there any data type preference in cassandra?
>>> Also What is the alternative to creating materialised views if my
>>> cassandra version is prior to 3.0 (specifically 2.1) and which is already
>>> in production.?
>>>
>>>
>>> On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli 
>>> wrote:
>>>
 On Mon, Jan 11, 2016 at 11:30 PM, anuja jain 
 wrote:

> 1 more question, what does it mean by "cassandra inherently sorts
> data"?
>

 SSTable = Sorted Strings Table.

 It doesn't contain "Strings" anymore, really, but that's a hint.. :)

 =Rob

>>>
>>>
>


Re: Sorting & pagination in apache cassandra 2.1

2016-01-14 Thread anuja jain
@Jonathan
what do you mean by "you'll need to maintain your own materialized view
tables"?
does it mean we have to create new table for each query?

On Wed, Jan 13, 2016 at 7:40 PM, Narendra Sharma 
wrote:

> In the example you gave the primary key user _ name is the row key. Since
> the default partition is random you are getting rows in random order.
>
> Since each row no clustering column there is no further grouping of data.
> Or in simple terms each row has one record and is being returned ordered by
> column name.
>
> To see some meaningful ordering there should be some clustering column
> defined.
>
> You can use create additional column families to maintain ordering. Or use
> external solutions like elasticsearch.
> On Jan 12, 2016 10:07 PM, "anuja jain"  wrote:
>
>> I understand the meaning of SSTable but whats the reason behind sorting
>> the table on the basis of int columns first..
>> Is there any data type preference in cassandra?
>> Also What is the alternative to creating materialised views if my
>> cassandra version is prior to 3.0 (specifically 2.1) and which is already
>> in production.?
>>
>>
>> On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli 
>> wrote:
>>
>>> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain 
>>> wrote:
>>>
 1 more question, what does it mean by "cassandra inherently sorts data"?

>>>
>>> SSTable = Sorted Strings Table.
>>>
>>> It doesn't contain "Strings" anymore, really, but that's a hint.. :)
>>>
>>> =Rob
>>>
>>
>>


Re: Sorting & pagination in apache cassandra 2.1

2016-01-13 Thread Narendra Sharma
In the example you gave the primary key user _ name is the row key. Since
the default partition is random you are getting rows in random order.

Since each row no clustering column there is no further grouping of data.
Or in simple terms each row has one record and is being returned ordered by
column name.

To see some meaningful ordering there should be some clustering column
defined.

You can use create additional column families to maintain ordering. Or use
external solutions like elasticsearch.
On Jan 12, 2016 10:07 PM, "anuja jain"  wrote:

> I understand the meaning of SSTable but whats the reason behind sorting
> the table on the basis of int columns first..
> Is there any data type preference in cassandra?
> Also What is the alternative to creating materialised views if my
> cassandra version is prior to 3.0 (specifically 2.1) and which is already
> in production.?
>
>
> On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli 
> wrote:
>
>> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain 
>> wrote:
>>
>>> 1 more question, what does it mean by "cassandra inherently sorts data"?
>>>
>>
>> SSTable = Sorted Strings Table.
>>
>> It doesn't contain "Strings" anymore, really, but that's a hint.. :)
>>
>> =Rob
>>
>
>


Re: Sorting & pagination in apache cassandra 2.1

2016-01-12 Thread Jonathan Haddad
The clustering keys determine the sorting of rows within a partition.  The
partitions within a file are sorted by their token (usually computed by
applying the murmur 3 hash to the partition key).

If you are using a version of Cassandra < 3.0, you'll need to maintain your
own materialized view tables.

On Tue, Jan 12, 2016 at 10:07 PM anuja jain  wrote:

> I understand the meaning of SSTable but whats the reason behind sorting
> the table on the basis of int columns first..
> Is there any data type preference in cassandra?
> Also What is the alternative to creating materialised views if my
> cassandra version is prior to 3.0 (specifically 2.1) and which is already
> in production.?
>
>
> On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli 
> wrote:
>
>> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain 
>> wrote:
>>
>>> 1 more question, what does it mean by "cassandra inherently sorts data"?
>>>
>>
>> SSTable = Sorted Strings Table.
>>
>> It doesn't contain "Strings" anymore, really, but that's a hint.. :)
>>
>> =Rob
>>
>
>


Re: Sorting & pagination in apache cassandra 2.1

2016-01-12 Thread anuja jain
I understand the meaning of SSTable but whats the reason behind sorting the
table on the basis of int columns first..
Is there any data type preference in cassandra?
Also What is the alternative to creating materialised views if my cassandra
version is prior to 3.0 (specifically 2.1) and which is already in
production.?


On Wed, Jan 13, 2016 at 12:17 AM, Robert Coli  wrote:

> On Mon, Jan 11, 2016 at 11:30 PM, anuja jain  wrote:
>
>> 1 more question, what does it mean by "cassandra inherently sorts data"?
>>
>
> SSTable = Sorted Strings Table.
>
> It doesn't contain "Strings" anymore, really, but that's a hint.. :)
>
> =Rob
>


Re: Sorting & pagination in apache cassandra 2.1

2016-01-12 Thread Robert Coli
On Mon, Jan 11, 2016 at 11:30 PM, anuja jain  wrote:

> 1 more question, what does it mean by "cassandra inherently sorts data"?
>

SSTable = Sorted Strings Table.

It doesn't contain "Strings" anymore, really, but that's a hint.. :)

=Rob


Re: Sorting & pagination in apache cassandra 2.1

2016-01-12 Thread Carlos Alonso
Hi Anuja.

Cassandra saves records on disk sorted by the clustering column. In this
case you haven't selected any but it looks like is picking birth_year as
the clustering column. I don't know which is the clustering column
selection algorithm though (maybe alphabetically by name?).

Regards

Carlos Alonso | Software Engineer | @calonso 

On 12 January 2016 at 07:30, anuja jain  wrote:

> 1 more question, what does it mean by "cassandra inherently sorts data"?
> For eg:
> I have a table with schema
>
> CREATE TABLE users (
>
> ...   user_name varchar PRIMARY KEY,
>
> ...   password varchar,
>
> ...   gender varchar,
>
> ...   session_token varchar,
>
> ...   state varchar,
>
> ...   birth_year bigint
>
> ... );
>
> I inserted data in random order but I on firing select statement I get
> data sorted by birth_year..  So why does this happen?
>
>  cqlsh:learning> select * from users;
>
>
>
> user_name | birth_year | gender | password | session_token | state
>
> ---+++--+---+-
>
>   John |   1979 |  M | qwer |   abc |  JK
>
>Dharini |   1980 |  F |  Xyz |   abc | Gujarat
>
>  Keval |   1990 |  M |  DDD |   abc |  WB
>
> On Tue, Jan 12, 2016 at 12:52 PM, anuja jain  wrote:
>
>> What is the alternative if my cassandra version is prior to 3.0
>> (specifically) 2.1) and which is already in production.?
>>
>> Also as per the docs given at
>>
>>
>> https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/srch/srchCapazty.html
>>  what does it mean by we need to do capacity planning if we need to
>> search using SOLR. What is other alternative when we do not know the size
>> of the data ?
>>
>>  Thanks,
>>
>> Anuja
>>
>>
>>
>> On Fri, Jan 8, 2016 at 12:15 AM, Tyler Hobbs  wrote:
>>
>>>
>>> On Thu, Jan 7, 2016 at 6:45 AM, anuja jain  wrote:
>>>
 My question is, what is the alternative if we need to order by col3 or
 col4 in my above example without including col2 in order by clause.

>>>
>>> The server-side alternative is to create a second table (or a
>>> materialized view, if you're using 3.0+) that uses a different clustering
>>> order.  Cassandra purposefully only supports simple and efficient queries
>>> that can be handled quickly (with a few exceptions), and arbitrary ordering
>>> is not part of that, especially if you consider complications like paging.
>>>
>>>
>>> --
>>> Tyler Hobbs
>>> DataStax 
>>>
>>
>>
>


Re: Sorting & pagination in apache cassandra 2.1

2016-01-11 Thread anuja jain
1 more question, what does it mean by "cassandra inherently sorts data"?
For eg:
I have a table with schema

CREATE TABLE users (

...   user_name varchar PRIMARY KEY,

...   password varchar,

...   gender varchar,

...   session_token varchar,

...   state varchar,

...   birth_year bigint

... );

I inserted data in random order but I on firing select statement I get data
sorted by birth_year..  So why does this happen?

 cqlsh:learning> select * from users;



user_name | birth_year | gender | password | session_token | state

---+++--+---+-

  John |   1979 |  M | qwer |   abc |  JK

   Dharini |   1980 |  F |  Xyz |   abc | Gujarat

 Keval |   1990 |  M |  DDD |   abc |  WB

On Tue, Jan 12, 2016 at 12:52 PM, anuja jain  wrote:

> What is the alternative if my cassandra version is prior to 3.0
> (specifically) 2.1) and which is already in production.?
>
> Also as per the docs given at
>
>
> https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/srch/srchCapazty.html
>  what does it mean by we need to do capacity planning if we need to
> search using SOLR. What is other alternative when we do not know the size
> of the data ?
>
>  Thanks,
>
> Anuja
>
>
>
> On Fri, Jan 8, 2016 at 12:15 AM, Tyler Hobbs  wrote:
>
>>
>> On Thu, Jan 7, 2016 at 6:45 AM, anuja jain  wrote:
>>
>>> My question is, what is the alternative if we need to order by col3 or
>>> col4 in my above example without including col2 in order by clause.
>>>
>>
>> The server-side alternative is to create a second table (or a
>> materialized view, if you're using 3.0+) that uses a different clustering
>> order.  Cassandra purposefully only supports simple and efficient queries
>> that can be handled quickly (with a few exceptions), and arbitrary ordering
>> is not part of that, especially if you consider complications like paging.
>>
>>
>> --
>> Tyler Hobbs
>> DataStax 
>>
>
>


Re: Sorting & pagination in apache cassandra 2.1

2016-01-11 Thread anuja jain
What is the alternative if my cassandra version is prior to 3.0
(specifically) 2.1) and which is already in production.?

Also as per the docs given at

https://docs.datastax.com/en/datastax_enterprise/4.6/datastax_enterprise/srch/srchCapazty.html
 what does it mean by we need to do capacity planning if we need to search
using SOLR. What is other alternative when we do not know the size of the
data ?

 Thanks,

Anuja



On Fri, Jan 8, 2016 at 12:15 AM, Tyler Hobbs  wrote:

>
> On Thu, Jan 7, 2016 at 6:45 AM, anuja jain  wrote:
>
>> My question is, what is the alternative if we need to order by col3 or
>> col4 in my above example without including col2 in order by clause.
>>
>
> The server-side alternative is to create a second table (or a materialized
> view, if you're using 3.0+) that uses a different clustering order.
> Cassandra purposefully only supports simple and efficient queries that can
> be handled quickly (with a few exceptions), and arbitrary ordering is not
> part of that, especially if you consider complications like paging.
>
>
> --
> Tyler Hobbs
> DataStax 
>


Re: Sorting & pagination in apache cassandra 2.1

2016-01-07 Thread Tyler Hobbs
On Thu, Jan 7, 2016 at 6:45 AM, anuja jain  wrote:

> My question is, what is the alternative if we need to order by col3 or
> col4 in my above example without including col2 in order by clause.
>

The server-side alternative is to create a second table (or a materialized
view, if you're using 3.0+) that uses a different clustering order.
Cassandra purposefully only supports simple and efficient queries that can
be handled quickly (with a few exceptions), and arbitrary ordering is not
part of that, especially if you consider complications like paging.


-- 
Tyler Hobbs
DataStax 


Sorting & pagination in apache cassandra 2.1

2016-01-07 Thread anuja jain
HI All,
 If suppose I have a cassandra table with structure
CREATE TABLE test.t1 (
col1 text,
col2 text,
col3 text,
col4 text,
PRIMARY KEY (col1, col2, col3, col4)
) WITH CLUSTERING ORDER BY (col2 ASC, col3 ASC, col4 ASC);

and it has following data

 col1 | col2 | col3 | col4
--+--+--+--
  abc |  abc |  abc |  abc

and I query the table saying
select * from t1 where col1='abc' order by col3;

it gives me following error
InvalidRequest: code=2200 [Invalid query] message="Order by currently only
support the ordering of columns following their declared order in the
PRIMARY KEY"

While reading on docs I came to know that only the first clustering column
can ordered by independently and for the other columns we need to follow
the sequence of the clustering columns.
My question is, what is the alternative if we need to order by col3 or col4
in my above example without including col2 in order by clause.


Thanks,
Anuja


Re: Automatic pagination does not get all results

2015-10-23 Thread Sid Tantia
Hello Jeff,

I'm using Cassandra v2.1.4
I'm expecting the amount of results to be the same every time I use the
COPY command (specifically I'm using `COPY  TO stdout`). However
here are the counts of rows exported each time I ran COPY:
1) 180389 rows exported
2) 181212 rows exported
3) 178641 rows exported
4) 176688 rows exported
5) 175433 rows exported

So it's found a different amount of rows to export every single time I've
run the command, even though its on the same table and no additional writes
have been made.

CL for read is ALL
CL for write is ONE
Yes, I've run repair since last writing data.

On Thu, Oct 22, 2015 at 9:02 PM, Jeff Jirsa 
wrote:

> It’s possible that it could be different depending on your consistency
> level (on write and on read).
>
> It’s also possible it’s a bug, but you didn’t give us much information –
> here are some questions to help us help you:
>
> What version?
> What results are you seeing?
> What’s the “right” result?
> What CL did you use to write the data?
> What CL did you use to read the data?
> Have you run repair since writing the data?
>
>
> From: Sid Tantia
> Reply-To: "user@cassandra.apache.org"
> Date: Thursday, October 22, 2015 at 5:49 PM
> To: user
> Subject: Automatic pagination does not get all results
>
> Hello,
>
> Has anyone had a problem with automatic pagination returning different
> results everytime (this is for a table with ~180,000 rows)? I'm going
> through each page and inserting the results into an array and each time I
> go through all the pages, the resultant array has a different size.
>
> This happens whether I use a SELECT query with automatic paging using the
> Ruby driver or a COPY to CSV command with cqlsh.
>
> -Sid
>
>


Re: Automatic pagination does not get all results

2015-10-22 Thread Jeff Jirsa
It’s possible that it could be different depending on your consistency level 
(on write and on read).

It’s also possible it’s a bug, but you didn’t give us much information – here 
are some questions to help us help you:

What version? 
What results are you seeing? 
What’s the “right” result? 
What CL did you use to write the data? 
What CL did you use to read the data? 
Have you run repair since writing the data?


From:  Sid Tantia
Reply-To:  "user@cassandra.apache.org"
Date:  Thursday, October 22, 2015 at 5:49 PM
To:  user
Subject:  Automatic pagination does not get all results

Hello,

Has anyone had a problem with automatic pagination returning different results 
everytime (this is for a table with ~180,000 rows)? I'm going through each page 
and inserting the results into an array and each time I go through all the 
pages, the resultant array has a different size. 

This happens whether I use a SELECT query with automatic paging using the Ruby 
driver or a COPY to CSV command with cqlsh.

-Sid




smime.p7s
Description: S/MIME cryptographic signature


Automatic pagination does not get all results

2015-10-22 Thread Sid Tantia
Hello,

Has anyone had a problem with automatic pagination returning different
results everytime (this is for a table with ~180,000 rows)? I'm going
through each page and inserting the results into an array and each time I
go through all the pages, the resultant array has a different size.

This happens whether I use a SELECT query with automatic paging using the
Ruby driver or a COPY to CSV command with cqlsh.

-Sid


Re: Pagination support on Java Driver Query API

2015-02-13 Thread Ajay
The syntax suggested by Ondrej is not working in some case in 2.0.11 and
logged an issue for the same.

https://issues.apache.org/jira/browse/CASSANDRA-8797

Thanks
Ajay
On Feb 12, 2015 11:01 PM, "Bulat Shakirzyanov" <
bulat.shakirzya...@datastax.com> wrote:

> Fixed my Mail.app settings so you can see my actual name, sorry.
>
> On Feb 12, 2015, at 8:55 AM, DataStax 
> wrote:
>
> Hello,
>
> As was mentioned earlier, the Java driver doesn’t actually perform
> pagination.
>
> Instead, it uses cassandra native protocol to set page size of the result
> set. (
> https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v2.spec#L699-L730
> )
> When Cassandra sends the result back to the java driver, it includes a
> some binary token.
> This token represents paging state. To fetch the next page, the driver
> re-executes the same
> statement with original page size and paging state attached. If there is
> another page available,
> Cassandra responds with a new paging state that can be used to fetch it.
>
> You could also try reporting this issue on the Cassandra user mailing list.
>
> On Feb 12, 2015, at 8:35 AM, Eric Stevens  wrote:
>
> I don't know what the shape of the page state data is deep inside the
> JavaDriver, I've actually tried to dig into that in the past and understand
> it to see if I could reproduce it as a general purpose any-query kind of
> thing.  I gave up before I fully understood it, but I think it's actually a
> handle to an in-memory state maintained by the coordinator, which is only
> maintained for the lifetime of the statement (i.e. it's not stateless
> paging). That would make it a bad candidate for stateless paging scenarios
> such as REST requests where a typical setup would load balance across HTTP
> hosts, never mind across coordinators.
>
> It shouldn't be too much work to abstract this basic idea for manual
> paging into a general purpose class that takes List[ClusteringKeyDef[T,
> O<:Ordering]], and can produce a connection agnostic PageState from a
> ResultSet or Row, or accepts a PageState to produce a WHERE CQL fragment.
>
>
>
> Also RE: possibly multiple queries to satisfy a page - yes, that's
> unfortunate.  Since you're on 2.0.11, see Ondřej's answer to avoid it.
>
> On Thu, Feb 12, 2015 at 8:13 AM, Ajay  wrote:
>
>> Thanks Eric. I figured out the same but didn't get time to put it on the
>> mail. Thanks.
>>
>> But it is highly tied up to how data is stored internally in Cassandra.
>> Basically how partition keys are used to distribute (less likely to change.
>> We are not directly dependence on the partition algo) and clustering keys
>> are used to sort the data with in a partition( multi level sorting and
>> henceforth the restrictions on the ORDER BY clause) which I think can
>> change likely down the lane in Cassandra 3.x or 4.x in an different way for
>> some better storage or retrieval.
>>
>> Thats said I am hesitant to implement this client side logic for
>> pagination for a) 2+ queries might need more than one query to Cassandra.
>> b)  tied up implementation to Cassandra internal storage details which can
>> change(though not often). c) in our case, we are building REST Apis which
>> will be deployed Tomcat clusters. Hence whatever we cache to support
>> pagination, need to be cached in a distributed way for failover support.
>>
>> It (pagination support) is best done at the server side like ROWNUM in
>> SQL or better done in Java driver to hide the internal details and can be
>> optimized better as server sends the paging state with the driver.
>>
>> Thanks
>> Ajay
>> On Feb 12, 2015 8:22 PM, "Eric Stevens"  wrote:
>>
>>> Your page state then needs to track the last ck1 and last ck2 you saw.
>>> Pages 2+ will end up needing to be up to two queries if the first query
>>> doesn't fill the page size.
>>>
>>> CREATE TABLE foo (
>>>   partitionkey int,
>>>   ck1 int,
>>>   ck2 int,
>>>   col1 int,
>>>   col2 int,
>>>   PRIMARY KEY ((partitionkey), ck1, ck2)
>>> ) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc);
>>>
>>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1);
>>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2);
>>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3);
>>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4);
>>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5);
>>> INSERT INTO foo (partition

Re: Pagination support on Java Driver Query API

2015-02-12 Thread Bulat Shakirzyanov
Fixed my Mail.app settings so you can see my actual name, sorry.
> On Feb 12, 2015, at 8:55 AM, DataStax  wrote:
> 
> Hello,
> 
> As was mentioned earlier, the Java driver doesn’t actually perform pagination.
> 
> Instead, it uses cassandra native protocol to set page size of the result 
> set. 
> (https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v2.spec#L699-L730
>  
> <https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v2.spec#L699-L730>)
> When Cassandra sends the result back to the java driver, it includes a some 
> binary token.
> This token represents paging state. To fetch the next page, the driver 
> re-executes the same
> statement with original page size and paging state attached. If there is 
> another page available,
> Cassandra responds with a new paging state that can be used to fetch it.
> 
> You could also try reporting this issue on the Cassandra user mailing list.
> 
>> On Feb 12, 2015, at 8:35 AM, Eric Stevens > <mailto:migh...@gmail.com>> wrote:
>> 
>> I don't know what the shape of the page state data is deep inside the 
>> JavaDriver, I've actually tried to dig into that in the past and understand 
>> it to see if I could reproduce it as a general purpose any-query kind of 
>> thing.  I gave up before I fully understood it, but I think it's actually a 
>> handle to an in-memory state maintained by the coordinator, which is only 
>> maintained for the lifetime of the statement (i.e. it's not stateless 
>> paging). That would make it a bad candidate for stateless paging scenarios 
>> such as REST requests where a typical setup would load balance across HTTP 
>> hosts, never mind across coordinators.
>> 
>> It shouldn't be too much work to abstract this basic idea for manual paging 
>> into a general purpose class that takes List[ClusteringKeyDef[T, 
>> O<:Ordering]], and can produce a connection agnostic PageState from a 
>> ResultSet or Row, or accepts a PageState to produce a WHERE CQL fragment.
>> 
>> 
>> 
>> Also RE: possibly multiple queries to satisfy a page - yes, that's 
>> unfortunate.  Since you're on 2.0.11, see Ondřej's answer to avoid it.
>> 
>> On Thu, Feb 12, 2015 at 8:13 AM, Ajay > <mailto:ajay.ga...@gmail.com>> wrote:
>> Thanks Eric. I figured out the same but didn't get time to put it on the 
>> mail. Thanks.
>> 
>> But it is highly tied up to how data is stored internally in Cassandra.  
>> Basically how partition keys are used to distribute (less likely to change. 
>> We are not directly dependence on the partition algo) and clustering keys 
>> are used to sort the data with in a partition( multi level sorting and 
>> henceforth the restrictions on the ORDER BY clause) which I think can change 
>> likely down the lane in Cassandra 3.x or 4.x in an different way for some 
>> better storage or retrieval. 
>> 
>> Thats said I am hesitant to implement this client side logic for pagination 
>> for a) 2+ queries might need more than one query to Cassandra. b)  tied up 
>> implementation to Cassandra internal storage details which can change(though 
>> not often). c) in our case, we are building REST Apis which will be deployed 
>> Tomcat clusters. Hence whatever we cache to support pagination, need to be 
>> cached in a distributed way for failover support. 
>> 
>> It (pagination support) is best done at the server side like ROWNUM in SQL 
>> or better done in Java driver to hide the internal details and can be 
>> optimized better as server sends the paging state with the driver.
>> 
>> Thanks
>> Ajay
>> 
>> On Feb 12, 2015 8:22 PM, "Eric Stevens" > <mailto:migh...@gmail.com>> wrote:
>> Your page state then needs to track the last ck1 and last ck2 you saw.  
>> Pages 2+ will end up needing to be up to two queries if the first query 
>> doesn't fill the page size.
>> 
>> CREATE TABLE foo (
>>   partitionkey int,
>>   ck1 int,
>>   ck2 int,
>>   col1 int,
>>   col2 int,
>>   PRIMARY KEY ((partitionkey), ck1, ck2)
>> ) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc);
>> 
>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1);
>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2);
>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3);
>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4);
>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5);
>> INSERT INTO foo (partitionk

Re: Pagination support on Java Driver Query API

2015-02-12 Thread DataStax
Hello,

As was mentioned earlier, the Java driver doesn’t actually perform pagination.

Instead, it uses cassandra native protocol to set page size of the result set. 
(https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v2.spec#L699-L730
 
<https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v2.spec#L699-L730>)
When Cassandra sends the result back to the java driver, it includes a some 
binary token.
This token represents paging state. To fetch the next page, the driver 
re-executes the same
statement with original page size and paging state attached. If there is 
another page available,
Cassandra responds with a new paging state that can be used to fetch it.

You could also try reporting this issue on the Cassandra user mailing list.

> On Feb 12, 2015, at 8:35 AM, Eric Stevens  wrote:
> 
> I don't know what the shape of the page state data is deep inside the 
> JavaDriver, I've actually tried to dig into that in the past and understand 
> it to see if I could reproduce it as a general purpose any-query kind of 
> thing.  I gave up before I fully understood it, but I think it's actually a 
> handle to an in-memory state maintained by the coordinator, which is only 
> maintained for the lifetime of the statement (i.e. it's not stateless 
> paging). That would make it a bad candidate for stateless paging scenarios 
> such as REST requests where a typical setup would load balance across HTTP 
> hosts, never mind across coordinators.
> 
> It shouldn't be too much work to abstract this basic idea for manual paging 
> into a general purpose class that takes List[ClusteringKeyDef[T, 
> O<:Ordering]], and can produce a connection agnostic PageState from a 
> ResultSet or Row, or accepts a PageState to produce a WHERE CQL fragment.
> 
> 
> 
> Also RE: possibly multiple queries to satisfy a page - yes, that's 
> unfortunate.  Since you're on 2.0.11, see Ondřej's answer to avoid it.
> 
> On Thu, Feb 12, 2015 at 8:13 AM, Ajay  <mailto:ajay.ga...@gmail.com>> wrote:
> Thanks Eric. I figured out the same but didn't get time to put it on the 
> mail. Thanks.
> 
> But it is highly tied up to how data is stored internally in Cassandra.  
> Basically how partition keys are used to distribute (less likely to change. 
> We are not directly dependence on the partition algo) and clustering keys are 
> used to sort the data with in a partition( multi level sorting and henceforth 
> the restrictions on the ORDER BY clause) which I think can change likely down 
> the lane in Cassandra 3.x or 4.x in an different way for some better storage 
> or retrieval. 
> 
> Thats said I am hesitant to implement this client side logic for pagination 
> for a) 2+ queries might need more than one query to Cassandra. b)  tied up 
> implementation to Cassandra internal storage details which can change(though 
> not often). c) in our case, we are building REST Apis which will be deployed 
> Tomcat clusters. Hence whatever we cache to support pagination, need to be 
> cached in a distributed way for failover support. 
> 
> It (pagination support) is best done at the server side like ROWNUM in SQL or 
> better done in Java driver to hide the internal details and can be optimized 
> better as server sends the paging state with the driver.
> 
> Thanks
> Ajay
> 
> On Feb 12, 2015 8:22 PM, "Eric Stevens"  <mailto:migh...@gmail.com>> wrote:
> Your page state then needs to track the last ck1 and last ck2 you saw.  Pages 
> 2+ will end up needing to be up to two queries if the first query doesn't 
> fill the page size.
> 
> CREATE TABLE foo (
>   partitionkey int,
>   ck1 int,
>   ck2 int,
>   col1 int,
>   col2 int,
>   PRIMARY KEY ((partitionkey), ck1, ck2)
> ) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc);
> 
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,3,6,6);
> 
> If you're pulling the whole of partition 1 and your page size is 2, your 
> first page looks like:
> 
> PAGE 1
> 
> SELECT * FROM foo WHERE partitionkey = 1 LIMIT 2;
>  partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   1 |   3 |3 |3
> 1 |   1 |   2 |2 |2
> 
> You got enough rows to satisfy the page, Your page state is taken from the 
> last row: (ck1=1, ck2=2)
&g

Re: Pagination support on Java Driver Query API

2015-02-12 Thread Eric Stevens
I don't know what the shape of the page state data is deep inside the
JavaDriver, I've actually tried to dig into that in the past and understand
it to see if I could reproduce it as a general purpose any-query kind of
thing.  I gave up before I fully understood it, but I think it's actually a
handle to an in-memory state maintained by the coordinator, which is only
maintained for the lifetime of the statement (i.e. it's not stateless
paging). That would make it a bad candidate for stateless paging scenarios
such as REST requests where a typical setup would load balance across HTTP
hosts, never mind across coordinators.

It shouldn't be too much work to abstract this basic idea for manual paging
into a general purpose class that takes List[ClusteringKeyDef[T,
O<:Ordering]], and can produce a connection agnostic PageState from a
ResultSet or Row, or accepts a PageState to produce a WHERE CQL fragment.



Also RE: possibly multiple queries to satisfy a page - yes, that's
unfortunate.  Since you're on 2.0.11, see Ondřej's answer to avoid it.

On Thu, Feb 12, 2015 at 8:13 AM, Ajay  wrote:

> Thanks Eric. I figured out the same but didn't get time to put it on the
> mail. Thanks.
>
> But it is highly tied up to how data is stored internally in Cassandra.
> Basically how partition keys are used to distribute (less likely to change.
> We are not directly dependence on the partition algo) and clustering keys
> are used to sort the data with in a partition( multi level sorting and
> henceforth the restrictions on the ORDER BY clause) which I think can
> change likely down the lane in Cassandra 3.x or 4.x in an different way for
> some better storage or retrieval.
>
> Thats said I am hesitant to implement this client side logic for
> pagination for a) 2+ queries might need more than one query to Cassandra.
> b)  tied up implementation to Cassandra internal storage details which can
> change(though not often). c) in our case, we are building REST Apis which
> will be deployed Tomcat clusters. Hence whatever we cache to support
> pagination, need to be cached in a distributed way for failover support.
>
> It (pagination support) is best done at the server side like ROWNUM in SQL
> or better done in Java driver to hide the internal details and can be
> optimized better as server sends the paging state with the driver.
>
> Thanks
> Ajay
> On Feb 12, 2015 8:22 PM, "Eric Stevens"  wrote:
>
>> Your page state then needs to track the last ck1 and last ck2 you saw.
>> Pages 2+ will end up needing to be up to two queries if the first query
>> doesn't fill the page size.
>>
>> CREATE TABLE foo (
>>   partitionkey int,
>>   ck1 int,
>>   ck2 int,
>>   col1 int,
>>   col2 int,
>>   PRIMARY KEY ((partitionkey), ck1, ck2)
>> ) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc);
>>
>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1);
>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2);
>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3);
>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4);
>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5);
>> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,3,6,6);
>>
>> If you're pulling the whole of partition 1 and your page size is 2, your
>> first page looks like:
>>
>> *PAGE 1*
>>
>> SELECT * FROM foo WHERE partitionkey = 1 LIMIT 2;
>>  partitionkey | ck1 | ck2 | col1 | col2
>> --+-+-+--+--
>> 1 |   1 |   3 |3 |3
>> 1 |   1 |   2 |2 |2
>>
>> You got enough rows to satisfy the page, Your page state is taken from
>> the last row: (ck1=1, ck2=2)
>>
>>
>> *PAGE 2*
>> Notice that you have a page state, and add some limiting clauses on the
>> statement:
>>
>> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 1 AND ck2 < 2 LIMIT 2;
>>  partitionkey | ck1 | ck2 | col1 | col2
>> --+-+-+--+--
>> 1 |   1 |   1 |1 |1
>>
>> Oops, we didn't get enough rows to satisfy the page limit, so we need to
>> continue on, we just need one more:
>>
>> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 1 LIMIT 1;
>>  partitionkey | ck1 | ck2 | col1 | col2
>> --+-+-+--+--
>> 1 |   2 |   3 |6 |6
>>
>> We have enough to satisfy page 2 now, our new page state: (ck1 = 2, ck2 =
>> 3).
>>
>>
>> *PAGE 3*
>>
>> SELECT * FROM foo WHERE part

Re: Pagination support on Java Driver Query API

2015-02-12 Thread Eric Stevens
Thanks Ondřej!  Definitely much easier.

N/B, this is a new feature in 2.0.x, it will not work in 1.2.x.

cqlsh:scratch> SELECT * FROM foo WHERE partitionkey = 1 and (ck1, ck2) >
(1,2) limit 2;
Bad Request: line 1:45 no viable alternative at input '('


On Thu, Feb 12, 2015 at 8:44 AM, Ondřej Nešpor 
wrote:

>  There is a much easier way to do that (and I suppose the Java driver does
> it this way):
>
> page 1:
>
> SELECT * FROM foo WHERE partitionkey = 1 limit 2;
>
>  partitionkey | ck1 | ck2 | col1 | col2
>
> --+-+-+--+--
>
> 1 |   1 |   3 |3 |3
>
> 1 |   1 |   2 |2 |2
>
>
>
> page2:
>
> SELECT * FROM foo WHERE partitionkey = 1 and (ck1, ck2) > (1,2) limit 2;
>
>
>  partitionkey | ck1 | ck2 | col1 | col2
>
> --+-+-+--+--
>
> 1 |   1 |   1 |1 |1
>
> 1 |   2 |   3 |6 |6
>
>
>
>
> page 3:
>
> SELECT * FROM foo WHERE partitionkey = 1 and (ck1, ck2) > (2,3) limit 2;
>
>  partitionkey | ck1 | ck2 | col1 | col2
>
> --+-+-+--+--
>
> 1 |   2 |   2 |5 |5
>
> 1 |   2 |   1 |4 |4
>
>
>
> Basically you pass ck1 and ck2 values from the last row of the previous
> result and tell C* you want results that are greater (next results in case
> the first clustering key is ASC).
>
>
>
> Andrew
>
>
>
> Dne 12.2.2015 v 15:50 Eric Stevens napsal(a):
>
> Your page state then needs to track the last ck1 and last ck2 you saw.
> Pages 2+ will end up needing to be up to two queries if the first query
> doesn't fill the page size.
>
>  CREATE TABLE foo (
>   partitionkey int,
>   ck1 int,
>   ck2 int,
>   col1 int,
>   col2 int,
>   PRIMARY KEY ((partitionkey), ck1, ck2)
> ) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc);
>
>  INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1);
>  INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2);
>   INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3);
>   INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4);
>   INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5);
>   INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,3,6,6);
>
>  If you're pulling the whole of partition 1 and your page size is 2, your
> first page looks like:
>
>  *PAGE 1*
>
>  SELECT * FROM foo WHERE partitionkey = 1 LIMIT 2;
>   partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   1 |   3 |3 |3
> 1 |   1 |   2 |2 |2
>
>  You got enough rows to satisfy the page, Your page state is taken from
> the last row: (ck1=1, ck2=2)
>
>
>  *PAGE 2*
> Notice that you have a page state, and add some limiting clauses on the
> statement:
>
>  SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 1 AND ck2 < 2 LIMIT 2;
>   partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   1 |   1 |1 |1
>
>  Oops, we didn't get enough rows to satisfy the page limit, so we need to
> continue on, we just need one more:
>
>  SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 1 LIMIT 1;
>   partitionkey | ck1 | ck2 | col1 | col2
>  --+-+-+--+--
> 1 |   2 |   3 |6 |6
>
>  We have enough to satisfy page 2 now, our new page state: (ck1 = 2, ck2
> = 3).
>
>
>  *PAGE 3*
>
>  SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 3 LIMIT 2;
>  partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   2 |   2 |5 |5
> 1 |   2 |   1 |4 |4
>
>  Great, we satisfied this page with only one query, page state: (ck1 = 2,
> ck2 = 1).
>
>
>  *PAGE 4*
>
>  SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 1 LIMIT 2;
> (0 rows)
>
>  Oops, our initial query was on the boundary of ck1, but this looks like
> any other time that the initial query returns < pageSize rows, we just move
> on to the next page:
>
>  SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 2 LIMIT 2;
> (0 rows)
>
>  Aha, we've exhausted ck1 as well, so there are no more pages, page 3
> actually pulled the last possible value; page 4 is empty, and we're all
> done.  Generally speaking you know you're done when your first clustering
> key is the only non-equality operator in the statement, and you got no rows
> back.
>
>
>
>
>

Re: Pagination support on Java Driver Query API

2015-02-12 Thread Ondřej Nešpor
There is a much easier way to do that (and I suppose the Java driver
does it this way):

page 1:

SELECT * FROM foo WHERE partitionkey = 1 limit 2;

 partitionkey | ck1 | ck2 | col1 | col2

--+-+-+--+--

1 |   1 |   3 |3 |3

1 |   1 |   2 |2 |2



page2:

SELECT * FROM foo WHERE partitionkey = 1 and (ck1, ck2) > (1,2) limit 2;

 partitionkey | ck1 | ck2 | col1 | col2

--+-+-+--+--

1 |   1 |   1 |1 |1

1 |   2 |   3 |6 |6




page 3:

SELECT * FROM foo WHERE partitionkey = 1 and (ck1, ck2) > (2,3) limit 2;

 partitionkey | ck1 | ck2 | col1 | col2

--+-+-+--+--

1 |   2 |   2 |5 |5

1 |   2 |   1 |4 |4



Basically you pass ck1 and ck2 values from the last row of the previous
result and tell C* you want results that are greater (next results in
case the first clustering key is ASC).



Andrew



Dne 12.2.2015 v 15:50 Eric Stevens napsal(a):
> Your page state then needs to track the last ck1 and last ck2 you
> saw.  Pages 2+ will end up needing to be up to two queries if the
> first query doesn't fill the page size.
>
> CREATE TABLE foo (
>   partitionkey int,
>   ck1 int,
>   ck2 int,
>   col1 int,
>   col2 int,
>   PRIMARY KEY ((partitionkey), ck1, ck2)
> ) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc);
>
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,3,6,6);
>
> If you're pulling the whole of partition 1 and your page size is 2,
> your first page looks like:
>
> *PAGE 1*
>
> SELECT * FROM foo WHERE partitionkey = 1 LIMIT 2;
>  partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   1 |   3 |3 |3
> 1 |   1 |   2 |2 |2
>
> You got enough rows to satisfy the page, Your page state is taken from
> the last row: (ck1=1, ck2=2)
>
>
> *PAGE 2*
> Notice that you have a page state, and add some limiting clauses on
> the statement:
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 1 AND ck2 < 2 LIMIT 2;
>  partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   1 |   1 |1 |1
>
> Oops, we didn't get enough rows to satisfy the page limit, so we need
> to continue on, we just need one more:
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 1 LIMIT 1;
>  partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   2 |   3 |6 |6
>
> We have enough to satisfy page 2 now, our new page state: (ck1 = 2,
> ck2 = 3).
>
>
> *PAGE 3*
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 3 LIMIT 2;
>  partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   2 |   2 |5 |5
> 1 |   2 |   1 |4 |4
>
> Great, we satisfied this page with only one query, page state: (ck1 =
> 2, ck2 = 1).  
>
>
> *PAGE 4*
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 1 LIMIT 2;
> (0 rows)
>
> Oops, our initial query was on the boundary of ck1, but this looks
> like any other time that the initial query returns < pageSize rows, we
> just move on to the next page:
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 2 LIMIT 2;
> (0 rows)
>
> Aha, we've exhausted ck1 as well, so there are no more pages, page 3
> actually pulled the last possible value; page 4 is empty, and we're
> all done.  Generally speaking you know you're done when your first
> clustering key is the only non-equality operator in the statement, and
> you got no rows back.
>
>
>
>
>
>
> On Wed, Feb 11, 2015 at 10:55 AM, Ajay  <mailto:ajay.ga...@gmail.com>> wrote:
>
> Basically I am trying different queries with your approach.
>
> One such query is like
>
> Select * from mycf where condition on partition key order by ck1
> asc, ck2 desc where ck1 and ck2 are clustering keys in that order.
>
> Here how do we achieve pagination support?
>
> Thanks
> Ajay
>
> On Feb 11, 2015 11:16 PM, "Ajay"  <mailto:ajay.ga...@gmail.com>> wrote:
>
>
> Hi Eric,
>
> Tha

Re: Pagination support on Java Driver Query API

2015-02-12 Thread Ajay
Thanks Eric. I figured out the same but didn't get time to put it on the
mail. Thanks.

But it is highly tied up to how data is stored internally in Cassandra.
Basically how partition keys are used to distribute (less likely to change.
We are not directly dependence on the partition algo) and clustering keys
are used to sort the data with in a partition( multi level sorting and
henceforth the restrictions on the ORDER BY clause) which I think can
change likely down the lane in Cassandra 3.x or 4.x in an different way for
some better storage or retrieval.

Thats said I am hesitant to implement this client side logic for pagination
for a) 2+ queries might need more than one query to Cassandra. b)  tied up
implementation to Cassandra internal storage details which can
change(though not often). c) in our case, we are building REST Apis which
will be deployed Tomcat clusters. Hence whatever we cache to support
pagination, need to be cached in a distributed way for failover support.

It (pagination support) is best done at the server side like ROWNUM in SQL
or better done in Java driver to hide the internal details and can be
optimized better as server sends the paging state with the driver.

Thanks
Ajay
On Feb 12, 2015 8:22 PM, "Eric Stevens"  wrote:

> Your page state then needs to track the last ck1 and last ck2 you saw.
> Pages 2+ will end up needing to be up to two queries if the first query
> doesn't fill the page size.
>
> CREATE TABLE foo (
>   partitionkey int,
>   ck1 int,
>   ck2 int,
>   col1 int,
>   col2 int,
>   PRIMARY KEY ((partitionkey), ck1, ck2)
> ) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc);
>
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5);
> INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,3,6,6);
>
> If you're pulling the whole of partition 1 and your page size is 2, your
> first page looks like:
>
> *PAGE 1*
>
> SELECT * FROM foo WHERE partitionkey = 1 LIMIT 2;
>  partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   1 |   3 |3 |3
> 1 |   1 |   2 |2 |2
>
> You got enough rows to satisfy the page, Your page state is taken from the
> last row: (ck1=1, ck2=2)
>
>
> *PAGE 2*
> Notice that you have a page state, and add some limiting clauses on the
> statement:
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 1 AND ck2 < 2 LIMIT 2;
>  partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   1 |   1 |1 |1
>
> Oops, we didn't get enough rows to satisfy the page limit, so we need to
> continue on, we just need one more:
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 1 LIMIT 1;
>  partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   2 |   3 |6 |6
>
> We have enough to satisfy page 2 now, our new page state: (ck1 = 2, ck2 =
> 3).
>
>
> *PAGE 3*
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 3 LIMIT 2;
>  partitionkey | ck1 | ck2 | col1 | col2
> --+-+-+--+--
> 1 |   2 |   2 |5 |5
> 1 |   2 |   1 |4 |4
>
> Great, we satisfied this page with only one query, page state: (ck1 = 2,
> ck2 = 1).
>
>
> *PAGE 4*
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 1 LIMIT 2;
> (0 rows)
>
> Oops, our initial query was on the boundary of ck1, but this looks like
> any other time that the initial query returns < pageSize rows, we just move
> on to the next page:
>
> SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 2 LIMIT 2;
> (0 rows)
>
> Aha, we've exhausted ck1 as well, so there are no more pages, page 3
> actually pulled the last possible value; page 4 is empty, and we're all
> done.  Generally speaking you know you're done when your first clustering
> key is the only non-equality operator in the statement, and you got no rows
> back.
>
>
>
>
>
>
> On Wed, Feb 11, 2015 at 10:55 AM, Ajay  wrote:
>
>> Basically I am trying different queries with your approach.
>>
>> One such query is like
>>
>> Select * from mycf where condition on partition key order by ck1 asc, ck2
>> desc where ck1 and ck2 are clustering keys in that order.
>>
>> Here how do we achieve pagination support?
>>
>&

Re: Pagination support on Java Driver Query API

2015-02-12 Thread Eric Stevens
Your page state then needs to track the last ck1 and last ck2 you saw.
Pages 2+ will end up needing to be up to two queries if the first query
doesn't fill the page size.

CREATE TABLE foo (
  partitionkey int,
  ck1 int,
  ck2 int,
  col1 int,
  col2 int,
  PRIMARY KEY ((partitionkey), ck1, ck2)
) WITH CLUSTERING ORDER BY (ck1 asc, ck2 desc);

INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,1,1,1);
INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,2,2,2);
INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,1,3,3,3);
INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,1,4,4);
INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,2,5,5);
INSERT INTO foo (partitionkey, ck1, ck2, col1, col2) VALUES (1,2,3,6,6);

If you're pulling the whole of partition 1 and your page size is 2, your
first page looks like:

*PAGE 1*

SELECT * FROM foo WHERE partitionkey = 1 LIMIT 2;
 partitionkey | ck1 | ck2 | col1 | col2
--+-+-+--+--
1 |   1 |   3 |3 |3
1 |   1 |   2 |2 |2

You got enough rows to satisfy the page, Your page state is taken from the
last row: (ck1=1, ck2=2)


*PAGE 2*
Notice that you have a page state, and add some limiting clauses on the
statement:

SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 1 AND ck2 < 2 LIMIT 2;
 partitionkey | ck1 | ck2 | col1 | col2
--+-+-+--+--
1 |   1 |   1 |1 |1

Oops, we didn't get enough rows to satisfy the page limit, so we need to
continue on, we just need one more:

SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 1 LIMIT 1;
 partitionkey | ck1 | ck2 | col1 | col2
--+-+-+--+--
1 |   2 |   3 |6 |6

We have enough to satisfy page 2 now, our new page state: (ck1 = 2, ck2 =
3).


*PAGE 3*

SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 3 LIMIT 2;
 partitionkey | ck1 | ck2 | col1 | col2
--+-+-+--+--
1 |   2 |   2 |5 |5
1 |   2 |   1 |4 |4

Great, we satisfied this page with only one query, page state: (ck1 = 2,
ck2 = 1).


*PAGE 4*

SELECT * FROM foo WHERE partitionkey = 1 AND ck1 = 2 AND ck2 < 1 LIMIT 2;
(0 rows)

Oops, our initial query was on the boundary of ck1, but this looks like any
other time that the initial query returns < pageSize rows, we just move on
to the next page:

SELECT * FROM foo WHERE partitionkey = 1 AND ck1 > 2 LIMIT 2;
(0 rows)

Aha, we've exhausted ck1 as well, so there are no more pages, page 3
actually pulled the last possible value; page 4 is empty, and we're all
done.  Generally speaking you know you're done when your first clustering
key is the only non-equality operator in the statement, and you got no rows
back.






On Wed, Feb 11, 2015 at 10:55 AM, Ajay  wrote:

> Basically I am trying different queries with your approach.
>
> One such query is like
>
> Select * from mycf where condition on partition key order by ck1 asc, ck2
> desc where ck1 and ck2 are clustering keys in that order.
>
> Here how do we achieve pagination support?
>
> Thanks
> Ajay
> On Feb 11, 2015 11:16 PM, "Ajay"  wrote:
>
>>
>> Hi Eric,
>>
>> Thanks for your reply.
>>
>> I am using Cassandra 2.0.11 and in that I cannot append condition like
>> last clustering key column > value of the last row in the previous batch.
>> It fails Preceding column is either not restricted or by a non-EQ relation.
>> It means I need to specify equal  condition for all preceding clustering
>> key columns. With this I cannot get the pagination correct.
>>
>> Thanks
>> Ajay
>> > I can't believe that everyone read & process all rows at once (without
>> pagination).
>>
>> Probably not too many people try to read all rows in a table as a single
>> rolling operation with a standard client driver.  But those who do would
>> use token() to keep track of where they are and be able to resume with that
>> as well.
>>
>> But it sounds like you're talking about paginating a subset of data -
>> larger than you want to process as a unit, but prefiltered by some other
>> criteria which prevents you from being able to rely on token().  For this
>> there is no general purpose solution, but it typically involves you
>> maintaining your own paging state, typically keeping track of the last
>> partitioning and clustering key seen, and using that to construct your next
>> query.
>>
>> For example, we have client queries which can span several partitioning
>> keys.  We make sure that the List of partition keys generated by a given
>> client query List(Pq) is deterministic, then our paging state is

Re: Pagination support on Java Driver Query API

2015-02-11 Thread Ajay
Basically I am trying different queries with your approach.

One such query is like

Select * from mycf where condition on partition key order by ck1 asc, ck2
desc where ck1 and ck2 are clustering keys in that order.

Here how do we achieve pagination support?

Thanks
Ajay
On Feb 11, 2015 11:16 PM, "Ajay"  wrote:

>
> Hi Eric,
>
> Thanks for your reply.
>
> I am using Cassandra 2.0.11 and in that I cannot append condition like
> last clustering key column > value of the last row in the previous batch.
> It fails Preceding column is either not restricted or by a non-EQ relation.
> It means I need to specify equal  condition for all preceding clustering
> key columns. With this I cannot get the pagination correct.
>
> Thanks
> Ajay
> > I can't believe that everyone read & process all rows at once (without
> pagination).
>
> Probably not too many people try to read all rows in a table as a single
> rolling operation with a standard client driver.  But those who do would
> use token() to keep track of where they are and be able to resume with that
> as well.
>
> But it sounds like you're talking about paginating a subset of data -
> larger than you want to process as a unit, but prefiltered by some other
> criteria which prevents you from being able to rely on token().  For this
> there is no general purpose solution, but it typically involves you
> maintaining your own paging state, typically keeping track of the last
> partitioning and clustering key seen, and using that to construct your next
> query.
>
> For example, we have client queries which can span several partitioning
> keys.  We make sure that the List of partition keys generated by a given
> client query List(Pq) is deterministic, then our paging state is the
> index offset of the final Pq in the response, plus the value of the final
> clustering column.  A query coming in with a paging state attached to it
> starts the next set of queries from the provided Pq offset where
> clusteringKey > the provided value.
>
> So if you can just track partition key offset (if spanning multiple
> partitions), and clustering key offset, you can construct your next query
> from those instead.
>
> On Tue, Feb 10, 2015 at 6:58 PM, Ajay  wrote:
>
>> Thanks Alex.
>>
>> But is there any workaround possible?. I can't believe that everyone read
>> & process all rows at once (without pagination).
>>
>> Thanks
>> Ajay
>> On Feb 10, 2015 11:46 PM, "Alex Popescu"  wrote:
>>
>>>
>>> On Tue, Feb 10, 2015 at 4:59 AM, Ajay  wrote:
>>>
>>>> 1) Java driver implicitly support Pagination in the ResultSet (using
>>>> Iterator) which can be controlled through FetchSize. But it is limited in a
>>>> way that we cannot skip or go previous. The FetchState is not exposed.
>>>
>>>
>>> Cassandra doesn't support skipping so this is not really a limitation of
>>> the driver.
>>>
>>>
>>> --
>>>
>>> [:>-a)
>>>
>>> Alex Popescu
>>> Sen. Product Manager @ DataStax
>>> @al3xandru
>>>
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to java-driver-user+unsubscr...@lists.datastax.com.
>>>
>>
>


Re: Pagination support on Java Driver Query API

2015-02-11 Thread Ajay
Hi Eric,

Thanks for your reply.

I am using Cassandra 2.0.11 and in that I cannot append condition like last
clustering key column > value of the last row in the previous batch. It
fails Preceding column is either not restricted or by a non-EQ relation. It
means I need to specify equal  condition for all preceding clustering key
columns. With this I cannot get the pagination correct.

Thanks
Ajay
> I can't believe that everyone read & process all rows at once (without
pagination).

Probably not too many people try to read all rows in a table as a single
rolling operation with a standard client driver.  But those who do would
use token() to keep track of where they are and be able to resume with that
as well.

But it sounds like you're talking about paginating a subset of data -
larger than you want to process as a unit, but prefiltered by some other
criteria which prevents you from being able to rely on token().  For this
there is no general purpose solution, but it typically involves you
maintaining your own paging state, typically keeping track of the last
partitioning and clustering key seen, and using that to construct your next
query.

For example, we have client queries which can span several partitioning
keys.  We make sure that the List of partition keys generated by a given
client query List(Pq) is deterministic, then our paging state is the index
offset of the final Pq in the response, plus the value of the final
clustering column.  A query coming in with a paging state attached to it
starts the next set of queries from the provided Pq offset where
clusteringKey > the provided value.

So if you can just track partition key offset (if spanning multiple
partitions), and clustering key offset, you can construct your next query
from those instead.

On Tue, Feb 10, 2015 at 6:58 PM, Ajay  wrote:

> Thanks Alex.
>
> But is there any workaround possible?. I can't believe that everyone read
> & process all rows at once (without pagination).
>
> Thanks
> Ajay
> On Feb 10, 2015 11:46 PM, "Alex Popescu"  wrote:
>
>>
>> On Tue, Feb 10, 2015 at 4:59 AM, Ajay  wrote:
>>
>>> 1) Java driver implicitly support Pagination in the ResultSet (using
>>> Iterator) which can be controlled through FetchSize. But it is limited in a
>>> way that we cannot skip or go previous. The FetchState is not exposed.
>>
>>
>> Cassandra doesn't support skipping so this is not really a limitation of
>> the driver.
>>
>>
>> --
>>
>> [:>-a)
>>
>> Alex Popescu
>> Sen. Product Manager @ DataStax
>> @al3xandru
>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to java-driver-user+unsubscr...@lists.datastax.com.
>>
>


Re: Pagination support on Java Driver Query API

2015-02-11 Thread Eric Stevens
> I can't believe that everyone read & process all rows at once (without
pagination).

Probably not too many people try to read all rows in a table as a single
rolling operation with a standard client driver.  But those who do would
use token() to keep track of where they are and be able to resume with that
as well.

But it sounds like you're talking about paginating a subset of data -
larger than you want to process as a unit, but prefiltered by some other
criteria which prevents you from being able to rely on token().  For this
there is no general purpose solution, but it typically involves you
maintaining your own paging state, typically keeping track of the last
partitioning and clustering key seen, and using that to construct your next
query.

For example, we have client queries which can span several partitioning
keys.  We make sure that the List of partition keys generated by a given
client query List(Pq) is deterministic, then our paging state is the index
offset of the final Pq in the response, plus the value of the final
clustering column.  A query coming in with a paging state attached to it
starts the next set of queries from the provided Pq offset where
clusteringKey > the provided value.

So if you can just track partition key offset (if spanning multiple
partitions), and clustering key offset, you can construct your next query
from those instead.

On Tue, Feb 10, 2015 at 6:58 PM, Ajay  wrote:

> Thanks Alex.
>
> But is there any workaround possible?. I can't believe that everyone read
> & process all rows at once (without pagination).
>
> Thanks
> Ajay
> On Feb 10, 2015 11:46 PM, "Alex Popescu"  wrote:
>
>>
>> On Tue, Feb 10, 2015 at 4:59 AM, Ajay  wrote:
>>
>>> 1) Java driver implicitly support Pagination in the ResultSet (using
>>> Iterator) which can be controlled through FetchSize. But it is limited in a
>>> way that we cannot skip or go previous. The FetchState is not exposed.
>>
>>
>> Cassandra doesn't support skipping so this is not really a limitation of
>> the driver.
>>
>>
>> --
>>
>> [:>-a)
>>
>> Alex Popescu
>> Sen. Product Manager @ DataStax
>> @al3xandru
>>
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to java-driver-user+unsubscr...@lists.datastax.com.
>>
>


Re: Pagination support on Java Driver Query API

2015-02-10 Thread Ajay
Thanks Alex.

But is there any workaround possible?. I can't believe that everyone read &
process all rows at once (without pagination).

Thanks
Ajay
On Feb 10, 2015 11:46 PM, "Alex Popescu"  wrote:

>
> On Tue, Feb 10, 2015 at 4:59 AM, Ajay  wrote:
>
>> 1) Java driver implicitly support Pagination in the ResultSet (using
>> Iterator) which can be controlled through FetchSize. But it is limited in a
>> way that we cannot skip or go previous. The FetchState is not exposed.
>
>
> Cassandra doesn't support skipping so this is not really a limitation of
> the driver.
>
>
> --
>
> [:>-a)
>
> Alex Popescu
> Sen. Product Manager @ DataStax
> @al3xandru
>
> To unsubscribe from this group and stop receiving emails from it, send an
> email to java-driver-user+unsubscr...@lists.datastax.com.
>


Re: Pagination support on Java Driver Query API

2015-02-10 Thread Alex Popescu
On Tue, Feb 10, 2015 at 4:59 AM, Ajay  wrote:

> 1) Java driver implicitly support Pagination in the ResultSet (using
> Iterator) which can be controlled through FetchSize. But it is limited in a
> way that we cannot skip or go previous. The FetchState is not exposed.


Cassandra doesn't support skipping so this is not really a limitation of
the driver.


-- 

[:>-a)

Alex Popescu
Sen. Product Manager @ DataStax
@al3xandru


Pagination support on Java Driver Query API

2015-02-10 Thread Ajay
Hi,

I am working on exposing the Cassandra Query APIs(Java Driver) as REST APIs
for our internal project.

To support Pagination, I looked at the Cassandra documentation, Source code
and other forums.
What I mean by pagination support is like below:

1) Client fires query to REST server
2) Server prepares the statement, caches the query and return a query id
(unique id)
3) Get the query id, offset and limit and return the set of rows according
to the offset and limit and also return the last returned row offset.
4) Client make subsequent calls to the server with the offset returned by
the server until all rows are returned. In case once call fails or times
out, the client will make a call again.

Below are the details I found:

1) Java driver implicitly support Pagination in the ResultSet (using
Iterator) which can be controlled through FetchSize. But it is limited in a
way that we cannot skip or go previous. The FetchState is not exposed.

2) Using token() function on the clustering keys of the last returned row,
we can skip the returned rows and using the LIMIT keyword, we can limit the
number of rows. But the problem I see is that the token() function cannot
be used if the query contains ORDER BY clause.

Is there any other way to achieve the pagination support?

Thanks
Ajay


Re: Internal pagination in secondary index queries

2014-12-31 Thread Sam Klock
Thanks.  I've opened the following issue to track this:

https://issues.apache.org/jira/browse/CASSANDRA-8550

SK

On 2014-12-30 11:26, Tyler Hobbs wrote:
> 
> On Mon, Dec 29, 2014 at 5:20 PM, Sam Klock  > wrote:
> 
> 
> Our investigation led us to logic in Cassandra used to paginate scans
> of rows in indexes on composites.  The issue seems to be the short
> algorithm Cassandra uses to select the size of the pages for the scan,
> partially given on the following two lines (from
> o.a.c.db.index.composites.CompositesSearcher):
> 
> private int meanColumns =
> Math.max(index.getIndexCfs().getMeanColumns(), 1);
> private int rowsPerQuery = Math.max(Math.min(filter.maxRows(),
> filter.maxColumns() / meanColumns), 2);
> 
> The value computed for rowsPerQuery appears to be the page size.
> 
> Based on our reading of the code, unless the value obtained for
> meanColumns is very small, a large query-level page size is used, or
> the DISTINCT keyword is used, the value for (filter.maxColumns() /
> meanColumns) always ends up being small enough that the page size is
> 2.  This seems to be the case both for very low-cardinality indexes
> (two different indexed values) and for indexes with higher
> cardinalities as long as the number of entries per index row is more
> than a few thousand.
> 
> Does anyone here have relevant experience with secondary indexes that
> might shed light on the design choice here?  In particular, can anyone
> (perhaps the developers?) explain what this algorithm is intended to do
> and what we might do to safely get around this limitation?
> 
> 
> Hmm, this does seem suspect.  I'm not sure off the top of my head why
> the mean columns are used at all.  Each index entry (in other words,
> each cell in the index table) should correspond to one result row, so it
> seems like the slice limit for the index table should only be based on
> maxRows/maxColumns (or perhaps better, filter.maxResults()).
> 
> Can you go ahead and open a JIRA ticket to look into this?
>  
> 
> 
> Also (to the developers watching this list): is this the sort of
> question we should be addressing to the dev list directly?
> 
> 
> Yes, you can either send a message to the dev list or open a JIRA ticket
> when you're pretty sure you've found a bug.  We don't mind confirming
> and closing a ticket if it's not a bug.
> 
> Thanks!
> 
> -- 
> Tyler Hobbs
> DataStax 



Re: Internal pagination in secondary index queries

2014-12-30 Thread Tyler Hobbs
On Mon, Dec 29, 2014 at 5:20 PM, Sam Klock  wrote:
>
>
> Our investigation led us to logic in Cassandra used to paginate scans
> of rows in indexes on composites.  The issue seems to be the short
> algorithm Cassandra uses to select the size of the pages for the scan,
> partially given on the following two lines (from
> o.a.c.db.index.composites.CompositesSearcher):
>
> private int meanColumns = Math.max(index.getIndexCfs().getMeanColumns(),
> 1);
> private int rowsPerQuery = Math.max(Math.min(filter.maxRows(),
> filter.maxColumns() / meanColumns), 2);
>
> The value computed for rowsPerQuery appears to be the page size.
>
> Based on our reading of the code, unless the value obtained for
> meanColumns is very small, a large query-level page size is used, or
> the DISTINCT keyword is used, the value for (filter.maxColumns() /
> meanColumns) always ends up being small enough that the page size is
> 2.  This seems to be the case both for very low-cardinality indexes
> (two different indexed values) and for indexes with higher
> cardinalities as long as the number of entries per index row is more
> than a few thousand.
>
> Does anyone here have relevant experience with secondary indexes that
> might shed light on the design choice here?  In particular, can anyone
> (perhaps the developers?) explain what this algorithm is intended to do
> and what we might do to safely get around this limitation?
>

Hmm, this does seem suspect.  I'm not sure off the top of my head why the
mean columns are used at all.  Each index entry (in other words, each cell
in the index table) should correspond to one result row, so it seems like
the slice limit for the index table should only be based on
maxRows/maxColumns (or perhaps better, filter.maxResults()).

Can you go ahead and open a JIRA ticket to look into this?


>
> Also (to the developers watching this list): is this the sort of
> question we should be addressing to the dev list directly?


Yes, you can either send a message to the dev list or open a JIRA ticket
when you're pretty sure you've found a bug.  We don't mind confirming and
closing a ticket if it's not a bug.

Thanks!

-- 
Tyler Hobbs
DataStax 


Re: Internal pagination in secondary index queries

2014-12-29 Thread Jonathan Haddad
Secondary indexes are there for convenience, not performance.  If you're
looking for something performant, you'll need to maintain your own indexes.


On Mon Dec 29 2014 at 3:22:58 PM Sam Klock  wrote:

> Hi folks,
>
> Perhaps this is a question better addressed to the Cassandra developers
> directly, but I thought I'd ask it here first.  We've recently been
> benchmarking certain uses of secondary indexes in Cassandra 2.1.x, and
> we've noticed that when the number of items in an index reaches beyond
> some threshold (perhaps several tens of thousands depending on the
> cardinality) performance begins to degrade substantially.  This is
> particularly the case when the client does things it probably shouldn't
> do (like manually paginate results), but we suspect there's at least
> one issue in Cassandra having an impact here that we'd like to
> understand better.
>
> Our investigation led us to logic in Cassandra used to paginate scans
> of rows in indexes on composites.  The issue seems to be the short
> algorithm Cassandra uses to select the size of the pages for the scan,
> partially given on the following two lines (from
> o.a.c.db.index.composites.CompositesSearcher):
>
> private int meanColumns = Math.max(index.getIndexCfs().getMeanColumns(),
> 1);
> private int rowsPerQuery = Math.max(Math.min(filter.maxRows(),
> filter.maxColumns() / meanColumns), 2);
>
> The value computed for rowsPerQuery appears to be the page size.
>
> Based on our reading of the code, unless the value obtained for
> meanColumns is very small, a large query-level page size is used, or
> the DISTINCT keyword is used, the value for (filter.maxColumns() /
> meanColumns) always ends up being small enough that the page size is
> 2.  This seems to be the case both for very low-cardinality indexes
> (two different indexed values) and for indexes with higher
> cardinalities as long as the number of entries per index row is more
> than a few thousand.
>
> The fact that we consistently get such a small page size appears to
> have a substantial impact on performance.  The overhead is simply
> devastating, especially since it looks like the pages are likely to
> overlap with each other (the last element of one page is the first
> element of the next).  To wit: if we fix the index page size in code to
> a very large number, index queries in our environment that prior
> required over two minutes to complete can finish in under ten seconds.
>
> Some (but probably not this much) overhead might be acceptable if the
> algorithm is intended to achieve other worthy goals (safety?).  But
> what's puzzling to us is that we can't figure out what it's intended to
> do.  We suspect the algorithm is simply buggy, but we'd like insight
> from knowledgeable parties before we draw that conclusion and try to
> find a different solution.
>
> Does anyone here have relevant experience with secondary indexes that
> might shed light on the design choice here?  In particular, can anyone
> (perhaps the developers?) explain what this algorithm is intended to do
> and what we might do to safely get around this limitation?
>
> Also (to the developers watching this list): is this the sort of
> question we should be addressing to the dev list directly?
>
> Thanks,
> SK
>


Internal pagination in secondary index queries

2014-12-29 Thread Sam Klock
Hi folks,

Perhaps this is a question better addressed to the Cassandra developers 
directly, but I thought I'd ask it here first.  We've recently been 
benchmarking certain uses of secondary indexes in Cassandra 2.1.x, and 
we've noticed that when the number of items in an index reaches beyond 
some threshold (perhaps several tens of thousands depending on the  
cardinality) performance begins to degrade substantially.  This is  
particularly the case when the client does things it probably shouldn't 
do (like manually paginate results), but we suspect there's at least  
one issue in Cassandra having an impact here that we'd like to  
understand better.

Our investigation led us to logic in Cassandra used to paginate scans 
of rows in indexes on composites.  The issue seems to be the short 
algorithm Cassandra uses to select the size of the pages for the scan, 
partially given on the following two lines (from 
o.a.c.db.index.composites.CompositesSearcher):

private int meanColumns = Math.max(index.getIndexCfs().getMeanColumns(), 1);
private int rowsPerQuery = Math.max(Math.min(filter.maxRows(), 
filter.maxColumns() / meanColumns), 2);

The value computed for rowsPerQuery appears to be the page size.

Based on our reading of the code, unless the value obtained for 
meanColumns is very small, a large query-level page size is used, or 
the DISTINCT keyword is used, the value for (filter.maxColumns() / 
meanColumns) always ends up being small enough that the page size is 
2.  This seems to be the case both for very low-cardinality indexes 
(two different indexed values) and for indexes with higher 
cardinalities as long as the number of entries per index row is more 
than a few thousand.

The fact that we consistently get such a small page size appears to 
have a substantial impact on performance.  The overhead is simply 
devastating, especially since it looks like the pages are likely to 
overlap with each other (the last element of one page is the first 
element of the next).  To wit: if we fix the index page size in code to 
a very large number, index queries in our environment that prior 
required over two minutes to complete can finish in under ten seconds.

Some (but probably not this much) overhead might be acceptable if the 
algorithm is intended to achieve other worthy goals (safety?).  But 
what's puzzling to us is that we can't figure out what it's intended to 
do.  We suspect the algorithm is simply buggy, but we'd like insight 
from knowledgeable parties before we draw that conclusion and try to 
find a different solution.

Does anyone here have relevant experience with secondary indexes that 
might shed light on the design choice here?  In particular, can anyone 
(perhaps the developers?) explain what this algorithm is intended to do 
and what we might do to safely get around this limitation?

Also (to the developers watching this list): is this the sort of 
question we should be addressing to the dev list directly?

Thanks,
SK


Re: Cassandra (2.0.4) pagination and total records?

2014-03-18 Thread DuyHai Doan
With Cassandra 2.0.x and Java driver 2.0.0 you can set the fetch size on
the query and then use Resulset.iterator(). It will iterate over your data
set by loading batches of size = fetch size
Le 18 mars 2014 01:39, "Philip G"  a écrit :

> Thanks for the links.
>
> As I'm messing around with CQL, I'm realizing Cassandra isn't going to do
> what I need. Quite simply, here's a basic layout of my table:
>
> myTable (
> visit_dt timestamp,
> cid ascii,
> company text,
> // ... other stuff
>primary key (visit_dt, cid)
> );
> index on (company)
>
> My query starts off with visit_dt IN ('2014-01-17'). In Cassandra, I
> essentially get back just 1 wide row (but shows as many within CQL3). I can
> filter that via AND company='my company' due to the index. However, if I
> LIMIT 10; there isn't a way to get "the next 10" records as token() only
> works on the partition key, and each row has the same partition key.
>
> Or am I missing something? Is there a way I've not discovered to get "the
> next 10" on a single wide row?
>
>
> ---
> Philip
> g...@gpcentre.net
> http://www.gpcentre.net/
>
>
> On Mon, Mar 17, 2014 at 5:12 PM, Tupshin Harper wrote:
>
>> Read the automatic paging portion of this post :
>> http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0
>> On Mar 17, 2014 8:09 PM, "Philip G"  wrote:
>>
>>> On Mon, Mar 17, 2014 at 4:54 PM, Robert Coli wrote:
>>>
 The form of your question suggests you are Doing It Wrong, FWIW.

>>>
>>>
>>> Okay, let me ask different question: how do you go about data browsing
>>> in a CQL3 table? Especially in situations were a single query could return
>>> a couple thousand records, and we want to limit it by a 100 at a time.
>>>
>>> Please, feel free to point me in the right direction, if necessary. I
>>> admit I'm still figuring out Cassandra/CQL. But my knowledge has been
>>> exponentially expanding on a daily basis. I want to understand this more,
>>> and possible solution to problems I'm running into migrating from a RDBMS
>>> (mssql) to Cassandra. I've figured out a lot of stuff, but have not quite
>>> resolved this use-case.
>>>
>>> Thanks,
>>>
>>> ---
>>> Philip
>>> g...@gpcentre.net
>>> http://www.gpcentre.net/
>>>
>>
>


Re: Cassandra (2.0.4) pagination and total records?

2014-03-17 Thread Philip G
Thanks for the links.

As I'm messing around with CQL, I'm realizing Cassandra isn't going to do
what I need. Quite simply, here's a basic layout of my table:

myTable (
visit_dt timestamp,
cid ascii,
company text,
// ... other stuff
   primary key (visit_dt, cid)
);
index on (company)

My query starts off with visit_dt IN ('2014-01-17'). In Cassandra, I
essentially get back just 1 wide row (but shows as many within CQL3). I can
filter that via AND company='my company' due to the index. However, if I
LIMIT 10; there isn't a way to get "the next 10" records as token() only
works on the partition key, and each row has the same partition key.

Or am I missing something? Is there a way I've not discovered to get "the
next 10" on a single wide row?


---
Philip
g...@gpcentre.net
http://www.gpcentre.net/


On Mon, Mar 17, 2014 at 5:12 PM, Tupshin Harper  wrote:

> Read the automatic paging portion of this post :
> http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0
> On Mar 17, 2014 8:09 PM, "Philip G"  wrote:
>
>> On Mon, Mar 17, 2014 at 4:54 PM, Robert Coli wrote:
>>
>>> The form of your question suggests you are Doing It Wrong, FWIW.
>>>
>>
>>
>> Okay, let me ask different question: how do you go about data browsing in
>> a CQL3 table? Especially in situations were a single query could return a
>> couple thousand records, and we want to limit it by a 100 at a time.
>>
>> Please, feel free to point me in the right direction, if necessary. I
>> admit I'm still figuring out Cassandra/CQL. But my knowledge has been
>> exponentially expanding on a daily basis. I want to understand this more,
>> and possible solution to problems I'm running into migrating from a RDBMS
>> (mssql) to Cassandra. I've figured out a lot of stuff, but have not quite
>> resolved this use-case.
>>
>> Thanks,
>>
>> ---
>> Philip
>> g...@gpcentre.net
>> http://www.gpcentre.net/
>>
>


Re: Cassandra (2.0.4) pagination and total records?

2014-03-17 Thread Tupshin Harper
Read the automatic paging portion of this post :
http://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0
On Mar 17, 2014 8:09 PM, "Philip G"  wrote:

> On Mon, Mar 17, 2014 at 4:54 PM, Robert Coli  wrote:
>
>> The form of your question suggests you are Doing It Wrong, FWIW.
>>
>
>
> Okay, let me ask different question: how do you go about data browsing in
> a CQL3 table? Especially in situations were a single query could return a
> couple thousand records, and we want to limit it by a 100 at a time.
>
> Please, feel free to point me in the right direction, if necessary. I
> admit I'm still figuring out Cassandra/CQL. But my knowledge has been
> exponentially expanding on a daily basis. I want to understand this more,
> and possible solution to problems I'm running into migrating from a RDBMS
> (mssql) to Cassandra. I've figured out a lot of stuff, but have not quite
> resolved this use-case.
>
> Thanks,
>
> ---
> Philip
> g...@gpcentre.net
> http://www.gpcentre.net/
>


Re: Cassandra (2.0.4) pagination and total records?

2014-03-17 Thread Iain Finlayson
Hi Philip,

Read this blog post

http://www.wentnet.com/blog/?p=24

It talks about COUNT but might give some answers to your first question.

On Monday, March 17, 2014, Philip G  wrote:

> On Mon, Mar 17, 2014 at 4:54 PM, Robert Coli 
> 
> > wrote:
>
>> The form of your question suggests you are Doing It Wrong, FWIW.
>>
>
>
> Okay, let me ask different question: how do you go about data browsing in
> a CQL3 table? Especially in situations were a single query could return a
> couple thousand records, and we want to limit it by a 100 at a time.
>
> Please, feel free to point me in the right direction, if necessary. I
> admit I'm still figuring out Cassandra/CQL. But my knowledge has been
> exponentially expanding on a daily basis. I want to understand this more,
> and possible solution to problems I'm running into migrating from a RDBMS
> (mssql) to Cassandra. I've figured out a lot of stuff, but have not quite
> resolved this use-case.
>
> Thanks,
>
> ---
> Philip
> g...@gpcentre.net 
> http://www.gpcentre.net/
>


-- 
Iain Finlayson
Solutions Engineer
718-483-6427
i...@datastax.com






Re: Cassandra (2.0.4) pagination and total records?

2014-03-17 Thread Philip G
On Mon, Mar 17, 2014 at 4:54 PM, Robert Coli  wrote:

> The form of your question suggests you are Doing It Wrong, FWIW.
>


Okay, let me ask different question: how do you go about data browsing in a
CQL3 table? Especially in situations were a single query could return a
couple thousand records, and we want to limit it by a 100 at a time.

Please, feel free to point me in the right direction, if necessary. I admit
I'm still figuring out Cassandra/CQL. But my knowledge has been
exponentially expanding on a daily basis. I want to understand this more,
and possible solution to problems I'm running into migrating from a RDBMS
(mssql) to Cassandra. I've figured out a lot of stuff, but have not quite
resolved this use-case.

Thanks,

---
Philip
g...@gpcentre.net
http://www.gpcentre.net/


Re: Cassandra (2.0.4) pagination and total records?

2014-03-17 Thread Robert Coli
On Mon, Mar 17, 2014 at 4:27 PM, Philip G  wrote:

> Is there a way to get "total number of records" when working with limit
> and internal pagination? Every example I found online was purely about
> using LIMIT and sort_col > token(sort_col). Nothing about getting the total
> matching records.
>

The form of your question suggests you are Doing It Wrong, FWIW.

=Rob


Cassandra (2.0.4) pagination and total records?

2014-03-17 Thread Philip G
Is there a way to get "total number of records" when working with limit and
internal pagination? Every example I found online was purely about using
LIMIT and sort_col > token(sort_col). Nothing about getting the total
matching records.

(PS: if there's a better group to ask CQL questions, please let me know.
Thanks)

---
Philip
g...@gpcentre.net
http://www.gpcentre.net/


Re: Astyanax - multiple key search with pagination

2013-12-30 Thread Aaron Morton
You will need to paginate the list of keys to read in your app. 

Cheers


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 21/12/2013, at 12:58 pm, Parag Patel  wrote:

> Hi,
>  
> I’m using Astyanax and trying to do search for multiple keys with pagination. 
>  I tried “.getKeySlice” with a list a of primary keys, but it doesn’t allow 
> pagination.  Does anyone know how to tackle this issue with Astyanax?
>  
> Parag



Re: Cassandra pytho pagination

2013-12-23 Thread Aaron Morton
> Is there something wrong with it? Here 1234555665_53323232 and 
> 2344555665_53323232 are super columns. Also, If I have to represent this data 
> with new composite comparator, How will I accomplish that?
> 
> 

Composite types via pycassa 
http://pycassa.github.io/pycassa/assorted/composite_types.html?highlight=composite

Create a composite of where the super column value is the first part and the 
second part is the column name, this is basically what cql3 does. 

You will have to make all columns the same type though.

Or use CQL 3, it works well for these sorts of models. 

Cheers


-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 20/12/2013, at 7:22 am, Kumar Ranjan  wrote:

> Rob - I got a question following your advice. This is how, I define my column 
> family 
> validators = {
> 
> 'approved':'UTF8Type',
> 
> 'tid': 'UTF8Type',
> 
> 'iid': 'UTF8Type',
> 
> 'score':   'IntegerType',
> 
> 'likes':   'IntegerType',
> 
> 'retweet': 'IntegerType',
> 
> 'favorite':'IntegerType',
> 
> 'screen_name': 'UTF8Type',
> 
> 'created_date':'UTF8Type',
> 
> 'expanded_url':'UTF8Type',
> 
> 'embedly_data':'BytesType',
> 
> }
> 
> SYSTEM_MANAGER.create_column_family('KeySpaceNNN', 'Twitter_Instagram', 
> default_validation_class='UTF8Type', super=True, comparator='UTF8Type', 
> key_validation_class='UTF8Type', column_validation_classes=validator)
> 
> Actual data representation:
> 
> 'row_key': {'1234555665_53323232': {'approved': 'false', 'tid': 123,  'iid': 
> 34, 'score': 2, likes: 50, retweets: 45, favorite: 34, 
> screen_name:'goodname'},
> 
> '2344555665_53323232': {'approved': 'false', 'tid': 134,  
> 'iid': 34, 'score': 2, likes: 50, retweets: 45, favorite: 34, 
> screen_name:'newname'}.
> 
> .
> 
>}
> 
> Is there something wrong with it? Here 1234555665_53323232 and 
> 2344555665_53323232 are super columns. Also, If I have to represent this data 
> with new composite comparator, How will I accomplish that?
> 
> 
> 
> Please let me know.
> 
> 
> 
> Regards.
> 
> 
> 
> On Wed, Dec 18, 2013 at 5:32 PM, Robert Coli  wrote:
> On Wed, Dec 18, 2013 at 1:28 PM, Kumar Ranjan  wrote:
> Second approach ( I used in production ):
> - fetch all super columns for a row key
> 
> Stock response mentioning that super columns are anti-advised for use, 
> especially in brand new code.
> 
> =Rob
>  
> 



Re: Cassandra pytho pagination

2013-12-23 Thread Aaron Morton
> First approach:

Sounds good. 

> Second approach ( I used in production ):
If the row gets big enough this will have bad performance. 

A

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 19/12/2013, at 10:28 am, Kumar Ranjan  wrote:

> I am using pycassa. So, here is how I solved this issue. Will discuss 2 
> approaches. First approach didn't work out for me. Thanks Aaron for your 
> attention.
> 
> First approach:
> - Say if column_count = 10
> - collect first 11 rows, sort first 10, send it to user (front end) as JSON 
> object and last=11th_column
> - User then calls for page 2, with prev = 1st_column_id, column_start = 
> 11th_column and column_count = 10
> - This way, I can traverse, next page and previous page.
> - Only issue with this approach is, I don't have all columns in super column 
> sorted. So this did not work.
> 
> Second approach ( I used in production ):
> - fetch all super columns for a row key
> - Sort this in python using sorted and lambda function based on column values.
> - Once sorted, I prepare buckets and each bucked size is of page size/column 
> count. Also filter out any rogue data if needed
> - Store page by page results in Redis with keys such as 
> 'row_key|page_1|super_column' and keep refreshing redis periodically.
> 
> I am sure, there must be a better and brighter approach but for now, 2nd 
> approach is working. Thoughts ??
> 
> 
> 
> On Tue, Dec 17, 2013 at 9:19 PM, Aaron Morton  wrote:
> CQL3 and thrift do not support an offset clause, so you can only really 
> support next / prev page calls to the database. 
> 
>> I am trying to use xget with column_count and buffer_size parameters. Can 
>> someone explain me, how does it work? From doc, my understanding is that, I 
>> can do something like,
> What client are you using ? 
> xget is not a standard cassandra function. 
> 
> Cheers
> 
> -
> Aaron Morton
> New Zealand
> @aaronmorton
> 
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> On 13/12/2013, at 4:56 am, Kumar Ranjan  wrote:
> 
>> Hey Folks,
>> 
>> I need some ideas about support implementing of pagination on the browser, 
>> from the backend. So python code (backend) gets request from frontend with 
>> page=1,2,3,4 and so on and count_per_page=50. 
>> 
>> I am trying to use xget with column_count and buffer_size parameters. Can 
>> someone explain me, how does it work? From doc, my understanding is that, I 
>> can do something like,
>> 
>> 
>> total_cols is total columns for that key.
>> count is what user sends me. 
>> .xget('Twitter_search', hh, column_count=total_cols, buffer_size=count):
>> 
>> Is my understanding correct? because its not working for page 2 and so on? 
>> Please enlighten me with suggestions.
>> 
>> Thanks.
>> 
> 
> 



Astyanax - multiple key search with pagination

2013-12-20 Thread Parag Patel
Hi,

I'm using Astyanax and trying to do search for multiple keys with pagination.  
I tried ".getKeySlice" with a list a of primary keys, but it doesn't allow 
pagination.  Does anyone know how to tackle this issue with Astyanax?

Parag


Re: Cassandra pytho pagination

2013-12-19 Thread Kumar Ranjan
Rob - I got a question following your advice. This is how, I define my
column family

validators = {

'approved':'UTF8Type',

'tid': 'UTF8Type',

'iid': 'UTF8Type',

'score':   'IntegerType',

'likes':   'IntegerType',

'retweet': 'IntegerType',

'favorite':'IntegerType',

'screen_name': 'UTF8Type',

'created_date':'UTF8Type',

'expanded_url':'UTF8Type',

'embedly_data':'BytesType',

}

SYSTEM_MANAGER.create_column_family('KeySpaceNNN', 'Twitter_Instagram',
default_validation_class='UTF8Type', super=True, comparator='UTF8Type',
key_validation_class='UTF8Type', column_validation_classes=validator)

Actual data representation:

'row_key': {'1234555665_53323232': {'approved': 'false', 'tid':
123,  'iid': 34, 'score': 2, likes: 50, retweets: 45, favorite: 34,
screen_name:'goodname'},

'2344555665_53323232': {'approved': 'false', 'tid':
134,  'iid': 34, 'score': 2, likes: 50, retweets: 45, favorite: 34,
screen_name:'newname'}.

.

   }

Is there something wrong with it? Here 1234555665_53323232 and
2344555665_53323232 are super columns. Also, If I have to represent this
data with new composite comparator, How will I accomplish that?


Please let me know.


Regards.


On Wed, Dec 18, 2013 at 5:32 PM, Robert Coli  wrote:

> On Wed, Dec 18, 2013 at 1:28 PM, Kumar Ranjan wrote:
>
>> Second approach ( I used in production ):
>> - fetch all super columns for a row key
>>
>
> Stock response mentioning that super columns are anti-advised for use,
> especially in brand new code.
>
> =Rob
>
>


Re: Cassandra pytho pagination

2013-12-18 Thread Robert Coli
On Wed, Dec 18, 2013 at 1:28 PM, Kumar Ranjan  wrote:

> Second approach ( I used in production ):
> - fetch all super columns for a row key
>

Stock response mentioning that super columns are anti-advised for use,
especially in brand new code.

=Rob


Re: Cassandra pytho pagination

2013-12-18 Thread Kumar Ranjan
I am using pycassa. So, here is how I solved this issue. Will discuss 2
approaches. First approach didn't work out for me. Thanks Aaron for your
attention.

First approach:
- Say if column_count = 10
- collect first 11 rows, sort first 10, send it to user (front end) as JSON
object and last=11th_column
- User then calls for page 2, with prev = 1st_column_id, column_start =
11th_column and column_count = 10
- This way, I can traverse, next page and previous page.
- Only issue with this approach is, I don't have all columns in super
column sorted. So this did not work.

Second approach ( I used in production ):
- fetch all super columns for a row key
- Sort this in python using sorted and lambda function based on column
values.
- Once sorted, I prepare buckets and each bucked size is of page
size/column count. Also filter out any rogue data if needed
- Store page by page results in Redis with keys such as
'row_key|page_1|super_column' and keep refreshing redis periodically.

I am sure, there must be a better and brighter approach but for now, 2nd
approach is working. Thoughts ??



On Tue, Dec 17, 2013 at 9:19 PM, Aaron Morton wrote:

> CQL3 and thrift do not support an offset clause, so you can only really
> support next / prev page calls to the database.
>
> I am trying to use xget with column_count and buffer_size parameters. Can
> someone explain me, how does it work? From doc, my understanding is that, I
> can do something like,
>
> What client are you using ?
> xget is not a standard cassandra function.
>
> Cheers
>
> -
> Aaron Morton
> New Zealand
> @aaronmorton
>
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On 13/12/2013, at 4:56 am, Kumar Ranjan  wrote:
>
> Hey Folks,
>
> I need some ideas about support implementing of pagination on the browser,
> from the backend. So python code (backend) gets request from frontend with
> page=1,2,3,4 and so on and count_per_page=50.
>
> I am trying to use xget with column_count and buffer_size parameters. Can
> someone explain me, how does it work? From doc, my understanding is that, I
> can do something like,
>
>
> total_cols is total columns for that key.
> count is what user sends me.
>
> .*xget*('Twitter_search', hh, column_count=total_cols, buffer_size=count):
>
> Is my understanding correct? because its not working for page 2 and so on?
> Please enlighten me with suggestions.
>
> Thanks.
>
>
>


Re: Cassandra pytho pagination

2013-12-17 Thread Aaron Morton
CQL3 and thrift do not support an offset clause, so you can only really support 
next / prev page calls to the database. 

> I am trying to use xget with column_count and buffer_size parameters. Can 
> someone explain me, how does it work? From doc, my understanding is that, I 
> can do something like,
What client are you using ? 
xget is not a standard cassandra function. 

Cheers

-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 13/12/2013, at 4:56 am, Kumar Ranjan  wrote:

> Hey Folks,
> 
> I need some ideas about support implementing of pagination on the browser, 
> from the backend. So python code (backend) gets request from frontend with 
> page=1,2,3,4 and so on and count_per_page=50. 
> 
> I am trying to use xget with column_count and buffer_size parameters. Can 
> someone explain me, how does it work? From doc, my understanding is that, I 
> can do something like,
> 
> 
> total_cols is total columns for that key.
> count is what user sends me. 
> .xget('Twitter_search', hh, column_count=total_cols, buffer_size=count):
> 
> Is my understanding correct? because its not working for page 2 and so on? 
> Please enlighten me with suggestions.
> 
> Thanks.
> 



Cassandra pytho pagination

2013-12-12 Thread Kumar Ranjan
Hey Folks,

I need some ideas about support implementing of pagination on the browser,
from the backend. So python code (backend) gets request from frontend with
page=1,2,3,4 and so on and count_per_page=50.

I am trying to use xget with column_count and buffer_size parameters. Can
someone explain me, how does it work? From doc, my understanding is that, I
can do something like,


total_cols is total columns for that key.
count is what user sends me.

.*xget*('Twitter_search', hh, column_count=total_cols, buffer_size=count):

Is my understanding correct? because its not working for page 2 and so on?
Please enlighten me with suggestions.

Thanks.


Re: pagination in cql3

2013-05-24 Thread Sylvain Lebresne
The short answer is yes, you can rely on the ordering of keys being
consistent. They will always be returned in partitioner order.
This is pretty much implied by the existence of the token() function so
it's not going to change (if only because changing it would break people).

--
Sylvain


On Fri, May 24, 2013 at 4:52 PM, Ondřej Černoš  wrote:

> Hi all,
>
> I need to support a legacy API where page offset and limit are on the
> input of the API call (it used to be mapped directly to offset and limit
> MySQL select options). The data are pretty small (like really small,
> some hundreds of thousands narrow rows maximum - I use Cassandra for its
> multiple-dc and HA capabilities, not for "big data").
>
> I know the token(key) function and its use for paging, but unfortunately I
> cannot change the API to a version where last key on previous page and
> limit would be provided.
>
> What I thought I would do - though it is violating good Cassandra
> practices like "don't fetch all keys"  - is the following:
>
> select _key_ from table limit _offset_value_;
> select _columns_ from table where token(_key_) >
> token(_last_key_from_the_select_above_);
>
> The first select tells me where the offset begins and the second one
> queries for the page. The paged queries will not be performed too often, so
> performance is not such a big deal here.
>
> This construct however depends on repeatable ordering of keys returned
> from the select key from table query. I don't care about the ordering, but
> I need to know it is actually ordered by key tokens. Afaik it should be so
> (SSTs are ordered this way, the coordinator merges the data from queried
> nodes, ssts and memtables - I suppose it all preserves the order), but I
> don't know if it really works this way and if it is "documented" so that I
> can rely on it.
>
> Or should it be done some other way?
>
> Thanks,
>
> Ondrej Cernos
>


pagination in cql3

2013-05-24 Thread Ondřej Černoš
Hi all,

I need to support a legacy API where page offset and limit are on the input
of the API call (it used to be mapped directly to offset and limit MySQL
select options). The data are pretty small (like really small,
some hundreds of thousands narrow rows maximum - I use Cassandra for its
multiple-dc and HA capabilities, not for "big data").

I know the token(key) function and its use for paging, but unfortunately I
cannot change the API to a version where last key on previous page and
limit would be provided.

What I thought I would do - though it is violating good Cassandra practices
like "don't fetch all keys"  - is the following:

select _key_ from table limit _offset_value_;
select _columns_ from table where token(_key_) >
token(_last_key_from_the_select_above_);

The first select tells me where the offset begins and the second one
queries for the page. The paged queries will not be performed too often, so
performance is not such a big deal here.

This construct however depends on repeatable ordering of keys returned from
the select key from table query. I don't care about the ordering, but I
need to know it is actually ordered by key tokens. Afaik it should be so
(SSTs are ordered this way, the coordinator merges the data from queried
nodes, ssts and memtables - I suppose it all preserves the order), but I
don't know if it really works this way and if it is "documented" so that I
can rely on it.

Or should it be done some other way?

Thanks,

Ondrej Cernos


Re: Pagination over row Keys in Cassandra using Kundera/CQL queries

2013-01-14 Thread aaron morton
Its the same idea. 

If you want to get 50 columns ask for 51, iterate over the first 50 and use the 
51st as the first column for the next page. If you get < 51 column then you are 
at the end of the page. 

I've not used Kundera so cannot talk about specifics. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 11/01/2013, at 7:20 AM, Snehal Nagmote  wrote:

> Thank you Aaron , that link helps.
> 
> However, In my application ,  I am using jpa(Kundera)  to query cassandra.
> 
> Is there a way to achieve this in cql or jpa query language? 
> 
> Thanks,
> Snehal
> 
> On 9 January 2013 16:28, aaron morton  wrote:
> Try this http://wiki.apache.org/cassandra/FAQ#iter_world
> 
> Take a look at the code examples it points to. 
> 
> Cheers
> 
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 10/01/2013, at 11:55 AM, Snehal Nagmote  wrote:
> 
>> Hello All,
>> 
>> I am using Kundera 2.0.7 and Cassandra 1.0.8. I need to implement batching/ 
>> pagination over row keys.  
>> 
>> for instance, Scan columnfamily , get 100 records in batch everytime , till 
>> all keys are exhausted.
>> 
>> I am using random partitioner for keyspace. I explored  limit option in cql 
>> and ,setMaxresults() , but It doesnt give me ability to do range scan over 
>> row keys . 
>> 
>> One option I can think of is Storing those row keys in separate column 
>> family as columns and do range queries on column.
>> 
>> Is there any best practice to achieve this ?
>> 
>> Any help ?
>> 
>> 
>> Thanks,
>> Snehal
>> 
> 
> 



Re: Pagination over row Keys in Cassandra using Kundera/CQL queries

2013-01-10 Thread Snehal Nagmote
Thank you Aaron , that link helps.

However, In my application ,  I am using jpa(Kundera)  to query cassandra.

Is there a way to achieve this in cql or jpa query language?

Thanks,
Snehal

On 9 January 2013 16:28, aaron morton  wrote:

> Try this http://wiki.apache.org/cassandra/FAQ#iter_world
>
> Take a look at the code examples it points to.
>
> Cheers
>
> -
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 10/01/2013, at 11:55 AM, Snehal Nagmote 
> wrote:
>
> Hello All,
>
> I am using Kundera 2.0.7 and Cassandra 1.0.8. I need to implement
> batching/ pagination over row keys.
>
> for instance, Scan columnfamily , get 100 records in batch everytime ,
> till all keys are exhausted.
>
> I am using random partitioner for keyspace. I explored  limit option in
> cql and ,setMaxresults() , but It doesnt give me ability to do range scan
> over row keys .
>
> One option I can think of is Storing those row keys in separate column
> family as columns and do range queries on column.
>
> Is there any best practice to achieve this ?
>
> Any help ?
>
>
> Thanks,
> Snehal
>
>
>


Re: Pagination over row Keys in Cassandra using Kundera/CQL queries

2013-01-09 Thread aaron morton
Try this http://wiki.apache.org/cassandra/FAQ#iter_world

Take a look at the code examples it points to. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/01/2013, at 11:55 AM, Snehal Nagmote  wrote:

> Hello All,
> 
> I am using Kundera 2.0.7 and Cassandra 1.0.8. I need to implement batching/ 
> pagination over row keys.  
> 
> for instance, Scan columnfamily , get 100 records in batch everytime , till 
> all keys are exhausted.
> 
> I am using random partitioner for keyspace. I explored  limit option in cql 
> and ,setMaxresults() , but It doesnt give me ability to do range scan over 
> row keys . 
> 
> One option I can think of is Storing those row keys in separate column family 
> as columns and do range queries on column.
> 
> Is there any best practice to achieve this ?
> 
> Any help ?
> 
> 
> Thanks,
> Snehal
> 



Pagination over row Keys in Cassandra using Kundera/CQL queries

2013-01-09 Thread Snehal Nagmote
Hello All,

I am using Kundera 2.0.7 and Cassandra 1.0.8. I need to implement batching/
pagination over row keys.

for instance, Scan columnfamily , get 100 records in batch everytime , till
all keys are exhausted.

I am using random partitioner for keyspace. I explored  limit option in cql
and ,setMaxresults() , but It doesnt give me ability to do range scan over
row keys .

One option I can think of is Storing those row keys in separate column
family as columns and do range queries on column.

Is there any best practice to achieve this ?

Any help ?


Thanks,
Snehal


RE: Pagination

2012-11-27 Thread Sam Hodgson

Well I know what you mean and i have been doing that however im currently 
migrating an old mysql site onto cass and just trying to keep things consistent 
on the front end for the guy, i thought i might be missing a trick but if not 
then yeah I may well ditch the page linkage if it starts causing problems.
Cheers
Sam

Date: Tue, 27 Nov 2012 13:01:48 -0700
Subject: Re: Pagination
From: de...@fyrie.net
To: user@cassandra.apache.org

Do you really require page numbers? I usually find them annoying while paging 
through a forum, especially if it is quite active. Threads from the bottom of 
the page get bumped to the next page so you end up seeing the same content 
again. I much prefer the first page being the current N results, and the next 
page being the next N results after the last updated time of the last thread on 
the page. It is also much easier to model with Cassandra.


On Tue, Nov 27, 2012 at 12:19 PM, Sam Hodgson  wrote:





Hi All,
Wondering if anyone has any good solutions to pagination? In particular 
enumerating the number of pages and linking to each page, a common feature in 
forums.

This code is untested (using phpcassa) and may need tweaking to get the correct 
range of records or maybe completely wrong! however it shows the concept of 
taking a page number then pulling out a range of posts belonging to that page:

$cf_threads looks like:thread_ID => (timestamp => post_id)
if($page > 1){$ranger = ($pagenumber * 20);$low_ranger 
= $ranger - 20;
$arr_range = $cf_threads->get("$thread_id" , $columns=null ,  
$column_start="" , $column_finish="" , $column_reversed=True, $limit=$ranger);  
  $arr_page = array_slice($arr_range, $low_ranger , $ranger , TRUE);
}else{$arr_page = 
$cf_threads->get("$thread_id" , $columns=null ,  $column_start="" , 
$column_finish="" , $column_reversed=True, $limit=20);
}
I think this should be ok? the only concern is if there are some really long 
threads when im having to pull the entire CF.
Another idea involved a schema change and using a super CF to include a page 
number as follows:

Thread_ID => (PageNumber(timestamp => Post_ID))
Probably more efficient but generally page numbers go backwards ie page 1 has 
newest content so this would complicate things when writing data and cause load 
if logic was included to reorganise page numbers etc.

Cheers
Samhttp://Newsarc.net 


-- 
Derek Williams

  

Re: Pagination

2012-11-27 Thread Derek Williams
Do you really require page numbers? I usually find them annoying while
paging through a forum, especially if it is quite active. Threads from the
bottom of the page get bumped to the next page so you end up seeing the
same content again. I much prefer the first page being the current N
results, and the next page being the next N results after the last updated
time of the last thread on the page. It is also much easier to model with
Cassandra.


On Tue, Nov 27, 2012 at 12:19 PM, Sam Hodgson wrote:

>  Hi All,
>
> Wondering if anyone has any good solutions to pagination? In particular
> enumerating the number of pages and linking to each page, a common feature
> in forums.
>
> This code is untested (using phpcassa) and may need tweaking to get the
> correct range of records or maybe completely wrong! however it shows the
> concept of taking a page number then pulling out a range of posts belonging
> to that page:
>
> $cf_threads looks like:
> thread_ID => (timestamp => post_id)
>
> if($page > 1)
> {
> $ranger = ($pagenumber * 20);
> $low_ranger = $ranger - 20;
> $arr_range = $cf_threads->get("$thread_id" , $columns=null ,
>  $column_start="" , $column_finish="" , $column_reversed=True,
> $limit=$ranger);
> $arr_page = array_slice($arr_range, $low_ranger , $ranger , TRUE);
> }else
> {
> $arr_page = $cf_threads->get("$thread_id" , $columns=null
> ,  $column_start="" , $column_finish="" , $column_reversed=True, $limit=20);
> }
>
> I think this should be ok? the only concern is if there are some really
> long threads when im having to pull the entire CF.
>
> Another idea involved a schema change and using a super CF to include a
> page number as follows:
>
> Thread_ID => (PageNumber(timestamp => Post_ID))
>
> Probably more efficient but generally page numbers go backwards ie page 1
> has newest content so this would complicate things when writing data and
> cause load if logic was included to reorganise page numbers etc.
>
> Cheers
>
> Sam
> http://Newsarc.net
>



-- 
Derek Williams


Re: Offset in slicequeries for pagination

2012-06-11 Thread Rajat Mathur
Hi Cyril,

This may help.

http://architecturalatrocities.com/post/13918146722/implementing-column-pagination-in-cassandra

On Tue, Jun 12, 2012 at 3:18 AM, Cyril Auburtin wrote:

> If my columns are ("k1:k2" => data1), ("k11:k32" => data211), ("k10:k211"
> => data91)
>
> U mean transforming to  ("1:k1:k2" => data1), ("2:k11:k32" => data211) but
> I need the previous columns names to slice query on them
>
> 2012/6/11 R. Verlangen 
>
> I solved this with creating a manual index with as column keys integers
>> and column values the uuid's of the results. Then run a slicequery to
>> determine the batch to fetch.
>>
>>
>> 2012/6/11 Cyril Auburtin 
>>
>>> using  10 results maximum per page,
>>>
>>> to go directly to 14th page, there is no offset=141 possibility I guess?
>>> or does a Java client proposes that?
>>>
>>> What is the best solution, perform a get with a limit = page*10, and
>>> then a get with a column_start equals the lastest column received, and a
>>> limit of 10,
>>> I guess also, client side should cache results but it's off topic
>>>
>>
>>
>>
>> --
>> With kind regards,
>>
>> Robin Verlangen
>> *Software engineer*
>> *
>> *
>> W http://www.robinverlangen.nl
>> E ro...@us2.nl
>>
>> Disclaimer: The information contained in this message and attachments is
>> intended solely for the attention and use of the named addressee and may be
>> confidential. If you are not the intended recipient, you are reminded that
>> the information remains the property of the sender. You must not use,
>> disclose, distribute, copy, print or rely on this e-mail. If you have
>> received this message in error, please contact the sender immediately and
>> irrevocably delete this message and any copies.
>>
>>
>


-- 
*Rajat Mathur
B.Tech (IT) Final Year
IIIT Allahabad

09945990291

Find me @ Facebook <http://www.facebook.com/rajatmathurr>
Follow me @ Twitter <http://www.twitter.com/Raj_Mathur>*


Re: Offset in slicequeries for pagination

2012-06-11 Thread Cyril Auburtin
If my columns are ("k1:k2" => data1), ("k11:k32" => data211), ("k10:k211"
=> data91)

U mean transforming to  ("1:k1:k2" => data1), ("2:k11:k32" => data211) but
I need the previous columns names to slice query on them

2012/6/11 R. Verlangen 

> I solved this with creating a manual index with as column keys integers
> and column values the uuid's of the results. Then run a slicequery to
> determine the batch to fetch.
>
>
> 2012/6/11 Cyril Auburtin 
>
>> using  10 results maximum per page,
>>
>> to go directly to 14th page, there is no offset=141 possibility I guess?
>> or does a Java client proposes that?
>>
>> What is the best solution, perform a get with a limit = page*10, and then
>> a get with a column_start equals the lastest column received, and a limit
>> of 10,
>> I guess also, client side should cache results but it's off topic
>>
>
>
>
> --
> With kind regards,
>
> Robin Verlangen
> *Software engineer*
> *
> *
> W http://www.robinverlangen.nl
> E ro...@us2.nl
>
> Disclaimer: The information contained in this message and attachments is
> intended solely for the attention and use of the named addressee and may be
> confidential. If you are not the intended recipient, you are reminded that
> the information remains the property of the sender. You must not use,
> disclose, distribute, copy, print or rely on this e-mail. If you have
> received this message in error, please contact the sender immediately and
> irrevocably delete this message and any copies.
>
>


Re: Offset in slicequeries for pagination

2012-06-11 Thread R. Verlangen
I solved this with creating a manual index with as column keys integers and
column values the uuid's of the results. Then run a slicequery to determine
the batch to fetch.

2012/6/11 Cyril Auburtin 

> using  10 results maximum per page,
>
> to go directly to 14th page, there is no offset=141 possibility I guess?
> or does a Java client proposes that?
>
> What is the best solution, perform a get with a limit = page*10, and then
> a get with a column_start equals the lastest column received, and a limit
> of 10,
> I guess also, client side should cache results but it's off topic
>



-- 
With kind regards,

Robin Verlangen
*Software engineer*
*
*
W http://www.robinverlangen.nl
E ro...@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.


Re: Maintain sort order on updatable property and pagination

2012-04-29 Thread aaron morton
> . Is there a better way to solve this in real time.
Not really. If however you can send a row level delete before the insert you 
dont need to read first. Of course that deletes all the other data :)

If you create a secondary index on a column value, the index will be updated 
when you change the value. Note that it has to do the same thing you do: read 
and delete the old value. 

> Also for pagination, we have to set range for columnNames. If we know the 
> last page's last columnName we can get the next page. What if we want to go 
> from page 2 to page 6, this seems impossible as of now. Any suggestion?
You will need to read the intermediate pages. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/04/2012, at 11:28 PM, Rajat Mathur wrote:

> Hi All, 
> 
> I am using property of columns i.e., they are in sorted order to store sort 
> orders (I believe everyone else is also using the same).
> But if I want to maintain sort order on a property, whose value changes, I 
> would have to perform read and delete operation. Is there a better way to 
> solve this in real time.
> 
> Also for pagination, we have to set range for columnNames. If we know the 
> last page's last columnName we can get the next page. What if we want to go 
> from page 2 to page 6, this seems impossible as of now. Any suggestion?
> 
> Thank you.
> 
> 



Maintain sort order on updatable property and pagination

2012-04-26 Thread Rajat Mathur
Hi All,

I am using property of columns i.e., they are in sorted order to store sort
orders (I believe everyone else is also using the same).
But if I want to maintain sort order on a property, whose value changes, I
would have to perform read and delete operation. Is there a better way to
solve this in real time.

Also for pagination, we have to set range for columnNames. If we know the
last page's last columnName we can get the next page. What if we want to go
from page 2 to page 6, this seems impossible as of now. Any suggestion?

Thank you.

**


RE: Previous Page Pagination?

2011-10-29 Thread Alex Major
Hey,

You can switch the reversed flag on (or off) and use the first key returned
as the start key for the previous page.

Alex
--
From: Sam Hodgson
Sent: 29/10/2011 16:15
To: user@cassandra.apache.org
Subject: Previous Page Pagination?

  Hi,

Is there a good method to use for pagination in Cassandra? I can create a
next page link by pulling an extra column when getting however previous
page is proving tricky.  Im using integer timestamps as column keys so
would really need a way to pull out the 10 preceeding columns as well as
the 10 following the given timestamp.

Cheers

Sam


Previous Page Pagination?

2011-10-29 Thread Sam Hodgson

Hi,

Is there a good method to use for pagination in Cassandra? I can create a next 
page link by pulling an extra column when getting however previous page is 
proving tricky.  Im using integer timestamps as column keys so would really 
need a way to pull out the 10 preceeding columns as well as the 10 following 
the given timestamp.

Cheers

Sam
  

RE: How do you implement pagination?

2010-12-10 Thread Dan Hendry
Or you can just start at the 1 + nth id given ids must be unique (you don't
have to specify an existing id as the start of a slice). You don't HAVE to
load the n + 1 record. 

 

This (slightly) more optimal approach has the disadvantage that you don't
know with certainty when you have reached the end of all records. This may
or may not be acceptable for your application.

 

Dan

 

From: joshua.j...@gmail.com [mailto:joshua.j...@gmail.com] On Behalf Of
Joshua Partogi
Sent: December-10-10 21:05
To: user@cassandra.apache.org
Subject: Re: How do you implement pagination?

 

So you're actually getting n+1 record? Correct? So this is the right way to
do it?



On Sat, Dec 11, 2010 at 1:02 PM, Tyler Hobbs  wrote:

Yes, what you described is the correct way to do it.  Your next slice will
start with that 11th column.

- Tyler

 

On Fri, Dec 10, 2010 at 7:01 PM, Joshua Partogi 
wrote:

Hi all,

I am interested to see people's way to do record pagination with cassandra
because I can not find anything like MySQL LIMIT in cassandra. 

>From what I understand you need to tell cassandra the Record ID for the
beginning of the slice and the number of record you want to get after that
Record. I am using UUID instead of Long for the Record ID. 

My question is, how does your application get the next Record ID after the
current slice that is displayed on the page? 
Let's say I want to display record 1-10, do I actually grab 11 records but
only display 10 records and only keep the ID of the 11th records so I can
use it for pagination?

Sorry if the question is a bit obscured, but I am still figuring out how to
do pagination. 

Thanks very much for your assistance.

Kind regards,
Joshua.

-- 
http://twitter.com/jpartogi <http://twitter.com/scrum8> 

 




-- 
http://twitter.com/jpartogi

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.872 / Virus Database: 271.1.1/3307 - Release Date: 12/10/10
02:37:00



Re: How do you implement pagination?

2010-12-10 Thread Joshua Partogi
So you're actually getting n+1 record? Correct? So this is the right way to
do it?


On Sat, Dec 11, 2010 at 1:02 PM, Tyler Hobbs  wrote:

> Yes, what you described is the correct way to do it.  Your next slice will
> start with that 11th column.
>
> - Tyler
>
>
> On Fri, Dec 10, 2010 at 7:01 PM, Joshua Partogi wrote:
>
>> Hi all,
>>
>> I am interested to see people's way to do record pagination with cassandra
>> because I can not find anything like MySQL LIMIT in cassandra.
>>
>> From what I understand you need to tell cassandra the Record ID for the
>> beginning of the slice and the number of record you want to get after that
>> Record. I am using UUID instead of Long for the Record ID.
>>
>> My question is, how does your application get the next Record ID after the
>> current slice that is displayed on the page?
>> Let's say I want to display record 1-10, do I actually grab 11 records but
>> only display 10 records and only keep the ID of the 11th records so I can
>> use it for pagination?
>>
>> Sorry if the question is a bit obscured, but I am still figuring out how
>> to do pagination.
>>
>> Thanks very much for your assistance.
>>
>> Kind regards,
>> Joshua.
>>
>> --
>> http://twitter.com/jpartogi <http://twitter.com/scrum8>
>>
>
>


-- 
http://twitter.com/jpartogi


Re: How do you implement pagination?

2010-12-10 Thread Tyler Hobbs
Yes, what you described is the correct way to do it.  Your next slice will
start with that 11th column.

- Tyler

On Fri, Dec 10, 2010 at 7:01 PM, Joshua Partogi wrote:

> Hi all,
>
> I am interested to see people's way to do record pagination with cassandra
> because I can not find anything like MySQL LIMIT in cassandra.
>
> From what I understand you need to tell cassandra the Record ID for the
> beginning of the slice and the number of record you want to get after that
> Record. I am using UUID instead of Long for the Record ID.
>
> My question is, how does your application get the next Record ID after the
> current slice that is displayed on the page?
> Let's say I want to display record 1-10, do I actually grab 11 records but
> only display 10 records and only keep the ID of the 11th records so I can
> use it for pagination?
>
> Sorry if the question is a bit obscured, but I am still figuring out how to
> do pagination.
>
> Thanks very much for your assistance.
>
> Kind regards,
> Joshua.
>
> --
> http://twitter.com/jpartogi <http://twitter.com/scrum8>
>


How do you implement pagination?

2010-12-10 Thread Joshua Partogi
Hi all,

I am interested to see people's way to do record pagination with cassandra
because I can not find anything like MySQL LIMIT in cassandra.

>From what I understand you need to tell cassandra the Record ID for the
beginning of the slice and the number of record you want to get after that
Record. I am using UUID instead of Long for the Record ID.

My question is, how does your application get the next Record ID after the
current slice that is displayed on the page?
Let's say I want to display record 1-10, do I actually grab 11 records but
only display 10 records and only keep the ID of the 11th records so I can
use it for pagination?

Sorry if the question is a bit obscured, but I am still figuring out how to
do pagination.

Thanks very much for your assistance.

Kind regards,
Joshua.

-- 
http://twitter.com/jpartogi <http://twitter.com/scrum8>


Re: Pagination

2010-12-06 Thread Jonathan Ellis
Short answer: that's a bad idea; don't do it.

Long answer: you could count 10 pages of results and jump there
manually, which is what "offset 10 * page_size" is doing for you under
the hood, but that gets slow quickly as your offset grows.  Which is
why you shouldn't do it with a SQL db either.

On Mon, Dec 6, 2010 at 3:35 PM, Mark  wrote:
> How is pagination accomplished when you dont know a start key? For example,
> how can I "jump" to page 10?
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Pagination

2010-12-06 Thread Mark
How is pagination accomplished when you dont know a start key? For 
example, how can I "jump" to page 10?


Re: SEO friendly pagination

2010-08-25 Thread Ryan King
On Wed, Aug 25, 2010 at 11:20 AM, Petr Odut  wrote:
> Hi,
> I've read about pagination in cassandra. My current implementation is
> get_range_slices with startKey = lastKey + 1, but I need to get the
> specified page directly. Is it any chance to do this?
>
> If you look at twitter, it has direct pagination too:
> http://twitter.com/PetrOdut?page=1
> http://twitter.com/PetrOdut?page=2
> http://twitter.com/PetrOdut?page=3

I'm not sure what the SEO implications are of using page numbers
rather than tokens, but (as someone who's worked on Twitter's
pagination code) I'd say to use tokens.

-ryan


SEO friendly pagination

2010-08-25 Thread Petr Odut
Hi,
I've read about pagination in cassandra. My current implementation is
get_range_slices with startKey = lastKey + 1, but I need to get the
specified page directly. Is it any chance to do this?

If you look at twitter, it has direct pagination too:
http://twitter.com/PetrOdut?page=1
http://twitter.com/PetrOdut?page=2
http://twitter.com/PetrOdut?page=3

Thanks, Petr


Re: pagination through slices with deleted keys

2010-05-07 Thread Mike Malone
On Fri, May 7, 2010 at 5:29 AM, Joost Ouwerkerk wrote:

> +1.  There is some disagreement on whether or not the API should
> return empty columns or skip rows when no data is found.  In all of
> our use cases, we would prefer skipped rows.  And based on how
> frequently new cassandra users appear to be confused about the current
> behaviour, this might be a more common use case than the need for
> empty cols.  Perhaps this could be added as an option on
> SlicePredicate ?  (e.g. skipEmpty=true).
>

That's exactly how we implemented it:

struct SlicePredicate {
1: optional list column_names,
2: optional SliceRange   slice_range,
3: optional bool ignore_empty_rows=0,
}

Mike


Re: pagination through slices with deleted keys

2010-05-07 Thread Mark Greene
I like your idea about specifying it at the SP level.

On Fri, May 7, 2010 at 8:29 AM, Joost Ouwerkerk wrote:

> +1.  There is some disagreement on whether or not the API should
> return empty columns or skip rows when no data is found.  In all of
> our use cases, we would prefer skipped rows.  And based on how
> frequently new cassandra users appear to be confused about the current
> behaviour, this might be a more common use case than the need for
> empty cols.  Perhaps this could be added as an option on
> SlicePredicate ?  (e.g. skipEmpty=true).
>
> On Fri, May 7, 2010 at 12:59 AM, Mike Malone  wrote:
> > On Thu, May 6, 2010 at 3:27 PM, Ian Kallen 
> wrote:
> >>
> >> Cool, is this a patch you've applied on the server side? Are you running
> >> 0.6.x? I'm wondering if this kind of thing can make it into future
> versions
> >> of Cassandra.
> >
> > Yea, server side. It's basically doing the same thing clients typically
> want
> > to do (again, at least for our use cases) but doing it closer to the
> data.
> > Our patch is kind of janky though. I can probably get some version of it
> > pushed back upstream - or at least on github or something - if there's
> any
> > interest.
> > Mike
>


Re: pagination through slices with deleted keys

2010-05-07 Thread Joost Ouwerkerk
+1.  There is some disagreement on whether or not the API should
return empty columns or skip rows when no data is found.  In all of
our use cases, we would prefer skipped rows.  And based on how
frequently new cassandra users appear to be confused about the current
behaviour, this might be a more common use case than the need for
empty cols.  Perhaps this could be added as an option on
SlicePredicate ?  (e.g. skipEmpty=true).

On Fri, May 7, 2010 at 12:59 AM, Mike Malone  wrote:
> On Thu, May 6, 2010 at 3:27 PM, Ian Kallen  wrote:
>>
>> Cool, is this a patch you've applied on the server side? Are you running
>> 0.6.x? I'm wondering if this kind of thing can make it into future versions
>> of Cassandra.
>
> Yea, server side. It's basically doing the same thing clients typically want
> to do (again, at least for our use cases) but doing it closer to the data.
> Our patch is kind of janky though. I can probably get some version of it
> pushed back upstream - or at least on github or something - if there's any
> interest.
> Mike


Re: pagination through slices with deleted keys

2010-05-06 Thread Mike Malone
On Thu, May 6, 2010 at 3:27 PM, Ian Kallen  wrote:

> Cool, is this a patch you've applied on the server side? Are you running
> 0.6.x? I'm wondering if this kind of thing can make it into future versions
> of Cassandra.
>

Yea, server side. It's basically doing the same thing clients typically want
to do (again, at least for our use cases) but doing it closer to the data.
Our patch is kind of janky though. I can probably get some version of it
pushed back upstream - or at least on github or something - if there's any
interest.

Mike


Re: pagination through slices with deleted keys

2010-05-06 Thread Ian Kallen
Cool, is this a patch you've applied on the server side? Are you running
0.6.x? I'm wondering if this kind of thing can make it into future versions
of Cassandra.
-Ian

On Thu, May 6, 2010 at 2:56 PM, Mike Malone  wrote:

> Our solution at SimpleGeo has been to hack Cassandra to (optionally, at
> least) be sensible and drop Rows that don't have any Columns. The claim from
> the FAQ that "Cassandra would have to check if there are any other columns
> in the row" is inaccurate. The common case for us at least is that we're
> only interested in Rows that have Columns matching our predicate. So if
> there aren't any, we just don't return that row. No need to check if the
> entire row is deleted.
>
> Mike
>
>
> On Thu, May 6, 2010 at 9:17 AM, Ian Kallen wrote:
>
>> I read the DistributedDeletes and the range_ghosts FAQ entry on the wiki
>> which do a good job describing how difficult deletion is in an eventually
>> consistent system. But practical application strategies for dealing with it
>> aren't there (that I saw). I'm wondering how folks implement pagination in
>> their applications; if you want to render N results in an application, is
>> the only solution to over-fetch and filter out the tombstones? Or is there
>> something simpler that I overlooked? I'd like to be able to count (even if
>> the counts are approximate) and fetch rows with the deleted ones filtered
>> out (without waiting for the GCGraceSeconds interval + compaction) but from
>> what I see so far, the burden is on the app to deal with the tombstones.
>>  -Ian
>>
>
>


Re: pagination through slices with deleted keys

2010-05-06 Thread Mike Malone
Our solution at SimpleGeo has been to hack Cassandra to (optionally, at
least) be sensible and drop Rows that don't have any Columns. The claim from
the FAQ that "Cassandra would have to check if there are any other columns
in the row" is inaccurate. The common case for us at least is that we're
only interested in Rows that have Columns matching our predicate. So if
there aren't any, we just don't return that row. No need to check if the
entire row is deleted.

Mike

On Thu, May 6, 2010 at 9:17 AM, Ian Kallen  wrote:

> I read the DistributedDeletes and the range_ghosts FAQ entry on the wiki
> which do a good job describing how difficult deletion is in an eventually
> consistent system. But practical application strategies for dealing with it
> aren't there (that I saw). I'm wondering how folks implement pagination in
> their applications; if you want to render N results in an application, is
> the only solution to over-fetch and filter out the tombstones? Or is there
> something simpler that I overlooked? I'd like to be able to count (even if
> the counts are approximate) and fetch rows with the deleted ones filtered
> out (without waiting for the GCGraceSeconds interval + compaction) but from
> what I see so far, the burden is on the app to deal with the tombstones.
> -Ian
>


Re: pagination through slices with deleted keys

2010-05-06 Thread Ian Kallen
Thanks Mark, great illustration. I'm already splitting my time developing
directly with hector and a vastly simplified jython wrapper around it; I
guess I'll address it at some wrapping layer (patch hector or let the jython
layer deal).

My grumpy editorial about this stuff is that on the cassandra server, the
Column has an isMarkedForDelete():boolean and the thrift (+avro) API could
(should) expose the ability to filter the true cases. It seems pretty grotty
to allow such low level data bookkeeping to leak into the application space;
analogous to the ConsistencyLevel and timestamp stuff that most folks don't
need on the app layer, slicing via the service API should do the filtering
unless your abstraction actually *needs* the tombstones. OK, done being
grumpy :)
-Ian

On Thu, May 6, 2010 at 9:26 AM, Mark Greene  wrote:

> Hey Ian,
>
> I actually just wrote a quick example of how to iterate over a CF that may
> have tombstones. This may help you out:
> http://markjgreene.wordpress.com/2010/05/05/iterate-over-entire-cassandra-column-family/
>
>
> On Thu, May 6, 2010 at 12:17 PM, Ian Kallen wrote:
>
>> I read the DistributedDeletes and the range_ghosts FAQ entry on the wiki
>> which do a good job describing how difficult deletion is in an eventually
>> consistent system. But practical application strategies for dealing with it
>> aren't there (that I saw). I'm wondering how folks implement pagination in
>> their applications; if you want to render N results in an application, is
>> the only solution to over-fetch and filter out the tombstones? Or is there
>> something simpler that I overlooked? I'd like to be able to count (even if
>> the counts are approximate) and fetch rows with the deleted ones filtered
>> out (without waiting for the GCGraceSeconds interval + compaction) but from
>> what I see so far, the burden is on the app to deal with the tombstones.
>>  -Ian
>>
>
>


Re: pagination through slices with deleted keys

2010-05-06 Thread Mark Greene
Hey Ian,

I actually just wrote a quick example of how to iterate over a CF that may
have tombstones. This may help you out:
http://markjgreene.wordpress.com/2010/05/05/iterate-over-entire-cassandra-column-family/

On Thu, May 6, 2010 at 12:17 PM, Ian Kallen  wrote:

> I read the DistributedDeletes and the range_ghosts FAQ entry on the wiki
> which do a good job describing how difficult deletion is in an eventually
> consistent system. But practical application strategies for dealing with it
> aren't there (that I saw). I'm wondering how folks implement pagination in
> their applications; if you want to render N results in an application, is
> the only solution to over-fetch and filter out the tombstones? Or is there
> something simpler that I overlooked? I'd like to be able to count (even if
> the counts are approximate) and fetch rows with the deleted ones filtered
> out (without waiting for the GCGraceSeconds interval + compaction) but from
> what I see so far, the burden is on the app to deal with the tombstones.
>  -Ian
>


pagination through slices with deleted keys

2010-05-06 Thread Ian Kallen
I read the DistributedDeletes and the range_ghosts FAQ entry on the wiki
which do a good job describing how difficult deletion is in an eventually
consistent system. But practical application strategies for dealing with it
aren't there (that I saw). I'm wondering how folks implement pagination in
their applications; if you want to render N results in an application, is
the only solution to over-fetch and filter out the tombstones? Or is there
something simpler that I overlooked? I'd like to be able to count (even if
the counts are approximate) and fetch rows with the deleted ones filtered
out (without waiting for the GCGraceSeconds interval + compaction) but from
what I see so far, the burden is on the app to deal with the tombstones.
-Ian


  1   2   >