Re: Iterating through large numbers of rows with JDBC

2013-05-14 Thread aaron morton
You can iterate over them, just make sure to set a sensible row count to chunk 
things up.
See 
http://www.datastax.com/docs/1.2/cql_cli/using/paging#non-ordered-partitioner-paging

You can also break up the processing so only one worker reads the token ranges 
for a node. That allows you to 
process the rows in parallel and avoid workers processing the same rows. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/05/2013, at 2:51 AM, Robert Wille rwi...@footnote.com wrote:

 Iterating through lots of records is not a primary use of my data.
 However, there are a number scenarios where scanning the entire contents
 of a column family is an interesting and useful exercise. Here are a few:
 removal of orphaned records, checking the integrity a data set, and
 analytics.
 
 On 5/12/13 3:41 AM, Oleg Dulin oleg.du...@gmail.com wrote:
 
 On 2013-05-11 14:42:32 +, Robert Wille said:
 
 I'm using the JDBC driver to access Cassandra. I'm wondering if its
 possible to iterate through a large number of records (e.g. to perform
 maintenance on a large column family). I tried calling
 Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,
 ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that
 cursors aren't supported. Is there another way to do this, or do I need
 to
 use a different API?
 
 Thanks in advance
 
 Robert
 
 If you feel that you need to iterate through a large number of rows
 then you are probably not using a correct data model.
 
 Can you describe your use case ?
 
 -- 
 Regards,
 Oleg Dulin
 NYC Java Big Data Engineer
 http://www.olegdulin.com/
 
 
 
 



Re: Iterating through large numbers of rows with JDBC

2013-05-14 Thread David McNelis
Another thing to keep in mind when doing this with CQL is to take into
account the ordering partitioner you may or may not be using.  If you're
using one you'll need to make sure that if you have a larger number of rows
for the partitioner key than your query limit, then you can end up in a
situation where you're stuck in a loop.


On Tue, May 14, 2013 at 1:39 PM, aaron morton aa...@thelastpickle.comwrote:

 You can iterate over them, just make sure to set a sensible row count to
 chunk things up.
 See
 http://www.datastax.com/docs/1.2/cql_cli/using/paging#non-ordered-partitioner-paging

 You can also break up the processing so only one worker reads the token
 ranges for a node. That allows you to
 process the rows in parallel and avoid workers processing the same rows.

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 13/05/2013, at 2:51 AM, Robert Wille rwi...@footnote.com wrote:

 Iterating through lots of records is not a primary use of my data.
 However, there are a number scenarios where scanning the entire contents
 of a column family is an interesting and useful exercise. Here are a few:
 removal of orphaned records, checking the integrity a data set, and
 analytics.

 On 5/12/13 3:41 AM, Oleg Dulin oleg.du...@gmail.com wrote:

 On 2013-05-11 14:42:32 +, Robert Wille said:

 I'm using the JDBC driver to access Cassandra. I'm wondering if its
 possible to iterate through a large number of records (e.g. to perform
 maintenance on a large column family). I tried calling
 Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,
 ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that
 cursors aren't supported. Is there another way to do this, or do I need
 to
 use a different API?

 Thanks in advance

 Robert


 If you feel that you need to iterate through a large number of rows
 then you are probably not using a correct data model.

 Can you describe your use case ?

 --
 Regards,
 Oleg Dulin
 NYC Java Big Data Engineer
 http://www.olegdulin.com/








Re: Iterating through large numbers of rows with JDBC

2013-05-12 Thread Oleg Dulin

On 2013-05-11 14:42:32 +, Robert Wille said:


I'm using the JDBC driver to access Cassandra. I'm wondering if its
possible to iterate through a large number of records (e.g. to perform
maintenance on a large column family). I tried calling
Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,
ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that
cursors aren't supported. Is there another way to do this, or do I need to
use a different API?

Thanks in advance

Robert


If you feel that you need to iterate through a large number of rows 
then you are probably not using a correct data model.


Can you describe your use case ?

--
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/




Re: Iterating through large numbers of rows with JDBC

2013-05-12 Thread Robert Wille
Iterating through lots of records is not a primary use of my data.
However, there are a number scenarios where scanning the entire contents
of a column family is an interesting and useful exercise. Here are a few:
removal of orphaned records, checking the integrity a data set, and
analytics.

On 5/12/13 3:41 AM, Oleg Dulin oleg.du...@gmail.com wrote:

On 2013-05-11 14:42:32 +, Robert Wille said:

 I'm using the JDBC driver to access Cassandra. I'm wondering if its
 possible to iterate through a large number of records (e.g. to perform
 maintenance on a large column family). I tried calling
 Connection.createStatement(ResultSet.TYPE_FORWARD_ONLY,
 ResultSet.CONCUR_READ_ONLY), but it times out, so I'm guessing that
 cursors aren't supported. Is there another way to do this, or do I need
to
 use a different API?
 
 Thanks in advance
 
 Robert

If you feel that you need to iterate through a large number of rows
then you are probably not using a correct data model.

Can you describe your use case ?

-- 
Regards,
Oleg Dulin
NYC Java Big Data Engineer
http://www.olegdulin.com/