Re: Efficiency of hector's setRowCount (and setStartKey!)

2011-10-13 Thread Don Smith
It's actually setStartKey that's the important method call (in 
combination with setRowCount). So I should have been clearer.


The following code performs as expected, as far as returning the 
expected data in the expected order.  I believe that the use of 
IndexedSliceQuery's setStartKey will support efficient queries -- 
avoiding repulling the entire data set from cassandra. Correct?



void demoPaging() {
String lastKey = processPage(don,);  // get first 
batch, starting with  (smallest key)
lastKey = processPage(don,lastKey);// get second 
batch starting with previous last key
lastKey = processPage(don,lastKey);// get third 
batch starting with previous last key

   //
}

// return last key processed, null when no records left
String processPage(String username, String startKey) {
String lastKey=null;
IndexedSlicesQueryString, String, String 
indexedSlicesQuery =

HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, 
stringSerializer, stringSerializer);

indexedSlicesQuery.addEqualsExpression(user, username);

indexedSlicesQuery.setColumnNames(source,ip);

indexedSlicesQuery.setColumnFamily(ourColumnFamilyName);

indexedSlicesQuery.setStartKey(startKey);   // 


indexedSlicesQuery.setRowCount(batchSize);
QueryResultOrderedRowsString, String, 
String result =indexedSlicesQuery.execute();
OrderedRowsString,String,String rows 
= result.get();

for(RowString,String,String row:rows ){
if (row==null) { continue; }
totalCount++;
String key = row.getKey();

if (!startKey.equals(key)) 
{lastKey=key;}

}
totalCount--;
return lastKey;
}






On 10/13/2011 09:15 AM, Patricio Echagüe wrote:
Hi Don. No it will not. IndexedSlicesQuery will read just the amount 
of rows specified by RowCount and will go to the DB to get the new 
page when needed.


SetRowCount is doing indexClause.setCount(rowCount);

On Mon, Oct 10, 2011 at 3:52 PM, Don Smith dsm...@likewise.com 
mailto:dsm...@likewise.com wrote:


Hector's IndexedSlicesQuery has a setRowCount method that you can
use to page through the results, as described in
https://github.com/rantav/hector/wiki/User-Guide .

rangeSlicesQuery.setRowCount(1001);
 .
rangeSlicesQuery.setKeys(lastRow.getKey(),  );

Is it efficient?  Specifically, suppose my query returns 100,000
results and I page through batches of 1000 at a time (making 100
executes of the query). Will it internally retrieve all the
results each time (but pass only the desired set of 1000 or so to
me)? Or will it optimize queries to avoid the duplication?  I
presume the latter. :)

Can IndexedSlicesQuery's setStartKey method be used for the same
effect?

  Thanks,  Don






Re: Efficiency of hector's setRowCount (and setStartKey!)

2011-10-13 Thread Patricio Echagüe
On Thu, Oct 13, 2011 at 9:39 AM, Don Smith dsm...@likewise.com wrote:

 **
 It's actually setStartKey that's the important method call (in combination
 with setRowCount). So I should have been clearer.

 The following code performs as expected, as far as returning the expected
 data in the expected order.  I believe that the use of IndexedSliceQuery's
 setStartKey will support efficient queries -- avoiding repulling the entire
 data set from cassandra. Correct?


correct



 void demoPaging() {
 String lastKey = processPage(don,);  // get first
 batch, starting with  (smallest key)
 lastKey = processPage(don,lastKey);// get second
 batch starting with previous last key
 lastKey = processPage(don,lastKey);// get third batch
 starting with previous last key
//
 }

 // return last key processed, null when no records left
 String processPage(String username, String startKey) {
 String lastKey=null;
 IndexedSlicesQueryString, String, String
 indexedSlicesQuery =
 HFactory.createIndexedSlicesQuery(keyspace,
 stringSerializer, stringSerializer, stringSerializer);

 indexedSlicesQuery.addEqualsExpression(user, username);

 indexedSlicesQuery.setColumnNames(source,ip);

 indexedSlicesQuery.setColumnFamily(ourColumnFamilyName);
 indexedSlicesQuery.setStartKey(startKey);
 //
 
 indexedSlicesQuery.setRowCount(batchSize);
 QueryResultOrderedRowsString, String,
 String result =indexedSlicesQuery.execute();
 OrderedRowsString,String,String rows =
 result.get();
 for(RowString,String,String row:rows ){
 if (row==null) { continue; }
 totalCount++;
 String key = row.getKey();

 if (!startKey.equals(key))
 {lastKey=key;}
 }
 totalCount--;
 return lastKey;
 }






 On 10/13/2011 09:15 AM, Patricio Echagüe wrote:

 Hi Don. No it will not. IndexedSlicesQuery will read just the amount of
 rows specified by RowCount and will go to the DB to get the new page when
 needed.

  SetRowCount is doing indexClause.setCount(rowCount);

 On Mon, Oct 10, 2011 at 3:52 PM, Don Smith dsm...@likewise.com wrote:

 Hector's IndexedSlicesQuery has a setRowCount method that you can use to
 page through the results, as described in
 https://github.com/rantav/hector/wiki/User-Guide .

 rangeSlicesQuery.setRowCount(1001);
  .
 rangeSlicesQuery.setKeys(lastRow.getKey(),  );

 Is it efficient?  Specifically, suppose my query returns 100,000 results
 and I page through batches of 1000 at a time (making 100 executes of the
 query). Will it internally retrieve all the results each time (but pass only
 the desired set of 1000 or so to me)? Or will it optimize queries to avoid
 the duplication?  I presume the latter. :)

 Can IndexedSlicesQuery's setStartKey method be used for the same effect?

   Thanks,  Don