Re: Efficiency of hector's setRowCount
Hi Don. No it will not. IndexedSlicesQuery will read just the amount of rows specified by RowCount and will go to the DB to get the new page when needed. SetRowCount is doing indexClause.setCount(rowCount); On Mon, Oct 10, 2011 at 3:52 PM, Don Smith dsm...@likewise.com wrote: Hector's IndexedSlicesQuery has a setRowCount method that you can use to page through the results, as described in https://github.com/rantav/** hector/wiki/User-Guide https://github.com/rantav/hector/wiki/User-Guide. rangeSlicesQuery.setRowCount(**1001); . rangeSlicesQuery.setKeys(**lastRow.getKey(), ); Is it efficient? Specifically, suppose my query returns 100,000 results and I page through batches of 1000 at a time (making 100 executes of the query). Will it internally retrieve all the results each time (but pass only the desired set of 1000 or so to me)? Or will it optimize queries to avoid the duplication? I presume the latter. :) Can IndexedSlicesQuery's setStartKey method be used for the same effect? Thanks, Don
Re: Efficiency of hector's setRowCount (and setStartKey!)
It's actually setStartKey that's the important method call (in combination with setRowCount). So I should have been clearer. The following code performs as expected, as far as returning the expected data in the expected order. I believe that the use of IndexedSliceQuery's setStartKey will support efficient queries -- avoiding repulling the entire data set from cassandra. Correct? void demoPaging() { String lastKey = processPage(don,); // get first batch, starting with (smallest key) lastKey = processPage(don,lastKey);// get second batch starting with previous last key lastKey = processPage(don,lastKey);// get third batch starting with previous last key // } // return last key processed, null when no records left String processPage(String username, String startKey) { String lastKey=null; IndexedSlicesQueryString, String, String indexedSlicesQuery = HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); indexedSlicesQuery.addEqualsExpression(user, username); indexedSlicesQuery.setColumnNames(source,ip); indexedSlicesQuery.setColumnFamily(ourColumnFamilyName); indexedSlicesQuery.setStartKey(startKey); // indexedSlicesQuery.setRowCount(batchSize); QueryResultOrderedRowsString, String, String result =indexedSlicesQuery.execute(); OrderedRowsString,String,String rows = result.get(); for(RowString,String,String row:rows ){ if (row==null) { continue; } totalCount++; String key = row.getKey(); if (!startKey.equals(key)) {lastKey=key;} } totalCount--; return lastKey; } On 10/13/2011 09:15 AM, Patricio Echagüe wrote: Hi Don. No it will not. IndexedSlicesQuery will read just the amount of rows specified by RowCount and will go to the DB to get the new page when needed. SetRowCount is doing indexClause.setCount(rowCount); On Mon, Oct 10, 2011 at 3:52 PM, Don Smith dsm...@likewise.com mailto:dsm...@likewise.com wrote: Hector's IndexedSlicesQuery has a setRowCount method that you can use to page through the results, as described in https://github.com/rantav/hector/wiki/User-Guide . rangeSlicesQuery.setRowCount(1001); . rangeSlicesQuery.setKeys(lastRow.getKey(), ); Is it efficient? Specifically, suppose my query returns 100,000 results and I page through batches of 1000 at a time (making 100 executes of the query). Will it internally retrieve all the results each time (but pass only the desired set of 1000 or so to me)? Or will it optimize queries to avoid the duplication? I presume the latter. :) Can IndexedSlicesQuery's setStartKey method be used for the same effect? Thanks, Don
Re: Efficiency of hector's setRowCount (and setStartKey!)
On Thu, Oct 13, 2011 at 9:39 AM, Don Smith dsm...@likewise.com wrote: ** It's actually setStartKey that's the important method call (in combination with setRowCount). So I should have been clearer. The following code performs as expected, as far as returning the expected data in the expected order. I believe that the use of IndexedSliceQuery's setStartKey will support efficient queries -- avoiding repulling the entire data set from cassandra. Correct? correct void demoPaging() { String lastKey = processPage(don,); // get first batch, starting with (smallest key) lastKey = processPage(don,lastKey);// get second batch starting with previous last key lastKey = processPage(don,lastKey);// get third batch starting with previous last key // } // return last key processed, null when no records left String processPage(String username, String startKey) { String lastKey=null; IndexedSlicesQueryString, String, String indexedSlicesQuery = HFactory.createIndexedSlicesQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); indexedSlicesQuery.addEqualsExpression(user, username); indexedSlicesQuery.setColumnNames(source,ip); indexedSlicesQuery.setColumnFamily(ourColumnFamilyName); indexedSlicesQuery.setStartKey(startKey); // indexedSlicesQuery.setRowCount(batchSize); QueryResultOrderedRowsString, String, String result =indexedSlicesQuery.execute(); OrderedRowsString,String,String rows = result.get(); for(RowString,String,String row:rows ){ if (row==null) { continue; } totalCount++; String key = row.getKey(); if (!startKey.equals(key)) {lastKey=key;} } totalCount--; return lastKey; } On 10/13/2011 09:15 AM, Patricio Echagüe wrote: Hi Don. No it will not. IndexedSlicesQuery will read just the amount of rows specified by RowCount and will go to the DB to get the new page when needed. SetRowCount is doing indexClause.setCount(rowCount); On Mon, Oct 10, 2011 at 3:52 PM, Don Smith dsm...@likewise.com wrote: Hector's IndexedSlicesQuery has a setRowCount method that you can use to page through the results, as described in https://github.com/rantav/hector/wiki/User-Guide . rangeSlicesQuery.setRowCount(1001); . rangeSlicesQuery.setKeys(lastRow.getKey(), ); Is it efficient? Specifically, suppose my query returns 100,000 results and I page through batches of 1000 at a time (making 100 executes of the query). Will it internally retrieve all the results each time (but pass only the desired set of 1000 or so to me)? Or will it optimize queries to avoid the duplication? I presume the latter. :) Can IndexedSlicesQuery's setStartKey method be used for the same effect? Thanks, Don
Efficiency of hector's setRowCount
Hector's IndexedSlicesQuery has a setRowCount method that you can use to page through the results, as described in https://github.com/rantav/hector/wiki/User-Guide . rangeSlicesQuery.setRowCount(1001); . rangeSlicesQuery.setKeys(lastRow.getKey(), ); Is it efficient? Specifically, suppose my query returns 100,000 results and I page through batches of 1000 at a time (making 100 executes of the query). Will it internally retrieve all the results each time (but pass only the desired set of 1000 or so to me)? Or will it optimize queries to avoid the duplication? I presume the latter. :) Can IndexedSlicesQuery's setStartKey method be used for the same effect? Thanks, Don