[ https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902572#comment-13902572 ]
Edward Capriolo commented on CASSANDRA-6704: -------------------------------------------- {quote} One to act on on the query itself before it is executed, and another to act on the result set of any query. {quote} In many cases it is not enough to act on the result set of the query. Scanners return a status that is interpreted by the framework allowing the processing to continue or not. For example, imagine a wide row with 100000000 columns and my goal is to search until I find a column that is even. I can not materialize the result set first and them trim it down that would likely OOM. That however is a totally valid use case. Intravert calls that a filter https://github.com/zznate/intravert-ug/wiki/Filter-mode. This could be implemented easy enough by allowing a SlicePredicate to supply an option FilterFunction. Although that has one weird issue. If the filter leaves out the last row, how do you know what the last row filtered was. {quote}The main thing that would be sacrificed, with respect to this ticket, would be embedded groovy in select statements, as I believe this is the most controversial aspect.{quote} Think about this. You create a function you load it to 30 servers. It is found to have a bug. What do you do? Bob wants to create a new function??? Lets shut down the entire cluster. Schedule and outage to rolling restart every server? Without being able to load unload it is just a toy no one can really use it in production in any meaningful way. One you put the proper cap on it and disallow the features to those that fear it the problem is solved. > Create wide row scanners > ------------------------ > > Key: CASSANDRA-6704 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6704 > Project: Cassandra > Issue Type: New Feature > Reporter: Edward Capriolo > Assignee: Edward Capriolo > > The BigTable white paper demonstrates the use of scanners to iterate over > rows and columns. > http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf > Because Cassandra does not have a primary sorting on row keys scanning over > ranges of row keys is less useful. > However we can use the scanner concept to operate on wide rows. For example > many times a user wishes to do some custom processing inside a row and does > not wish to carry the data across the network to do this processing. > I have already implemented thrift methods to compile dynamic groovy code into > Filters as well as some code that uses a Filter to page through and process > data on the server side. > https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk > The following is a working code snippet. > {code} > @Test > public void test_scanner() throws Exception > { > ColumnParent cp = new ColumnParent(); > cp.setColumn_family("Standard1"); > ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes()); > for (char a='a'; a < 'g'; a++){ > Column c1 = new Column(); > c1.setName((a+"").getBytes()); > c1.setValue(new byte [0]); > c1.setTimestamp(System.nanoTime()); > server.insert(key, cp, c1, ConsistencyLevel.ONE); > } > > FilterDesc d = new FilterDesc(); > d.setSpec("GROOVY_CLASS_LOADER"); > d.setName("limit3"); > d.setCode("import org.apache.cassandra.dht.* \n" + > "import org.apache.cassandra.thrift.* \n" + > "public class Limit3 implements SFilter { \n " + > "public FilterReturn filter(ColumnOrSuperColumn col, > List<ColumnOrSuperColumn> filtered) {\n"+ > " filtered.add(col);\n"+ > " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : > FilterReturn.FILTER_DONE;\n"+ > "} \n" + > "}\n"); > server.create_filter(d); > > > ScannerResult res = server.create_scanner("Standard1", "limit3", key, > ByteBuffer.wrap("a".getBytes())); > Assert.assertEquals(3, res.results.size()); > } > {code} > I am going to be working on this code over the next few weeks but I wanted to > get the concept our early so the design can see some criticism. -- This message was sent by Atlassian JIRA (v6.1.5#6160)