[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902475#comment-13902475
 ] 

Edward Capriolo commented on CASSANDRA-6704:
--------------------------------------------

For reference: Hive allows users to add jar files to the class path and then 
load user defined functions from them. Hive has similar issues with 
"unfortunately" scoped static variables. In practice the plug-ability is a huge 
win, regardless of some subtle risks.

Recently I added the capability to create groovy udfs from the hive shell using 
a similar technique.

https://issues.apache.org/jira/browse/HIVE-5250

We also provided a way for admins to allow and disallow this code

https://issues.apache.org/jira/browse/HIVE-5400

The net result is a huge win. Before we had users that did not have the time to 
set up a development environment (get jars, build jars, email jars to admins, 
push jars to cluster). It was not "security" it was just lots of hoop jumping. 
It kept people from answers they wanted. When done correctly this is 
transformative and powerful. Users have new avenues to access there data. 
Sure you give someone some rope they can hang themselves with a see loop or 
whatever, that is the down. Not saying it is perfect. 

> Create wide row scanners
> ------------------------
>
>                 Key: CASSANDRA-6704
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
>     @Test
>     public void test_scanner() throws Exception
>     {
>       ColumnParent cp = new ColumnParent();
>       cp.setColumn_family("Standard1");
>       ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>       for (char a='a'; a < 'g'; a++){
>         Column c1 = new Column();
>         c1.setName((a+"").getBytes());
>         c1.setValue(new byte [0]);
>         c1.setTimestamp(System.nanoTime());
>         server.insert(key, cp, c1, ConsistencyLevel.ONE);
>       }
>       
>       FilterDesc d = new FilterDesc();
>       d.setSpec("GROOVY_CLASS_LOADER");
>       d.setName("limit3");
>       d.setCode("import org.apache.cassandra.dht.* \n" +
>               "import org.apache.cassandra.thrift.* \n" +
>           "public class Limit3 implements SFilter { \n " +
>           "public FilterReturn filter(ColumnOrSuperColumn col, 
> List<ColumnOrSuperColumn> filtered) {\n"+
>           " filtered.add(col);\n"+
>           " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>           "} \n" +
>         "}\n");
>       server.create_filter(d);
>       
>       
>       ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>       Assert.assertEquals(3, res.results.size());
>     }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to