[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

Edward Capriolo (JIRA) Wed, 19 Feb 2014 06:41:34 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905500#comment-13905500
 ]


Edward Capriolo commented on CASSANDRA-6704:
--------------------------------------------

Recent changes:

{code}
struct ScannerCreateDesc {
    1:required string cfname,
    2:required string filter_name,
    3:required binary key,
    4:optional binary start_column,
    5:optional binary end_column,
    6:required i32 slice_size,
    7:optional ConsistencyLevel consistency_level=ConsistencyLevel.ONE,
    8:optional map<string,binary> params
}
{code}

{code}
public interface ScanFilter {
  public FilterReturn filter(ColumnOrSuperColumn col, ScannerState state);
}
{code}

Adding the params allows us to create more generic scanners. Before the scanner 
Limit3 was implemented. However now we can do things like LIMIT X

{code}
  @Override
  public FilterReturn filter(ColumnOrSuperColumn col, ScannerState state) {
    state.getFiltered().add(col);
    return state.getFiltered().size()< 
ByteBufferUtil.toInt(state.getParams().get("limit")) 
            ? FilterReturn.FILTER_MORE : FilterReturn.FILTER_DONE;
  }
{code}

Also thinking more broadly scanners could work from CQL, Hive has a feature 
called UDTF (https://issues.apache.org/jira/browse/HIVE-1614) that takes in 
zero to many columns and produces zero to many rows with one to many columns. 
this roughly equates to the Scanner interface I am working on. Session level 
tracking with need to record the position in the row so that a second 
disconnected query can pick up where the first left off. I will draft this up 
later.

> Create wide row scanners
> ------------------------
>
>                 Key: CASSANDRA-6704
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Edward Capriolo
>            Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
>     @Test
>     public void test_scanner() throws Exception
>     {
>       ColumnParent cp = new ColumnParent();
>       cp.setColumn_family("Standard1");
>       ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>       for (char a='a'; a < 'g'; a++){
>         Column c1 = new Column();
>         c1.setName((a+"").getBytes());
>         c1.setValue(new byte [0]);
>         c1.setTimestamp(System.nanoTime());
>         server.insert(key, cp, c1, ConsistencyLevel.ONE);
>       }
>       
>       FilterDesc d = new FilterDesc();
>       d.setSpec("GROOVY_CLASS_LOADER");
>       d.setName("limit3");
>       d.setCode("import org.apache.cassandra.dht.* \n" +
>               "import org.apache.cassandra.thrift.* \n" +
>           "public class Limit3 implements SFilter { \n " +
>           "public FilterReturn filter(ColumnOrSuperColumn col, 
> List<ColumnOrSuperColumn> filtered) {\n"+
>           " filtered.add(col);\n"+
>           " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>           "} \n" +
>         "}\n");
>       server.create_filter(d);
>       
>       
>       ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>       Assert.assertEquals(3, res.results.size());
>     }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

Reply via email to