subject:"\[jira\] \[Commented\] \(CASSANDRA\-6704\) Create wide row scanners"

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-27 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13915489#comment-13915489
 ] 

Edward Capriolo commented on CASSANDRA-6704:


[~jbellis] After poking around in the slice SliceQueryFilters and other pieces 
I get what you are saying. It is definitely better that the majority of the 
code be implemented closer to the ColumnFamily/StorageProxy rather then higher 
up in thrift. I will read the other tickets and study that a bit. Then 
potentially come up with a lower level implementation.

Does anyone like the idea of using the Dynamic Loading feature to implement 
triggers while still controlling the feature with this configuration knob? 
dynamic_loading:
- JAVA_LOCAL_CLASSPATH
- GROOVY_CLASS_LOADER

That seems to be a nice win working with triggers without having to ship around 
jar files? Should I open up another ticket?

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-19 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906147#comment-13906147
 ] 

Edward Capriolo commented on CASSANDRA-6704:


I have to read more the other tickets. If you were going to CQL-i-size this 
scanner concept you could do it this way.
{code}
create function rolling_avg as 'com.whatever'
cql>start_scan select rolling_avg ( col ) as col1 from table where key in 'xyz' 
limit 1000
{code}
This would make a scanner with a range_size=1000;
This would return something like:

{code}
scanner_id, rolling_avg, status
54, 4.5,scanner_continue
{code}

Then you would call

{code} 
cql> next_scan 54
{code}

which world return

{code}
scanner_id, rolling_avg, status
54, 4.2, scanner_done
{code}



> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-19 Thread Tupshin Harper (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905973#comment-13905973
 ] 

Tupshin Harper commented on CASSANDRA-6704:
---

I'm all in favor of this. I'd love to see a UDTF equivalent in Cassandra and 
CQL that could allow us to do a lot of deep mucking with server-side processing 
in a plug-able way. My suggestion and request would be that you practically and 
conceptually isolate that feature (a scanner/UDTF interface) from the other 
aspects of this ticket. With a sane interface, I expect there would be minimal 
objections. I know that this, by itself, doesn't meet all of your objectives, 
but it moves us in the right direction.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-19 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905500#comment-13905500
 ] 

Edward Capriolo commented on CASSANDRA-6704:


Recent changes:

{code}
struct ScannerCreateDesc {
1:required string cfname,
2:required string filter_name,
3:required binary key,
4:optional binary start_column,
5:optional binary end_column,
6:required i32 slice_size,
7:optional ConsistencyLevel consistency_level=ConsistencyLevel.ONE,
8:optional map params
}
{code}

{code}
public interface ScanFilter {
  public FilterReturn filter(ColumnOrSuperColumn col, ScannerState state);
}
{code}

Adding the params allows us to create more generic scanners. Before the scanner 
Limit3 was implemented. However now we can do things like LIMIT X

{code}
  @Override
  public FilterReturn filter(ColumnOrSuperColumn col, ScannerState state) {
state.getFiltered().add(col);
return state.getFiltered().size()< 
ByteBufferUtil.toInt(state.getParams().get("limit")) 
? FilterReturn.FILTER_MORE : FilterReturn.FILTER_DONE;
  }
{code}

Also thinking more broadly scanners could work from CQL, Hive has a feature 
called UDTF (https://issues.apache.org/jira/browse/HIVE-1614) that takes in 
zero to many columns and produces zero to many rows with one to many columns. 
this roughly equates to the Scanner interface I am working on. Session level 
tracking with need to record the position in the row so that a second 
disconnected query can pick up where the first left off. I will draft this up 
later.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-19 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905485#comment-13905485
 ] 

Edward Capriolo commented on CASSANDRA-6704:


{quote}So in the context of my suggestion above, you get back a list of 
iterators. One iterator per partition? I would heartily endorse that because I 
almost suggested adding that additional complexity when I wrote it up in the 
first place.{quote}

I was actually suggesting that two scanners could implement something like 
http://en.wikipedia.org/wiki/Sort-merge_join across two column families. 
However there are other applications as well like you suggest inside a single 
row.



> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-18 Thread Tupshin Harper (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905047#comment-13905047
 ] 

Tupshin Harper commented on CASSANDRA-6704:
---

So in the context of my suggestion above, you get back a list of iterators. One 
iterator per partition? I would heartily endorse that because I almost 
suggested adding that additional complexity when I wrote it up in the first 
place.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-18 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904480#comment-13904480
 ] 

Edward Capriolo commented on CASSANDRA-6704:


But I see no reason why a query language could not use a scanner to help it 
answer a query. Potentially paging the results of N scans at once or something 
like that.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-18 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904466#comment-13904466
 ] 

Edward Capriolo commented on CASSANDRA-6704:


{quote} Today people come to Cassandra with the expectation of a coherent 
design that makes choices for them.{quote} 

I'm a user. That is not why I come to Cassandra. On these other tickets 
mentioned in this jira I see few users commenting or voting. For reference 
there are 60 followers of https://github.com/zznate/intravert-ug . This allows 
me to argue that not all users wish to be locked away from the database with an 
access language.

{quote} That is why the right way forward for the project as a whole is to 
figure out the right semantics are for CQL{quote}

A blocking query language can not achieve what scanners can. 

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-18 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13904253#comment-13904253
 ] 

Jonathan Ellis commented on CASSANDRA-6704:
---

bq. To step back for a bit, what we have here is an interesting feature 
developed with minimal intrusion to the plumbing. To encapsulate this a bit 
better, what if the non-thrift parts - the filter marshaling and execution - 
were pushed into a new StorageProxy method?  Then we can all go hack on our 
transports of choice.

The problem is that an API is more than a transport.  Syntax matters, Turing 
completeness arguments aside.  That is why the right way forward for the 
project as a whole is to figure out the right semantics are for CQL.  If we are 
lucky and it also makes sense in the Thrift world (as with cas and 
atomic_batch_mutate), so much the better.  If not, then we should leave it out 
rather than forking things.  We're past the days of being an "index 
construction kit;" today people come to Cassandra with the expectation of a 
coherent design that makes choices for them.  One of those choices is CQL.

(This doesn't mean we can't prototype things out in Thrift in the meantime, but 
it does mean that you're taking a risk by doing so if the design ends up going 
in a different direction.)

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-17 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903436#comment-13903436
 ] 

Edward Capriolo commented on CASSANDRA-6704:


the implementation looks like this.

{code}
@Override 
public void create_filter(FilterDesc desc) throws InvalidRequestException,
UnavailableException, TimedOutException, TException {
 
  ClientState cState = state();
  NitDesc.NitSpec spec = NitSpec.valueOf(desc.spec);
  if (!DatabaseDescriptor.getDynamicLoading().contains(spec)){
throw new InvalidRequestException(spec +" is not in allowed list "+ 
DatabaseDescriptor.getDynamicLoading());
  }
{code}

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-17 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903434#comment-13903434
 ] 

Edward Capriolo commented on CASSANDRA-6704:


Latest commit supports a yaml parameter.

{code}
dynamic_loading:
- JAVA_LOCAL_CLASSPATH
- GROOVY_CLASS_LOADER
{code}

You can use this to disable any dynamic loading.

{code}
@Test
public void testIllegalDynamic() throws InvalidRequestException, 
UnavailableException, TimedOutException, TException{
  FilterDesc d = new FilterDesc();
  d.setSpec("CLOJURE_CLOSURE");
  d.setName("limit9");
  boolean noClojure = false;
  try{
server.create_filter(d);
  } catch (InvalidRequestException ex){
noClojure = true;
  }
}
{code}

JAVA_LOCAL_CLASSPATH means "allow mechanism to load using Class.forName"


> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-16 Thread Tupshin Harper (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902781#comment-13902781
 ] 

Tupshin Harper commented on CASSANDRA-6704:
---

And I should note that I meant that it should get an iterator of the result 
set,  and that iterator should be interruptable. I'm quite open to other 
mechanisms, though. 

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Tupshin Harper (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902580#comment-13902580
 ] 

Tupshin Harper commented on CASSANDRA-6704:
---

CASSANDRA-4949 addresses trigger reloading. I would assume it would apply to 
this as well. I share your need to be able to hot-patch/swap these.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902572#comment-13902572
 ] 

Edward Capriolo commented on CASSANDRA-6704:


{quote} One to act on on the query itself before it is executed, and another to 
act on the result set of any query. {quote}
In many cases it is not enough to act on the result set of the query. Scanners 
return a status that is interpreted by the framework allowing the processing to 
continue or not. For example, imagine a wide row with 1 columns and my 
goal is to search until I find a column that is even. I can not materialize the 
result set first and them trim it down that would likely OOM.

That however is a totally valid use case. Intravert calls that a filter 
https://github.com/zznate/intravert-ug/wiki/Filter-mode. This could be 
implemented easy enough by allowing a SlicePredicate to supply an option 
FilterFunction. Although that has one weird issue. If the filter leaves out the 
last row, how do you know what the last row filtered was.

{quote}The main thing that would be sacrificed, with respect to this ticket, 
would be embedded groovy in select statements, as I believe this is the most 
controversial aspect.{quote} 

Think about this. You create a function you load it to 30 servers. It is found 
to have a bug. What do you do? Bob wants to create a new function??? Lets shut 
down the entire cluster. Schedule and outage to rolling restart every server? 
Without being able to load unload it is just a toy no one can really use it in 
production in any meaningful way. One you put the proper cap on it and disallow 
the features to those that fear it the problem is solved. 

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Tupshin Harper (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902558#comment-13902558
 ] 

Tupshin Harper commented on CASSANDRA-6704:
---

Given the dissension over this issue, and given my shared interest in many of 
the objectives of this ticket (over and above the overlap with CASSANDRA-6167), 
I'd like to propose an alternative way forward.

What if we were to create an interface exactly analogous to triggers that would 
have two hooks (instead of the single one for triggers). One to act on on the 
query itself before it is executed, and another to act on the result set of any 
query.

The result would be jar deployment of a SELECT equivalent of triggers, and 
would have all the same pros and caveats as triggers.

Admin deployment, and table-level permissions to use them would be the same.

The main thing that would be sacrificed, with respect to this ticket, would be 
embedded groovy in select statements, as I believe this is the most 
controversial aspect. But it would provide a mechanism around which to discuss 
the possibility of embedded turing complete scripting in CQL in the future. 

Thiis would appear to provide Ed the necessary hooks to achieve most of his 
goals by automating groovy->jar deployment outside of core cassandra code.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902475#comment-13902475
 ] 

Edward Capriolo commented on CASSANDRA-6704:


For reference: Hive allows users to add jar files to the class path and then 
load user defined functions from them. Hive has similar issues with 
"unfortunately" scoped static variables. In practice the plug-ability is a huge 
win, regardless of some subtle risks.

Recently I added the capability to create groovy udfs from the hive shell using 
a similar technique.

https://issues.apache.org/jira/browse/HIVE-5250

We also provided a way for admins to allow and disallow this code

https://issues.apache.org/jira/browse/HIVE-5400

The net result is a huge win. Before we had users that did not have the time to 
set up a development environment (get jars, build jars, email jars to admins, 
push jars to cluster). It was not "security" it was just lots of hoop jumping. 
It kept people from answers they wanted. When done correctly this is 
transformative and powerful. Users have new avenues to access there data. 
Sure you give someone some rope they can hang themselves with a see loop or 
whatever, that is the down. Not saying it is perfect. 

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902467#comment-13902467
 ] 

Edward Capriolo commented on CASSANDRA-6704:


{quote}
These security holes should NOT be the default.{quote}

Just to be clear. A system admin deleting the groovy-all.jar is not secure but 
a system-admin adding a  my-trigger.jar is? 

lets not fight I agree. I will add a security role to cassandra that will allow 
someone to reject loading groovy for this patch.



> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902466#comment-13902466
 ] 

Benedict commented on CASSANDRA-6704:
-

bq. Most people just have hudson pushing jars they care about and restarting.

Let's not make this awful security hole for them. Let's let them make it for 
themselves.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902465#comment-13902465
 ] 

Edward Capriolo commented on CASSANDRA-6704:


Also just for reference having a sys admin deployment blocker, security system, 
is not actual security. Are sysadmin's going to be able to code review the jars 
and make sure the jar has no exploits? What if people are coming up with 19 
triggers a day, sysadmin will just automate the trigger deployment. In today's 
rapid deploy/continuous development world security by sysadmin slowdown is not 
actually security. Most people just have hudson pushing jars they care about 
and restarting. 

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902464#comment-13902464
 ] 

Benedict commented on CASSANDRA-6704:
-

bq. Or sysadmin delete groovy.jar from classpath. Problem solved.

This is equivalent to sysadmin dropping *in* the groovy.jar. Problem solved.

These security holes should NOT be the default.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902462#comment-13902462
 ] 

Edward Capriolo commented on CASSANDRA-6704:


Here are some security points.

1) There are three thrift endpoints 
* compiling end point
* scanner_start
* scanner_next

This feature still can work if the compiling endpoint is remove. Users will 
only be able to access SFilter instances statically compiled in java included 
on the classpath like triggers.

2) User can remove the groovy.jar from classpath
Cant compile groovy without groovy jar

3) The internal bit

Guess what? You really can not protect the world from itself.
https://dev.mysql.com/doc/refman/5.0/en/udf-compiling.html

Does mysql have plugable UDFs increase an attack vector? Is that how people 
attach mysql generally? 



> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902463#comment-13902463
 ] 

Edward Capriolo commented on CASSANDRA-6704:


{quote}
As previously discussed, these can only be deployed by sysadmins, so are an 
acceptable security risk. Sysadmins can already do as much damage as they like, 
so this isn't an exploit.{quote{
Then we add a config variable or a role that disables groovy compiling. Or 
sysadmin delete groovy.jar from classpath. Problem solved.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902461#comment-13902461
 ] 

Benedict commented on CASSANDRA-6704:
-

bq. Do triggers have the same issue?

As previously discussed, these can *only* be deployed by sysadmins, so are an 
acceptable security risk. Sysadmins can already do as much damage as they like, 
so this isn't an exploit.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902458#comment-13902458
 ] 

Edward Capriolo commented on CASSANDRA-6704:


{quote}
This is exactly what I'm referring to. But even without the exposed static 
functionality there would be basically the same problem because of reflection, 
which without a SecurityManager would allow the user to access almost anything 
they desired anyway. Running a SecurityManager is a major headache that's best 
avoided, but would be pretty much essential to avoid the user accessing Unsafe 
and doing really dangerous things. I'm not sure if a SecurityManager is 
actually possible with the current state, as we have jars that share package 
namespaces (e.g. disruptor thrift server) which I'm pretty sure are expressly 
forbidden. We'd probably also need to start signing jars for this to work, 
which then makes development and debugging a PITA. Note that without this we'd 
not only be permitting clients to compromise C*, but the node it is running on 
as well.
{quote} 
Do triggers have the same issue?

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902457#comment-13902457
 ] 

Benedict commented on CASSANDRA-6704:
-

bq. Now unfortunately the Cassandra server is packed with globally reachable 
static objects

This is exactly what I'm referring to. But even without the exposed static 
functionality there would be basically the same problem because of reflection, 
which without a SecurityManager would allow the user to access almost anything 
they desired anyway. Running a SecurityManager is a major headache that's best 
avoided, but would be pretty much essential to avoid the user accessing Unsafe 
and doing *really* dangerous things. I'm not sure if a SecurityManager is 
actually possible with the current state, as we have jars that share package 
namespaces (e.g. disruptor thrift server) which I'm pretty sure are expressly 
forbidden. We'd probably also need to start signing jars for this to work, 
which then makes development and debugging a PITA. Note that without this we'd 
not only be permitting clients to compromise C*, but the node it is running on 
as well.

These things all need to be carefully assessed/discussed.



> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902453#comment-13902453
 ] 

Edward Capriolo commented on CASSANDRA-6704:


{quote}Possibly, although "create function" rights in normal databases do not 
offer the client the ability to access product internals. This is an unexpected 
and dangerous behaviour, one that not everyone is going to get behind, and 
waving it away because it does not concern you does not mean it is not a valid 
concern.{quote}

In the current code the SFilter implementer has the ability to:
1) touch the row moving through the filter in flight.
2) Modify a list of rows that will be send back to the client.

We can be extra safe with try catch and duplicating the byte buffers thus 
effecting performance.

Now unfortunately the Cassandra server is packed with globally reachable static 
objects. Maybe this is what you are talking about as "internals". I do not have 
a solution there. One one hand it is nice to be able to get at things like 
metadata, and other stuff. On the other hand well it just sucks, that there is 
static stuff floating around everywhere. 

{quote}I mostly agree with you, although not everyone will. However I don't 
think rushing into one person's ideal solution is the way forward for a large 
project like Cassandra.{quote} Im not saying we should rush forward either. But 
then again once the feature works and is tested I am not going to advocate 
slowing down either.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902444#comment-13902444
 ] 

Benedict commented on CASSANDRA-6704:
-

bq. Saying that users will be confused by two ways to do something is not a 
concern.  If you could argue that some of those other features would be done in 
1 week or 1 month maybe, but in reality they look very far off.

You contradict yourself here. Either it is a valid concern or not; the 
timeliness of the conflict is irrelevant. Especially since features do not 
become widespread until months after release, so 1 month is a very short time 
horizon. Try 1-2yrs for a reasonable *minimum* distance between features if you 
want to release approaches that conflict, IMO.

bq. Suggesting my code go into a fork because it may be redundant with some 
undone future work by someone else is just plain silly.

Calling the concerns of several other people "plain silly" doesn't seem fair. 
You may not be concerned, but clearly several others are. They are not all 
being "plain silly"  - we're not all out to stamp on your dreams.

bq. Sandboxing should be achieved by roles like a normal database new features 
should be added like. 'GRANT CREATE FUNCTION'. Making a feature hard to use is 
not security.

Possibly, although "create function" rights in normal databases do not offer 
the client the ability to access product internals. This is an unexpected and 
dangerous behaviour, one that not everyone is going to get behind, and waving 
it away because it does not concern you does not mean it is not a valid concern.

bq. Cassandra has a large deficit here

I mostly agree with you, although not everyone will. However I don't think 
rushing into one person's ideal solution is the way forward for a large project 
like Cassandra. Let's take some time to reach consensus on how to address this, 
if we want to (others I'm sure think keeping the compute side separate from 
Cassandra is the right way forward, and we shouldn't ride roughshod over their 
position without addressing any concerns).

Personally, I would like to see tight integration with a scripting language at 
some point in Cassandra. But I think that integration needs to be carefully 
considered, and not rushed into. Not at the official release level, anyway.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can s

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902440#comment-13902440
 ] 

Edward Capriolo commented on CASSANDRA-6704:


{quote}
They are also expressing concern that it will create a polluted vision of the 
future of C*, .{quote}

That is an invalid concern. You are saying that my working code is polluting 
some future vision of Cassandra. All I see from  CASSANDRA-6167 and friends is 
tickets with some talk, no action, and few followers. Suggesting my code go 
into a fork because it may be redundant with some undone future work by someone 
else is just plain silly. 

Saying that users will be confused by two ways to do something is not a 
concern. If you could argue that some of those other features would be done in 
1 week or 1 month maybe, but in reality they look very far off. 

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902438#comment-13902438
 ] 

Edward Capriolo commented on CASSANDRA-6704:


{quote}
This is very different. You're equating admins setting up the system with users 
querying it, which are not the same. In your system, it may be, as you may 
control all access paths to the database. But this is not the common case, and 
we should not assume it is. Sandboxing seems absolutely essential, or it needs 
to be disabled by default, in which case why not just have them drop in an 
extra jar?
{quote}
Sandboxing should be achieved by roles like a normal database new features 
should be added like. 'GRANT CREATE FUNCTION'. Making a feature hard to use is 
not security.  

I




> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902432#comment-13902432
 ] 

Edward Capriolo commented on CASSANDRA-6704:


To declare this feature another way:

MongoDB has In-Database MapReduce  (javascript queires that can be supplied 
at runtime)

Cassandra has a large deficit here. We have CQL which only tackles some 
problems with limited semantics, and we have hadoop map reduce that only 
tackles really big problems.

This ticket gives us a now solution. To tackle problems that CQL can't. If the 
average mongo user can tackle mongo map reduce, the average cassandra users 
should be able to write a function. IMHO




> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902430#comment-13902430
 ] 

Benedict commented on CASSANDRA-6704:
-

bq. You started the answer with "sorta". You are still allowing a user to put 
code in the execution path. It is the exact same problem, if you let someone 
compile dynamic code or you let the admin put the jar in a folder. You give 
someone the potential to break something. All dynamic compiling does is make 
the result faster to break and faster to fix.

This is very different. You're equating admins setting up the system with users 
querying it, which are not the same. In your system, it may be, as you may 
control all access paths to the database. But this is not the common case, and 
we should not assume it is. Sandboxing seems absolutely essential, or it needs 
to be disabled by default, in which case why not just have them drop in an 
extra jar?

bq. Please do not imply that this feature is not coherent, or bad which has 
been done several times already. This is a good feature.

I am not suggesting it is an incoherent or bad feature in isolation, and I 
don't think anybody is. When I say coherent, I mean how it fits in with the 
overall progress and development of the project, and how users interact with 
the database. This is a pretty left field introduction, that doesn't fit 
cleanly with anything we have currently. I do not intend to give the impression 
I am judging the feature itself negatively; in fact I think it's pretty neat. I 
just think whether neatness is enough is up for discussion.

bq. I am not asking other developers who make features with no votes to put 
changes and in forks you should not ask me to do the same.

I am sorry, but I don't follow this?

The other developers here are expressing concern that this new feature will 
place a future burden on them that you will not be able to alleviate. I think 
given the nature and scope of the change that is a reasonable concern, that is 
not brushed over lightly. It is not that anybody is singling out "your changes" 
- this concern would be true regardless of the person suggesting it and, 
frankly, were it not for your position in the community there probably would 
not have been anywhere near this level of serious engagement with the 
discussion.

They are also expressing concern that it will create a polluted vision of the 
future of C*, with multiple conflicting ways to achieve something, both of 
which nontrivial to understand, creating further demands on them and the wider 
community in trying to explain all of these features to newcomers, with the 
confusion potentially further exacerbating the negative perception of 
Cassandra's ease of use.



> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterR

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902424#comment-13902424
 ] 

Edward Capriolo commented on CASSANDRA-6704:


{quote}
Sort of, but there are some important differences: 1) as Brandon says, the code 
is clearly vetted by the database dev team deploying triggers, which can't be 
said here; and 2) we're all Java experts here, and the execution context is the 
normal execution context of Cassandra, which again we're all familiar with. 
Helping users with issues from dynamic class compilation / loading of languages 
we don't understand is quite a different matter IMO, especially once sandboxing 
is introduced (which really would be essential as C*'s internal APIs are not 
safe to be accessed, nor protected, and could be used dangerously). It's not 
clear to me this will be pain free from our side to ensure it always works, 
either.

Also, with triggers we can more easily justify API breakages across minor/major 
versions that require some work when upgrading, as they're well contained 
within their Cassandra deployment, however if we expose internal APIs to client 
code we will necessarily see more pushback on rapid development of these APIs, 
as the difficulty for users to migrate will be increased.
{quote
You started the answer with "sorta". You are still allowing a user to put code 
in the execution path. It is the exact same problem, if you let someone compile 
dynamic code or you let the admin put the jar in a folder. You give someone the 
potential to break something. All dynamic compiling does is make the result 
faster to break and faster to fix. 
{quote}
I think it's a pretty nice underlying goal, but it's a really heavyweight 
feature that needs to be approached cautiously, and as Sylvain says, preferably 
coherently.{quote}
Please do not imply that this feature is not coherent, or bad which has been 
done several times already. This is a good feature.

{quote}I do wonder if it mightn't be possible to offer this as an easy to apply 
patch in the meantime, outside of the main Apache repository. {quote}
Nice compromise but I do not the apache way. This option is forcing me into a 
fork. This is a non breaking change. It is new feature. Cassandra is an open 
source project. I am a user. I want a feature. Someone else on thread says 
{quote}IMO, harnessing the invokeDynamic stuff in the JVM thusly could have 
some compelling applications for us. {quote} I am not asking other developers 
who make features with no votes to put changes and in forks you should not ask 
me to do the same. 

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerRe

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Nate McCall (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902412#comment-13902412
 ] 

Nate McCall commented on CASSANDRA-6704:


Thanks, Benedict - you make some very valid points regarding inclusion of a 
full language at runtime.

> I do wonder if it mightn't be possible to offer this as an easy to apply 
> patch in the meantime, outside of the main Apache repository.

What about trying this as via extending cassandra.thrift and CassandraServer? I 
think this approach has been done before somewhere :)

This would keep it isolated in a single jar so the community could horse around 
with it. Thoughts?


> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-15 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902405#comment-13902405
 ] 

Benedict commented on CASSANDRA-6704:
-

bq.  Essentially the same thing, user code running inside cassandra.

Sort of, but there are some important differences: 1) as Brandon says, the code 
is clearly vetted by the database dev team deploying triggers, which can't be 
said here; and 2) we're all Java experts here, and the execution context is the 
normal execution context of Cassandra, which again we're all familiar with. 
Helping users with issues from dynamic class compilation / loading of languages 
we don't understand is quite a different matter IMO, especially once sandboxing 
is introduced (which really would be essential as C*'s internal APIs are *not* 
safe to be accessed, nor protected, and could be used dangerously). It's not 
clear to me this will be pain free from our side to ensure it always works, 
either.

Also, with triggers we can more easily justify API breakages across minor/major 
versions that require some work when upgrading, as they're well contained 
within their Cassandra deployment, however if we expose internal APIs to client 
code we will necessarily see more pushback on rapid development of these APIs, 
as the difficulty for users to migrate will be increased.

bq. The language that you chose to implement the filter with is your call.

This only seems to make my issue (1) worse, to my eyes

bq. think about all the cql iterations like cql2 , execute_cql, execute_cql_3. 
Set keyspace set _consistency level.

Well, these things are all still present. We may retire CQL2 soon, but that has 
the advantage of having very quickly been superseded by CQL3, which to my 
knowledge does not have dramatically different syntax anyway - and yet it still 
has stuck around. It's not yet clear what this would be superseded by, or if 
the functionality would map easily, and maintaining a deprecated access method 
doesn't reduce the support burden.

I think it's a pretty nice underlying goal, but it's a really heavyweight 
feature that needs to be approached cautiously, and as Sylvain says, preferably 
coherently.

I do wonder if it mightn't be possible to offer this as an easy to apply patch 
in the meantime, outside of the main Apache repository. There are definitely 
some users that would be happy with the security risks and would love this to 
play with, but those people are power users who would be comfortable applying a 
simple patch to their C* instance, and would not contribute excessively to the 
support burden as they'd be competent enough to figure out any issues they have.

Just my 2c, anyway.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
>

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-14 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902317#comment-13902317
 ] 

Edward Capriolo commented on CASSANDRA-6704:


github updated. create_scanner and next_scanner are implmented. Using this 
interface to sum the columns.
{code}
@Test
public void test_summer() throws Exception
{
  ColumnParent cp = new ColumnParent();
  cp.setColumn_family("Standard1");
  ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
  for (int i=0; i < 10; i++){
Column c1 = new Column();
c1.setName((i+"").getBytes());
c1.setValue(new byte [0]);
c1.setTimestamp(System.nanoTime());
server.insert(key, cp, c1, ConsistencyLevel.ONE);
  }
  
  FilterDesc d = new FilterDesc();
  d.setSpec("JAVA_LOCAL_CLASSPATH");
  d.setName("org.apache.cassandra.thrift.FilterSum");
  d.setCode("");
  server.create_filter(d);
  
  ScannerResult res = server.create_scanner("Standard1", 
"org.apache.cassandra.thrift.FilterSum", key, 
ByteBuffer.wrap("1".getBytes()),10);
  Assert.assertEquals("45", 
ByteBufferUtil.string(res.results.get(0).column.name));
  Assert.assertEquals(FilterReturn.FILTER_DONE, res.getFilter_status());
}
{code}

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-14 Thread Brandon Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902249#comment-13902249
 ] 

Brandon Williams commented on CASSANDRA-6704:
-

bq. look at triggers. Essentially the same thing, user code running inside 
cassandra

Not quite, there is at least one very large difference here, and that is from 
the perspective of security.  If someone wants to run a custom trigger, their 
code can be audited and vetted by the persons responsible for making sure it 
doesn't access something it shouldn't or do anything malicious before they 
implement it on the cluster.  If you introduce a turing complete language that 
the user can remotely invoke on the cluster, now it becomes *our job* to 
sandbox this correctly.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-14 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902235#comment-13902235
 ] 

Edward Capriolo commented on CASSANDRA-6704:


1) your point is well taken, but look at triggers. Essentially the same thing, 
user code running inside cassandra. This is kinda the opposite, code on read 
path not write path. 
2)The nit_compiler can compile javascript, groovy, and clojure out of the box. 
You are just implementing a method. The language that you chose to implement 
the filter with is your call. 
3) we can always roll back, think about all the cql iterations like cql2 , 
execute_cql, execute_cql_3.  Set keyspace set _consistency level. these things 
figuret hemselves out

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-14 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902200#comment-13902200
 ] 

Benedict commented on CASSANDRA-6704:
-

I think the two issues that aren't being addressed effectively here are:

1) The support burden of introducing a whole new (*turing complete*) language 
into the database; and (if we decide this is acceptable)
2) What language would be suitable?

Both are very difficult questions, and to make assumptions about either is 
dangerous, as there is no stepping back from the decision once it's released. 
Some users will rely on it, and it will have to be maintained. Guaranteeing 
those hours of support burden is difficult, and not something easily committed 
to (or convincingly, given there is no mechanism by which anybody can require 
somebody contribute that support).

As to 1 (ignoring 2): any turing complete language is going to have interesting 
and unexpected interactions with Cassandra once let loose upon the world. To 
assume that the support burden will be low is very optimistic: naturally we 
will take on some support burden of users of the language *itself*, as users do 
not understand where they are making the mistake, be it in their interaction 
with Cassandra or the language. But also there will be (probably many) 
unintended edge cases, ones we don't expect and cannot predict because we do 
not fully understand both sides of the equation, and that even if we did the 
combination of the two is frankly impossible to model in our heads. These edge 
cases will change and continually present themselves with each version and new 
feature in Cassandra, and in the language itself.

2) The choice of Groovy itself is also likely to be a strong point of 
contention. It may be quick to put in place, but I disagree with your 
assertions that it is intuitive. I find it powerful in some situations, but it 
has some very strange scoping behaviours, and I found myself quite unproductive 
with it for at least the first day, which is a pretty poor track record given 
my Java background and how straight forward it should ostensibly be. I don't 
want to be on the other end of the user confusion, frankly. This, and it does 
not have a strong backing in general; not weak, but not incredibly widely used. 
And further, it - to me- seems a slightly lazily put together language. Useful 
features, expressive, but not coherently designed with a clearly defined goal 
and purpose or specifications. This is only my impression of it. The point 
being it is a point of contention, and not easily brushed under the carpet.

So, as far as I can see, even *if* we decide that (1) is acceptable and we want 
to include a turing complete language - or any languge other than CQL - and 
that we are confident we can safely support it, we still need to collectively 
address (2) carefully given that we cannot roll back the decision.



> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public Filt

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-14 Thread Nate McCall (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902102#comment-13902102
 ] 

Nate McCall commented on CASSANDRA-6704:


Collective deep breath...

To step back for a bit, what we have here is an interesting feature developed 
with minimal intrusion to the plumbing. To encapsulate this a bit better, what 
if the non-thrift parts - the filter marshaling and execution - were pushed 
into a new StorageProxy method? 

Then we can all go hack on our transports of choice. I seem to recall us taking 
this approach with other features in the past. 

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-14 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901803#comment-13901803
 ] 

Edward Capriolo commented on CASSANDRA-6704:


{quote}
I'm sorry but I disagree. When you open a ticket on this JIRA, you're not 
really in "scratching my own itch in my own backyard" territory anymore, you're 
saying "I'm suggesting this for the Cassandra project".{quote}

Apache has countless documents and guides on this. There is no official 
language on not developing new thrift features. In fact you pointed out CAS 
which was just added to thrift. The only words I have hear is that "thrift 
support" is "not going anywhere". Now your trying to bend this interpretation 
to mean "thrift can only have features that CQL has" or "thrift can only have 
features if I feel like supporting them". 

If this is how cassandra is going to be run then close the ticket. I'm out. I'd 
done. Seriously.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-14 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901737#comment-13901737
 ] 

Sylvain Lebresne commented on CASSANDRA-6704:
-

bq. Which is why I should be able to scratch my own itch.

I'm sorry but I disagree. When you open a ticket on this JIRA, you're not 
really in "scratching my own itch in my own backyard" territory anymore, you're 
saying "I'm suggesting this for the Cassandra project". And the Cassandra 
project is about more that just everyone scratching their own itches in 
isolation because that's a crappy way to develop software: we're trying to 
build a coherent piece of software. Don't get me wrong, itching can be a good 
motivation and the start of new ideas, but itching doesn't give you an inherent 
right to get something committed.

bq. Also no one every said this has to be a thrift only feature. I just chose 
to build the POC in thrift because this was easier for me.

Fair enough, but what I'm saying is "CQL is the Cassandra API moving forward" 
(that's the direction the project has been following for more than a year now) 
and so adding something to CQL and optionally to thrift (the legacy API) if 
that's trivial and relatively maintenance free is fine, but adding something to 
thrit and "maybe later to CQL but it's unclear how" is kind of not ok since it 
goes in the opposite direction of the project direction.

And well, so far, all you've provided us is a thrift-only POC and asked for 
criticism on the design. So I'm saying that, as is, that's kind of not really 
ok.  If you have something to suggest that is a good fit for CQL and just 
started with Thrift out of familiarity with it, then please do go on, but since 
CQL is the important part as far as the Cassandra project is concerned, I'll 
reserve judgement until the important part is here to see. If we then need less 
than 200 lines of additional code on top of that hypothetical solution to 
support Thrift too, then why not, I probably won't object to that.




> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-14 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901555#comment-13901555
 ] 

Edward Capriolo commented on CASSANDRA-6704:



{quote}This ticket seems non trivial and thrift-only by design and so, for the 
reason I just expressed, {quote}. 

Also no one every said this has to be a thrift only feature. I just chose to 
build the POC in thrift because this was easier for me. 

I have some unique prospective on this from working with hive for a long time, 
Hive has UDFs, UDTFs, UDAF, WindowingFunctions. The work to build the query 
language and planner to support all these constructs is very complex.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-14 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901548#comment-13901548
 ] 

Edward Capriolo commented on CASSANDRA-6704:


{quote}Didn't Sylvain just cover this? Development resources are never 
infinite.{quote}

Which is why I should be able to scratch my own itch. He also said :

{quote}
and we are even fine exposing some new features through it when that require 
very little maintenance effort (CAS for instance).{quote}

Thus far this patch is less then 200 lines of code. 



> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-14 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901490#comment-13901490
 ] 

Jonathan Ellis commented on CASSANDRA-6704:
---

bq. If other approaches are better clearly they would have been implemented by 
now

Didn't Sylvain just cover this?  Development resources are never infinite.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-14 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901470#comment-13901470
 ] 

Edward Capriolo commented on CASSANDRA-6704:


The argument that blocking cql queries that do aggregations and thus are the 
same thing as scanners is wrong. The argument that custom udfs are the same 
thing as scanners is also wrong.

Scanners will give us the ability to do a special client server side paging not 
possible in a blocking query language.

Additionally it is giving a simple way to do all these open tickets which 
speaks volumes to this implementation. 

The 2006 scanner approach is a valid one. It is a good technical solution and 
having that solution can only be a win. If other approaches are better clearly 
they would have been implemented by now.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-14 Thread Sylvain Lebresne (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901463#comment-13901463
 ] 

Sylvain Lebresne commented on CASSANDRA-6704:
-

bq. Everything CQL is right, and everything else is wrong?

I don't think that's really what people mean here. I believe the concern (maybe 
I should say "my" concern, I'm really speaking in my own name here) is that it 
would be a bad idea for C* to have 2 API (thrift and CQL) that continue to 
evolve with set of features that fundamentally do the same thing but have 
different implementations.  In practice, the project don't want to maintain 2 
APIs, we don't have infinite development resources and this is confusing for 
users in the long run.

Thrift is the legacy API. We've promised to maintain it in it's current state 
indefinitely (which *is* a non-negligible drain on the project resources btw), 
and we are even fine exposing some new features through it when that require 
very little maintenance effort (CAS for instance), but the C* API moving 
forward, the one we are developing not just maintaining, is CQL.

This ticket seems non trivial and thrift-only by design and so, for the reason 
I just expressed, I do not think that it's a good idea for the C* project and 
agree that we should focus on tickets like CASSANDRA-4914 instead (and granted 
no-one has had the time to focus on that yet, but that's really just proving my 
point that development resources are never infinite. As a side note and for 
what it's worth, I do intent to make ticket one of my priority for 3.0 (if 
no-one else beats me to it of course)).


> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-14 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901436#comment-13901436
 ] 

Jonathan Ellis commented on CASSANDRA-6704:
---

Sorry man, "Bigtable had scanners in 2006" is not a very convincing argument 
for doing the same thing in Cassandra in 2014.  Rather the opposite; surely we 
can do better (as we have done in moving beyond exposing raw partitions/cells).

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-14 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901414#comment-13901414
 ] 

Edward Capriolo commented on CASSANDRA-6704:


Why are we using the words right and wrong? Everything CQL is right, and 
everything else is wrong?  Prior art, big table white paper says scanners are 
good so we should have scanners. They have nothing to do with adding an 
aggregation or UDF feature to the cql language.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-13 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901094#comment-13901094
 ] 

Jonathan Ellis commented on CASSANDRA-6704:
---

I hear your frustration that 6167 is four months old with little progress, but 
I think Aleksey is right that a more productive response is "let me help figure 
out how to do this right," not "let's implement something quick and dirty 
because that's faster."

Actually, 6167 may be a bit of a red herring because I think CASSANDRA-5970 is 
even closer to what you want, and I've offered a pretty clear path forward 
there.  (The obvious next step after "build-in filtering functions" would be 
"user-defined filtering functions.")

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-13 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901089#comment-13901089
 ] 

Edward Capriolo commented on CASSANDRA-6704:


{qutote}
This isn't exactly a fair analogy, you should be comparing the complexity of 
groovy the language against CQL the language, or the complexity of antlr 
against writing a language for the JVM, but not mixing the two and effectively 
comparing syntactic complexity on one hand and the machinery necessary to parse 
it on the other.
{quote}

As for this issue, the google clearly saw a rational for making scanners in 
BigTable when they produced their white paper. CQL is declarative language and 
Scanners and Filters (written in groovy) are imperative programming. 

CASSANDRA-6167 is just one tiny case of what you can do with Scanners and 
Filters and it is a 4 month old ticket. I'm not against anyone adding whatever 
to CQL. Go ahead have fun, trying to make a DSL for everything :). Should that 
stop everyone in the world from adding any cool feature to thrift though? I 
think no.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-13 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901068#comment-13901068
 ] 

Aleksey Yeschenko commented on CASSANDRA-6704:
--

You are welcome to help with those open tickets (:

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-13 Thread Brandon Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901067#comment-13901067
 ] 

Brandon Williams commented on CASSANDRA-6704:
-

bq. You could argue that groovy is complex, but anyone that can write java can 
write groovy. I can argue CQL is complex, not everyone can write antlr or query 
parsers.

This isn't exactly a fair analogy, you should be comparing the complexity of 
groovy the language against CQL the language, or the complexity of antlr 
against writing a language for the JVM, but not mixing the two and effectively 
comparing syntactic complexity on one hand and the machinery necessary to parse 
it on the other.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-13 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901064#comment-13901064
 ] 

Edward Capriolo commented on CASSANDRA-6704:


IE should users wait for months/years until a statement like this "SELECT 
pk,event from cf where pk IN (1,5,10,11) UNTIL PARTITION event {predicate}" is 
implemented? Agile says iterate fast and don't plan forever. I am a little 
tired of waiting for scanners and other cool stuff that hbase (and other nosql 
systems) do out of the box.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-13 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901054#comment-13901054
 ] 

Edward Capriolo commented on CASSANDRA-6704:


You could argue that groovy is complex, but anyone that can write java can 
write groovy. I can argue CQL is complex, not everyone can write antlr or query 
parsers. It took me < 1 day to write this interface.  

About 1 year ago I was working on intravert 
(https://github.com/zznate/intravert-ug). You say that "all maps really neatly 
into CQL" but in reality its been a year, and most of these tickets are still 
just talk.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-13 Thread Tupshin Harper (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901053#comment-13901053
 ] 

Tupshin Harper commented on CASSANDRA-6704:
---

I added a comment to CASSANDRA-6167 explaining how that would be used for read 
aggregation. Ed is right that there is tremendous overlap in the goals of the 
two tickets.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-13 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901044#comment-13901044
 ] 

Aleksey Yeschenko commented on CASSANDRA-6704:
--

Which is why I linked to it. With 4914 and 6167 combined (all mapping neatly to 
CQL), what are the use cases that would justify adding all this complexity (+ 
groovy), that we can't live without?

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-13 Thread Edward Capriolo (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901022#comment-13901022
 ] 

Edward Capriolo commented on CASSANDRA-6704:


This example is basically accomplishing CASSANDRA-6167. CASSANDRA-4914 is more 
then this is trying to accomplish (general aggregation). A scanner is a really 
nice medium between complete blocking "queries" and "map reduce". You could 
build an aggregation, or even a rolling aggregation but flow control comes from 
the client.

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-13 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901016#comment-13901016
 ] 

Aleksey Yeschenko commented on CASSANDRA-6704:
--

+ CASSANDRA-6167

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6704) Create wide row scanners

2014-02-13 Thread Brandon Williams (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-6704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13901012#comment-13901012
 ] 

Brandon Williams commented on CASSANDRA-6704:
-

Seems similar to CASSANDRA-4914

> Create wide row scanners
> 
>
> Key: CASSANDRA-6704
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6704
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Edward Capriolo
>Assignee: Edward Capriolo
>
> The BigTable white paper demonstrates the use of scanners to iterate over 
> rows and columns. 
> http://static.googleusercontent.com/media/research.google.com/en/us/archive/bigtable-osdi06.pdf
> Because Cassandra does not have a primary sorting on row keys scanning over 
> ranges of row keys is less useful. 
> However we can use the scanner concept to operate on wide rows. For example 
> many times a user wishes to do some custom processing inside a row and does 
> not wish to carry the data across the network to do this processing. 
> I have already implemented thrift methods to compile dynamic groovy code into 
> Filters as well as some code that uses a Filter to page through and process 
> data on the server side.
> https://github.com/edwardcapriolo/cassandra/compare/apache:trunk...trunk
> The following is a working code snippet.
> {code}
> @Test
> public void test_scanner() throws Exception
> {
>   ColumnParent cp = new ColumnParent();
>   cp.setColumn_family("Standard1");
>   ByteBuffer key = ByteBuffer.wrap("rscannerkey".getBytes());
>   for (char a='a'; a < 'g'; a++){
> Column c1 = new Column();
> c1.setName((a+"").getBytes());
> c1.setValue(new byte [0]);
> c1.setTimestamp(System.nanoTime());
> server.insert(key, cp, c1, ConsistencyLevel.ONE);
>   }
>   
>   FilterDesc d = new FilterDesc();
>   d.setSpec("GROOVY_CLASS_LOADER");
>   d.setName("limit3");
>   d.setCode("import org.apache.cassandra.dht.* \n" +
>   "import org.apache.cassandra.thrift.* \n" +
>   "public class Limit3 implements SFilter { \n " +
>   "public FilterReturn filter(ColumnOrSuperColumn col, 
> List filtered) {\n"+
>   " filtered.add(col);\n"+
>   " return filtered.size()< 3 ? FilterReturn.FILTER_MORE : 
> FilterReturn.FILTER_DONE;\n"+
>   "} \n" +
> "}\n");
>   server.create_filter(d);
>   
>   
>   ScannerResult res = server.create_scanner("Standard1", "limit3", key, 
> ByteBuffer.wrap("a".getBytes()));
>   Assert.assertEquals(3, res.results.size());
> }
> {code}
> I am going to be working on this code over the next few weeks but I wanted to 
> get the concept our early so the design can see some criticism.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

59 matches

Mail list logo