Re: Node side processing

2014-02-27 Thread Tupshin Harper
Hi David,

Check out the ongoing discussion in
https://issues.apache.org/jira/browse/CASSANDRA-6704 as well as some
related tickets linked to from that one.

No consensus at this point, but I'm personally hoping to see something
along the general lines of Hive's UDFs.

-Tupshin


On Thu, Feb 27, 2014 at 8:50 AM, David Semeria da...@lmframework.comwrote:

 Hi List,

 I was wondering whether there have been any past proposals for
 implementing node side processing (NSP) in C*. By NSP, I mean the passing a
 reference to a Java class which would then process the result set before it
 being returned to the client.

 In our particular use case our clients typically loop through result sets
 of a million or more rows to produce a tiny amount of output (sums, means,
 variance, etc). The bottleneck -- quite obviously -- is the need to
 transfer a million rows to the client before processing can take place. It
 would be extremely useful to execute this processing on the coordinator
 node and only transfer the results to the client.

 I mention this here because I can imagine other C* users having similar
 requirements.

 Thanks

 D.



Re: Node side processing

2014-02-27 Thread Brandon Williams
A few:

https://issues.apache.org/jira/browse/CASSANDRA-4914

https://issues.apache.org/jira/browse/CASSANDRA-5184

https://issues.apache.org/jira/browse/CASSANDRA-6704

https://issues.apache.org/jira/browse/CASSANDRA-6167



On Thu, Feb 27, 2014 at 7:50 AM, David Semeria da...@lmframework.comwrote:

 Hi List,

 I was wondering whether there have been any past proposals for
 implementing node side processing (NSP) in C*. By NSP, I mean the passing a
 reference to a Java class which would then process the result set before it
 being returned to the client.

 In our particular use case our clients typically loop through result sets
 of a million or more rows to produce a tiny amount of output (sums, means,
 variance, etc). The bottleneck -- quite obviously -- is the need to
 transfer a million rows to the client before processing can take place. It
 would be extremely useful to execute this processing on the coordinator
 node and only transfer the results to the client.

 I mention this here because I can imagine other C* users having similar
 requirements.

 Thanks

 D.



Re: Node side processing

2014-02-27 Thread Edward Capriolo
Check intravert on github. I am working t get many of those features into
cassandra.

On Thursday, February 27, 2014, Brandon Williams dri...@gmail.com wrote:
 A few:

 https://issues.apache.org/jira/browse/CASSANDRA-4914

 https://issues.apache.org/jira/browse/CASSANDRA-5184

 https://issues.apache.org/jira/browse/CASSANDRA-6704

 https://issues.apache.org/jira/browse/CASSANDRA-6167



 On Thu, Feb 27, 2014 at 7:50 AM, David Semeria da...@lmframework.com
wrote:

 Hi List,

 I was wondering whether there have been any past proposals for
 implementing node side processing (NSP) in C*. By NSP, I mean the
passing a
 reference to a Java class which would then process the result set before
it
 being returned to the client.

 In our particular use case our clients typically loop through result sets
 of a million or more rows to produce a tiny amount of output (sums,
means,
 variance, etc). The bottleneck -- quite obviously -- is the need to
 transfer a million rows to the client before processing can take place.
It
 would be extremely useful to execute this processing on the coordinator
 node and only transfer the results to the client.

 I mention this here because I can imagine other C* users having similar
 requirements.

 Thanks

 D.



-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.