Node side processing

2014-02-27 Thread David Semeria

Hi List,

I was wondering whether there have been any past proposals for 
implementing node side processing (NSP) in C*. By NSP, I mean the 
passing a reference to a Java class which would then process the result 
set before it being returned to the client.


In our particular use case our clients typically loop through result 
sets of a million or more rows to produce a tiny amount of output (sums, 
means, variance, etc). The bottleneck -- quite obviously -- is the need 
to transfer a million rows to the client before processing can take 
place. It would be extremely useful to execute this processing on the 
coordinator node and only transfer the results to the client.


I mention this here because I can imagine other C* users having similar 
requirements.


Thanks

D.


Re: Node side processing

2014-02-27 Thread Tupshin Harper
Hi David,

Check out the ongoing discussion in
https://issues.apache.org/jira/browse/CASSANDRA-6704 as well as some
related tickets linked to from that one.

No consensus at this point, but I'm personally hoping to see something
along the general lines of Hive's UDFs.

-Tupshin


On Thu, Feb 27, 2014 at 8:50 AM, David Semeria da...@lmframework.comwrote:

 Hi List,

 I was wondering whether there have been any past proposals for
 implementing node side processing (NSP) in C*. By NSP, I mean the passing a
 reference to a Java class which would then process the result set before it
 being returned to the client.

 In our particular use case our clients typically loop through result sets
 of a million or more rows to produce a tiny amount of output (sums, means,
 variance, etc). The bottleneck -- quite obviously -- is the need to
 transfer a million rows to the client before processing can take place. It
 would be extremely useful to execute this processing on the coordinator
 node and only transfer the results to the client.

 I mention this here because I can imagine other C* users having similar
 requirements.

 Thanks

 D.



Re: Node side processing

2014-02-27 Thread Brandon Williams
A few:

https://issues.apache.org/jira/browse/CASSANDRA-4914

https://issues.apache.org/jira/browse/CASSANDRA-5184

https://issues.apache.org/jira/browse/CASSANDRA-6704

https://issues.apache.org/jira/browse/CASSANDRA-6167



On Thu, Feb 27, 2014 at 7:50 AM, David Semeria da...@lmframework.comwrote:

 Hi List,

 I was wondering whether there have been any past proposals for
 implementing node side processing (NSP) in C*. By NSP, I mean the passing a
 reference to a Java class which would then process the result set before it
 being returned to the client.

 In our particular use case our clients typically loop through result sets
 of a million or more rows to produce a tiny amount of output (sums, means,
 variance, etc). The bottleneck -- quite obviously -- is the need to
 transfer a million rows to the client before processing can take place. It
 would be extremely useful to execute this processing on the coordinator
 node and only transfer the results to the client.

 I mention this here because I can imagine other C* users having similar
 requirements.

 Thanks

 D.



Re: Node side processing

2014-02-27 Thread Edward Capriolo
Check intravert on github. I am working t get many of those features into
cassandra.

On Thursday, February 27, 2014, Brandon Williams dri...@gmail.com wrote:
 A few:

 https://issues.apache.org/jira/browse/CASSANDRA-4914

 https://issues.apache.org/jira/browse/CASSANDRA-5184

 https://issues.apache.org/jira/browse/CASSANDRA-6704

 https://issues.apache.org/jira/browse/CASSANDRA-6167



 On Thu, Feb 27, 2014 at 7:50 AM, David Semeria da...@lmframework.com
wrote:

 Hi List,

 I was wondering whether there have been any past proposals for
 implementing node side processing (NSP) in C*. By NSP, I mean the
passing a
 reference to a Java class which would then process the result set before
it
 being returned to the client.

 In our particular use case our clients typically loop through result sets
 of a million or more rows to produce a tiny amount of output (sums,
means,
 variance, etc). The bottleneck -- quite obviously -- is the need to
 transfer a million rows to the client before processing can take place.
It
 would be extremely useful to execute this processing on the coordinator
 node and only transfer the results to the client.

 I mention this here because I can imagine other C* users having similar
 requirements.

 Thanks

 D.



-- 
Sorry this was sent from mobile. Will do less grammar and spell check than
usual.


Re: [jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-02-27 Thread Benedict Elliott Smith
I've been investigating this, but a bit slowly as was looking in the wrong
place. I haven't yet confirmed, but I have a strong suspicion of the
problem. Could you confirm the total physical memory on the nodes? If 8gb
or less, try applying the not yet committed patches in CASSANDRA-6692
(atomic b tree improvements), and setting the memtable_cleanup_threshold to
0.2, if you are looking at this today. I suspect it will fix it, although
if so it doesn't quite adequately explain the behaviour exactly. If you let
me know the cluster you're using I can test the same environment to make
sure I'm testing like for like as well.
On 27 Feb 2014 20:58, Ryan McGuire (JIRA) j...@apache.org wrote:


 [
 https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915038#comment-13915038]

 Ryan McGuire commented on CASSANDRA-6746:
 -

 I tried re-running this with a 'nodetool flush' on each node after it's
 done with the write. It looked the same as above. I'm running a test with a
 5 minute wait between the write and read to see if that causes a change.

  Reads have a slow ramp up in speed
  --
 
  Key: CASSANDRA-6746
  URL:
 https://issues.apache.org/jira/browse/CASSANDRA-6746
  Project: Cassandra
   Issue Type: Bug
   Components: Core
 Reporter: Ryan McGuire
 Assignee: Benedict
   Labels: performance
  Fix For: 2.1 beta2
 
  Attachments: 2.1_vs_2.0_read.png
 
 
  On a physical four node cluister I am doing a big write and then a big
 read. The read takes a long time to ramp up to respectable speeds.
  !2.1_vs_2.0_read.png!
  [See data here|
 http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1
 ]



 --
 This message was sent by Atlassian JIRA
 (v6.1.5#6160)



Re: [jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed

2014-02-27 Thread Ryan McGuire
32G nodes. I'm not mucking with heap settings so it's using -Xms8023M
-Xmx8023M -Xmn1600M


[GitHub] cassandra pull request: fix minor typo in cqlsh help

2014-02-27 Thread hjacobs
GitHub user hjacobs opened a pull request:

https://github.com/apache/cassandra/pull/26

fix minor typo in cqlsh help



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hjacobs/cassandra trunk

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra/pull/26.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #26


commit f3de4f924ac4bc96808f76678e03a1f602819450
Author: hjacobs henn...@jacobs1.de
Date:   2014-02-27T22:24:05Z

fix minor typo in cqlsh help




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---