Node side processing
Hi List, I was wondering whether there have been any past proposals for implementing node side processing (NSP) in C*. By NSP, I mean the passing a reference to a Java class which would then process the result set before it being returned to the client. In our particular use case our clients typically loop through result sets of a million or more rows to produce a tiny amount of output (sums, means, variance, etc). The bottleneck -- quite obviously -- is the need to transfer a million rows to the client before processing can take place. It would be extremely useful to execute this processing on the coordinator node and only transfer the results to the client. I mention this here because I can imagine other C* users having similar requirements. Thanks D.
Re: Node side processing
Hi David, Check out the ongoing discussion in https://issues.apache.org/jira/browse/CASSANDRA-6704 as well as some related tickets linked to from that one. No consensus at this point, but I'm personally hoping to see something along the general lines of Hive's UDFs. -Tupshin On Thu, Feb 27, 2014 at 8:50 AM, David Semeria da...@lmframework.comwrote: Hi List, I was wondering whether there have been any past proposals for implementing node side processing (NSP) in C*. By NSP, I mean the passing a reference to a Java class which would then process the result set before it being returned to the client. In our particular use case our clients typically loop through result sets of a million or more rows to produce a tiny amount of output (sums, means, variance, etc). The bottleneck -- quite obviously -- is the need to transfer a million rows to the client before processing can take place. It would be extremely useful to execute this processing on the coordinator node and only transfer the results to the client. I mention this here because I can imagine other C* users having similar requirements. Thanks D.
Re: Node side processing
A few: https://issues.apache.org/jira/browse/CASSANDRA-4914 https://issues.apache.org/jira/browse/CASSANDRA-5184 https://issues.apache.org/jira/browse/CASSANDRA-6704 https://issues.apache.org/jira/browse/CASSANDRA-6167 On Thu, Feb 27, 2014 at 7:50 AM, David Semeria da...@lmframework.comwrote: Hi List, I was wondering whether there have been any past proposals for implementing node side processing (NSP) in C*. By NSP, I mean the passing a reference to a Java class which would then process the result set before it being returned to the client. In our particular use case our clients typically loop through result sets of a million or more rows to produce a tiny amount of output (sums, means, variance, etc). The bottleneck -- quite obviously -- is the need to transfer a million rows to the client before processing can take place. It would be extremely useful to execute this processing on the coordinator node and only transfer the results to the client. I mention this here because I can imagine other C* users having similar requirements. Thanks D.
Re: Node side processing
Check intravert on github. I am working t get many of those features into cassandra. On Thursday, February 27, 2014, Brandon Williams dri...@gmail.com wrote: A few: https://issues.apache.org/jira/browse/CASSANDRA-4914 https://issues.apache.org/jira/browse/CASSANDRA-5184 https://issues.apache.org/jira/browse/CASSANDRA-6704 https://issues.apache.org/jira/browse/CASSANDRA-6167 On Thu, Feb 27, 2014 at 7:50 AM, David Semeria da...@lmframework.com wrote: Hi List, I was wondering whether there have been any past proposals for implementing node side processing (NSP) in C*. By NSP, I mean the passing a reference to a Java class which would then process the result set before it being returned to the client. In our particular use case our clients typically loop through result sets of a million or more rows to produce a tiny amount of output (sums, means, variance, etc). The bottleneck -- quite obviously -- is the need to transfer a million rows to the client before processing can take place. It would be extremely useful to execute this processing on the coordinator node and only transfer the results to the client. I mention this here because I can imagine other C* users having similar requirements. Thanks D. -- Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Re: [jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed
I've been investigating this, but a bit slowly as was looking in the wrong place. I haven't yet confirmed, but I have a strong suspicion of the problem. Could you confirm the total physical memory on the nodes? If 8gb or less, try applying the not yet committed patches in CASSANDRA-6692 (atomic b tree improvements), and setting the memtable_cleanup_threshold to 0.2, if you are looking at this today. I suspect it will fix it, although if so it doesn't quite adequately explain the behaviour exactly. If you let me know the cluster you're using I can test the same environment to make sure I'm testing like for like as well. On 27 Feb 2014 20:58, Ryan McGuire (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/CASSANDRA-6746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13915038#comment-13915038] Ryan McGuire commented on CASSANDRA-6746: - I tried re-running this with a 'nodetool flush' on each node after it's done with the write. It looked the same as above. I'm running a test with a 5 minute wait between the write and read to see if that causes a change. Reads have a slow ramp up in speed -- Key: CASSANDRA-6746 URL: https://issues.apache.org/jira/browse/CASSANDRA-6746 Project: Cassandra Issue Type: Bug Components: Core Reporter: Ryan McGuire Assignee: Benedict Labels: performance Fix For: 2.1 beta2 Attachments: 2.1_vs_2.0_read.png On a physical four node cluister I am doing a big write and then a big read. The read takes a long time to ramp up to respectable speeds. !2.1_vs_2.0_read.png! [See data here| http://ryanmcguire.info/ds/graph/graph.html?stats=stats.2.1_vs_2.0_vs_1.2.retry1.jsonmetric=interval_op_rateoperation=stress-readsmoothing=1 ] -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: [jira] [Commented] (CASSANDRA-6746) Reads have a slow ramp up in speed
32G nodes. I'm not mucking with heap settings so it's using -Xms8023M -Xmx8023M -Xmn1600M
[GitHub] cassandra pull request: fix minor typo in cqlsh help
GitHub user hjacobs opened a pull request: https://github.com/apache/cassandra/pull/26 fix minor typo in cqlsh help You can merge this pull request into a Git repository by running: $ git pull https://github.com/hjacobs/cassandra trunk Alternatively you can review and apply these changes as the patch at: https://github.com/apache/cassandra/pull/26.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #26 commit f3de4f924ac4bc96808f76678e03a1f602819450 Author: hjacobs henn...@jacobs1.de Date: 2014-02-27T22:24:05Z fix minor typo in cqlsh help --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---