Re: Pointers on writing your own Compaction Strategy
In addition to what Markus said, take a look at the latest patch in https://issues.apache.org/jira/browse/CASSANDRA-6602 for a relevant example. -Tupshin On Sep 4, 2014 2:28 PM, Marcus Eriksson krum...@gmail.com wrote: 1. create a class that extends AbstractCompactionStrategy (i would keep it in-tree while developing to avoid having classpath issues etc) 2. Implement the abstract methods - getNextBackgroundTask - called when cassandra wants to do a new minor (background) compaction - return a CompactionTask with the sstables you want compacted - getMaximalTask - called when a user triggers a major compaction - getUserDefinedTask - when a user triggers a user defined compaction from JMX - getEstimatedRemainingTasks - return the guessed number of tasks before we are done - getMaxSSTableBytes - if your compaction strategy puts a limit on the size of sstables 3. Execute this in cqlsh to enable your compaction strategy: ALTER TABLE foo WITH compaction = { class: ‘Bar’ } 4. Things to think about: - make sure you mark sstables as compacting before returning them from the compaction strategy (and check the return value!): https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java#L271 - if you do this on 2.1 - dont mix repaired and unrepaired sstables (SSTableReader#isRepaired) Let me know if you need any more information /Marcus On Thu, Sep 4, 2014 at 6:50 PM, Ghosh, Mainak mgho...@illinois.edu wrote: Hello, I am planning to write a new compaction strategy and I was hoping if anyone can point me to the relevant functions and how they are related in the call hierarchy. Thanks for the help. Regards, Mainak.
Re: Node side processing
Hi David, Check out the ongoing discussion in https://issues.apache.org/jira/browse/CASSANDRA-6704 as well as some related tickets linked to from that one. No consensus at this point, but I'm personally hoping to see something along the general lines of Hive's UDFs. -Tupshin On Thu, Feb 27, 2014 at 8:50 AM, David Semeria da...@lmframework.comwrote: Hi List, I was wondering whether there have been any past proposals for implementing node side processing (NSP) in C*. By NSP, I mean the passing a reference to a Java class which would then process the result set before it being returned to the client. In our particular use case our clients typically loop through result sets of a million or more rows to produce a tiny amount of output (sums, means, variance, etc). The bottleneck -- quite obviously -- is the need to transfer a million rows to the client before processing can take place. It would be extremely useful to execute this processing on the coordinator node and only transfer the results to the client. I mention this here because I can imagine other C* users having similar requirements. Thanks D.
Re: network compatibility from 0.6 to 0.7
As long as network compatibility is in place, it is possible to incrementally upgrade a cluster by restricting thrift clients to only talk to the 0.6 nodes until half the cluster is upgraded and then modify them to talk to the 0.7 nodes. If networking compatibility breaks, there is no way to avoid downtime or even test 0.7 under production load. On Jul 22, 2010 9:50 AM, Jonathan Ellis jbel...@gmail.com wrote: How useful is this to insist on, given that 0.7 thrift api is fairly incompatible with 0.6's? (timestamp - Clock change being the biggest problem there) -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com