[ https://issues.apache.org/jira/browse/CASSANDRA-8543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260204#comment-14260204 ]
Aleksey Yeschenko commented on CASSANDRA-8543: ---------------------------------------------- Use native protocol batching with prepared separate inserts - but make sure that you only batch columns/rows with the same partition key. Use DateTieredCompactionStrategy (https://labs.spotify.com/2014/12/18/date-tiered-compaction/). And, more importantly, don't try to optimize before you actually need it. In any case, CASSANDRA-6412 is very unlikely to make it into Cassandra until 3.1 or 3.2, if at all, so any wins that you could get from your blob-packing will be negated by the need to do a read before write. You also lose convenient querying on lesser than 1024 limits, and the ability to reuse 3.0 aggregate functions on your values. Also complicating MR/Spark jobs and losing ability to use some of those pre-defined methods. > Allow custom code to control behavior of reading and compaction > --------------------------------------------------------------- > > Key: CASSANDRA-8543 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8543 > Project: Cassandra > Issue Type: New Feature > Reporter: Pavol Slamka > Priority: Minor > > When storing series data in blob objects because of speed improvements, it is > sometimes neccessary to change only few values of a single blob (say few > integers out of 1024 integers). Right now one could rewrite these using > compare and set and versioning - read blob and version, change few values, > write whole updated blob and incremented version if version did not change, > repeat the whole process otherwise (optimistic approach). However compare and > set brings some overhead. Let's try to leave out compare and set, and instead > reading and updating, let's write only "blank" blob with only few values set. > Blank blob contains special blank placeholder data such as NULL or max value > of int or similar. Since this write in fact only appends new SStable record, > we did not overwrite the old data yet. That happens during read or > compaction. But if we provided custom read, and custom compaction, which > would not replace the blob with a new "sparse blank" blob, but rather would > replace values in first blob (first sstable record) with only "non blank" > values from second blob (second sstable record), we would achieve fast > partial blob update without compare and set on a last write wins basis. Is > such approach feasible? Would it be possible to customize Cassandra so that > custom code for compaction and data reading could be provided for a column > (blob)? > There may be other better solutions, but speedwise, this seems best to me. > Sorry for any mistakes, I am new to Cassandra. > Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)