[jira] [Comment Edited] (CASSANDRA-5417) Push composites support in the storage engine

2013-11-13 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13821243#comment-13821243
 ] 

Sylvain Lebresne edited comment on CASSANDRA-5417 at 11/13/13 5:11 PM:
---

I've pushed a rebased/heavily modified version at 
https://github.com/pcmanus/cassandra/commits/5417-v2.

Compared to the first version, this new version cleans up things a bit but also 
try to be a lot more careful with not allocating too much objects for 
performance reasons. Yet, the main principle is the same, to make cell names 
not be opaque ByteBuffer anymore but rather custom objects (of type CellName in 
the patch) and this with 3 main objectives:
# to abstract away the details of the cell name layouts. With CQL3 we have 4 
different layout of cell names, depending on whether 1) the underlying 
comparator is a CompositeType or not and 2) whether one of the component of the 
cell name is used to store a CQL3 column name (and if yes, which one). 
Currently, there is a lot of ad-hock and error-prone code all over the place to 
deal with those differences manually. The CellName interface of this patch 
allow to encapsulate those difference a lot better imo (in the different 
implementation of that interface).
# to keep components of the cell name separated in memory. Currently, because a 
cell name is an opaque buffer, every code that is only concerned by only one of 
the component of the cell name has to manually re-split the byte buffer. On top 
of the ad-hock concern described above (every time you split a cell name, you 
need to worry about the different layouts), the other concern is that we may 
split the same cell name over and over, which has a cost. Typically, each time 
we compare two composite cell names we have to re-decode each components one by 
one.  And during reads, on top of all the cell name comparisons, we also have 
to split the cell name to 1) count results in SliceQueryFilter and 2) process 
the storage engine result into a ResultSet. By keeping components separated 
behind the CellName interface, the attached patch save duplicating this work.
# keeping components of composites names separated should allow to make 
component-specific optimizations a lot easier. Typically, some possible 
optimizations like sharing common prefixes, using ids for CQL3 column names 
(CASSANDRA-4175) or even, why not, per-component-type specific implementation 
that uses native java types to save allocations (use an int when the component 
is Int32Type), are likely to be easier to do if we can deal with individual 
components easily rather than with opaque ByteBuffer. Note that I'm *not* 
claiming that this patch does those optimizations yet, just that it very likely 
make them a lot easier to implement properly.

The patch itself is pretty big but most of the interesting bits are in the 
newly added {{db.composites}} package. The rest of the patch is mainly updating 
the the code to use those new interfaces and some related simplification of the 
CQL3 code. One interesting bit though is the new AtomDeserializer class and 
it's use in sstable iterators. The reason is that for composite cell names, 
this patch move the work of splitting the names read from disk into their 
components from doing it every time we need to access a component to once at 
deserialization time.  Which is better, except that when we deserialize an 
index block from disk the current code fully deserialize atoms even if they are 
not used afterwards by the query (either because it's a query by name and we 
only care about a handful of names in the block or because the name is before 
the beginning of the slice we're looking for). In that case, the first version 
of this patch was actually doing more work splitting/allocating all the 
components for names of cells the query don't care about, which is why Jake's 
test of the first version was showing slower reads. So this new version 
introduce the AtomDeserializer class which basically is a lazy deserializer of 
OnDiskAtom (and their cell names) that makes sure we don't do unnecessary work 
for atoms we don't care about.

On the performance side I did 2 simple tests:
# I check basic stress writes and reads (with 5M keys but default parameters 
otherwise). On writes, the patch is slower than trunk by about 10%.  I've 
yourkited it but I'm still not entirely sure why tbh (at least why the 
difference is that big). Some of my tests suggests that the MemoryMeter is 
taking more time and that this impact the overall performance but that could be 
a red herring. For reads however, there is no noticeable difference.
# Then I did a read test similar to Jake's test above (I wasn't able to reuse 
his sstables because it seems trunk can't read them).  I used the following 
CQL3 table:
{noformat}
CREATE TABLE timeline (
pk text,
ck1 text,
ck2 int,
val 

[jira] [Comment Edited] (CASSANDRA-5417) Push composites support in the storage engine

2013-04-21 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637737#comment-13637737
 ] 

T Jake Luciani edited comment on CASSANDRA-5417 at 4/22/13 2:45 AM:


I've posted a couple minor commits to 
https://github.com/tjake/cassandra/tree/5417
There was one failing test and one performance regression.

I ran some benchmarks to gauge the effectiveness of this change.
I created a COMPACT table with a few wide rows and created 1k sstables. I 
benchmarked the time it took to perform a major compaction (since that causes 
the most Composite comparisons). 

First the base 
{code}
INFO 20:54:41,504 Compacted 942 sstables to 
[/var/lib/cassandra/data/mjff/data/mjff-data-ja-963,].  1,004,572,909 bytes to 
999,124,671 (~99% of original) in 372,638ms = 2.557011MB/s.  947 total rows, 6 
unique.
{code}


With this change.
{code}
INFO 22:25:46,747 Compacted 942 sstables to 
[/var/lib/cassandra/data/mjff/data/mjff-data-ja-963,].  1,004,572,909 bytes to 
999,124,671 (~99% of original) in 341,939ms = 2.786578MB/s.  947 total rows, 6 
unique.
{code}

So a bit of a improvement (~10%).  


Then I tested the performance of reading as quickly as possible.  This 
unfortunately was slower (~30%)
I've attached a screenshot of the current bottleneck if you want to take a look.

http://i.imgur.com/rrp3fsX.png


  was (Author: tjake):
I've posted a couple minor commits to 
https://github.com/tjake/cassandra/tree/5417
There was one failing test and one performance regression.

I ran some benchmarks to gauge the effectiveness of this change.
I created a COMPACT table with a few wide rows and created 1k sstables. I 
benchmarked the time it took to perform a major compaction (since that causes 
the most Composite comparisons). 

First the base 
{code}
INFO 20:54:41,504 Compacted 942 sstables to 
[/var/lib/cassandra/data/mjff/data/mjff-data-ja-963,].  1,004,572,909 bytes to 
999,124,671 (~99% of original) in 372,638ms = 2.557011MB/s.  947 total rows, 6 
unique.
{code}


With this change.
{code}
INFO 22:25:46,747 Compacted 942 sstables to 
[/var/lib/cassandra/data/mjff/data/mjff-data-ja-963,].  1,004,572,909 bytes to 
999,124,671 (~99% of original) in 341,939ms = 2.786578MB/s.  947 total rows, 6 
unique.
{code}

So a bit of a improvement (~10%).  


Then I tested the performance of reading as quickly as possible.  This 
unfortunately was slower (~30%)
I've attached a screenshot of the current bottleneck if you want to take a look.




  
 Push composites support in the storage engine
 -

 Key: CASSANDRA-5417
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5417
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
 Fix For: 2.0


 CompositeType happens to be very useful and is now widely used: CQL3 heavily 
 rely on it, and super columns are now using it too internally. Besides, 
 CompositeType has been advised as a replacement of super columns on the 
 thrift side for a while, so it's safe to assume that it's generally used 
 there too.
 CompositeType has initially been introduced as just another AbstractType.  
 Meaning that the storage engine has no nothing whatsoever of composites 
 being, well, composite. This has the following drawbacks:
 * Because internally a composite value is handled as just a ByteBuffer, we 
 end up doing a lot of extra work. Typically, each time we compare 2 composite 
 value, we end up deserializing the components (which, while it doesn't copy 
 data per-se because we just slice the global ByteBuffer, still waste some cpu 
 cycles and allocate a bunch of ByteBuffer objects). And since compare can be 
 called *a lot*, this is likely not negligible.
 * This make CQL3 code uglier than necessary. Basically, CQL3 makes extensive 
 use of composites, and since it gets backs ByteBuffer from the internal 
 columns, it always have to check if it's actually a compositeType or not, and 
 then split it and pick the different parts it needs. It's only an API 
 problem, but having things exposed as composites directly would definitively 
 make thinks cleaner. In particular, in most cases, CQL3 don't care whether it 
 has a composite with only one component or a non-really-composite value, but 
 we still always distinguishes both cases.  Lastly, if we do expose composites 
 more directly internally, it's not a lot more work to internalize better 
 the different parts of the cell name that CQL3 uses (what's the clustering 
 key, what's the actuall CQL3 column name, what's the collection element), 
 making things cleaner. Last but not least, there is currently a bunch of 
 places where methods take a ByteBuffer as argument and it's hard to know 
 whether