[jira] [Commented] (CASSANDRA-5417) Push composites support in the storage engine

T Jake Luciani (JIRA) Fri, 26 Apr 2013 19:44:18 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13643479#comment-13643479
 ]


T Jake Luciani commented on CASSANDRA-5417:
-------------------------------------------

The best thing todo is just give you the uncompacted sstables...

https://docs.google.com/file/d/0B4FSNkh7LrJCc040UTRKZFdtTVk/edit?usp=sharing


You should keep the uncompacted sstables around and reset after each test
The two scenarios I tested were:
  1. Time it takes to perform a major compaction (with and without patch)
  2. Latency of reads for reading across all uncompacted tables (with and 
without patch)

Here is the schema:

{code}
CREATE KEYSPACE mjff WITH replication = {'class': 'SimpleStrategy', 
'replication_factor': 1};

use mjff;
CREATE TABLE data (
        name text,
        type text,
        date timestamp,
        value double,
        PRIMARY KEY(name,type,date)
) WITH COMPACT STORAGE;
{code}

The reader code is simple:

{code}
public class StressReads {

    static int threadCount = 2;

    public static String[] names = new 
String[]{"APPLE","VIOLET","SUNFLOWER","ROSE","PEONY","ORCHID","ORANGE","MAPLE","LILLY","FLOX","DAISY","DAFODIL","CROCUS","CHERRY"};
    public static String[] types = new String[]{"diffSecs","N.samples",
            "x.mean","x.absolue.deviation","x.standard.deviation",
            "y.mean","y.absolue.deviation","y.standard.deviation",
            "z.mean","z.absolue.deviation","z.standard.deviation"};

    static ThreadLocal<Cassandra.Client> client = new 
ThreadLocal<Cassandra.Client>() {
        public Cassandra.Client initialValue() {
           try{
                TTransport trans = new TFramedTransport(new 
TSocket("localhost",9160));
                trans.open();

                TProtocol prot = new TBinaryProtocol(trans);
                Cassandra.Client client = new Cassandra.Client(prot);

                client.set_keyspace("mjff");

               return client;
           }catch(Exception e){
            throw new RuntimeException("err", e);
           }
        }
    };

    static ExecutorService threadPool = 
Executors.newFixedThreadPool(threadCount);

    static AtomicLong totalReads = new AtomicLong(0);
    static long allReads = 0;
    static int countSeconds = 0 ;

    static Random rand = new Random();
    public static void main(String[] args) throws InterruptedException {

        for(int i=0; i<threadCount; i++) {
           threadPool.submit(new Runnable() {
               @Override
               public void run() {
                   while(true){
                        StringBuffer sb = new StringBuffer();
                       sb.append("Select value from data where name='");
                       sb.append(names[rand.nextInt(names.length)]);
                       sb.append("' and type='");
                       sb.append(types[rand.nextInt(types.length)]);
                       sb.append("' and date > '2012-03-01 00:00:00' LIMIT 
100");

                       try {
                            CqlResult result = 
client.get().execute_cql3_query(ByteBufferUtil.bytes(sb.toString()), 
Compression.NONE, ConsistencyLevel.ONE);

                            totalReads.addAndGet(result.getRows().size());
                       }catch(Exception e){
                           e.printStackTrace();
                       }
                   }
               }
           });
        }


        while (true) {
            Thread.sleep(1000);

            long reads = totalReads.getAndSet(0);
                   allReads += reads;
            System.err.println("Read "+reads+" per/sec, avg 
"+allReads/++countSeconds);
        }
    }
}
{code}

                
> Push composites support in the storage engine
> ---------------------------------------------
>
>                 Key: CASSANDRA-5417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5417
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 2.0
>
>
> CompositeType happens to be very useful and is now widely used: CQL3 heavily 
> rely on it, and super columns are now using it too internally. Besides, 
> CompositeType has been advised as a replacement of super columns on the 
> thrift side for a while, so it's safe to assume that it's generally used 
> there too.
> CompositeType has initially been introduced as just another AbstractType.  
> Meaning that the storage engine has no nothing whatsoever of composites 
> being, well, composite. This has the following drawbacks:
> * Because internally a composite value is handled as just a ByteBuffer, we 
> end up doing a lot of extra work. Typically, each time we compare 2 composite 
> value, we end up "deserializing" the components (which, while it doesn't copy 
> data per-se because we just slice the global ByteBuffer, still waste some cpu 
> cycles and allocate a bunch of ByteBuffer objects). And since compare can be 
> called *a lot*, this is likely not negligible.
> * This make CQL3 code uglier than necessary. Basically, CQL3 makes extensive 
> use of composites, and since it gets backs ByteBuffer from the internal 
> columns, it always have to check if it's actually a compositeType or not, and 
> then split it and pick the different parts it needs. It's only an API 
> problem, but having things exposed as composites directly would definitively 
> make thinks cleaner. In particular, in most cases, CQL3 don't care whether it 
> has a composite with only one component or a non-really-composite value, but 
> we still always distinguishes both cases.  Lastly, if we do expose composites 
> more directly internally, it's not a lot more work to "internalize" better 
> the different parts of the cell name that CQL3 uses (what's the clustering 
> key, what's the actuall CQL3 column name, what's the collection element), 
> making things cleaner. Last but not least, there is currently a bunch of 
> places where methods take a ByteBuffer as argument and it's hard to know 
> whether it expects a cell name or a CQL3 column name. This is pretty error 
> prone.
> * It makes it hard (or impossible) to do a number of performance 
> improvements.  Consider CASSANDRA-4175, I'm not really sure how you can do it 
> properly (in memory) if cell names are just ByteBuffer (since CQL3 column 
> names are just one of the component in general). But we also miss 
> oportunities of sharing prefixes. If we were able to share prefixes of 
> composite names in memory we would 1) lower the memory footprint and 2) 
> potentially speed-up comparison (of the prefixes) by checking reference 
> equality first (also, doing prefix sharing on-disk, which is a separate 
> concern btw, might be easier to do if we do prefix sharing in memory).
> So I suggest pushing CompositeType support inside the storage engine. What I 
> mean by that concretely would be change the internal {{Column.name}} from 
> ByteBuffer to some CellName type. A CellName would API-wise just be a list of 
> ByteBuffer. But in practice, we'd have a specific CellName implementation for 
> not-really-composite names, and the truly composite implementation will allow 
> some prefix sharing. From an external API however, nothing would change, we 
> would pack the composite as usual before sending it back to the client, but 
> at least internally, comparison won't have to deserialize the components 
> every time, and CQL3 code will be cleaner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-5417) Push composites support in the storage engine

Reply via email to