[ 
https://issues.apache.org/jira/browse/CASSANDRA-2995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089557#comment-13089557
 ] 

Shunsuke Nakamura edited comment on CASSANDRA-2995 at 8/23/11 4:17 PM:
-----------------------------------------------------------------------

I am a developer of MyCassandra, that provides storage engine pluggability. 
It supports MySQL, Redis, Kyoto Cabinet, MongoDB and the others in addition to 
the original storage engine of Cassandra.

A storage engine is a performance bottleneck in most applications. The 
pluggability provides rapid improvement of performance and adaptability to 
various applications in a very straightforward way though caching techniques 
started working well these days.

* Data model

MyCassandra supports the same schemaless multi-dimensional map as Cassandra. 
It maps the data model to tables in RDB and key-value pairs in key-value stores 
using object serialization and labeling keys and columns. 
The mapping does not make a sparse RDB table because a record is mapped to a 
key-value pair.


* Open problems
Data model mapping requires more elaboration.

・ Secondary indices
  They require a raw column though usually a row is object-serialized. It is 
also difficult to add a secondary index later for today's MyCassandra because 
it requires traversal of all rows a schema change.

・ Row capacity
  The size of a column in RDB is fixed when a schema is defined. It is static.

・ Lookup efficiency in a row
  MyCassandra serializes a row to store it into a table or a KVS. A lookup in a 
row requires deserialization.

A research paper about MyCassandra become available soon. 
It has design performance and results of MyCassandra.

http://www.slideshare.net/sunsuk7tp/mycassandra-8499189


      was (Author: sunsuk7tp):
    I am a developer of MyCassandra, that provides storage engine pluggability. 
It supports MySQL, Redis, Kyoto Cabinet, MongoDB and the others in addition to 
the original storage engine of Cassandra.

A storage engine is a performance bottleneck in most applications. The 
pluggability provides rapid improvement of performance and adaptability to 
various applications in a very straightforward way though caching techniques 
started working well these days.

- Data model

MyCassandra supports the same schemaless multi-dimensional map as Cassandra. 
It maps the data model to tables in RDB and key-value pairs in key-value stores 
using object serialization and labeling keys and columns. 
The mapping does not make a sparse RDB table because a record is mapped to a 
key-value pair.


- Open problems
Data model mapping requires more elaboration.

-- Secondary indices
  They require a raw column though usually a row is object-serialized. It is 
also difficult to add a secondary index later for today's MyCassandra because 
it requires traversal of all rows a schema change.

-- Row capacity
  The size of a column in RDB is fixed when a schema is defined. It is static.

-- Lookup efficiency in a row
  MyCassandra serializes a row to store it into a table or a KVS. A lookup in a 
row requires deserialization.

A research paper about MyCassandra become available soon. 
It has design performance results of MyCassandra.

http://www.slideshare.net/sunsuk7tp/mycassandra-8499189

  
> Making Storage Engine Pluggable
> -------------------------------
>
>                 Key: CASSANDRA-2995
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2995
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>    Affects Versions: 0.8.2
>            Reporter: Muga Nishizawa
>
> Will you design and implement Cassandra's storage engine API like MyCassandra?
> MyCassandra provides extensible architecture for pluging other storage 
> engines to Cassandra like MySQL.  
> https://github.com/sunsuk7tp/MyCassandra/
>   
> It could be advantageous for Cassandra to make the storage engine pluggable.  
> This could allow Cassandra to 
> - deal with potential use cases where maybe the current sstables are not the 
> best fit
> - allow several types of internal storage formats (at the same time) 
> optimized for different data types
> - allow easier experiments and research on new storage formats (encourage 
> research institutions to do strange things with Cassandra)
> - there could also be potential advantages from better isolation of the data 
> engine in terms of less risk for data corruptions if other parts of Cassandra 
> change

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to