[ 
https://issues.apache.org/jira/browse/CASSANDRA-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482488#comment-16482488
 ] 

Jason Brown commented on CASSANDRA-13981:
-----------------------------------------

Thanks, [~pree] and [~shylaja.koko...@intel.com], for the patches. I've been 
reading them, understanding the scope of the technology, and see the direction 
you are going. However, I'd like to propose a slightly different direction.

Stepping back, the pcj library is divided into two parts: the higher-level pcj 
components (as used in the version of this patch as previously posted), and the 
lower-level API, called LLPL in the library. LLPL is much smaller than the pcj 
parts, and offers a direct and simple way to just write bytes into a backing 
array from the persistent memory. In my option this will be far more natural 
for the cassandra community and developers, and provides a more direct access 
to the storage bytes. We already have lots of serialization code, and we 
understand that quite well; thus I'd like to keep leveraging that lower-level 
thinking. We will need to write custom, non-generic data structures (like we 
already have for our LSM-based engine), but I only see this as complete win. We 
need to optimize, in every way we reasonably can, our data structures as we are 
a database, after all. LLPL has some rough edges wrt code optimization and we 
will want to modify the transaction model a bit, but I suspect the pcj authors 
will work with us toward that end.

With this as background, I've started sketching out a direction I think we 
should pursue. This sketch primarily shows the direction for thinking about 
serialization and memory allocation using LLPL. DISCLAIMER: this code doesn't 
compile, is not syntactically correct, and is wholly incomplete. It should be 
thought of a loose blueprint (sketch!) for discussion.

The sketch compromises of the following concepts:
 - thread per sub-range (to reduce lock contention in the data structures). 
This is kinda inspired by the thread-per-core notion, but on a smaller scale. 
({{TreeManager}} in this patch is a rudimentary dispatch class.)
 - how partitions should be stored - allocate a {{MemoryRegion}} from the LLPL 
allocator, wrap it with a {{DataOutputPlus}}, and write as we normally would.
 - rough implementations of the data structures for the primary index and 
storing rows. A longer treatment of this topic will be in the deisgn doc (see 
below), but using a tree for the primary index (for partition look up) and then 
a map for the cql rows is the basic idea. I mostly want to show the ideas 
around serialization so I didn't actually implement the index nor the map - 
except for the leaf/entry nodes which show how the serailization/data layout 
fits into the data structure.
 - explicitly pass the transaction around on writes (instead of looking for it 
in a {{ThreadLocal}}, as the pcj transactions does).

||13981-sketch-1||
|[branch|https://github.com/jasobrown/cassandra/tree/13981-sketch-1]|

I am proposing this sketch as a starting for for discussion, along with a 
forthcoming design doc to help us work out more high-level details of how 
cassandra as a main memory database should look. I'm working on design doc now. 
It will explore how we can have a pluggable storage engine implementation that 
allows cassandra to run as a main memory database using persistent memory, 
while supporting the existing behaviors of cassandra in that kind of system.

> Enable Cassandra for Persistent Memory 
> ---------------------------------------
>
>                 Key: CASSANDRA-13981
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13981
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Preetika Tyagi
>            Assignee: Preetika Tyagi
>            Priority: Major
>             Fix For: 4.0
>
>         Attachments: in-mem-cassandra-1.0.patch, in-mem-cassandra-2.0.patch, 
> readme.txt, readme2_0.txt
>
>
> Currently, Cassandra relies on disks for data storage and hence it needs data 
> serialization, compaction, bloom filters and partition summary/index for 
> speedy access of the data. However, with persistent memory, data can be 
> stored directly in the form of Java objects and collections, which can 
> greatly simplify the retrieval mechanism of the data. What we are proposing 
> is to make use of faster and scalable B+ tree-based data collections built 
> for persistent memory in Java (PCJ: https://github.com/pmem/pcj) and enable a 
> complete in-memory version of Cassandra, while still keeping the data 
> persistent.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to