[ https://issues.apache.org/jira/browse/CASSANDRA-13981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482488#comment-16482488 ]
Jason Brown commented on CASSANDRA-13981: ----------------------------------------- Thanks, [~pree] and [~shylaja.koko...@intel.com], for the patches. I've been reading them, understanding the scope of the technology, and see the direction you are going. However, I'd like to propose a slightly different direction. Stepping back, the pcj library is divided into two parts: the higher-level pcj components (as used in the version of this patch as previously posted), and the lower-level API, called LLPL in the library. LLPL is much smaller than the pcj parts, and offers a direct and simple way to just write bytes into a backing array from the persistent memory. In my option this will be far more natural for the cassandra community and developers, and provides a more direct access to the storage bytes. We already have lots of serialization code, and we understand that quite well; thus I'd like to keep leveraging that lower-level thinking. We will need to write custom, non-generic data structures (like we already have for our LSM-based engine), but I only see this as complete win. We need to optimize, in every way we reasonably can, our data structures as we are a database, after all. LLPL has some rough edges wrt code optimization and we will want to modify the transaction model a bit, but I suspect the pcj authors will work with us toward that end. With this as background, I've started sketching out a direction I think we should pursue. This sketch primarily shows the direction for thinking about serialization and memory allocation using LLPL. DISCLAIMER: this code doesn't compile, is not syntactically correct, and is wholly incomplete. It should be thought of a loose blueprint (sketch!) for discussion. The sketch compromises of the following concepts: - thread per sub-range (to reduce lock contention in the data structures). This is kinda inspired by the thread-per-core notion, but on a smaller scale. ({{TreeManager}} in this patch is a rudimentary dispatch class.) - how partitions should be stored - allocate a {{MemoryRegion}} from the LLPL allocator, wrap it with a {{DataOutputPlus}}, and write as we normally would. - rough implementations of the data structures for the primary index and storing rows. A longer treatment of this topic will be in the deisgn doc (see below), but using a tree for the primary index (for partition look up) and then a map for the cql rows is the basic idea. I mostly want to show the ideas around serialization so I didn't actually implement the index nor the map - except for the leaf/entry nodes which show how the serailization/data layout fits into the data structure. - explicitly pass the transaction around on writes (instead of looking for it in a {{ThreadLocal}}, as the pcj transactions does). ||13981-sketch-1|| |[branch|https://github.com/jasobrown/cassandra/tree/13981-sketch-1]| I am proposing this sketch as a starting for for discussion, along with a forthcoming design doc to help us work out more high-level details of how cassandra as a main memory database should look. I'm working on design doc now. It will explore how we can have a pluggable storage engine implementation that allows cassandra to run as a main memory database using persistent memory, while supporting the existing behaviors of cassandra in that kind of system. > Enable Cassandra for Persistent Memory > --------------------------------------- > > Key: CASSANDRA-13981 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13981 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Preetika Tyagi > Assignee: Preetika Tyagi > Priority: Major > Fix For: 4.0 > > Attachments: in-mem-cassandra-1.0.patch, in-mem-cassandra-2.0.patch, > readme.txt, readme2_0.txt > > > Currently, Cassandra relies on disks for data storage and hence it needs data > serialization, compaction, bloom filters and partition summary/index for > speedy access of the data. However, with persistent memory, data can be > stored directly in the form of Java objects and collections, which can > greatly simplify the retrieval mechanism of the data. What we are proposing > is to make use of faster and scalable B+ tree-based data collections built > for persistent memory in Java (PCJ: https://github.com/pmem/pcj) and enable a > complete in-memory version of Cassandra, while still keeping the data > persistent. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org