actually, looking at the code it is a datsset graph that the cassandra code is built on.
On Mon, Sep 4, 2017 at 5:17 PM, Claude Warren <[email protected]> wrote: > The jena-on-cassandra solution is quite simple. it is an implementation > of the graph layer so it doesn't do the joins directly but lets the higher > level do it. There are 4 copies of the data stored in different order > gspo, spog and 2 others that escape my mind at the moment but start with > "o" and "p". > > The tables are "indexed" by their first segments. The system looks at the > known values and finds the table with the best index to solve the query, it > then performs the query and any filtering as necessary to return the > results. > > Inserts are written into all the tables (as would be expected) > > Deletes are done on a separate thread (eventual consistency after all). > > It uses the standard model-on-graph to create a model. > > Much of the work was really to understand how Cassandra does its indexing > and how do do deletions. > > As a final note, the Object field is stored in several formats (URI, > numeric value [if appropriate], string value and perhaps one other, I > forget just now). So when finding a value it uses the proper value index. > All a bit tricky but it seems to work. > > I would be glad to spend some time with you going over the design and > design decisions if you wish. > > Claude > > On Mon, Sep 4, 2017 at 12:10 PM, <[email protected]> wrote: > >> Little of both? :grin: >> >> Primarily I am interested because of a grant [1] in which the Smithsonian >> Institution (where I work) is participating in a supporting role (partly >> because I convinced us to). That work involves using Cassandra for >> distributed storage, and it will also involve a distributed LDP >> implementation (the Fedora API referred to in that grant description is >> really just a packaging of Memento [2] with LDP [3]), hence my interest in >> jena-on-cassandra. >> >> As I understand the join question, the usual move with Cassandra is to >> denormalize and store the joined data together, but that's obviously >> nontrivial in our situation, where we don't know the potential queries. >> Have you looked at an indexing solution such as was used by CumulusRDF [4]? >> >> ajs6f >> >> [1] https://www.imls.gov/grants/awarded/lg-71-17-0159-17 >> [2] http://www.mementoweb.org/guide/quick-intro/ >> [3] https://www.w3.org/TR/ldp/ >> [4] http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Worksh >> ops/SSWS/Ladwig-et-all-SSWS2011.pdf >> >> Claude Warren wrote on 9/2/17 12:44 PM: >> >> are you looking to use jena-on-cassandra or do you have ideas? what leads >>> you to ask about it? >>> >>> >>> On Sat, Sep 2, 2017 at 1:21 PM, <[email protected]> wrote: >>> >>> Hey, Claude-- >>>> >>>> Just curious as to where https://github.com/Claudenw/jena-on-cassandra >>>> has ended up. Is that still work-in-progress? >>>> >>>> -- >>>> >>>> ajs6f >>>> >>>> >>> >>> >>> > > > -- > I like: Like Like - The likeliest place on the web > <http://like-like.xenei.com> > LinkedIn: http://www.linkedin.com/in/claudewarren > -- I like: Like Like - The likeliest place on the web <http://like-like.xenei.com> LinkedIn: http://www.linkedin.com/in/claudewarren
