The requirements for distributed storage are actually that DRAS-TIC (see that grant description) be used, and DRAS-TIC is 100% based around Cassandra, so
effectively, the requirement is that Cassandra be used, at least at core. So part of what I am wondering (if it's not obvious) is "If we're going to have a
Cassandra cluster as part of this, how can we get as much mileage as possible out of it?"
I know that Cassandra offers some ordering capabilities out-of-the-box, although I'm not familiar with them. Maybe they could be used to support merge join
generally.
CumulusRDF (as shown in that paper I forwarded) uses a structure in which they mostly leave column values empty. The information is stored entirely in the keys,
and use is made of prefix lookup. Does your system do something like that, Claude? It sounds like you are storing tuple component in the column values.
ajs6f
Andy Seaborne wrote on 9/5/17 4:43 AM:
On Mon, Sep 4, 2017 at 12:10 PM, <[email protected]> wrote:
Little of both? :grin:
Primarily I am interested because of a grant [1] in which the Smithsonian
Institution (where I work) is participating in a supporting role (partly
because I convinced us to). That work involves using Cassandra for
distributed storage, and it will also involve a distributed LDP
implementation (the Fedora API referred to in that grant description is
really just a packaging of Memento [2] with LDP [3]), hence my interest in
jena-on-cassandra.
Turning this round - what are the requirements for the distributed storage?
As I understand the join question, the usual move with Cassandra is to
denormalize and store the joined data together, but that's obviously
nontrivial in our situation, where we don't know the potential queries.
Have you looked at an indexing solution such as was used by CumulusRDF [4]?
(single graph example)
If Cassandra has stored PSO and POS then parallel merge joins are possible.
Andy
ajs6f
[1] https://www.imls.gov/grants/awarded/lg-71-17-0159-17
[2] http://www.mementoweb.org/guide/quick-intro/
[3] https://www.w3.org/TR/ldp/
[4] http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/Worksh
ops/SSWS/Ladwig-et-all-SSWS2011.pdf
Claude Warren wrote on 9/2/17 12:44 PM:
are you looking to use jena-on-cassandra or do you have ideas? what leads
you to ask about it?
On Sat, Sep 2, 2017 at 1:21 PM, <[email protected]> wrote:
Hey, Claude--
Just curious as to where https://github.com/Claudenw/jena-on-cassandra
has ended up. Is that still work-in-progress?
--
ajs6f
--
I like: Like Like - The likeliest place on the web
<http://like-like.xenei.com>
LinkedIn: http://www.linkedin.com/in/claudewarren