Sounds in some ways like two different efforts. Jena brings a lot of assumptions and machinery that aren't present and won't ever be present in Commons. I could see a SPARQL-less Commons impl over Cassandra being useful for the LDP-style use case, but that doesn't sound like what Claude is trying to get done.
--- A. Soroka The University of Virginia Library > On Oct 31, 2016, at 10:27 AM, Claude Warren <[email protected]> wrote: > > Well, I started the process at work with Apache Jena as the target, If I > change target I have to start the process over. Unless there is a very > strong reason to move to Commons RDF I would prefer to stay with Jena. > > Given that we want to run SPARQL queries over the data I think we want to > stay with Jena. > > Claude > > On Mon, Oct 31, 2016 at 2:23 PM, Stian Soiland-Reyes <[email protected]> > wrote: > >> Do you think it would make sense to do a Cassandra Commons RDF API binding >> for Graph or Dataset..? Or would that be too high level? >> >> The streaming part would fit well there I think. >> >> Commons RDF 0.3.0 is under vote now, adding Quad, Dataset and "RDF" as the >> factory interface. >> >> https://commonsrdf.incubator.apache.org/apidocs/index.html? >> org/apache/commons/rdf/api/package-summary.html >> >> But it could make more sense as a Jena DatasetGraph so it can be used by >> sparql queries etc. (And then exposed as Commons RDF Jena bindings if one >> so wanted) >> >> On 31 Oct 2016 1:41 pm, "Claude Warren" <[email protected]> wrote: >> >>> Andy, >>> >>> This seems like a good approach but does not appear to be in the Jena >> code >>> base, which I suppose is your comment about an approach to developing >> work. >>> >>> Does it make sense to create git clones that contain the new work? Or >>> perhaps branches? >>> >>> Do you have a suggestion or direction you would like to see this go? >>> >>> Claude >>> >>> >>> >>> On Fri, Oct 28, 2016 at 2:35 PM, Andy Seaborne <[email protected]> wrote: >>> >>>> Claude, >>>> >>>> These may help: >>>> >>>> I have been thinking about an interface that is more oriented to the >>>> storage than the full DatasetGraph. >>>> >>>> StorageRDF breaks down all the operations into those on the default >> graph >>>> and those on named graphs. For just a graph, simply ignore the named >>> graph >>>> operations. >>>> >>>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro >>>> jects/dsg2/storage/StorageRDF.java >>>> >>>> There is an adapter to the DatasetGraph hierarchy (which is needed for >>>> SPARQL): >>>> >>>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro >>>> jects/dsg2/DatasetGraphStorage.java >>>> >>>> If you want to only use existing classes, DatasetGraphTriplesQuads is >> the >>>> place to start - used by TIM and TDB - yuo can implement without >> needing >>>> quads/named graphs. Again, simply ignore (throw >>>> UnsupportedOperationException for the named graph calls). >>>> >>>> Going the graph route could lead to rework later on for any kind of >>>> performance issues because find(S,P,O) is so narrow and precludes union >>>> default graph except by brute force. DatasetGraph work with the SPARQL >>>> execution engine. >>>> >>>> We still need to discuss how best to approach developing work - it >> should >>>> not get sucked up by the release cycle. >>>> >>>> Andy >>>> >>>> >>>> On 26/10/16 19:21, Claude Warren wrote: >>>> >>>>> My plan is to start with a Graph implementation. We expect to write 3 >>>>> tables: SPO, POS, OPS (I think). Currently we don't have an easy way >> to >>>>> handle find( ANY, ANY, ANY) so I suspect we will just start with >>>>> permitting >>>>> a column scan on Cassandra. >>>>> >>>>> I have not looked at DynamoDB but as I recall there are significant >>>>> differences under the hood. >>>>> >>>>> I expect that we will move on to a custom model or query engine to get >>> the >>>>> best performance but that is not what we are planning for the first >> cut. >>>>> >>>>> I am still waiting for management approval to do this at work .... >>>>> sometimes it takes longer to get the paperwork done than it does to >>> design >>>>> the thing. >>>>> >>>>> >>>>> Claude >>>>> >>>>> On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <[email protected] >>> >>>>> wrote: >>>>> >>>>> I like DynamoDB as a target for this sort of thing. There are many >>>>>> tasks which are small-scale yet critical where it would otherwise be >>>>>> hard to provide a distributed and reliable database. Put that >> together >>>>>> with Lambda, which does the same for computation, and you are >> cooking >>>>>> with gas. >>>>>> >>>>>> I wrote a 1-1 translation of DynamoDB documents to RDF that I use >>>>>> throughout an application; the code is DynamoDB idiomatic in every >>> way, >>>>>> just the application reads and writes (a constrained set of) RDF >>>>>> documents. >>>>>> >>>>>> Right now I dump the documents from the DynamoDB system into a triple >>>>>> store when I want a panoptic view, but with a distributed graph like >>>>>> that would mean being able to run SPARQL queries against DynamoDB >>>>>> directly. >>>>>> >>>>>> There are many products in the same family as Cassandra and DynamoDB >>> and >>>>>> it would be good to think through the math so we can approach them >> all >>>>>> in a similar way. >>>>>> >>>>>> -- >>>>>> Paul Houle >>>>>> [email protected] >>>>>> >>>>>> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote: >>>>>> >>>>>>> Yep, >>>>>>> >>>>>>> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/ >>>>>>> >>>>>> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf >>>>>> >>>>>>> >>>>>>> indicates that they are indexing by subject. As someone who has >>>>>>> implemented LDP, that is definitely the approach that makes sense >>> there. >>>>>>> >>>>>>> --- >>>>>>> A. Soroka >>>>>>> The University of Virginia Library >>>>>>> >>>>>>> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <[email protected]> >> wrote: >>>>>>>> >>>>>>>> IIRC It stores CBDs indexed by subject so it is the "other" model >> to >>>>>>>> >>>>>>> Rya. Better for LDP (??). >>>>>> >>>>>>> >>>>>>>> Andy >>>>>>>> >>>>>>>> On 17/10/16 15:41, A. Soroka wrote: >>>>>>>> >>>>>>>>> There's also: >>>>>>>>> >>>>>>>>> https://github.com/cumulusrdf/cumulusrdf >>>>>>>>> >>>>>>>>> in a similar vein (RDF over Cassandra). Not sure what kind of >>>>>>>>> >>>>>>>> particular uses it expects to support. >>>>>> >>>>>>> >>>>>>>>> --- >>>>>>>>> A. Soroka >>>>>>>>> The University of Virginia Library >>>>>>>>> >>>>>>>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <[email protected]> >> wrote: >>>>>>>>>> >>>>>>>>>> Hi Claude, >>>>>>>>>> >>>>>>>>>> There is certainly interest from me. >>>>>>>>>> >>>>>>>>>> What the best thing to do depends on various factors. By putting >>> it >>>>>>>>>> >>>>>>>>> in extras I presume you mean it gets added to the release? That >> is >>>>>> not the >>>>>> only way forward. >>>>>> >>>>>>> >>>>>>>>>> An important aspect of Apache is "Community over code" - will >> there >>>>>>>>>> >>>>>>>>> be a community around this code? Is that community the same, or >>>>>> significant overlap, as the Jena community? >>>>>> >>>>>>> >>>>>>>>>> There are various reasons for wanting RDF over a column store - >>>>>>>>>> >>>>>>>>> which use cases are the most important for this work? >>>>>> >>>>>>> >>>>>>>>>> They lead to different ways of using Cassandra. For example, >>>>>>>>>> >>>>>>>>> Rya(incubating) uses Accumulo tables as indexes, and partial scans >>> of >>>>>> the >>>>>> table is streaming. Other systems try to use the columns for >>> properties, >>>>>> possibly more useful for LDP style than SPARQL. >>>>>> >>>>>>> >>>>>>>>>> Andy >>>>>>>>>> >>>>>>>>>> On 15/10/16 18:38, Claude Warren wrote: >>>>>>>>>> >>>>>>>>>>> Howdy, >>>>>>>>>>> >>>>>>>>>>> We have a project at work that is implementing Jena Graph on >>>>>>>>>>> >>>>>>>>>> Cassandra. I >>>>>> >>>>>>> am wondering if there is enough interest here to accept it as a >>>>>>>>>>> contribution. I was thinking that it might fit in the Extras >>>>>>>>>>> >>>>>>>>>> category. >>>>>> >>>>>>> >>>>>>>>>>> I can not promise release of the code yet as I have to present >> it >>>>>>>>>>> >>>>>>>>>> to our >>>>>> >>>>>>> internal Intellectual Property group first. >>>>>>>>>>> >>>>>>>>>>> Thoughts? >>>>>>>>>>> >>>>>>>>>>> Claude >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> >>> >>> >>> -- >>> I like: Like Like - The likeliest place on the web >>> <http://like-like.xenei.com> >>> LinkedIn: http://www.linkedin.com/in/claudewarren >>> >> > > > > -- > I like: Like Like - The likeliest place on the web > <http://like-like.xenei.com> > LinkedIn: http://www.linkedin.com/in/claudewarren
