We don't have code at the moment.  We (the team I am on at work) are
planning on implementing on Cassandra.  That would mean that we would have
a couple of developers watching and at least one working on the code until
it was stable.

I was hoping that we would be able to contribute this to the jena project
as a complete module.   I understand not wanting to put it in as part of
the project at the beginning,  but that was my goal.

I don't have a release schedule in mind as the in house project is still
fluid.  It might make sense to put it on github to start, but I would like
to see it in a Jena based repo in order to make it more visible to the
development community.

As I keep saying, I need to get final approval from legal before
proceeding.  I expect to hear something later this week.


On Mon, Oct 31, 2016 at 5:53 PM, Andy Seaborne <a...@apache.org> wrote:

> On 31/10/16 13:41, Claude Warren wrote:
>> Andy,
>> This seems like a good approach but does not appear to be in the Jena code
>> base, which I suppose is your comment about an approach to developing
>> work.
>> Does it make sense to create git clones that contain the new work?  Or
>> perhaps branches?
>> Do you have a suggestion or direction you would like to see this go?
> That's the discussion to have.  The first item is "Community".  This is
> all new code? Who is involved? Just you so far?
> A storage layer is not trivial - this is not an "extra" thing.  It is a
> module of it's own, and if the community is significantly different, maybe
> a different different mailing lists (e.g. solr within the the Lucene
> project), maybe even a different project; it can be "straight to TLP" or
> "incubated" - that depends on who is involved.  There are a wide set of
> possibilities.
> If it is starting off, then the Jena git repo isn't a good place to have
> the code.  The lifecycles don't line up.
> A branch that is complete separate is really a separate repo.  Jena can
> get another git repo.
> What would be the release cycle?
> The real issue is the work needed by the PMC for releases.
> To get all options mentioned:
> If this is a one-person effort for now, then starting a github repo and
> creating the initial sketch/framework is an option.  More focused. More
> freedom to try things out and change directions.
>         Andy
>> Claude
>> On Fri, Oct 28, 2016 at 2:35 PM, Andy Seaborne <a...@apache.org> wrote:
>> Claude,
>>> These may help:
>>> I have been thinking about an interface that is more oriented to the
>>> storage than the full DatasetGraph.
>>> StorageRDF breaks down all the operations into those on the default graph
>>> and those on named graphs.  For just a graph, simply ignore the named
>>> graph
>>> operations.
>>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
>>> jects/dsg2/storage/StorageRDF.java
>>> There is an adapter to the DatasetGraph hierarchy (which is needed for
>>> SPARQL):
>>> https://github.com/afs/AFS-Dev/blob/master/src/main/java/pro
>>> jects/dsg2/DatasetGraphStorage.java
>>> If you want to only use existing classes, DatasetGraphTriplesQuads is the
>>> place to start - used by TIM and TDB - yuo can implement without needing
>>> quads/named graphs. Again, simply ignore (throw
>>> UnsupportedOperationException for the named graph calls).
>>> Going the graph route could lead to rework later on for any kind of
>>> performance issues because find(S,P,O) is so narrow and precludes union
>>> default graph except by brute force.  DatasetGraph work with the SPARQL
>>> execution engine.
>>> We still need to discuss how best to approach developing work - it should
>>> not get sucked up by the release cycle.
>>>         Andy
>>> On 26/10/16 19:21, Claude Warren wrote:
>>> My plan is to start with a Graph implementation.  We expect to write 3
>>>> tables: SPO, POS, OPS (I think).  Currently we don't have an easy way to
>>>> handle find( ANY, ANY, ANY) so I suspect we will just start with
>>>> permitting
>>>> a column scan on Cassandra.
>>>> I have not looked at DynamoDB but as I recall there are significant
>>>> differences under the hood.
>>>> I expect that we will move on to a custom model or query engine to get
>>>> the
>>>> best performance but that is not what we are planning for the first cut.
>>>> I am still waiting for management approval to do this at work ....
>>>> sometimes it takes longer to get the paperwork done than it does to
>>>> design
>>>> the thing.
>>>> Claude
>>>> On Mon, Oct 17, 2016 at 6:39 PM, Paul Houle <paul.ho...@ontology2.com>
>>>> wrote:
>>>> I like DynamoDB as a target for this sort of thing.  There are many
>>>>> tasks which are small-scale yet critical where it would otherwise be
>>>>> hard to provide a distributed and reliable database.  Put that together
>>>>> with Lambda,  which does the same for computation,  and you are cooking
>>>>> with gas.
>>>>> I wrote a 1-1 translation of DynamoDB documents to RDF that I use
>>>>> throughout an application;  the code is DynamoDB idiomatic in every
>>>>> way,
>>>>>  just the application reads and writes (a constrained set of) RDF
>>>>> documents.
>>>>> Right now I dump the documents from the DynamoDB system into a triple
>>>>> store when I want a panoptic view,  but with a distributed graph like
>>>>> that would mean being able to run SPARQL queries against DynamoDB
>>>>> directly.
>>>>> There are many products in the same family as Cassandra and DynamoDB
>>>>> and
>>>>> it would be good to think through the math so we can approach them all
>>>>> in a similar way.
>>>>> --
>>>>>   Paul Houle
>>>>>   paul.ho...@ontology2.com
>>>>> On Mon, Oct 17, 2016, at 12:31 PM, A. Soroka wrote:
>>>>> Yep,
>>>>>> http://iswc2011.semanticweb.org/fileadmin/iswc/Papers/
>>>>>> Workshops/SSWS/Ladwig-et-all-SSWS2011.pdf
>>>>>> indicates that they are indexing by subject. As someone who has
>>>>>> implemented LDP, that is definitely the approach that makes sense
>>>>>> there.
>>>>>> ---
>>>>>> A. Soroka
>>>>>> The University of Virginia Library
>>>>>> On Oct 17, 2016, at 12:20 PM, Andy Seaborne <a...@apache.org> wrote:
>>>>>>> IIRC It stores CBDs indexed by subject so it is the "other" model to
>>>>>>> Rya.  Better for LDP (??).
>>>>>>     Andy
>>>>>>> On 17/10/16 15:41, A. Soroka wrote:
>>>>>>> There's also:
>>>>>>>> https://github.com/cumulusrdf/cumulusrdf
>>>>>>>> in a similar vein (RDF over Cassandra). Not sure what kind of
>>>>>>>> particular uses it expects to support.
>>>>>> ---
>>>>>>>> A. Soroka
>>>>>>>> The University of Virginia Library
>>>>>>>> On Oct 17, 2016, at 7:02 AM, Andy Seaborne <a...@apache.org> wrote:
>>>>>>>>> Hi Claude,
>>>>>>>>> There is certainly interest from me.
>>>>>>>>> What the best thing to do depends on various factors.  By putting
>>>>>>>>> it
>>>>>>>>> in extras I presume you mean it gets added to the release?  That is
>>>>>>> not the
>>>>> only way forward.
>>>>>> An important aspect of Apache is "Community over code" - will there
>>>>>>>>> be a community around this code?  Is that community the same, or
>>>>>>> significant overlap, as the Jena community?
>>>>>> There are various reasons for wanting RDF over a column store -
>>>>>>>>> which use cases are the most important for this work?
>>>>>> They lead to different ways of using Cassandra. For example,
>>>>>>>>> Rya(incubating) uses Accumulo tables as indexes, and partial scans
>>>>>>>> of
>>>>>>> the
>>>>> table is streaming.  Other systems try to use the columns for
>>>>> properties,
>>>>> possibly more useful for LDP style than SPARQL.
>>>>>>   Andy
>>>>>>>>> On 15/10/16 18:38, Claude Warren wrote:
>>>>>>>>> Howdy,
>>>>>>>>>> We have a project at work that is implementing Jena Graph on
>>>>>>>>>> Cassandra.  I
>>>>> am wondering if there is enough interest here to accept it as a
>>>>>>> contribution.  I was thinking that it might fit in the Extras
>>>>>>>>>> category.
>>>>>> I can not promise release of the code yet as I have to present it
>>>>>>>>>> to our
>>>>> internal Intellectual Property group first.
>>>>>>>>>> Thoughts?
>>>>>>>>>> Claude

I like: Like Like - The likeliest place on the web
LinkedIn: http://www.linkedin.com/in/claudewarren

Reply via email to