Re: [Discussed] Integrating SPARQL-Gremlin 0.2 Plugin with the TinkerPop codebase

Stephen Mallette Wed, 24 Jan 2018 12:39:06 -0800

I just wanted to quickly note that sparql-gremlin is now building properly
on the TINKERPOP-1878 branch (i just pushed some changes to clean up some
pom.xml/dependency conflicts issues). As we discussed in this thread, the
branch currently contains a fairly bare bones model and it will need some
work to get it complete enough for it to be considered for merge to a
release branch. In a way that's good, because it will give the community a
chance to shape exactly how sparql-gremlin will work.


On Tue, Jan 9, 2018 at 10:47 AM, Harsh Thakkar <[email protected]> wrote:

> Hi Stephen,
>
> It does make sense to me. The work is going on slow but steady. Let's wait
> and see how other devs feel about this, as you said.
>
> Cheers,
> Harsh
> On 2018-01-09 16:31, Stephen Mallette <[email protected]> wrote:
> > I've had some thoughts on this thread since December. Since
> sparql-gremlin
> > has a pretty long to-do list and there is likely a lot of discussion
> > required on this list prior to it being ready for merge to a release
> > branch, it seems like we might treat this as a normal feature under
> > development. I think we should just merge it to a development branch in
> the
> > TinkerPop repository and then collaborate on it from there. We've taken
> > similar approaches with other "long term" pull requests which has allowed
> > the code to develop as it would typically would. I'm thinking that's a
> > better approach than a "big-bang" pull request.
> >
> > Harsh, if that's ok with you, feel free to issue your PR against master
> and
> > I'll get it setup against a development branch on our end (no rush,
> please
> > give it a few days to see if everyone is ok with that approach).
> >
> > On Mon, Dec 18, 2017 at 5:16 PM, Stephen Mallette <[email protected]>
> > wrote:
> >
> > > > Should I also remove the northwind file?
> > >
> > > I think I'd prefer to see all of our sparql examples use the existing
> toy
> > > graphs - better not to add more options - so I'd remove it as well. If
> > > anyone disagrees, I don't really feel too strongly about not including
> it,
> > > but it would be good to hear some reasoning as to why the existing
> datasets
> > > that we already package are insufficient for users to learn with.
> > >
> > > >  will need some help (quite possibly) with getting things right as
> far
> > > as the DSL pattern for the gremlin language variants is concerned.
> > >
> > > We can help point you in the right direction when you get stuck or
> need to
> > > clarify things. If you get really stuck, we can move to step 2 and
> have you
> > > issue a PR sooner than later and we'll just merge what you have to a
> > > development branch so others can collaborate with you on it more
> easily.
> > > Let's see how things develop.
> > >
> > > > Also, since you are very well versed in the test suite, I would also
> > > request some assistance for the same when we are there :) as it is our
> > > first time pushing a work to the production level. So bear with us :)
> > >
> > > no worries. i will need to think on the testing approach. my thinking
> will
> > > be focused on what i would call integration tests i.e. tests that
> evaluate
> > > sparql-gremlin across the entire stack. I don't imagine that you need
> my
> > > input to write some unit tests to validate the workings of your current
> > > code though.
> > >
> > > > One question, though there is not a strict deadline, when is the
> 3.3.2
> > > release planned?
> > >
> > > We have no timeline on 3.3.2 at this point (we are just in the process
> of
> > > releasing 3.3.1 so it will be a while before we see 3.3.2). I think the
> > > merging of gremlin-javascript will likely trigger that release, i would
> > > guess no earlier than February 2018 if all goes right with that. I also
> > > don't mean to make it sound like sparql-gremlin needs to be part of
> that
> > > release, so if it's not ready then, it's not ready and it releases with
> > > 3.3.3. You'll find that with TinkerPop, we tend to release when
> software is
> > > "ready" and not by setting long range time deadlines for ourselves. So,
> > > don't worry about when we release sparql-gremlin too much. Let's stay
> > > focused on just getting the code right.
> > >
> > > Thanks for your understanding.
> > >
> > >
> > >
> > >
> > > On Mon, Dec 18, 2017 at 5:01 PM, Harsh Thakkar <[email protected]>
> wrote:
> > >
> > >> Hello Stephen,
> > >>
> > >> Alright, I will remove the bsbm file from the repository and I refer
> to
> > >> it in the docs (with some examples) sharing a link to download from
> the
> > >> website if that is acceptable. No worries.
> > >> Should I also remove the northwind file?
> > >>
> > >>
> > >> Your expectations are reasonable, it was just that I wasn't very clear
> > >> about what needs to be done. Now it is pretty much clear. It will
> take some
> > >> time for me to wrap my head around the specifics of the tinkerpop
> codebase
> > >> in order to satisfy the 3 requirements. I will need some help (quite
> > >> possibly) with getting things right as far as the DSL pattern for the
> > >> gremlin language variants is concerned. I am already reading the
> dev-docs
> > >> on this, from here:
> > >> http://tinkerpop.apache.org/docs/current/reference/#dsl
> > >>
> > >> Also, since you are very well versed in the test suite, I would also
> > >> request some assistance for the same when we are there :) as it is our
> > >> first time pushing a work to the production level. So bear with us :)
> > >>
> > >> I agree with you on not having any API shifts, this does certainly not
> > >> give a good impression, also its a lot of effort down the drain.
> Quality
> > >> must be ensured.
> > >>
> > >> One question, though there is not a strict deadline, when is the 3.3.2
> > >> release planned?
> > >>
> > >> Cheers,
> > >> Harsh
> > >>
> > >>
> > >> On 2017-12-18 20:48, Stephen Mallette <[email protected]> wrote:
> > >> > A quick note about (4) - Having some sample data for user
> convenience is
> > >> > good. Files like that though should not be "resources", but should
> be
> > >> added
> > >> > here:
> > >> >
> > >> > https://github.com/harsh9t/tinkerpop/tree/master/data
> > >> >
> > >> > Placing those files there will allow them to be included in the the
> .zip
> > >> > distribution files we produce for Gremlin Console and Gremlin
> Server.
> > >> Now,
> > >> > that BSBM file is a bit much. It's 90M in size and 22M compressed to
> > >> zip.
> > >> > Either way, that's going to push our already large zip distributions
> > >> bigger
> > >> > than they should be. I don't think the value of this file is worth
> the
> > >> > that. We can definitely make it available as a separate download
> from
> > >> the
> > >> > web site if everyone thinks it's that important and then provide
> links
> > >> to
> > >> > it, but I don't think it should be in the source repository as it is
> > >> now.
> > >> >
> > >> > Aside from (4) I just wanted to make some general points about my
> > >> > expectations for a sparql-gremlin being part of a TinkerPop release
> > >> branch.
> > >> > Apologies if this wasn't clear from when we started. I think we
> need to
> > >> see
> > >> > sparql-gremlin as close to a final form as possible before we look
> to
> > >> merge
> > >> > it. By "final" I mean:
> > >> >
> > >> > 1. sparql-gremlin has a full test suite - that means good unit test
> > >> > coverage at a minimum and integration tests as necessary (and I
> sense
> > >> they
> > >> > will be necessary). I agree with marko, that we also have to
> consider
> > >> the
> > >> > testing pattern carefully, so that we set the stage properly for
> future
> > >> > languages.
> > >> > 2. sparql-gremlin has a clear and easy method of usage that is
> > >> consistent
> > >> > with how TinkerPop works - this is crucial prior to merge because
> > >> TinkerPop
> > >> > has high profile production usage. once merged sparql-gremlin will
> > >> > immediately be consumed by users and we will not want to shift that
> API
> > >> > once it is available. we will break the code of too many people if
> we do
> > >> > that. we need to strive to get this right from the start.
> > >> > 3. sparql-gremlin has a good body of user documentation.
> > >> >
> > >> > I don't think any of this is insurmountable, but it does mean there
> is a
> > >> > fair bit of work to do and it won't happen overnight. We held
> > >> > gremlin-dotnet to the same rigorous level before merging and even
> > >> > gremlin-javascript all these months later is still not merged for
> > >> basically
> > >> > the same reasons, so this is just the process that we tend to go
> > >> through.
> > >> > If we follow what we did for the GLVs, we will likely follow this
> basic
> > >> > process:
> > >> >
> > >> > 1. You get sparql-gremlin "pretty close" to final in your fork
> > >> > 2. Once we all agree that you are "pretty close", you offer the pull
> > >> request
> > >> > 3. We merge it into a TinkerPop branch for further evaluation (this
> > >> will be
> > >> > a development branch and not a release branch)
> > >> > 4. We work together to get the development branch "final"
> > >> > 5. We issue a pull request from that development branch
> > >> > 6. The pull request goes through the standard review/vote process
> and we
> > >> > merge to a release branch.
> > >> > 7. sparql-gremlin will likely be part of 3.3.2 release
> > >> >
> > >> > I hope that make sense.
> > >> >
> > >> >
> > >> > On Mon, Dec 18, 2017 at 12:26 PM, Marko Rodriguez <
> [email protected]
> > >> >
> > >> > wrote:
> > >> >
> > >> > > Actually, my (3) is bad. Given that query() would always return a
> > >> > > Traversal<S, Map<String,E>>, it would be necessary to have that
> > >> linearized
> > >> > > to Traversal<Vertex,Vertex> for the test suite to validate it.
> That
> > >> would
> > >> > > mean making SPARQLTraversal support extended Traversal methods
> like
> > >> > > flatMap(), blah, blah… That seems excessive, though convenient.
> > >> > >
> > >> > > Hm…… Thoughts?,
> > >> > > Marko.
> > >> > >
> > >> > > http://markorodriguez.com
> > >> > >
> > >> > >
> > >> > >
> > >> > > > On Dec 18, 2017, at 9:21 AM, Marko Rodriguez <
> [email protected]>
> > >> > > wrote:
> > >> > > >
> > >> > > > Hello,
> > >> > > >
> > >> > > > A couple of items worth considering.
> > >> > > >
> > >> > > > Regarding (7), that should be done prior to master/ merge. It is
> > >> > > necessary to follow the patterns that are established in TinkerPop
> > >> > > regarding language interoperability. The DSL pattern developed for
> > >> Gremlin
> > >> > > language variants seems to be the best pattern for distinct
> languages
> > >> as
> > >> > > well. In essence, if your language is not a fluent language, and
> > >> instead,
> > >> > > uses a String, then it should be wrapped as such in a fluent
> interface
> > >> > > using all the Strategy, Step, and Traversal methods that makes
> sense
> > >> so it
> > >> > > works within the larger infrastructure of TinkerPop (e.g.
> testing! —
> > >> see
> > >> > > below). What I proposed in my previous email seems the easiest and
> > >> cleanest
> > >> > > way to do things.
> > >> > > >
> > >> > > > Regarding (3), testing is crucial. Given that this would be
> > >> TinkerPop’s
> > >> > > first distinct language, we don’t have a pattern set forth for
> > >> testing.
> > >> > > However, this doesn’t mean we can’t improvise on our current
> model.
> > >> Off the
> > >> > > top of my head, perhaps the best way would be to follow the
> > >> > > ProcessTestSuite and do the SPARQL variants of those. For
> instance:
> > >> > > >
> > >> > > >       https://github.com/apache/tinkerpop/blob/master/gremlin-
> > >> > > test/src/main/java/org/apache/tinkerpop/gremlin/process/
> > >> > > traversal/step/map/VertexTest.java#L62 <
> https://github.com/apache/
> > >> > > tinkerpop/blob/master/gremlin-test/src/main/java/org/apache/
> > >> > > tinkerpop/gremlin/process/traversal/step/map/VertexTest.java#L62>
> > >> > > >
> > >> > > > The SPARQL test version would be:
> > >> > > >
> > >> > > > @Override
> > >> > > > public Traversal<Vertex, Vertex> get_g_VX1X_out(final Object
> v1Id) {
> > >> > > >   return sparql.query(“SELECT ?x WHERE {“ + toURI(v1Id) + “ ?a
> ?x
> > >> }”);
> > >> > > > }
> > >> > > >
> > >> > > > In this way, sparql is your SPARQLTraversalSource for each test
> and
> > >> > > query() will return a Traversal typed according (query() will
> have to
> > >> have
> > >> > > solid generic support). From there, you would implement each and
> > >> every test
> > >> > > that is semantically possible with SPARQL (where SPARQ won’t be
> able
> > >> to
> > >> > > semantically cover all Gremlin tests).
> > >> > > >
> > >> > > > Stephen has done a lot of recent work to generalize our test
> suite
> > >> out
> > >> > > of Java so it is in a language agnostic form. I haven’t been
> > >> following that
> > >> > > work so I’m not sure what I’m am saying above is exactly as it
> should
> > >> be
> > >> > > done, but it is a start.
> > >> > > >
> > >> > > > HTH,
> > >> > > > Marko.
> > >> > > >
> > >> > > > http://markorodriguez.com <http://markorodriguez.com/>
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > >> On Dec 18, 2017, at 7:43 AM, Harsh Thakkar <[email protected]
> > >> <mailto:
> > >> > > [email protected]>> wrote:
> > >> > > >>
> > >> > > >> Hi Stephen and All,
> > >> > > >>
> > >> > > >> Thanks for going through the code. I address your questions
> below
> > >> (in
> > >> > > the same order):
> > >> > > >>
> > >> > > >> 1. Yes, this file can be removed. It was just to test the
> traversal
> > >> > > method.
> > >> > > >>
> > >> > > >> 2. Yes, I have commented out the block of tests at this moment
> > >> since we
> > >> > > do not need to run tests at mvn clean install time. However, I
> kept
> > >> it (in
> > >> > > commented out form) if there arose a need in future for the same.
> It
> > >> can
> > >> > > surely be removed if you think, it won't be necessary.
> > >> > > >>
> > >> > > >> 3. There were two testing units (we continued them from
> Daniel's
> > >> > > version), one to check whether the prefixes are being encoded
> > >> correctly,
> > >> > > the second one is to test whether the generated traversal is
> correct
> > >> (in
> > >> > > short the compiler is functioning as it should). Since, we
> extended
> > >> > > previous work supporting a variety of SPARQL operators, more test
> > >> cases can
> > >> > > be added to validate that each of these is functioning as
> expected.
> > >> > > However, as I mentioned in point #2. we need not do it explicitly
> as
> > >> we
> > >> > > (Dharmen and I) have already tested them on 3-4 different
> datasets and
> > >> > > query-sets. Now, since we did not know if that was going to be
> > >> formally
> > >> > > required in the future or not, we left them as it is, just
> commented
> > >> it out.
> > >> > > >>
> > >> > > >> 4. These resources are the graphml files that we wish to
> provide
> > >> the
> > >> > > users, for (i) loading and querying famous datasets - the Berlin
> > >> SPARQL
> > >> > > Benchmark (BSBM)  (famous in the Semantic Web-RDF community) so
> that
> > >> they
> > >> > > do not have to look elsewhere for the same. (ii) Also, it
> provides a
> > >> strong
> > >> > > use-case for demonstrating the applicability of sparql-gremlin
> > >> (creates
> > >> > > trust in the SW community users) and (iii) to keep the plug-in
> pretty
> > >> much
> > >> > > self-dependent.
> > >> > > >>
> > >> > > >> 5 & 6  YES, damn it. The IDE did this. I will revert these
> changes.
> > >> > > It's like when you are not looking, the IDE does things on it own
> :-/
> > >> > > apologies!
> > >> > > >>
> > >> > > >> 7. Regarding, Marko's thoughts -- Yes, I was waiting for you to
> > >> reply
> > >> > > to the thread. I do have some thoughts on this. But first, I was
> > >> wondering
> > >> > > if this (what Marko suggested) is supposed to be entirely
> implemented
> > >> in
> > >> > > the current version of sparql-gremlin 0.2, i.e. including the
> > >> > > withStrategies() and withStrategies() and remote() features, or
> it is
> > >> to be
> > >> > > supported eventually (after the sparql-gremlin 0.2.0) plugin is
> > >> rolled out.
> > >> > > Also, I am not entirely sure I got what Marko was exactly
> suggesting.
> > >> I
> > >> > > bring this to light in the in-line style reply to Marko's comment
> > >> later
> > >> > > here.
> > >> > > >>
> > >> > > >> The current implementation is more of a typical compiler, the
> > >> users,
> > >> > > however, can use it by specifying the query file and the dataset
> > >> against
> > >> > > which it is to be executed via the command (once in the gremlin
> > >> shell):
> > >> > > >>
> > >> > > >> gremlin> graph = TinkerGraph.open(..)
> > >> > > >> gremlin> SparqlToGremlinCompiler.convertToGremlinTraversal(
> graph,
> > >> > > "SELECT ?a WHERE {....} ")
> > >> > > >> ==>{?x:marko, ?y:29}
> > >> > > >> ==>{?x:josh, ?y:32}
> > >> > > >>
> > >> > > >>
> > >> > > >> i.e. load a graph using pre-defined tinkerpop methods (
> graph.io <
> > >> > > http://graph.io/>(IoCore.gryo()).readGraph(graphName),
> > >> > > TinkerGraph.open(), etc ) , then execute the traversal as above
> with
> > >> > > arguments -- (graph, queryString), where queryString = "SPARQL
> query".
> > >> > > >>
> > >> > > >> Now Let me quote Marko's comment and reply in-line to bring
> more
> > >> > > clarity:
> > >> > > >>
> > >> > > >> 1. There should be a SPARQLTraversalSource which supports one
> spawn
> > >> > > method â€” query(String).
> > >> > > >>      This is already happening inside the code. Therefore, we
> do
> > >> not
> > >> > > need to mention it explicitly. Please correct me if I got it wrong
> > >> here.
> > >> > > >>
> > >> > > >> 2. SPARQLTraversal is spawned and it only supports only the
> > >> Traversal
> > >> > > methods â€” next(), toList(), iterate(), etc.
> > >> > > >>      All traversal methods that are supported, available to a
> > >> regular
> > >> > > gremlin traversal, can be used by the sparql-gremlin compiler
> > >> generated
> > >> > > traversal as well.
> > >> > > >>
> > >> > > >> 3. query(String) adds a ConstantStep(String).
> > >> > > >>             This is happening internally (as shown in the
> example
> > >> > > above), we can also make explicit. i.e. let the user only provide
> the
> > >> > > queryString instead of the whole "SparqlToGremlinCompiler.
> > >> > > convertToGremlinTraversal(graph, "SELECT ?a WHERE {....} ")"
> command.
> > >> > > Does this make sense? or am I missing something here.
> > >> > > >>
> > >> > > >>
> > >> > > >> 4. SPARQLTraversalSource has a registered SPARQLStrategy.
> > >> > > >>      At this moment, we leave it to the default setting for
> this
> > >> > > strategy selection.
> > >> > > >>
> > >> > > >> 5. SPARQLTraversalSource should also support withStrategies(),
> > >> > > withoutStrategies(), withRemote(), etc.
> > >> > > >>      Once the traversal is generated, it can support all
> strategies
> > >> > > like any other gremlin traversal. Does this make sense to you?
> > >> > > >>
> > >> > > >> In a nutshell,
> > >> > > >> What is happening is that we are converting the SPARQL
> queryString
> > >> into
> > >> > > a gremlin traversal and leave it upto the tinkerpop compiler to
> > >> choose what
> > >> > > is best for it.
> > >> > > >> We only map a SPARQL query to its corresponding pattern
> matching
> > >> > > gremlin traversal (i.e. using with .match() clause). Since, the
> > >> > > expressibility of SPARQL is less than that of Gremlin (i.e.
> SPARQL 1.0
> > >> > > doesn't support/allow  performing looping and traversing
> operations),
> > >> we
> > >> > > can only map what is in the scope of SPARQL language to Gremlin.
> Once
> > >> the
> > >> > > traversal is generated, it is left to the tinkerpop compiler to
> > >> select and
> > >> > > execute a wide range of strategies ( various levels of
> optimizations,
> > >> et).
> > >> > > >>
> > >> > > >> NOTE - Also, Right now the sparql-gremlin compiler returns the
> > >> > > traversal (string) and not Bytecode. Returning the Bytecode is
> also
> > >> > > completely possible, if you want so. We can just perform
> > >> > > traversal.asAdmin().getBytecode() for this and it is done.
> > >> > > >>
> > >> > > >> Since, we extended Daniel's work, we have not changed the
> names of
> > >> > > classes, methods and variable which were used. This, however, can
> be
> > >> > > changed, if you suggest so.
> > >> > > >>
> > >> > > >> 8. Yes, working in the academia doesn't groom you much on the
> > >> > > importance of commenting in the code by default, or for that
> matter
> > >> any
> > >> > > "good-practices". I will add appropriate comment block in each
> class
> > >> for
> > >> > > the javadocs.
> > >> > > >>
> > >> > > >> I hope the above reply address your questions to quite some
> extent.
> > >> > > Most of the issues are already handled internally (as I stated
> > >> above). We
> > >> > > can also leave some advanced features such as remote(), for the
> 0.2.1
> > >> > > release (though this is just an option) :D
> > >> > > >> Having said that, Of course, we are in no hurry for the pull
> > >> request. I
> > >> > > also believe it makes complete sense to get things right.
> > >> > > >>
> > >> > > >> Cheers!
> > >> > > >>
> > >> > > >> On 2017-12-18 14:11, Stephen Mallette <[email protected]
> > >> <mailto:
> > >> > > [email protected]>> wrote:
> > >> > > >>> Harsh, I looked at the code in a bit more detail than I have.
> > >> Here's
> > >> > > some
> > >> > > >>> thoughts/questions I had as I was going through things:
> > >> > > >>>
> > >> > > >>> 1. Can this file be removed - it doesn't appear to have any
> usage
> > >> that
> > >> > > I
> > >> > > >>> can see:
> > >> > > >>>
> > >> > > >>> https://github.com/harsh9t/tinkerpop/blob/master/sparql-
> > >> > > gremlin/src/main/java/org/apache/tinkerpop/gremlin/
> sparql/Runable.java
> > >> <
> > >> > > https://github.com/harsh9t/tinkerpop/blob/master/sparql-
> > >> > > gremlin/src/main/java/org/apache/tinkerpop/gremlin/sparql/
> > >> Runable.java>
> > >> > > >>>
> > >> > > >>> 2. I note that this entire block of tests is commented out -
> > >> should
> > >> > > that be
> > >> > > >>> removed?:
> > >> > > >>>
> > >> > > >>> https://github.com/harsh9t/tinkerpop/blob/master/sparql-
> > >> > > gremlin/src/test/java/org/apache/tinkerpop/gremlin/sparql/
> > >> > > SparqlToGremlinCompilerTest.java
> > >> > > >>>
> > >> > > >>> 3. I could be wrong, but even if you didn't remove the tests
> > >> above, it
> > >> > > >>> seems like unit testing is rather thin at this point. Am I
> missing
> > >> > > >>> something? Is there more work to do there?
> > >> > > >>>
> > >> > > >>> 4. I don't understand the nature of these resources:
> > >> > > >>>
> > >> > > >>> https://github.com/harsh9t/tinkerpop/tree/master/sparql-
> > >> > > gremlin/src/main/resources
> > >> > > >>>
> > >> > > >>> Is there any need to package those with the jar? Should those
> be
> > >> "test"
> > >> > > >>> resources instead? Do we need the really large
> data/bsbm1m.graphml
> > >> > > file for
> > >> > > >>> any specific reason?
> > >> > > >>>
> > >> > > >>> 5. What are these changes to these poms?
> > >> > > >>>
> > >> > > >>> https://github.com/harsh9t/tinkerpop/commit/
> > >> > > cb3b6512ea3536f556108e5a257c4586aa4d157a
> > >> > > >>>
> > >> > > >>> I assume that your IDE did that accidentally and it was not
> > >> intended.
> > >> > > >>> Please revert that change.
> > >> > > >>>
> > >> > > >>> 6. This looks odd too - gremlin-shaded repeated again and
> again
> > >> and
> > >> > > again:
> > >> > > >>>
> > >> > > >>> https://github.com/harsh9t/tinkerpop/commit/
> > >> > > 143d16f20dcaa9c915b96cdd4adf7b1504db5d36#diff-
> > >> > > 9e90009f097eabeb25c28159571fc6a2R118
> > >> > > >>>
> > >> > > >>> 7. Did you have any thoughts in reference to Marko's earlier
> > >> reply that
> > >> > > >>> described how sparql-gremlin should be used? Right now, it
> seems
> > >> like
> > >> > > the
> > >> > > >>> code you have there is just the "engine" but lacks the piece
> that
> > >> > > connects
> > >> > > >>> it into the rest of the stack. From my perspective, I think we
> > >> need to
> > >> > > be
> > >> > > >>> sure that users have an easy, clear and consistent way to use
> > >> > > >>> sparql-gremlin before we can merge this work. Obviously,
> having
> > >> that
> > >> > > aspect
> > >> > > >>> of the code thought through will impact the documentation
> that you
> > >> > > write as
> > >> > > >>> well, so I think you need to go down this path a bit further
> > >> before we
> > >> > > get
> > >> > > >>> to the pull request stage.
> > >> > > >>>
> > >> > > >>> 8. We aren't big javadoc sticklers here, but we try to at
> least
> > >> get
> > >> > > class
> > >> > > >>> level javadoc in place for most classes. I don't see much
> javadoc
> > >> or
> > >> > > >>> comments in the code right now. I think I'd like to see a
> modicum
> > >> of
> > >> > > >>> javadoc/comments present as part of this work.
> > >> > > >>>
> > >> > > >>> So, that's my broad level feedback at this point. It seems as
> > >> though
> > >> > > there
> > >> > > >>> are some reasonably large issues there to contend with before
> a
> > >> pull
> > >> > > >>> request is worth issuing. That's not a problem, of
> course....we
> > >> will
> > >> > > just
> > >> > > >>> keep iterating toward the goal. I'm not aware of anything
> that is
> > >> > > pushing
> > >> > > >>> us to rush to a pull request - I'm of the opinion that we can
> > >> take the
> > >> > > time
> > >> > > >>> to get this right.
> > >> > > >>>
> > >> > > >>> Thanks,
> > >> > > >>>
> > >> > > >>> Stephen
> > >> > > >>>
> > >> > > >>>
> > >> > > >>> On Fri, Dec 15, 2017 at 1:46 PM, Joshua Shinavier <
> > >> [email protected]>
> > >> > > wrote:
> > >> > > >>>
> > >> > > >>>> Hi Marko,
> > >> > > >>>>
> > >> > > >>>> I think we're more or less on the same page here; it's clear
> > >> that TP3
> > >> > > has a
> > >> > > >>>> different API than TP2. If you look at the guts of TP3
> GraphSail
> > >> [1],
> > >> > > it
> > >> > > >>>> uses the modern APIs, and yet does adapt them to the Sail
> > >> interface.
> > >> > > >>>>
> > >> > > >>>> Something like PropertyGraphSail (or an equivalent Jena
> thing)
> > >> still
> > >> > > makes
> > >> > > >>>> sense in TP3, as well. One interesting detail here is that in
> > >> TP3,
> > >> > > vertices
> > >> > > >>>> can have labels, which can be turned into rdf:type statements
> > >> (that,
> > >> > > in
> > >> > > >>>> turn, can be used to enable subclass/superclass inheritance
> if
> > >> the
> > >> > > graph is
> > >> > > >>>> combined with a RDF schema.
> > >> > > >>>>
> > >> > > >>>> A TP3 equivalent of SailGraph would indeed be quite
> different in
> > >> > > >>>> implementation -- strategies, not wrapper graph -- than what
> we
> > >> had
> > >> > > for
> > >> > > >>>> Blueprints, and yet would serve the same purpose.
> > >> > > >>>>
> > >> > > >>>> Josh
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>> [1]
> > >> > > >>>> https://github.com/joshsh/graphsail/tree/master/src/
> > >> > > >>>> main/java/net/fortytwo/tpop/sail
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>>
> > >> > > >>>> On Fri, Dec 15, 2017 at 10:22 AM, Marko Rodriguez <
> > >> > > [email protected]>
> > >> > > >>>> wrote:
> > >> > > >>>>
> > >> > > >>>>> Hello,
> > >> > > >>>>>
> > >> > > >>>>> The model proposed below is in-line with TinkerPop2â€™s way
> of
> > >> > > thinking.
> > >> > > >>>>> Unfortunately, TinkerPop3 and more so for TinkerPop4, the
> Graph
> > >> > > >>>> â€œstructure"
> > >> > > >>>>> API will become deprecated. This means that the notion of
> > >> > > â€œwrapping the
> > >> > > >>>>> Graph APIâ€  has gone away for TP3 and will be completely
> gone
> > >> in
> > >> > > TP4. In
> > >> > > >>>>> TP4, there will not even be a Graph API â€” no more Vertex,
> > >> Edge,
> > >> > > Property,
> > >> > > >>>>> etc. Only the concept of a Graph with only methods like
> > >> > > >>>> Graph.traversal(),
> > >> > > >>>>> Graph.partitions(), etc.
> > >> > > >>>>>
> > >> > > >>>>> Why was this route taken? In TinkerPop3, there was a need to
> > >> support
> > >> > > any
> > >> > > >>>>> language besides Java. This was why Gremlin bytecode and the
> > >> concept
> > >> > > of
> > >> > > >>>> the
> > >> > > >>>>> Gremlin traversal machine was introduced. A provider simply
> gets
> > >> > > Gremlin
> > >> > > >>>>> bytecode and has to do something with it. For the Java-based
> > >> Gremlin
> > >> > > >>>>> traversal machine, this is why providers implement their own
> > >> > > GraphStep,
> > >> > > >>>>> VertexStep, etc. For a Python-based Gremlin traversal
> machine,
> > >> > > likewiseâ€¦
> > >> > > >>>>>
> > >> > > >>>>> This means that SailGraph, GraphSail, PropertyGraphSail as
> > >> stated
> > >> > > below
> > >> > > >>>>> donâ€™t make sense in the current and future architectures.
> > >> > > >>>>>
> > >> > > >>>>> The next question becomes, "well how would you turn an RDF
> store
> > >> > > into a
> > >> > > >>>>> PropertyGraph?â€  Easy â€” implement your own custom
> GraphStep,
> > >> > > VertexStep,
> > >> > > >>>>> etc. and respective ProviderStrategies that will handle the
> > >> bytecode
> > >> > > >>>>> compilation accordingly.
> > >> > > >>>>>
> > >> > > >>>>> The next question becomes, â€œwell how would a PropertyGraph
> > >> support
> > >> > > >>>>> reasoning?â€  Easy â€” implement your own custom
> > >> DecorationStrategy
> > >> > > that will
> > >> > > >>>>> insert reasoning into the traversal giving the RDFS schema.
> For
> > >> > > instance:
> > >> > > >>>>>        g.V().out(â€œlikesâ€ )
> > >> > > >>>>>                ==>
> > >> > > >>>>>        g.V().out(â€œknowsâ€ ,â€ likesâ€ )
> > >> > > >>>>>                iff â€œlikesâ€  is a sub-property of
> â€œknowsâ€
> > >> > > >>>>>
> > >> > > >>>>> In essence, it is possible to do this integration of RDF and
> > >> > > TinkerPop,
> > >> > > >>>> it
> > >> > > >>>>> just needs to be done at the correct level of abstraction so
> > >> that it
> > >> > > >>>> stays
> > >> > > >>>>> in line with how TinkerPop is evolving, not how it was back
> in
> > >> 2012.
> > >> > > >>>>>
> > >> > > >>>>> Take care,
> > >> > > >>>>> Marko.
> > >> > > >>>>>
> > >> > > >>>>> http://markorodriguez <http://markorodriguez/>.com
> > >> > > >>>>>
> > >> > > >>>>>
> > >> > > >>>>> On 2017-12-13 07:46, Joshua Shinavier <[email protected]>
> > >> wrote:
> > >> > > >>>>>> Hi Harsh,>
> > >> > > >>>>>>
> > >> > > >>>>>> Glad you are taking Daniel's work forward. In porting the
> code
> > >> to
> > >> > > the>
> > >> > > >>>>>> TinkerPop code base, might I suggest we allow for not only
> > >> > > >>>>> SPARQL-Gremlin,>
> > >> > > >>>>>> but a whole suite of RDF tools as in TP2. Perhaps call the
> > >> module>
> > >> > > >>>>>> rdf-gremlin. Then we could have all of:>
> > >> > > >>>>>>
> > >> > > >>>>>> * SPARQL-Gremlin: executes standard SPARQL queries over a
> > >> Property
> > >> > > >>>> Graph>
> > >> > > >>>>>> database>
> > >> > > >>>>>> * GraphSail [1,2]: stores RDF quads in the database,
> > >> explicitly,
> > >> > > and>
> > >> > > >>>>>> enables SPARQL and triple pattern queries over the quads>
> > >> > > >>>>>> * PropertyGraphSail [3]: exposes a Property Graph with of
> two
> > >> > > mappings
> > >> > > >>>>> to>
> > >> > > >>>>>> the RDF data model>
> > >> > > >>>>>> * SailGraph [4]: takes an RDF triple store (not natively
> > >> supporting>
> > >> > > >>>>>> Gremlin) and enables Gremlin queries>
> > >> > > >>>>>> * others? I have often thought that a continuous SPARQL
> > >> > > implementation>
> > >> > > >>>>>> built on Gremlin would be powerful>
> > >> > > >>>>>>
> > >> > > >>>>>> The biggest mismatch between the TP2 suite and what might
> be
> > >> built
> > >> > > for>
> > >> > > >>>>>> Apache TinkerPop is that the previous suite was implemented
> > >> using
> > >> > > >>>>> (Eclipse)>
> > >> > > >>>>>> RDF4j, whereas things seem to be leaning towards (Apache)
> Jena
> > >> now.>
> > >> > > >>>>>> However, the same principles could be applied.>
> > >> > > >>>>>>
> > >> > > >>>>>> Josh>
> > >> > > >>>>>>
> > >> > > >>>>>>
> > >> > > >>>>>> [1] https://github.com/tinkerpop/blueprints/wiki/Sail-
> > >> > > Ouplementation>
> > >> > > >>>>>> [2] https://github.com/joshsh/graphsail>
> > >> > > >>>>>> [3]>
> > >> > > >>>>>> https://github.com/tinkerpop/blueprints/wiki/
> PropertyGraphSa
> > >> il-
> > >> > > >>>>> Ouplementation>
> > >> > > >>>>>> [4] https://github.com/tinkerpop/blueprints/wiki/Sail-
> > >> > > Implementation
> > >> > > >>>>>
> > >> > > >>>>> http://markorodriguez.com
> > >> > > >
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: [Discussed] Integrating SPARQL-Gremlin 0.2 Plugin with the TinkerPop codebase

Reply via email to